Add support for sending "assume" hints to compilers #1808

Dead2 · 2024-11-03T20:38:07Z

Add support for giving compilers (and static analyzers) hints about value ranges of variables, allowing them to better optimize the code.
It is very important that Assume hints are always true, therefore make debug builds check the values and throw errors if the assumption is not true.
Make Assert() do Assume() in release builds.
Add hints about value ranges coming into inffast
Remove always-true Assert in zng_tr_tally_lit()

Just letting existing Asserts() calls go to Assume() reduces compiled code size by 104 bytes on x86-64.
Performance seems to be unchanged with these hints (within 0.01%).

The intentions behind adding this:

Improve static analyzer hinting, letting them avoid false-positives.
Possibly letting the compilers better optimize code and for example not try to optimize for big lengths where we already know that we will only have short lengths (for example 258bytes is the max length of matches, so there is no need to optimize for the 64kB lengths that the datatype would allow).
Make adding asserts/assume to new code more attractive.
This would then also make code more self-documenting and make it easier for anyone to understand the limitations of a function.

Summary by CodeRabbit

New Features
- Introduced new macros for enhanced error-checking and debugging: z_assume and Assume.
Bug Fixes
- Improved error handling in compression and decompression functions with additional assertions to validate parameters.
Documentation
- Updated comments to clarify the purpose of new macros and assertions.

coderabbitai · 2024-11-03T20:38:13Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes made in this pull request enhance the error handling and validation mechanisms in the deflate compression algorithm. Modifications include the introduction of assertions in the zng_tr_tally_lit and zng_tr_tally_dist functions, ensuring parameters are within valid ranges. Additionally, new macros for assumptions and assertions are defined in zbuild.h, improving the robustness of the code. The INFLATE_FAST function in inffast_tpl.h has also been updated to include precondition checks, reinforcing the integrity of the decoding process without altering its core logic.

Changes

File	Change Summary
deflate_p.h	- Added `check_match` function for validating match parameters. - Updated `zng_tr_tally_lit` and `zng_tr_tally_dist` with assertions for parameter validation. - Defined macros `FLUSH_BLOCK_ONLY` and `FLUSH_BLOCK` for block flushing.
inffast_tpl.h	- Enhanced `INFLATE_FAST` function with `Assume` statements for precondition checks on state and buffers. Updated comments for clarity.
zbuild.h	- Introduced new macros `z_assume` and `Assume` for compiler hints and runtime checks. Updated `Assert` macro to utilize `Assume`.

Possibly related PRs

Misc inflate cleanup #1797: Modifications to the inflate_state structure, focusing on distance validation and state management, related to error handling improvements in deflate_p.h.
Compute the "safe" distance properly #1801: Changes to the inflate_fast function regarding memory operations for distance calculations, aligning with enhancements in zng_tr_tally_dist.
Simplify inflate fast by dispatching to chunkmemset for all chunkcopy cases #1802: Updates to memory handling functions in chunkset_tpl.h enhance safety and efficiency, paralleling the main PR's focus on robust error handling.
Reorder variables in inflate functions to reduce padding holes #1803: Reordering of variables in inflate functions to reduce padding holes, indirectly relating to improvements in structure and efficiency.

Suggested labels

cleanup

Suggested reviewers

nmoinvaz
KungFuJesus

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

zbuild.h (2)
214-229: LGTM! Consider enhancing documentation.

The implementation correctly handles different compiler capabilities for optimization hints. The macro is well-structured with appropriate fallbacks.

Consider adding a code example in the documentation to demonstrate proper usage, as this is a new feature that developers will need to understand:
/* The Assume() macro provides hints to compiler and static analyzers about what
   value range is valid for a variable, helping them optimize for real use.
   To ensure this does not hide bugs, compilation in debug mode will cause it to
   check the assumption and throw an error if it fails.
+
+ Example:
+   void process_buffer(char* buf, size_t len) {
+     Assume(len <= 1024);  // Hint to compiler about maximum size
+     ...
+   }
 */
Line range hint 214-257: Consider documenting optimization impact guidelines.

The introduction of compiler hints through Assume and modified Assert is a powerful feature for optimization. To ensure proper usage:

Consider adding documentation about when to use Assume vs Assert

Document guidelines for measuring performance impact

Add examples of common optimization patterns

Would you like me to help draft this documentation?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 94aacd8 and 09b74f3.

📒 Files selected for processing (3)

deflate_p.h (0 hunks)
inffast_tpl.h (1 hunks)
zbuild.h (2 hunks)

💤 Files with no reviewable changes (1)

deflate_p.h

🔇 Additional comments (2)

zbuild.h (1)

257-257: Verify impact of Assert behavior change in release builds.

The modification to use Assume instead of removing Assert calls in release builds is a significant change that could affect existing code.

Let's analyze the impact:

✅ Verification successful

Based on the results, I can now generate the final response:

Assert macro change appears safe for release builds

The codebase analysis shows that Assert is primarily used for:

Input validation and invariant checks (e.g., buffer sizes, value ranges)
Boundary checks in memory operations
Parameter validation in low-level functions

All observed Assert usages:

Are not in tight loops that would impact performance
Validate critical conditions that should hold true for correct operation
Check invariants that compilers can use for optimization

The change to use Assume in release builds is actually beneficial as it:

Provides optimization hints to the compiler about valid conditions
Maintains safety checks in debug builds
Doesn't introduce overhead in performance-critical paths

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all Assert usage in the codebase to evaluate impact

# Search for Assert usage
echo "Analyzing Assert usage patterns..."
rg "Assert\(" --type cpp -B 2 -A 2

# Look for potential performance-critical sections
echo "Checking Assert usage in tight loops..."
ast-grep --pattern 'while($_) { $$$ Assert($_, $_) $$$ }'
ast-grep --pattern 'for($_; $_; $_) { $$$ Assert($_, $_) $$$ }'

Length of output: 4812

inffast_tpl.h (1)

119-125: Well-placed assumptions that match documented invariants!

The added Assume statements effectively translate the documented entry assumptions into compiler hints, which can help with optimizations while maintaining runtime checks in debug builds.

Let's verify that callers maintain these invariants:

✅ Verification successful

Assumptions are well-defined and correctly enforced by callers

The verification shows that:

Constants INFLATE_FAST_MIN_HAVE (15) and INFLATE_FAST_MIN_LEFT (260) are properly defined in inflate_p.h
Callers in inflate.c and infback.c explicitly check these minimum requirements before invoking INFLATE_FAST
The assumptions align with the documented requirements and runtime checks

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if callers maintain the required invariants before calling INFLATE_FAST

# Search for all calls to INFLATE_FAST
echo "Searching for INFLATE_FAST callers..."
rg -l "INFLATE_FAST" | while read -r file; do
  echo "=== Analyzing $file ==="
  # Look for setup code before INFLATE_FAST calls
  rg -B 10 "INFLATE_FAST\s*\(" "$file"
done

# Verify the constants are properly defined
echo "Checking for INFLATE_FAST_MIN_HAVE and INFLATE_FAST_MIN_LEFT definitions..."
rg "INFLATE_FAST_MIN_(HAVE|LEFT)\b"

Length of output: 3826

zbuild.h

codecov · 2024-11-03T20:46:43Z

Codecov Report

Attention: Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.36%. Comparing base (860e4cf) to head (932c615).

Files with missing lines	Patch %	Lines
trees.c	91.30%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1808      +/-   ##
===========================================
- Coverage    80.44%   80.36%   -0.09%     
===========================================
  Files          139      139              
  Lines        11176    11170       -6     
  Branches      2867     2898      +31     
===========================================
- Hits          8991     8977      -14     
- Misses        1221     1225       +4     
- Partials       964      968       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zbuild.h

Dead2 · 2024-11-03T21:20:15Z

MSVC had support for this before the stated version, but I found some references to bugs, so ended up setting VS 2012 as lower end.
Clang had support from 4.5 or thereabouts, but I set version 5 as lowest for the same reason.

It is possible to emulate this using unreachable attributes in older GCC versions, however the semantics for that differ a little, therefore I chose to only use the proper assume hint at this time.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

zbuild.h (1)

225-233: Consider adding support for additional compilers.

The implementation is solid and safely handles Clang, GCC, and MSVC. Consider extending support to:

Intel compiler using __builtin_assume

ARM compiler using __builtin_assume

This would broaden the optimization benefits across more platforms.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ad20a13 and bfa8e5e.

📒 Files selected for processing (2)

inffast_tpl.h (1 hunks)
zbuild.h (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

inffast_tpl.h

🔇 Additional comments (3)

zbuild.h (3)

214-224: LGTM! Well-documented feature with clear example.

The documentation clearly explains the purpose, behavior, and usage of the Assume macro, with a practical example that demonstrates its application for providing size hints to the compiler.

234-238: LGTM! Clean and safe implementation.

The Assume macro implementation properly handles both debug and release modes, with appropriate error reporting in debug builds and optimization hints in release builds.

261-261: Verify impact on existing Assert usage.

The modification to use Assume in release builds is correct and aligns with the PR objectives. However, let's verify the impact on existing Assert calls in the codebase.

✅ Verification successful

Based on the analysis of Assert usage patterns in the codebase, I can generate the final response:

Assert usage verification completed - safe to proceed

All Assert calls in the codebase are used for invariant checking and validation with simple conditions. No instances found where:

Assertions contain function calls that could have side effects
Assertions are used for control flow
Assertions contain complex expressions that need to be evaluated

The conditions are primarily:

Boundary checks (e.g., len <= UINT8_MAX)
Null pointer checks (e.g., buf != NULL)
Size validations (e.g., s->w_size <= UINT16_MAX)
State invariants (e.g., state->pending == 0)

These patterns are compatible with the change to use Assume in release builds.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze Assert usage patterns to ensure compatibility with new behavior

# Find all Assert calls in the codebase
echo "Analyzing Assert usage patterns..."
rg -p 'Assert\((.*?)\)' --type c --type h

# Look for potential issues like Assert calls with side effects
echo -e "\nChecking for Assert calls that might have side effects..."
rg -p 'Assert\((.*?[+\-*/%=].*?)\)' --type c --type h

Length of output: 8572

Dead2 · 2024-11-03T21:57:42Z

I'll have to redo this PR, as I had initially suspected, not all the asserts are in fact usable as assume hints (some variables do not exist in release builds, so those cause failures).

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

We could possibly do it like this if it looks clean, not sure.

Assert(buf != NULL, "lost buf"); Assume(buf != NULL);

mtl1979 · 2024-12-10T14:52:28Z

I'll have to redo this PR, as I had initially suspected, not all the asserts are in fact usable as assume hints (some variables do not exist in release builds, so those cause failures).

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

Something like Verify() for release builds would work... It makes clear that we don't assume the condition holds, we want to make sure it holds.

nmoinvaz · 2024-12-10T17:10:09Z

zbuild.h

+    }
+ */
+#if defined(__clang__) && __clang_major__ >= 4
+  #define z_assume(...) do { __builtin_assume(__VA_ARGS__); } while(0)


#define -> # define

Dead2 · 2024-12-10T21:27:35Z

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

Something like Verify() for release builds would work... It makes clear that we don't assume the condition holds, we want to make sure it holds.

Not sure what you mean Verify would do. We don't want to add actual checks to the release builds.
That is why the hint is named assume, meaning it is completely safe for the compiler to assume that the hint is always true. Using an assert would give compilers and static checkers the same hint but also perform the check and thus cost performance and add instructions to the compiled code. Assume gives the hint but does not add any verification code to the build, if you assumed wrong it would more than likely break due to incorrect optimizations.

mtl1979 · 2024-12-12T13:13:15Z

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

Something like Verify() for release builds would work... It makes clear that we don't assume the condition holds, we want to make sure it holds.

Not sure what you mean Verify would do. We don't want to add actual checks to the release builds.

Just suggesting names... Maybe use Verify() for Debug builds and Assume() for Release builds.

value ranges of variables, allowing them to better optimize the code. Adds Assume() and AssertHint() macros.

useful hints to compiler and static analyzers.

Dead2 added the enhancement label Nov 3, 2024

coderabbitai bot reviewed Nov 3, 2024

View reviewed changes

zbuild.h Outdated Show resolved Hide resolved

Dead2 force-pushed the hint-input-ranges branch from 09b74f3 to ad20a13 Compare November 3, 2024 20:49

phprus reviewed Nov 3, 2024

View reviewed changes

zbuild.h Outdated Show resolved Hide resolved

Dead2 force-pushed the hint-input-ranges branch from ad20a13 to bfa8e5e Compare November 3, 2024 21:14

coderabbitai bot reviewed Nov 3, 2024

View reviewed changes

Dead2 marked this pull request as draft November 9, 2024 00:16

nmoinvaz reviewed Dec 10, 2024

View reviewed changes

Dead2 added 5 commits February 11, 2025 23:20

Remove Assert that is always true due to limited range of data-type

57a38d5

Add support for giving compilers (and static analyzers) hints about

293a771

value ranges of variables, allowing them to better optimize the code. Adds Assume() and AssertHint() macros.

Convert various Assert() calls to AssertHint() where these can provide

2435cd4

useful hints to compiler and static analyzers.

Add hints about value ranges coming into inffast.

08ec577

Add various Assume hints

932c615

Dead2 force-pushed the hint-input-ranges branch from bfa8e5e to 932c615 Compare February 11, 2025 22:22

Uh oh!

Add support for sending "assume" hints to compilers #1808

Are you sure you want to change the base?

Add support for sending "assume" hints to compilers #1808

Conversation

Dead2 commented Nov 3, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Dead2 commented Nov 3, 2024

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Dead2 commented Nov 3, 2024

Uh oh!

mtl1979 commented Dec 10, 2024

Uh oh!

nmoinvaz Dec 10, 2024

Choose a reason for hiding this comment

Uh oh!

Dead2 commented Dec 10, 2024

Uh oh!

mtl1979 commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Dead2 commented Nov 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 3, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Nov 3, 2024 •

edited

Loading

mtl1979 commented Dec 12, 2024 •

edited

Loading