这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@Dead2
Copy link
Member

@Dead2 Dead2 commented Nov 3, 2024

  • Add support for giving compilers (and static analyzers) hints about value ranges of variables, allowing them to better optimize the code.
  • It is very important that Assume hints are always true, therefore make debug builds check the values and throw errors if the assumption is not true.
  • Make Assert() do Assume() in release builds.
  • Add hints about value ranges coming into inffast
  • Remove always-true Assert in zng_tr_tally_lit()

Just letting existing Asserts() calls go to Assume() reduces compiled code size by 104 bytes on x86-64.
Performance seems to be unchanged with these hints (within 0.01%).

The intentions behind adding this:

  • Improve static analyzer hinting, letting them avoid false-positives.
  • Possibly letting the compilers better optimize code and for example not try to optimize for big lengths where we already know that we will only have short lengths (for example 258bytes is the max length of matches, so there is no need to optimize for the 64kB lengths that the datatype would allow).
  • Make adding asserts/assume to new code more attractive.
  • This would then also make code more self-documenting and make it easier for anyone to understand the limitations of a function.

Summary by CodeRabbit

  • New Features

    • Introduced new macros for enhanced error-checking and debugging: z_assume and Assume.
  • Bug Fixes

    • Improved error handling in compression and decompression functions with additional assertions to validate parameters.
  • Documentation

    • Updated comments to clarify the purpose of new macros and assertions.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 3, 2024

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes made in this pull request enhance the error handling and validation mechanisms in the deflate compression algorithm. Modifications include the introduction of assertions in the zng_tr_tally_lit and zng_tr_tally_dist functions, ensuring parameters are within valid ranges. Additionally, new macros for assumptions and assertions are defined in zbuild.h, improving the robustness of the code. The INFLATE_FAST function in inffast_tpl.h has also been updated to include precondition checks, reinforcing the integrity of the decoding process without altering its core logic.

Changes

File Change Summary
deflate_p.h - Added check_match function for validating match parameters.
- Updated zng_tr_tally_lit and zng_tr_tally_dist with assertions for parameter validation.
- Defined macros FLUSH_BLOCK_ONLY and FLUSH_BLOCK for block flushing.
inffast_tpl.h - Enhanced INFLATE_FAST function with Assume statements for precondition checks on state and buffers. Updated comments for clarity.
zbuild.h - Introduced new macros z_assume and Assume for compiler hints and runtime checks. Updated Assert macro to utilize Assume.

Possibly related PRs

Suggested labels

cleanup

Suggested reviewers

  • nmoinvaz
  • KungFuJesus

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
zbuild.h (2)

214-229: LGTM! Consider enhancing documentation.

The implementation correctly handles different compiler capabilities for optimization hints. The macro is well-structured with appropriate fallbacks.

Consider adding a code example in the documentation to demonstrate proper usage, as this is a new feature that developers will need to understand:

/* The Assume() macro provides hints to compiler and static analyzers about what
   value range is valid for a variable, helping them optimize for real use.
   To ensure this does not hide bugs, compilation in debug mode will cause it to
   check the assumption and throw an error if it fails.
+
+ Example:
+   void process_buffer(char* buf, size_t len) {
+     Assume(len <= 1024);  // Hint to compiler about maximum size
+     ...
+   }
 */

Line range hint 214-257: Consider documenting optimization impact guidelines.

The introduction of compiler hints through Assume and modified Assert is a powerful feature for optimization. To ensure proper usage:

  1. Consider adding documentation about when to use Assume vs Assert
  2. Document guidelines for measuring performance impact
  3. Add examples of common optimization patterns

Would you like me to help draft this documentation?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 94aacd8 and 09b74f3.

📒 Files selected for processing (3)
  • deflate_p.h (0 hunks)
  • inffast_tpl.h (1 hunks)
  • zbuild.h (2 hunks)
💤 Files with no reviewable changes (1)
  • deflate_p.h
🔇 Additional comments (2)
zbuild.h (1)

257-257: Verify impact of Assert behavior change in release builds.

The modification to use Assume instead of removing Assert calls in release builds is a significant change that could affect existing code.

Let's analyze the impact:

✅ Verification successful

Based on the results, I can now generate the final response:

Assert macro change appears safe for release builds

The codebase analysis shows that Assert is primarily used for:

  • Input validation and invariant checks (e.g., buffer sizes, value ranges)
  • Boundary checks in memory operations
  • Parameter validation in low-level functions

All observed Assert usages:

  • Are not in tight loops that would impact performance
  • Validate critical conditions that should hold true for correct operation
  • Check invariants that compilers can use for optimization

The change to use Assume in release builds is actually beneficial as it:

  • Provides optimization hints to the compiler about valid conditions
  • Maintains safety checks in debug builds
  • Doesn't introduce overhead in performance-critical paths
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all Assert usage in the codebase to evaluate impact

# Search for Assert usage
echo "Analyzing Assert usage patterns..."
rg "Assert\(" --type cpp -B 2 -A 2

# Look for potential performance-critical sections
echo "Checking Assert usage in tight loops..."
ast-grep --pattern 'while($_) { $$$ Assert($_, $_) $$$ }'
ast-grep --pattern 'for($_; $_; $_) { $$$ Assert($_, $_) $$$ }'

Length of output: 4812

inffast_tpl.h (1)

119-125: Well-placed assumptions that match documented invariants!

The added Assume statements effectively translate the documented entry assumptions into compiler hints, which can help with optimizations while maintaining runtime checks in debug builds.

Let's verify that callers maintain these invariants:

✅ Verification successful

Assumptions are well-defined and correctly enforced by callers

The verification shows that:

  • Constants INFLATE_FAST_MIN_HAVE (15) and INFLATE_FAST_MIN_LEFT (260) are properly defined in inflate_p.h
  • Callers in inflate.c and infback.c explicitly check these minimum requirements before invoking INFLATE_FAST
  • The assumptions align with the documented requirements and runtime checks
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if callers maintain the required invariants before calling INFLATE_FAST

# Search for all calls to INFLATE_FAST
echo "Searching for INFLATE_FAST callers..."
rg -l "INFLATE_FAST" | while read -r file; do
  echo "=== Analyzing $file ==="
  # Look for setup code before INFLATE_FAST calls
  rg -B 10 "INFLATE_FAST\s*\(" "$file"
done

# Verify the constants are properly defined
echo "Checking for INFLATE_FAST_MIN_HAVE and INFLATE_FAST_MIN_LEFT definitions..."
rg "INFLATE_FAST_MIN_(HAVE|LEFT)\b"

Length of output: 3826

@codecov
Copy link

codecov bot commented Nov 3, 2024

Codecov Report

Attention: Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.36%. Comparing base (860e4cf) to head (932c615).

Files with missing lines Patch % Lines
trees.c 91.30% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1808      +/-   ##
===========================================
- Coverage    80.44%   80.36%   -0.09%     
===========================================
  Files          139      139              
  Lines        11176    11170       -6     
  Branches      2867     2898      +31     
===========================================
- Hits          8991     8977      -14     
- Misses        1221     1225       +4     
- Partials       964      968       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Dead2 Dead2 force-pushed the hint-input-ranges branch from 09b74f3 to ad20a13 Compare November 3, 2024 20:49
@Dead2 Dead2 force-pushed the hint-input-ranges branch from ad20a13 to bfa8e5e Compare November 3, 2024 21:14
@Dead2
Copy link
Member Author

Dead2 commented Nov 3, 2024

MSVC had support for this before the stated version, but I found some references to bugs, so ended up setting VS 2012 as lower end.
Clang had support from 4.5 or thereabouts, but I set version 5 as lowest for the same reason.

It is possible to emulate this using unreachable attributes in older GCC versions, however the semantics for that differ a little, therefore I chose to only use the proper assume hint at this time.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
zbuild.h (1)

225-233: Consider adding support for additional compilers.

The implementation is solid and safely handles Clang, GCC, and MSVC. Consider extending support to:

  • Intel compiler using __builtin_assume
  • ARM compiler using __builtin_assume

This would broaden the optimization benefits across more platforms.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ad20a13 and bfa8e5e.

📒 Files selected for processing (2)
  • inffast_tpl.h (1 hunks)
  • zbuild.h (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • inffast_tpl.h
🔇 Additional comments (3)
zbuild.h (3)

214-224: LGTM! Well-documented feature with clear example.

The documentation clearly explains the purpose, behavior, and usage of the Assume macro, with a practical example that demonstrates its application for providing size hints to the compiler.


234-238: LGTM! Clean and safe implementation.

The Assume macro implementation properly handles both debug and release modes, with appropriate error reporting in debug builds and optimization hints in release builds.


261-261: Verify impact on existing Assert usage.

The modification to use Assume in release builds is correct and aligns with the PR objectives. However, let's verify the impact on existing Assert calls in the codebase.

✅ Verification successful

Based on the analysis of Assert usage patterns in the codebase, I can generate the final response:

Assert usage verification completed - safe to proceed

All Assert calls in the codebase are used for invariant checking and validation with simple conditions. No instances found where:

  • Assertions contain function calls that could have side effects
  • Assertions are used for control flow
  • Assertions contain complex expressions that need to be evaluated

The conditions are primarily:

  • Boundary checks (e.g., len <= UINT8_MAX)
  • Null pointer checks (e.g., buf != NULL)
  • Size validations (e.g., s->w_size <= UINT16_MAX)
  • State invariants (e.g., state->pending == 0)

These patterns are compatible with the change to use Assume in release builds.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze Assert usage patterns to ensure compatibility with new behavior

# Find all Assert calls in the codebase
echo "Analyzing Assert usage patterns..."
rg -p 'Assert\((.*?)\)' --type c --type h

# Look for potential issues like Assert calls with side effects
echo -e "\nChecking for Assert calls that might have side effects..."
rg -p 'Assert\((.*?[+\-*/%=].*?)\)' --type c --type h

Length of output: 8572

@Dead2
Copy link
Member Author

Dead2 commented Nov 3, 2024

I'll have to redo this PR, as I had initially suspected, not all the asserts are in fact usable as assume hints (some variables do not exist in release builds, so those cause failures).

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

We could possibly do it like this if it looks clean, not sure.

Assert(buf != NULL, "lost buf"); Assume(buf != NULL);

@Dead2 Dead2 marked this pull request as draft November 9, 2024 00:16
@mtl1979
Copy link
Collaborator

mtl1979 commented Dec 10, 2024

I'll have to redo this PR, as I had initially suspected, not all the asserts are in fact usable as assume hints (some variables do not exist in release builds, so those cause failures).

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

Something like Verify() for release builds would work... It makes clear that we don't assume the condition holds, we want to make sure it holds.

zbuild.h Outdated
}
*/
#if defined(__clang__) && __clang_major__ >= 4
#define z_assume(...) do { __builtin_assume(__VA_ARGS__); } while(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define -> # define

@Dead2
Copy link
Member Author

Dead2 commented Dec 10, 2024

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

Something like Verify() for release builds would work... It makes clear that we don't assume the condition holds, we want to make sure it holds.

Not sure what you mean Verify would do. We don't want to add actual checks to the release builds.
That is why the hint is named assume, meaning it is completely safe for the compiler to assume that the hint is always true. Using an assert would give compilers and static checkers the same hint but also perform the check and thus cost performance and add instructions to the compiled code. Assume gives the hint but does not add any verification code to the build, if you assumed wrong it would more than likely break due to incorrect optimizations.

@mtl1979
Copy link
Collaborator

mtl1979 commented Dec 12, 2024

I need to consider whether to add all Assume hints separately, or whether to introduce another Assert() that does Assume in release builds, while the normal one does not (name suggestions?). Thoughts?

Something like Verify() for release builds would work... It makes clear that we don't assume the condition holds, we want to make sure it holds.

Not sure what you mean Verify would do. We don't want to add actual checks to the release builds.

Just suggesting names... Maybe use Verify() for Debug builds and Assume() for Release builds.

@Dead2 Dead2 force-pushed the hint-input-ranges branch from bfa8e5e to 932c615 Compare February 11, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants