-
-
Notifications
You must be signed in to change notification settings - Fork 308
adler32_rvv: Fix two overflow problems #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes in the Changes
Suggested Labels
Suggested Reviewers
Note: No sequence diagram was generated as the changes are relatively straightforward and do not introduce significant new control flow or interactions. Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
arch/riscv/adler32_rvv.c (1)
113-115: LGTM: Proper bounds checking before scalar fallbackThe modulo operations correctly prevent overflow when transitioning to scalar processing for remaining bytes < vl. This is essential since vector calculations can produce large intermediate values.
Consider adding a comment explaining why these modulo operations are necessary here, as it's a subtle but critical detail for maintaining numerical stability in the hybrid vector-scalar implementation.
+ /* Ensure values are within bounds before scalar processing */ sum2 %= BASE; adler %= BASE;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
arch/riscv/adler32_rvv.c(2 hunks)
🔇 Additional comments (1)
arch/riscv/adler32_rvv.c (1)
75-75: LGTM: Overflow protection for v_buf32_accu
The added modulo operation correctly prevents overflow in v_buf32_accu when processing large input data blocks.
Let's verify the fix handles large inputs correctly:
✅ Verification successful
LGTM: Overflow protection for v_buf32_accu is correctly implemented
The added modulo operation is verified to be correct based on the following evidence:
- The test suite includes extensive test cases with various input sizes, including a large 5552-byte test string
- The fuzzer test validates correctness by comparing results with reference implementation
- The modulo operation is correctly placed within the NMAX block check, alongside the existing v_adler32_prev_accu modulo
- The implementation follows the standard Adler-32 algorithm requirements by using BASE as modulo value
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for test cases that verify overflow handling
rg -l "test.*adler.*overflow|test.*large.*input" test/
Length of output: 56
Script:
#!/bin/bash
# Let's try a broader search for test files related to adler32
fd "test.*adler" test/
# Also search for any test files in the RISC-V specific directory
fd "test" arch/riscv/
# Look for any test vectors or test data in the codebase
rg -i "test.*data|test.*vector" -A 5 | grep -i "adler"
Length of output: 289
Script:
#!/bin/bash
# Let's examine the test files to see how they verify adler32 implementation
cat test/test_adler32.cc
# Also check the fuzzer test to understand test data generation
cat test/fuzz/fuzzer_checksum.c
# Look for any constants or defines related to overflow prevention
rg -i "NMAX|BASE" arch/riscv/adler32_rvv.c -B 2 -A 2
Length of output: 38746
8a11d0c to
2161085
Compare
There are currently some overflow problems in adler32_rvv implementation, which can lead to wrong results for some input, and these problems could be easily exhibited when running `git fsck` with zlib-ng suitituting the system zlib on a big git repository. These problems and the solutions are the following: - When the input data is long enough, the v_buf32_accu can overflow too. Add it to the modulo code that happens per ~NMAX bytes. - When the vector data is reduced to scalar ones, the resulting scalar value (and the proceeded length) may lead to the calculation of sum2 to overflow. Add mod BASE to all these reductions and initial calculation of sum2. - When the remaining data less than vl bytes, the code falls back to a scalar implementation; however the sum2 and alder2 values are just reduced from vectors and could be very big that makes sum2 overflows in the scalar code. Modulo them before the scalar code to prevent such overflow (because vl is surely quite smaller than NMAX). Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
2161085 to
5fff59c
Compare
|
Added another overflow fix about the reduction from vector to scalar. |
nmoinvaz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add unit tests for this or is it not practical?
|
For the first problem, I have a 2M random .bin file that can lead to this issue being exhibited when running adler32 on it. |
|
For the second problem (in the commit message now), I think crafting a dataset with very long Then for the third, attaching (vl - 1) 0xFF's to a data that will be calculated to very big sum2 and adler may exhibit the problems? Well these two dataset construction methods both depend on a pre-determined vl ... (My testing platform is SpacemiT K1, which should have VLEN=256 and the vl is 32 in this situation) |
|
And surely the behavior between QEMU and K1 differs on these problems -- QEMU defaults to VLEN=128 (vl = 16). |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1826 +/- ##
===========================================
- Coverage 32.25% 32.13% -0.13%
===========================================
Files 67 67
Lines 5745 5745
Branches 1239 1239
===========================================
- Hits 1853 1846 -7
- Misses 3637 3646 +9
+ Partials 255 253 -2 ☔ View full report in Codecov by Sentry. |
Dead2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There are currently two overflow problems in adler32_rvv implementation, which can lead to wrong results for some input, and these problems could be easily exhibited when running
git fsckwith zlib-ng suitituting the system zlib on a big git repository.These problems and the solutions are the following:
Summary by CodeRabbit
New Features
Bug Fixes