-
-
Notifications
You must be signed in to change notification settings - Fork 308
Remove here pointer in inflate_fast. #1953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (163)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
inffast_tpl.h (1)
153-161: Avoid magic constant 4 in memcpy size; prefer sizeof(code) for safety and forward-compatUsing literal 4 assumes the layout/packing of
struct codenever changes. It’s correct today, but brittle. Usingsizeof(code)keeps the optimization (it’s a compile-time constant) and protects against future struct changes or packing differences.Apply this diff:
- memcpy(&here, &lcode[hold & lmask], 4); + memcpy(&here, &lcode[hold & lmask], sizeof(code)); @@ - memcpy(&here, &lcode[hold & lmask], 4); + memcpy(&here, &lcode[hold & lmask], sizeof(code)); @@ - memcpy(&here, &lcode[hold & lmask], 4); + memcpy(&here, &lcode[hold & lmask], sizeof(code)); @@ - memcpy(&here, &dcode[hold & dmask], 4); + memcpy(&here, &dcode[hold & dmask], sizeof(code)); @@ - memcpy(&here, &dcode[here.val + BITS(here.op)], 4); + memcpy(&here, &dcode[here.val + BITS(here.op)], sizeof(code)); @@ - memcpy(&here, &lcode[here.val + BITS(here.op)], 4); + memcpy(&here, &lcode[here.val + BITS(here.op)], sizeof(code));Optionally, if you want an extra belt-and-suspenders, add a compile-time assertion in a common header (or here) to document the expectation:
/* e.g., after includes or near code type definition site */ _Static_assert(sizeof(code) == 4, "inflate code table entries must be 4 bytes");(Use your project’s static-assert macro if available.)
Also applies to: 177-177, 279-279, 286-286
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
inffast_tpl.h(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: inflate.h:127-130
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `inflate_state` structure, fields like `lenbits`, `lencode`, `distcode`, and `distbits` may be moved within the structure for optimization purposes; they are not necessarily new additions.
Applied to files:
inffast_tpl.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)
- GitHub Check: Ubuntu Clang
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (3)
inffast_tpl.h (3)
110-110: Switching from pointer to value forhereremoves aliasing and enables single 32-bit loads — good changeThis reduces pointer-chasing and lets the compiler emit a single 32-bit load per entry, which explains the small speedup you’re seeing. No functional concerns here.
165-173: Literal/length decode control flow remains correct after the refactorDropping bits before branching and using
here.op/here.valdirectly preserves the original semantics. Theop & MAX_BITSmasking for extra bits is intact. LGTM.
278-280: 2nd-level table lookups:BITS(here.op)is valid in these branchesIn the 2nd-level branches
((here.op & 64) == 0),here.opencodes the number of index bits (0..15), so usingBITS(here.op)is semantically equivalent to the oldBITS(op)usage. Control flow matches the original pattern withgotoback to the decode labels. Looks good.Also applies to: 285-288
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
inffast_tpl.h (5)
110-110: Good change to load-by-value; add a guard for struct size assumptionsSwitching
herefrom a pointer to a by-valuecodeimproves locality and enables single 32-bit loads. However, the code relies oncodebeing exactly 4 bytes elsewhere in this file. Make that explicit to future-proof the change.You can add a compile-time assertion near the declarations:
/* Ensure code table entries are exactly 4 bytes */ #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L _Static_assert(sizeof(code) == 4, "code entries must be 4 bytes"); #endif
153-169: Use sizeof(here) instead of hard-coded 4 in memcpy for robustnessReplace the hard-coded size with
sizeof(here)to avoid hidden footguns ifcodelayout changes. This keeps intent clear and lets the compiler optimize the constant size.- memcpy(&here, &lcode[hold & lmask], 4); + memcpy(&here, &lcode[hold & lmask], sizeof(here)); @@ - DROPBITS(here.bits); - memcpy(&here, &lcode[hold & lmask], 4); + DROPBITS(here.bits); + memcpy(&here, &lcode[hold & lmask], sizeof(here)); @@ - DROPBITS(here.bits); - memcpy(&here, &lcode[hold & lmask], 4); + DROPBITS(here.bits); + memcpy(&here, &lcode[hold & lmask], sizeof(here));
184-187: Two minor nits: sizeof(here) and pre-refill ordering
- Use
sizeof(here)instead of4to copy the entry.- Consider refilling before indexing
dcodeto make the dependency explicit that enough bits are available to index the first-level distance table. The top-of-loop REFILL guarantees headroom, so this is a style/readability nit rather than correctness.- memcpy(&here, &dcode[hold & dmask], 4); + memcpy(&here, &dcode[hold & dmask], sizeof(here)); - if (bits < MAX_BITS + MAX_DIST_EXTRA_BITS) { + if (bits < MAX_BITS + MAX_DIST_EXTRA_BITS) { REFILL(); }
285-287: Mask table-bit count when indexing 2nd-level distance tableWhile
here.opshould already be < 64 in this branch, explicitly masking toMAX_BITSprevents accidental inclusion of flag bits if table generation changes and preserves original semantics that usedop & 15.- } else if ((here.op & 64) == 0) { /* 2nd level distance code */ - memcpy(&here, &dcode[here.val + BITS(here.op)], 4); + } else if ((here.op & 64) == 0) { /* 2nd level distance code */ + memcpy(&here, &dcode[here.val + BITS(here.op & MAX_BITS)], sizeof(here)); goto dodist;
293-294: Likewise mask when indexing 2nd-level length table and use sizeof(here)Same rationale as distance: mask the bit count and avoid the literal 4.
- } else if ((here.op & 64) == 0) { /* 2nd level length code */ - memcpy(&here, &lcode[here.val + BITS(here.op)], 4); + } else if ((here.op & 64) == 0) { /* 2nd level length code */ + memcpy(&here, &lcode[here.val + BITS(here.op & MAX_BITS)], sizeof(here)); goto dolen;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
inffast_tpl.h(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: inflate.h:127-130
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `inflate_state` structure, fields like `lenbits`, `lencode`, `distcode`, and `distbits` may be moved within the structure for optimization purposes; they are not necessarily new additions.
Applied to files:
inffast_tpl.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (161)
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC No Chorba
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC No Chorba
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC No Chorba
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Windows GCC
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win32
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC No Chorba
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Ubuntu GCC Symbol Prefix
🔇 Additional comments (1)
inffast_tpl.h (1)
172-180: Literal/length handling translated correctlyThe switch from pointer dereference to value fields is consistent.
DROPBITS(here.bits)remains in the right place, andopis correctly derived fromhere.op & MAX_BITS.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1953 +/- ##
==========================================
Coverage ? 81.42%
==========================================
Files ? 162
Lines ? 13919
Branches ? 3119
==========================================
Hits ? 11334
Misses ? 1574
Partials ? 1011 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I did a bunch of testing on two different machines and with different settings to avoid any golden cacheline alignment offsetting things, but all my tests using deflatebench concludes this is slower (as yours seems to do as well..?). Deflatebench: 3.84% slower on average. Develop i7-11700K (AVX512 disabled) function-aligned=64PR #1953Not sure what causes this. |
Mine showed slight improvement, but my results were listed as PR, then DEVELOP. So I reversed it for easier reading. |
struct code hereis only 4 bytes and can easily be replaced by a single integer copy which may be faster than pointer indirection.DEVELOP
PR
Measured decompression performance improvement of about ~1.2%.
Summary by CodeRabbit