Remove here pointer in inflate_fast. #1953

nmoinvaz · 2025-08-18T01:04:45Z

struct code here is only 4 bytes and can easily be replaced by a single integer copy which may be faster than pointer indirection.

DEVELOP

OS: Darwin 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:29:54 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8122 arm64
CPU: arm
Tool: /Users/nathan/Source/deflatebench/minigzip-develop Size: 159,736 B
Timing: GNU time (gtime)
Levels: 0-9       
Runs: 60         Trim worst: 30        

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.030/0.030/0.030/0.000        0.030/0.030/0.030/0.000      211,973,953
 1     44.409%      0.640/0.640/0.640/0.000        0.320/0.320/0.320/0.000       94,127,497
 2     35.519%      1.000/1.008/1.010/0.004        0.310/0.310/0.310/0.000       75,286,322
 3     33.882%      1.260/1.266/1.270/0.005        0.290/0.292/0.300/0.004       71,815,206
 4     33.175%      1.430/1.439/1.440/0.003        0.290/0.290/0.290/0.000       70,317,229
 5     32.661%      1.510/1.518/1.530/0.006        0.280/0.280/0.280/0.000       69,228,146
 6     32.507%      1.780/1.797/1.800/0.005        0.280/0.280/0.280/0.000       68,902,143
 7     32.255%      2.580/2.593/2.600/0.007        0.280/0.280/0.280/0.000       68,366,759
 8     32.167%      4.030/4.042/4.050/0.007        0.280/0.280/0.280/0.000       68,180,762
 9     31.887%      4.430/4.446/4.450/0.006        0.270/0.270/0.270/0.000       67,586,442

 avg1  40.847%                        1.878                          0.263
 avg2  45.386%                        2.087                          0.292
 tot                                563.380                         78.950      865,784,459

PR

OS: Darwin 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:29:54 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8122 arm64
CPU: arm
Tool: /Users/nathan/Source/zlib-ng/build/minigzip Size: 159,736 B
Timing: GNU time (gtime)
Levels: 0-9       
Runs: 60         Trim worst: 30        

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.030/0.030/0.030/0.000        0.030/0.030/0.030/0.000      211,973,953
 1     44.409%      0.640/0.640/0.640/0.000        0.310/0.310/0.310/0.000       94,127,497
 2     35.519%      1.000/1.008/1.010/0.004        0.300/0.300/0.300/0.000       75,286,322
 3     33.882%      1.240/1.246/1.250/0.005        0.290/0.290/0.290/0.000       71,815,206
 4     33.175%      1.410/1.417/1.420/0.005        0.280/0.280/0.280/0.000       70,317,229
 5     32.661%      1.500/1.508/1.510/0.004        0.280/0.280/0.280/0.000       69,228,146
 6     32.507%      1.780/1.787/1.790/0.005        0.280/0.280/0.280/0.000       68,902,143
 7     32.255%      2.570/2.583/2.590/0.007        0.280/0.280/0.280/0.000       68,366,759
 8     32.167%      4.020/4.033/4.040/0.007        0.280/0.280/0.280/0.000       68,180,762
 9     31.887%      4.430/4.440/4.450/0.006        0.270/0.270/0.270/0.000       67,586,442

 avg1  40.847%                        1.869                          0.260
 avg2  45.386%                        2.077                          0.289
 tot                                560.740                         78.000      865,784,459

Measured decompression performance improvement of about ~1.2%.

Summary by CodeRabbit

Refactor
- Internal decoding table access was reworked to use value-based reads for more consistent and robust handling.
- Control flow for literal/length/distance processing was streamlined while preserving behavior.
- No user-facing feature changes; targeted at maintainability with potential minor stability and performance improvements.

coderabbitai · 2025-08-18T01:04:51Z

Walkthrough

Refactors inffast_tpl.h to load 4-byte code-table entries by value using memcpy and replaces all pointer-based here->... accesses with value-based here....; updates related bit-field checks, second-level lookups, and length/distance expansion to use here fields.

Changes

Cohort / File(s)	Summary
Value-based code entry access refactor `inffast_tpl.h`	Change `here` from pointer to value; replace all `here->field` with `here.field`; load `lcode`/`dcode` entries via `memcpy(&here, &... , 4)`; update length/distance expansion and 2nd-level lookups to use `here.val`/`here.op`/`here.bits`; adjust control checks to use `here.op`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

cleanup

Suggested reviewers

KungFuJesus

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 206f3a8 and cbf343a.

📒 Files selected for processing (1)

inffast_tpl.h (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

inffast_tpl.h

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (163)

GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: EL9 GCC S390X DFLTCC
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: EL9 GCC S390X DFLTCC
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: EL9 GCC S390X DFLTCC
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing
GitHub Check: macOS GCC UBSAN
GitHub Check: Windows GCC Compat No Opt
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win32
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows GCC
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Windows MSVC 2022 v143 Win64
GitHub Check: Ubuntu MinGW i686
GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Fuzzing

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

inffast_tpl.h (1)
153-161: Avoid magic constant 4 in memcpy size; prefer sizeof(code) for safety and forward-compat

Using literal 4 assumes the layout/packing of struct code never changes. It’s correct today, but brittle. Using sizeof(code) keeps the optimization (it’s a compile-time constant) and protects against future struct changes or packing differences.

Apply this diff:
-        memcpy(&here, &lcode[hold & lmask], 4);
+        memcpy(&here, &lcode[hold & lmask], sizeof(code));
@@
-            memcpy(&here, &lcode[hold & lmask], 4);
+            memcpy(&here, &lcode[hold & lmask], sizeof(code));
@@
-                memcpy(&here, &lcode[hold & lmask], 4);
+                memcpy(&here, &lcode[hold & lmask], sizeof(code));
@@
-            memcpy(&here, &dcode[hold & dmask], 4);
+            memcpy(&here, &dcode[hold & dmask], sizeof(code));
@@
-                memcpy(&here, &dcode[here.val + BITS(here.op)], 4);
+                memcpy(&here, &dcode[here.val + BITS(here.op)], sizeof(code));
@@
-            memcpy(&here, &lcode[here.val + BITS(here.op)], 4);
+            memcpy(&here, &lcode[here.val + BITS(here.op)], sizeof(code));
Optionally, if you want an extra belt-and-suspenders, add a compile-time assertion in a common header (or here) to document the expectation:
/* e.g., after includes or near code type definition site */
_Static_assert(sizeof(code) == 4, "inflate code table entries must be 4 bytes");
(Use your project’s static-assert macro if available.)

Also applies to: 177-177, 279-279, 286-286

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3a52db3 and 5b226ee.

📒 Files selected for processing (1)

inffast_tpl.h (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.

📚 Learning: 2024-10-08T19:37:14.998Z

Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: inflate.h:127-130
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `inflate_state` structure, fields like `lenbits`, `lencode`, `distcode`, and `distbits` may be moved within the structure for optimization purposes; they are not necessarily new additions.

Applied to files:

inffast_tpl.h

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)

GitHub Check: Ubuntu Clang
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu Clang
GitHub Check: macOS GCC Symbol Prefix & Compat
GitHub Check: EL9 Clang S390X DFLTCC MSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN

🔇 Additional comments (3)

inffast_tpl.h (3)

110-110: Switching from pointer to value for here removes aliasing and enables single 32-bit loads — good change

This reduces pointer-chasing and lets the compiler emit a single 32-bit load per entry, which explains the small speedup you’re seeing. No functional concerns here.

165-173: Literal/length decode control flow remains correct after the refactor

Dropping bits before branching and using here.op/here.val directly preserves the original semantics. The op & MAX_BITS masking for extra bits is intact. LGTM.

278-280: 2nd-level table lookups: BITS(here.op) is valid in these branches

In the 2nd-level branches ((here.op & 64) == 0), here.op encodes the number of index bits (0..15), so using BITS(here.op) is semantically equivalent to the old BITS(op) usage. Control flow matches the original pattern with goto back to the decode labels. Looks good.

Also applies to: 285-288

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

inffast_tpl.h (5)
110-110: Good change to load-by-value; add a guard for struct size assumptions

Switching here from a pointer to a by-value code improves locality and enables single 32-bit loads. However, the code relies on code being exactly 4 bytes elsewhere in this file. Make that explicit to future-proof the change.

You can add a compile-time assertion near the declarations:
/* Ensure code table entries are exactly 4 bytes */
#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
_Static_assert(sizeof(code) == 4, "code entries must be 4 bytes");
#endif
153-169: Use sizeof(here) instead of hard-coded 4 in memcpy for robustness

Replace the hard-coded size with sizeof(here) to avoid hidden footguns if code layout changes. This keeps intent clear and lets the compiler optimize the constant size.
-        memcpy(&here, &lcode[hold & lmask], 4);
+        memcpy(&here, &lcode[hold & lmask], sizeof(here));
@@
-            DROPBITS(here.bits);
-            memcpy(&here, &lcode[hold & lmask], 4);
+            DROPBITS(here.bits);
+            memcpy(&here, &lcode[hold & lmask], sizeof(here));
@@
-                DROPBITS(here.bits);
-                memcpy(&here, &lcode[hold & lmask], 4);
+                DROPBITS(here.bits);
+                memcpy(&here, &lcode[hold & lmask], sizeof(here));
184-187: Two minor nits: sizeof(here) and pre-refill ordering

Use sizeof(here) instead of 4 to copy the entry.

Consider refilling before indexing dcode to make the dependency explicit that enough bits are available to index the first-level distance table. The top-of-loop REFILL guarantees headroom, so this is a style/readability nit rather than correctness.
-            memcpy(&here, &dcode[hold & dmask], 4);
+            memcpy(&here, &dcode[hold & dmask], sizeof(here));
-            if (bits < MAX_BITS + MAX_DIST_EXTRA_BITS) {
+            if (bits < MAX_BITS + MAX_DIST_EXTRA_BITS) {
                 REFILL();
             }
285-287: Mask table-bit count when indexing 2nd-level distance table

While here.op should already be < 64 in this branch, explicitly masking to MAX_BITS prevents accidental inclusion of flag bits if table generation changes and preserves original semantics that used op & 15.
-            } else if ((here.op & 64) == 0) {      /* 2nd level distance code */
-                memcpy(&here, &dcode[here.val + BITS(here.op)], 4);
+            } else if ((here.op & 64) == 0) {      /* 2nd level distance code */
+                memcpy(&here, &dcode[here.val + BITS(here.op & MAX_BITS)], sizeof(here));
                 goto dodist;
293-294: Likewise mask when indexing 2nd-level length table and use sizeof(here)

Same rationale as distance: mask the bit count and avoid the literal 4.
-        } else if ((here.op & 64) == 0) {       /* 2nd level length code */
-            memcpy(&here, &lcode[here.val + BITS(here.op)], 4);
+        } else if ((here.op & 64) == 0) {       /* 2nd level length code */
+            memcpy(&here, &lcode[here.val + BITS(here.op & MAX_BITS)], sizeof(here));
             goto dolen;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5b226ee and 206f3a8.

📒 Files selected for processing (1)

inffast_tpl.h (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.

📚 Learning: 2024-10-08T19:37:14.998Z

Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: inflate.h:127-130
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `inflate_state` structure, fields like `lenbits`, `lencode`, `distcode`, and `distbits` may be moved within the structure for optimization purposes; they are not necessarily new additions.

Applied to files:

inffast_tpl.h

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (161)

GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix
GitHub Check: Windows GCC
GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
GitHub Check: Windows ClangCl Win64
GitHub Check: Windows MSVC 2022 v140 Win32
GitHub Check: Windows MSVC 2022 v140 Win64
GitHub Check: Windows MSVC 2022 v142 Win64
GitHub Check: Windows MSVC 2022 v141 Win64
GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
GitHub Check: Windows MSVC 2022 v143 Win32
GitHub Check: Ubuntu MinGW i686
GitHub Check: EL9 GCC S390X DFLTCC UBSAN
GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
GitHub Check: Ubuntu GCC AARCH64 ASAN
GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
GitHub Check: Ubuntu GCC No Chorba
GitHub Check: Ubuntu GCC SSE2 UBSAN
GitHub Check: Ubuntu GCC Symbol Prefix

🔇 Additional comments (1)

inffast_tpl.h (1)

172-180: Literal/length handling translated correctly

The switch from pointer dereference to value fields is consistent. DROPBITS(here.bits) remains in the right place, and op is correctly derived from here.op & MAX_BITS.

codecov · 2025-08-18T02:40:21Z

Codecov Report

❌ Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3a52db3). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
inffast_tpl.h	93.10%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #1953   +/-   ##
==========================================
  Coverage           ?   81.42%           
==========================================
  Files              ?      162           
  Lines              ?    13919           
  Branches           ?     3119           
==========================================
  Hits               ?    11334           
  Misses             ?     1574           
  Partials           ?     1011

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Dead2 · 2025-08-18T12:39:24Z

I did a bunch of testing on two different machines and with different settings to avoid any golden cacheline alignment offsetting things, but all my tests using deflatebench concludes this is slower (as yours seems to do as well..?).

Deflatebench: 3.84% slower on average.
Uncompress_bench is too close to be sure, but possibly slightly faster, so perhaps this is only faster on highly compressible data?

Develop i7-11700K (AVX512 disabled) function-aligned=64

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%  0.0785/0.0787/0.0789/0.0001    0.0303/0.0311/0.0312/0.0002        8,526,745
 2     43.871%  0.1288/0.1292/0.1295/0.0002    0.0306/0.0307/0.0307/0.0000        6,903,702
 3     42.388%  0.1542/0.1545/0.1548/0.0002    0.0294/0.0295/0.0296/0.0000        6,670,239
 4     41.647%  0.1779/0.1783/0.1785/0.0002    0.0287/0.0288/0.0288/0.0000        6,553,746
 5     41.216%  0.1940/0.1945/0.1948/0.0002    0.0286/0.0286/0.0287/0.0000        6,485,936
 6     41.038%  0.2400/0.2406/0.2409/0.0002    0.0282/0.0283/0.0284/0.0000        6,457,827
 7     40.778%  0.3377/0.3382/0.3386/0.0003    0.0285/0.0287/0.0287/0.0000        6,416,941
 8     40.704%  0.4453/0.4459/0.4463/0.0003    0.0285/0.0286/0.0286/0.0000        6,405,249
 9     40.409%  0.5232/0.5238/0.5242/0.0002    0.0278/0.0278/0.0279/0.0000        6,358,951

 avg1  42.915%                       0.2537                         0.0291
 tot                                68.5096                         7.8627       60,779,336

   text    data     bss     dec     hex filename
 164758    1344       8  166110   288de libz-ng.so.2

uncompress_bench/uncompress_bench/1             70.4 ns         70.4 ns     39207767
uncompress_bench/uncompress_bench/64             216 ns          216 ns     12870185
uncompress_bench/uncompress_bench/1024           317 ns          317 ns      8854565
uncompress_bench/uncompress_bench/16384         2595 ns         2595 ns      1083860
uncompress_bench/uncompress_bench/131072       10142 ns        10142 ns       274575
uncompress_bench/uncompress_bench/1048576      75101 ns        75102 ns        37394

PR #1953

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%  0.0785/0.0787/0.0789/0.0001    0.0317/0.0325/0.0326/0.0002        8,526,745
 2     43.871%  0.1285/0.1290/0.1294/0.0002    0.0312/0.0319/0.0320/0.0001        6,903,702
 3     42.388%  0.1542/0.1544/0.1547/0.0001    0.0306/0.0307/0.0307/0.0000        6,670,239
 4     41.647%  0.1777/0.1782/0.1785/0.0002    0.0299/0.0299/0.0300/0.0000        6,553,746
 5     41.216%  0.1942/0.1945/0.1948/0.0002    0.0297/0.0298/0.0298/0.0000        6,485,936
 6     41.038%  0.2399/0.2404/0.2409/0.0003    0.0294/0.0294/0.0295/0.0000        6,457,827
 7     40.778%  0.3373/0.3379/0.3383/0.0002    0.0296/0.0297/0.0297/0.0000        6,416,941
 8     40.704%  0.4453/0.4458/0.4462/0.0003    0.0295/0.0296/0.0297/0.0000        6,405,249
 9     40.409%  0.5227/0.5236/0.5240/0.0003    0.0285/0.0286/0.0286/0.0000        6,358,951

 avg1  42.915%                       0.2536                         0.0302
 tot                                68.4788                         8.1653       60,779,336

   text    data     bss     dec     hex filename
 165142    1344       8  166494   28a5e libz-ng.so.2
 
uncompress_bench/uncompress_bench/1             70.3 ns         70.3 ns     40325817
uncompress_bench/uncompress_bench/64             225 ns          225 ns     12460990
uncompress_bench/uncompress_bench/1024           328 ns          328 ns      8625743
uncompress_bench/uncompress_bench/16384         2586 ns         2586 ns      1081615
uncompress_bench/uncompress_bench/131072       10048 ns        10048 ns       274041
uncompress_bench/uncompress_bench/1048576      74889 ns        74889 ns        37330

Not sure what causes this.
@KungFuJesus want to verify my results?

nmoinvaz · 2025-08-18T17:06:47Z

(as yours seems to do as well..?).

Mine showed slight improvement, but my results were listed as PR, then DEVELOP. So I reversed it for easier reading.

nmoinvaz added 2 commits August 17, 2025 18:02

Remove here pointer in inflate_fast.

9f7082b

Remove redundant op variable usage where possible.

5b226ee

nmoinvaz added the optimization label Aug 18, 2025

Add missing Tracevv calls for decoding literals in inflate_fast.

206f3a8

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

Use sizeof(code) in memcpy for here structure in inflate_fast.

cbf343a

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

nmoinvaz marked this pull request as draft August 18, 2025 17:08

nmoinvaz closed this Aug 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove here pointer in inflate_fast. #1953

Remove here pointer in inflate_fast. #1953

Uh oh!

nmoinvaz commented Aug 18, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Aug 18, 2025

Uh oh!

Dead2 commented Aug 18, 2025

Uh oh!

nmoinvaz commented Aug 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Remove here pointer in inflate_fast. #1953

Remove here pointer in inflate_fast. #1953

Uh oh!

Conversation

nmoinvaz commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DEVELOP

PR

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 18, 2025

Codecov Report

Uh oh!

Dead2 commented Aug 18, 2025

Develop i7-11700K (AVX512 disabled) function-aligned=64

PR #1953

Uh oh!

nmoinvaz commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nmoinvaz commented Aug 18, 2025 •

edited

Loading

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

nmoinvaz commented Aug 18, 2025 •

edited

Loading