这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@nmoinvaz
Copy link
Member

@nmoinvaz nmoinvaz commented Aug 18, 2025

struct code here is only 4 bytes and can easily be replaced by a single integer copy which may be faster than pointer indirection.

DEVELOP

OS: Darwin 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:29:54 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8122 arm64
CPU: arm
Tool: /Users/nathan/Source/deflatebench/minigzip-develop Size: 159,736 B
Timing: GNU time (gtime)
Levels: 0-9       
Runs: 60         Trim worst: 30        

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.030/0.030/0.030/0.000        0.030/0.030/0.030/0.000      211,973,953
 1     44.409%      0.640/0.640/0.640/0.000        0.320/0.320/0.320/0.000       94,127,497
 2     35.519%      1.000/1.008/1.010/0.004        0.310/0.310/0.310/0.000       75,286,322
 3     33.882%      1.260/1.266/1.270/0.005        0.290/0.292/0.300/0.004       71,815,206
 4     33.175%      1.430/1.439/1.440/0.003        0.290/0.290/0.290/0.000       70,317,229
 5     32.661%      1.510/1.518/1.530/0.006        0.280/0.280/0.280/0.000       69,228,146
 6     32.507%      1.780/1.797/1.800/0.005        0.280/0.280/0.280/0.000       68,902,143
 7     32.255%      2.580/2.593/2.600/0.007        0.280/0.280/0.280/0.000       68,366,759
 8     32.167%      4.030/4.042/4.050/0.007        0.280/0.280/0.280/0.000       68,180,762
 9     31.887%      4.430/4.446/4.450/0.006        0.270/0.270/0.270/0.000       67,586,442

 avg1  40.847%                        1.878                          0.263
 avg2  45.386%                        2.087                          0.292
 tot                                563.380                         78.950      865,784,459

PR

OS: Darwin 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:29:54 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8122 arm64
CPU: arm
Tool: /Users/nathan/Source/zlib-ng/build/minigzip Size: 159,736 B
Timing: GNU time (gtime)
Levels: 0-9       
Runs: 60         Trim worst: 30        

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.030/0.030/0.030/0.000        0.030/0.030/0.030/0.000      211,973,953
 1     44.409%      0.640/0.640/0.640/0.000        0.310/0.310/0.310/0.000       94,127,497
 2     35.519%      1.000/1.008/1.010/0.004        0.300/0.300/0.300/0.000       75,286,322
 3     33.882%      1.240/1.246/1.250/0.005        0.290/0.290/0.290/0.000       71,815,206
 4     33.175%      1.410/1.417/1.420/0.005        0.280/0.280/0.280/0.000       70,317,229
 5     32.661%      1.500/1.508/1.510/0.004        0.280/0.280/0.280/0.000       69,228,146
 6     32.507%      1.780/1.787/1.790/0.005        0.280/0.280/0.280/0.000       68,902,143
 7     32.255%      2.570/2.583/2.590/0.007        0.280/0.280/0.280/0.000       68,366,759
 8     32.167%      4.020/4.033/4.040/0.007        0.280/0.280/0.280/0.000       68,180,762
 9     31.887%      4.430/4.440/4.450/0.006        0.270/0.270/0.270/0.000       67,586,442

 avg1  40.847%                        1.869                          0.260
 avg2  45.386%                        2.077                          0.289
 tot                                560.740                         78.000      865,784,459

Measured decompression performance improvement of about ~1.2%.

Summary by CodeRabbit

  • Refactor
    • Internal decoding table access was reworked to use value-based reads for more consistent and robust handling.
    • Control flow for literal/length/distance processing was streamlined while preserving behavior.
    • No user-facing feature changes; targeted at maintainability with potential minor stability and performance improvements.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Walkthrough

Refactors inffast_tpl.h to load 4-byte code-table entries by value using memcpy and replaces all pointer-based here->... accesses with value-based here....; updates related bit-field checks, second-level lookups, and length/distance expansion to use here fields.

Changes

Cohort / File(s) Summary
Value-based code entry access refactor
inffast_tpl.h
Change here from pointer to value; replace all here->field with here.field; load lcode/dcode entries via memcpy(&here, &... , 4); update length/distance expansion and 2nd-level lookups to use here.val/here.op/here.bits; adjust control checks to use here.op.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

cleanup

Suggested reviewers

  • KungFuJesus

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 206f3a8 and cbf343a.

📒 Files selected for processing (1)
  • inffast_tpl.h (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • inffast_tpl.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (163)
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: EL9 GCC S390X DFLTCC
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: EL9 GCC S390X DFLTCC
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: EL9 GCC S390X DFLTCC
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Fuzzing
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
inffast_tpl.h (1)

153-161: Avoid magic constant 4 in memcpy size; prefer sizeof(code) for safety and forward-compat

Using literal 4 assumes the layout/packing of struct code never changes. It’s correct today, but brittle. Using sizeof(code) keeps the optimization (it’s a compile-time constant) and protects against future struct changes or packing differences.

Apply this diff:

-        memcpy(&here, &lcode[hold & lmask], 4);
+        memcpy(&here, &lcode[hold & lmask], sizeof(code));
@@
-            memcpy(&here, &lcode[hold & lmask], 4);
+            memcpy(&here, &lcode[hold & lmask], sizeof(code));
@@
-                memcpy(&here, &lcode[hold & lmask], 4);
+                memcpy(&here, &lcode[hold & lmask], sizeof(code));
@@
-            memcpy(&here, &dcode[hold & dmask], 4);
+            memcpy(&here, &dcode[hold & dmask], sizeof(code));
@@
-                memcpy(&here, &dcode[here.val + BITS(here.op)], 4);
+                memcpy(&here, &dcode[here.val + BITS(here.op)], sizeof(code));
@@
-            memcpy(&here, &lcode[here.val + BITS(here.op)], 4);
+            memcpy(&here, &lcode[here.val + BITS(here.op)], sizeof(code));

Optionally, if you want an extra belt-and-suspenders, add a compile-time assertion in a common header (or here) to document the expectation:

/* e.g., after includes or near code type definition site */
_Static_assert(sizeof(code) == 4, "inflate code table entries must be 4 bytes");

(Use your project’s static-assert macro if available.)

Also applies to: 177-177, 279-279, 286-286

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3a52db3 and 5b226ee.

📒 Files selected for processing (1)
  • inffast_tpl.h (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: inflate.h:127-130
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `inflate_state` structure, fields like `lenbits`, `lencode`, `distcode`, and `distbits` may be moved within the structure for optimization purposes; they are not necessarily new additions.

Applied to files:

  • inffast_tpl.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (38)
  • GitHub Check: Ubuntu Clang
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Clang
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (3)
inffast_tpl.h (3)

110-110: Switching from pointer to value for here removes aliasing and enables single 32-bit loads — good change

This reduces pointer-chasing and lets the compiler emit a single 32-bit load per entry, which explains the small speedup you’re seeing. No functional concerns here.


165-173: Literal/length decode control flow remains correct after the refactor

Dropping bits before branching and using here.op/here.val directly preserves the original semantics. The op & MAX_BITS masking for extra bits is intact. LGTM.


278-280: 2nd-level table lookups: BITS(here.op) is valid in these branches

In the 2nd-level branches ((here.op & 64) == 0), here.op encodes the number of index bits (0..15), so using BITS(here.op) is semantically equivalent to the old BITS(op) usage. Control flow matches the original pattern with goto back to the decode labels. Looks good.

Also applies to: 285-288

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
inffast_tpl.h (5)

110-110: Good change to load-by-value; add a guard for struct size assumptions

Switching here from a pointer to a by-value code improves locality and enables single 32-bit loads. However, the code relies on code being exactly 4 bytes elsewhere in this file. Make that explicit to future-proof the change.

You can add a compile-time assertion near the declarations:

/* Ensure code table entries are exactly 4 bytes */
#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
_Static_assert(sizeof(code) == 4, "code entries must be 4 bytes");
#endif

153-169: Use sizeof(here) instead of hard-coded 4 in memcpy for robustness

Replace the hard-coded size with sizeof(here) to avoid hidden footguns if code layout changes. This keeps intent clear and lets the compiler optimize the constant size.

-        memcpy(&here, &lcode[hold & lmask], 4);
+        memcpy(&here, &lcode[hold & lmask], sizeof(here));
@@
-            DROPBITS(here.bits);
-            memcpy(&here, &lcode[hold & lmask], 4);
+            DROPBITS(here.bits);
+            memcpy(&here, &lcode[hold & lmask], sizeof(here));
@@
-                DROPBITS(here.bits);
-                memcpy(&here, &lcode[hold & lmask], 4);
+                DROPBITS(here.bits);
+                memcpy(&here, &lcode[hold & lmask], sizeof(here));

184-187: Two minor nits: sizeof(here) and pre-refill ordering

  • Use sizeof(here) instead of 4 to copy the entry.
  • Consider refilling before indexing dcode to make the dependency explicit that enough bits are available to index the first-level distance table. The top-of-loop REFILL guarantees headroom, so this is a style/readability nit rather than correctness.
-            memcpy(&here, &dcode[hold & dmask], 4);
+            memcpy(&here, &dcode[hold & dmask], sizeof(here));
-            if (bits < MAX_BITS + MAX_DIST_EXTRA_BITS) {
+            if (bits < MAX_BITS + MAX_DIST_EXTRA_BITS) {
                 REFILL();
             }

285-287: Mask table-bit count when indexing 2nd-level distance table

While here.op should already be < 64 in this branch, explicitly masking to MAX_BITS prevents accidental inclusion of flag bits if table generation changes and preserves original semantics that used op & 15.

-            } else if ((here.op & 64) == 0) {      /* 2nd level distance code */
-                memcpy(&here, &dcode[here.val + BITS(here.op)], 4);
+            } else if ((here.op & 64) == 0) {      /* 2nd level distance code */
+                memcpy(&here, &dcode[here.val + BITS(here.op & MAX_BITS)], sizeof(here));
                 goto dodist;

293-294: Likewise mask when indexing 2nd-level length table and use sizeof(here)

Same rationale as distance: mask the bit count and avoid the literal 4.

-        } else if ((here.op & 64) == 0) {       /* 2nd level length code */
-            memcpy(&here, &lcode[here.val + BITS(here.op)], 4);
+        } else if ((here.op & 64) == 0) {       /* 2nd level length code */
+            memcpy(&here, &lcode[here.val + BITS(here.op & MAX_BITS)], sizeof(here));
             goto dolen;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5b226ee and 206f3a8.

📒 Files selected for processing (1)
  • inffast_tpl.h (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: inflate.h:127-130
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `inflate_state` structure, fields like `lenbits`, `lencode`, `distcode`, and `distbits` may be moved within the structure for optimization purposes; they are not necessarily new additions.

Applied to files:

  • inffast_tpl.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (161)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC No Chorba
  • GitHub Check: Ubuntu GCC SSE2 UBSAN
  • GitHub Check: Ubuntu GCC Symbol Prefix
🔇 Additional comments (1)
inffast_tpl.h (1)

172-180: Literal/length handling translated correctly

The switch from pointer dereference to value fields is consistent. DROPBITS(here.bits) remains in the right place, and op is correctly derived from here.op & MAX_BITS.

@codecov
Copy link

codecov bot commented Aug 18, 2025

Codecov Report

❌ Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3a52db3). Learn more about missing BASE report.

Files with missing lines Patch % Lines
inffast_tpl.h 93.10% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #1953   +/-   ##
==========================================
  Coverage           ?   81.42%           
==========================================
  Files              ?      162           
  Lines              ?    13919           
  Branches           ?     3119           
==========================================
  Hits               ?    11334           
  Misses             ?     1574           
  Partials           ?     1011           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Dead2
Copy link
Member

Dead2 commented Aug 18, 2025

I did a bunch of testing on two different machines and with different settings to avoid any golden cacheline alignment offsetting things, but all my tests using deflatebench concludes this is slower (as yours seems to do as well..?).

Deflatebench: 3.84% slower on average.
Uncompress_bench is too close to be sure, but possibly slightly faster, so perhaps this is only faster on highly compressible data?

Develop i7-11700K (AVX512 disabled) function-aligned=64

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%  0.0785/0.0787/0.0789/0.0001    0.0303/0.0311/0.0312/0.0002        8,526,745
 2     43.871%  0.1288/0.1292/0.1295/0.0002    0.0306/0.0307/0.0307/0.0000        6,903,702
 3     42.388%  0.1542/0.1545/0.1548/0.0002    0.0294/0.0295/0.0296/0.0000        6,670,239
 4     41.647%  0.1779/0.1783/0.1785/0.0002    0.0287/0.0288/0.0288/0.0000        6,553,746
 5     41.216%  0.1940/0.1945/0.1948/0.0002    0.0286/0.0286/0.0287/0.0000        6,485,936
 6     41.038%  0.2400/0.2406/0.2409/0.0002    0.0282/0.0283/0.0284/0.0000        6,457,827
 7     40.778%  0.3377/0.3382/0.3386/0.0003    0.0285/0.0287/0.0287/0.0000        6,416,941
 8     40.704%  0.4453/0.4459/0.4463/0.0003    0.0285/0.0286/0.0286/0.0000        6,405,249
 9     40.409%  0.5232/0.5238/0.5242/0.0002    0.0278/0.0278/0.0279/0.0000        6,358,951

 avg1  42.915%                       0.2537                         0.0291
 tot                                68.5096                         7.8627       60,779,336

   text    data     bss     dec     hex filename
 164758    1344       8  166110   288de libz-ng.so.2

uncompress_bench/uncompress_bench/1             70.4 ns         70.4 ns     39207767
uncompress_bench/uncompress_bench/64             216 ns          216 ns     12870185
uncompress_bench/uncompress_bench/1024           317 ns          317 ns      8854565
uncompress_bench/uncompress_bench/16384         2595 ns         2595 ns      1083860
uncompress_bench/uncompress_bench/131072       10142 ns        10142 ns       274575
uncompress_bench/uncompress_bench/1048576      75101 ns        75102 ns        37394

PR #1953

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%  0.0785/0.0787/0.0789/0.0001    0.0317/0.0325/0.0326/0.0002        8,526,745
 2     43.871%  0.1285/0.1290/0.1294/0.0002    0.0312/0.0319/0.0320/0.0001        6,903,702
 3     42.388%  0.1542/0.1544/0.1547/0.0001    0.0306/0.0307/0.0307/0.0000        6,670,239
 4     41.647%  0.1777/0.1782/0.1785/0.0002    0.0299/0.0299/0.0300/0.0000        6,553,746
 5     41.216%  0.1942/0.1945/0.1948/0.0002    0.0297/0.0298/0.0298/0.0000        6,485,936
 6     41.038%  0.2399/0.2404/0.2409/0.0003    0.0294/0.0294/0.0295/0.0000        6,457,827
 7     40.778%  0.3373/0.3379/0.3383/0.0002    0.0296/0.0297/0.0297/0.0000        6,416,941
 8     40.704%  0.4453/0.4458/0.4462/0.0003    0.0295/0.0296/0.0297/0.0000        6,405,249
 9     40.409%  0.5227/0.5236/0.5240/0.0003    0.0285/0.0286/0.0286/0.0000        6,358,951

 avg1  42.915%                       0.2536                         0.0302
 tot                                68.4788                         8.1653       60,779,336

   text    data     bss     dec     hex filename
 165142    1344       8  166494   28a5e libz-ng.so.2
 
uncompress_bench/uncompress_bench/1             70.3 ns         70.3 ns     40325817
uncompress_bench/uncompress_bench/64             225 ns          225 ns     12460990
uncompress_bench/uncompress_bench/1024           328 ns          328 ns      8625743
uncompress_bench/uncompress_bench/16384         2586 ns         2586 ns      1081615
uncompress_bench/uncompress_bench/131072       10048 ns        10048 ns       274041
uncompress_bench/uncompress_bench/1048576      74889 ns        74889 ns        37330

Not sure what causes this.
@KungFuJesus want to verify my results?

@nmoinvaz
Copy link
Member Author

nmoinvaz commented Aug 18, 2025

(as yours seems to do as well..?).

Mine showed slight improvement, but my results were listed as PR, then DEVELOP. So I reversed it for easier reading.

@nmoinvaz nmoinvaz marked this pull request as draft August 18, 2025 17:08
@nmoinvaz nmoinvaz closed this Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants