这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@mtl1979
Copy link
Collaborator

@mtl1979 mtl1979 commented Jan 26, 2025

Size of hold was inconsistent between different inflate versions. Make all use 64-bit variable.
In send_bits there was no guard against shifting by variable bit width, which will overflow.

Summary by CodeRabbit

Release Notes

  • Performance Improvements

    • Enhanced bit buffer handling by expanding variable types to support larger data values.
    • Improved data processing capacity in compression and decompression functions.
  • Bug Fixes

    • Added safety checks to prevent potential buffer overflow in bit manipulation operations.
  • Technical Updates

    • Updated internal data type representations to increase range and precision of bit-level operations.
    • Reduced memory footprint in the inflate_state structure by modifying the padding array size.
    • Refined control flow for bit processing, ensuring proper handling of buffer states.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2025

Walkthrough

The pull request modifies the data type of the hold variable across multiple files in the zlib library, changing it from uint32_t to uint64_t in infback.c, inflate.c, and inflate.h to accommodate larger values. Additionally, the trees_emit.h file's send_bits macro is updated to include stricter conditions for handling the bit buffer, ensuring proper management of buffer capacity and preventing overflow.

Changes

File Change Summary
infback.c Changed hold variable type from uint32_t to uint64_t in PULLBYTE() macro and inflateBack() function
inflate.c Updated hold variable type to uint64_t in inflatePrime() function and PULLBYTE() macro; modified state->head->time and state->check assignments in inflate() function
inflate.h Modified inflate_state structure to change hold member type from uint32_t to uint64_t and reduced padding array size from two to one
trees_emit.h Added conditional check in send_bits macro to prevent buffer overflow when bi_valid reaches BIT_BUF_SIZE

Possibly related PRs

  • Add uncompress benchmark #1860: The changes in the main PR are related to the modifications of the hold variable in the PULLBYTE() macro, which is also updated in the retrieved PR, indicating a direct connection at the code level.
  • Disable CRC32-VX Extention for some Clang versions #1852: The changes in the main PR are related to the modifications of the hold variable in the PULLBYTE() macro, which is also updated in the retrieved PR, indicating a direct connection at the code level.
  • Fix typos #1825: The changes in the main PR are related to the modifications of the hold variable in the inflateBack() function, which is also addressed in the retrieved PR that tests the functionality of inflateBack().

Suggested Labels

bug, enhancement, Testing Framework

Suggested Reviewers

  • Dead2
  • nmoinvaz

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e191d30 and 54e3e0d.

📒 Files selected for processing (4)
  • infback.c (2 hunks)
  • inflate.c (6 hunks)
  • inflate.h (2 hunks)
  • trees_emit.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • trees_emit.h
  • inflate.h
  • infback.c
⏰ Context from checks skipped due to timeout of 90000ms (81)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
🔇 Additional comments (7)
inflate.c (7)

294-294: LGTM! Explicit casting prevents shift overflow.

The explicit cast to uint64_t before the shift operation prevents potential overflow issues when shifting 32-bit values.


390-391: LGTM! Consistent type safety in bit manipulation.

The explicit cast to uint64_t in the PULLBYTE macro ensures type safety during bit manipulation operations.


482-482: LGTM! Core type change enables safer bit manipulation.

The change to uint64_t for the hold variable provides the foundation for safer bit manipulation throughout the inflate process.


580-580: LGTM! Explicit cast prevents potential data loss.

The explicit cast to unsigned when assigning to state->head->time prevents potential data loss when converting from 64-bit to 32-bit.


707-707: LGTM! Safe type conversion for ZSWAP32.

The explicit cast to unsigned ensures safe interaction with the ZSWAP32 macro which expects a 32-bit value.


1131-1131: LGTM! Consistent type safety pattern.

The explicit cast to unsigned maintains consistency with the earlier ZSWAP32 usage and ensures type safety.


294-1131: Excellent type safety improvements throughout the inflate process.

The changes form a comprehensive and consistent pattern of improvements:

  1. Standardizing hold as a 64-bit variable
  2. Adding explicit casts for all bit manipulations
  3. Ensuring safe conversions when interacting with 32-bit fields and macros

These changes effectively address the shift overflow issues mentioned in the PR objectives while maintaining type safety throughout the code.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Jan 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.81%. Comparing base (287c4dc) to head (54e3e0d).
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1859      +/-   ##
===========================================
- Coverage    80.38%   79.81%   -0.58%     
===========================================
  Files          139      139              
  Lines        11176    11172       -4     
  Branches      2867     2864       -3     
===========================================
- Hits          8984     8917      -67     
- Misses        1214     1268      +54     
- Partials       978      987       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtl1979 mtl1979 changed the title Fix shift overflow in inflate. Fix shift overflow in inflate and send_code. Jan 26, 2025
@mtl1979 mtl1979 force-pushed the clang-static branch 3 times, most recently from aa81e2f to 6cc1a76 Compare January 26, 2025 21:09
@mtl1979 mtl1979 marked this pull request as ready for review January 26, 2025 22:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
inflate.h (1)

124-124: LGTM! The type change is necessary for preventing shift overflow.

The change from uint32_t to uint64_t for the hold member is a breaking change that affects the ABI. This requires a version bump in the library.

Consider adding a note in the changelog about this breaking change to help users upgrade safely.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3510e and 6cc1a76.

📒 Files selected for processing (4)
  • infback.c (2 hunks)
  • inflate.c (6 hunks)
  • inflate.h (1 hunks)
  • trees_emit.h (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
inflate.c

[warning] 294-294: inflate.c#L294
Added line #L294 was not covered by tests


[warning] 482-482: inflate.c#L482
Added line #L482 was not covered by tests


[warning] 580-580: inflate.c#L580
Added line #L580 was not covered by tests


[warning] 707-707: inflate.c#L707
Added line #L707 was not covered by tests

🔇 Additional comments (7)
trees_emit.h (1)

56-58: LGTM! The overflow protection is correctly implemented.

The added conditional check prevents potential overflow by ensuring bi_buf is only updated when there's enough space in the buffer.

Let's verify that all callers handle the case when the shift operation is skipped:

✅ Verification successful

The overflow protection is safe across all callers

The verification shows that all uses of send_bits properly handle the buffer state:

  • Used with small, controlled bit lengths (2-7 bits)
  • Buffer state (bi_buf, bi_valid) is consistently tracked
  • Buffer fullness is managed through proper flushing mechanisms
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for direct uses of send_bits macro
rg -A 5 "send_bits\s*\(" --type c

Length of output: 3143

infback.c (2)

105-105: LGTM! The type change in PULLBYTE macro is correct.

The change to use uint64_t for the shift operation prevents overflow and is consistent with the type change in inflate_state.


157-157: LGTM! The hold variable declaration is updated correctly.

The change to use uint64_t for the hold variable is consistent with the type change in inflate_state.

inflate.c (4)

294-294: LGTM! The hold calculation in inflatePrime is updated correctly.

The change to use uint64_t for the shift operation prevents overflow.

Note: Static analysis indicates this line is not covered by tests.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 294-294: inflate.c#L294
Added line #L294 was not covered by tests


482-482: LGTM! The hold variable declaration is updated correctly.

The change to use uint64_t for the hold variable is consistent with the type change in inflate_state.

Note: Static analysis indicates this line is not covered by tests.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 482-482: inflate.c#L482
Added line #L482 was not covered by tests


580-580: LGTM! The type cast for head->time assignment is correct.

The change ensures proper type handling when assigning to head->time.

Note: Static analysis indicates this line is not covered by tests.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 580-580: inflate.c#L580
Added line #L580 was not covered by tests


707-707: LGTM! The type casts for ZSWAP32 are correct.

The changes ensure proper type handling when using ZSWAP32.

Note: Static analysis indicates line 707 is not covered by tests.

Let's verify the test coverage for these changes:

Also applies to: 1131-1131

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 707-707: inflate.c#L707
Added line #L707 was not covered by tests

@mtl1979 mtl1979 force-pushed the clang-static branch 3 times, most recently from 58d1c4b to 49c8e86 Compare January 27, 2025 20:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
inflate.c (1)

294-294: Overall approach to fixing bit buffer overflow looks good.

The changes consistently use a 64-bit buffer for bit operations, which prevents overflow during bit accumulation and shifting. However, there are several narrowing conversions to 32-bit integers that need verification:

  1. Time value assignment
  2. Dictionary ID comparison
  3. Check value comparison

Please ensure that these values cannot exceed 32 bits in any valid use case.

Consider adding explicit range checks before these narrowing conversions to fail fast if the values exceed 32 bits, rather than silently truncating them.

Also applies to: 390-390, 580-580, 707-707, 1131-1131

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e07526 and 49c8e86.

📒 Files selected for processing (4)
  • infback.c (2 hunks)
  • inflate.c (6 hunks)
  • inflate.h (2 hunks)
  • trees_emit.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • trees_emit.h
  • inflate.h
  • infback.c
⏰ Context from checks skipped due to timeout of 90000ms (170)
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC UBSAN
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows GCC Compat No Opt
  • GitHub Check: Windows GCC Native Instructions (AVX)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (4)
inflate.c (4)

294-294: LGTM! Prevents bit shift overflow.

The cast to uint64_t before the left shift operation prevents potential overflow when state->bits approaches 32.


390-390: LGTM! Safe bit accumulation.

The cast to uint64_t ensures safe bit accumulation in the hold buffer, preventing overflow when shifting bytes into higher positions.


707-707: Verify the safety of dictionary ID and check value conversions.

The casts from uint64_t to unsigned in both the dictionary ID and check value comparisons could potentially truncate data. Let's verify that these values are guaranteed to be 32-bit.

Also applies to: 1131-1131

✅ Verification successful

The conversions from uint64_t to unsigned are safe

The narrowing conversions are safe because:

  • Dictionary IDs and check values are Adler-32 checksums, which by definition produce 32-bit values (RFC1950)
  • The ZSWAP32 macro operates on 32-bit values, and the input values cannot exceed this width
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are any bounds checks or documentation about the expected ranges
# of dictionary IDs and check values.

# Search for documentation or bounds checks on dictionary IDs
rg -A 5 -B 5 'dictid|dict.*id' .

# Search for documentation or bounds checks on check values
rg -A 5 -B 5 'check.*=.*hold' .

Length of output: 21072


580-580: Verify the safety of narrowing conversion.

The cast from uint64_t to unsigned could potentially truncate data. Let's verify that the time value cannot exceed the range of unsigned.

✅ Verification successful

The narrowing conversion is safe

The conversion from uint64_t to unsigned is safe because NEEDBITS(32) explicitly ensures that only 32 bits are loaded into hold before the assignment, which fits within the range of unsigned.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are any bounds checks on the time value before this assignment
# or any documentation about the expected range of the time value.

# Search for any bounds checks or documentation about time value
rg -A 5 -B 5 'time.*=.*hold' .
rg -A 5 -B 5 'gz_header.*time' .

Length of output: 548


Script:

#!/bin/bash
# Check NEEDBITS macro definition and hold variable type
rg -B 5 -A 2 'NEEDBITS.*\(' inflate.c
rg -B 2 -A 2 'hold.*=' inflate.c | grep -v 'time.*='

Length of output: 8591

@mtl1979 mtl1979 requested review from nmoinvaz and phprus January 28, 2025 10:53
@Dead2 Dead2 requested review from KungFuJesus and phprus and removed request for phprus February 1, 2025 11:42
@Dead2
Copy link
Member

Dead2 commented Feb 3, 2025

Not sure why Github thought I wanted to remove a reviewer, that was not my intention.

I do want to see reviews before I merge this though, next stable release is waiting for this currently.

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 3, 2025

Should probably performance test the send_bits. To me it seems like a lot of extra if statements just to hide warnings.

@phprus
Copy link
Contributor

phprus commented Feb 3, 2025

I don't see any problems in PR now.
Tomorrow I will try to analyze this PR in more detail.

Copy link
Contributor

@phprus phprus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wait for @nmoinvaz performance tests, but I don't think these are false positive warnings.

I think this PR needs to be merged.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 5, 2025

Should probably performance test the send_bits. To me it seems like a lot of extra if statements just to hide warnings.

I'm not quite sure if compilers can optimize out some of the if statements especially when there is more than one macro invocation in series. The current version is in my opinion a lot more readable than the previous one.

As far as I managed to check, running without static analyzer never hits the added code and neither of the two length variables is ever zero with valid stream.

@Dead2
Copy link
Member

Dead2 commented Feb 5, 2025

I had some gremlins causing benchmarks to be highly random (+-5% run to run) recently, finally figured it out, turns out I should not disable deeper C-states on the idling hyperthread while the other one is running the benchmark. I am thinking this change in behavior might have been caused by a kernel or cpu microcode update. 🤷‍♂️
(As some of you know, I force the benchmark to run on a specific core, disable turbo, force the core to run at a specific Mhz and use cpuset and a few kernel tricks to force other tasks off that core and its HT sibling.)

Develop

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.078/0.082/0.083/0.001        0.027/0.030/0.031/0.001        8,526,745
 2     43.871%      0.144/0.149/0.151/0.001        0.027/0.030/0.031/0.001        6,903,702
 3     42.390%      0.178/0.182/0.183/0.001        0.025/0.028/0.030/0.001        6,670,664
 4     41.644%      0.206/0.212/0.214/0.002        0.024/0.028/0.030/0.001        6,553,205
 5     41.215%      0.235/0.238/0.241/0.002        0.024/0.028/0.029/0.001        6,485,659
 6     41.032%      0.277/0.285/0.290/0.004        0.023/0.027/0.029/0.002        6,456,912
 7     40.778%      0.381/0.388/0.392/0.003        0.024/0.028/0.029/0.001        6,416,941
 8     40.704%      0.492/0.501/0.506/0.004        0.022/0.027/0.030/0.002        6,405,249
 9     40.409%      0.588/0.595/0.599/0.003        0.024/0.027/0.028/0.001        6,358,951

 avg1  42.914%                        0.292                          0.028
 tot                                 78.931                          7.601       60,778,028

   text    data     bss     dec     hex filename
 130504    1344       8  131856   20310 libz-ng.so.2

compress_bench/compress_bench/1                 5347 ns         5338 ns       124032
compress_bench/compress_bench/8                 5631 ns         5621 ns       117456
compress_bench/compress_bench/16                5669 ns         5660 ns       120083
compress_bench/compress_bench/32                6035 ns         6025 ns       112324
compress_bench/compress_bench/64                6344 ns         6334 ns       106719
compress_bench/compress_bench/512               6809 ns         6798 ns       100918
compress_bench/compress_bench/4096             12010 ns        11988 ns        58512
compress_bench/compress_bench/32768            46215 ns        46130 ns        15176
uncompress_bench/uncompress_bench/1              150 ns          149 ns      4683675
uncompress_bench/uncompress_bench/64             310 ns          309 ns      2264460
uncompress_bench/uncompress_bench/1024           484 ns          483 ns      1449716
uncompress_bench/uncompress_bench/16384         4637 ns         4629 ns       150867
uncompress_bench/uncompress_bench/131072       26841 ns        26795 ns        26045
uncompress_bench/uncompress_bench/1048576     207574 ns       207153 ns         3372

PR #1859 eca360a1ce671e29c2a808e915367ae5573fb8e7

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.085/0.090/0.092/0.002        0.027/0.030/0.031/0.001        8,526,745
 2     43.871%      0.150/0.154/0.156/0.002        0.025/0.030/0.032/0.001        6,903,702
 3     42.390%      0.180/0.185/0.188/0.002        0.026/0.029/0.031/0.001        6,670,664
 4     41.644%      0.209/0.216/0.219/0.003        0.024/0.029/0.031/0.002        6,553,205
 5     41.215%      0.237/0.242/0.245/0.002        0.026/0.029/0.031/0.001        6,485,659
 6     41.032%      0.282/0.291/0.294/0.003        0.026/0.028/0.029/0.001        6,456,912
 7     40.778%      0.378/0.388/0.394/0.004        0.024/0.029/0.031/0.002        6,416,941
 8     40.704%      0.492/0.501/0.508/0.004        0.026/0.029/0.030/0.001        6,405,249
 9     40.409%      0.585/0.594/0.603/0.005        0.024/0.028/0.029/0.001        6,358,951

 avg1  42.914%                        0.296                          0.029
 tot                                 79.869                          7.831       60,778,028

   text    data     bss     dec     hex filename
 132680    1344       8  134032   20b90 libz-ng.so.2

compress_bench/compress_bench/1                 5313 ns         5305 ns       122435
compress_bench/compress_bench/8                 5500 ns         5491 ns       119030
compress_bench/compress_bench/16                5526 ns         5517 ns       121097
compress_bench/compress_bench/32                6119 ns         6109 ns       115237
compress_bench/compress_bench/64                6483 ns         6473 ns       107138
compress_bench/compress_bench/512               7012 ns         7000 ns        98111
compress_bench/compress_bench/4096             12357 ns        12338 ns        56095
compress_bench/compress_bench/32768            47198 ns        47114 ns        14858
uncompress_bench/uncompress_bench/1              147 ns          147 ns      4759144
uncompress_bench/uncompress_bench/64             310 ns          309 ns      2263728
uncompress_bench/uncompress_bench/1024           485 ns          484 ns      1446700
uncompress_bench/uncompress_bench/16384         4650 ns         4644 ns       150566
uncompress_bench/uncompress_bench/131072       27031 ns        26985 ns        26080
uncompress_bench/uncompress_bench/1048576     207922 ns       207502 ns         3369

The PR adds 2176 bytes of code.
Compression is 1.2% slower on average.

  • Level 1 9.76% slower
  • Level 2 3.36% slower
  • Level 6, 2.1% slower
  • Levels 7-9 unchanged, lower levels up to 2.1% slower.
    Decompression 3% slower on average.

PR + added UNLIKELY() hints to all three if checks.

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.083/0.087/0.088/0.001        0.027/0.031/0.032/0.002        8,526,745
 2     43.871%      0.152/0.156/0.158/0.002        0.027/0.031/0.032/0.001        6,903,702
 3     42.390%      0.182/0.186/0.188/0.002        0.026/0.029/0.030/0.001        6,670,664
 4     41.644%      0.212/0.217/0.220/0.002        0.024/0.029/0.030/0.001        6,553,205
 5     41.215%      0.235/0.242/0.245/0.002        0.022/0.028/0.030/0.002        6,485,659
 6     41.032%      0.284/0.290/0.294/0.003        0.024/0.028/0.029/0.001        6,456,912
 7     40.778%      0.381/0.389/0.395/0.004        0.026/0.029/0.031/0.001        6,416,941
 8     40.704%      0.491/0.502/0.509/0.004        0.026/0.029/0.031/0.001        6,405,249
 9     40.409%      0.588/0.595/0.599/0.003        0.025/0.028/0.029/0.001        6,358,951

 avg1  42.914%                        0.296                          0.029
 tot                                 79.953                          7.808       60,778,028

   text    data     bss     dec     hex filename
 132360    1344       8  133712   20a50 libz-ng.so.2

compress_bench/compress_bench/1                 5294 ns         5286 ns       121578
compress_bench/compress_bench/8                 5575 ns         5566 ns       118902
compress_bench/compress_bench/16                5632 ns         5623 ns       119614
compress_bench/compress_bench/32                5921 ns         5912 ns       114086
compress_bench/compress_bench/64                6263 ns         6253 ns       108188
compress_bench/compress_bench/512               6775 ns         6765 ns       100570
compress_bench/compress_bench/4096             11951 ns        11931 ns        58166
compress_bench/compress_bench/32768            46690 ns        46609 ns        15017
uncompress_bench/uncompress_bench/1              150 ns          150 ns      4669256
uncompress_bench/uncompress_bench/64             309 ns          308 ns      2269007
uncompress_bench/uncompress_bench/1024           483 ns          483 ns      1449211
uncompress_bench/uncompress_bench/16384         4657 ns         4650 ns       150678
uncompress_bench/uncompress_bench/131072       26900 ns        26855 ns        26199
uncompress_bench/uncompress_bench/1048576     208094 ns       207662 ns         3368

Unlikely hints shrink the code by 320 bytes.
Performance numbers are not accurate enough to tell for sure. compress_bench>512 seems to be faster.

IOW, performance takes a noticeable hit with these changes, at least in my tests on an i9-9900K (x86-64 w/AVX2).

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 5, 2025

As far as I managed to check, running without static analyzer never hits the added code and neither of the two length variables is ever zero with valid stream.

Do you think this is a false positive? (I'm not sure, but I can't prove it)

If so, then we need to add assert's (maybe instead of conditions).

Assert's may tell the static analyzer that this is an impossible path.

I thought about using an assert but those don't generally trap on release code, so no easy way to verify if it is false positive. I wanted to see that the code pass all tests on CI first.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 5, 2025

Because of the performance hit, I would suggest we try to fix it another way. That code path is called a lot to emit bits. We can try and use the Dead2's assume PR? Or even downgrade the clang static analyzer version.

I have no problems with the uint64_t changes. It is only the if statements that are the problem.

I'm open for suggestions.

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 5, 2025

Referring to:

/home/runner/work/zlib-ng/zlib-ng/trees.c:569:5: warning: Left shift overflows the capacity of 'uint64_t' [core.BitwiseShift]
  569 |     send_bits(s, lcodes-257, 5, bi_buf, bi_valid); /* not +255 as stated in appnote.txt */
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/runner/work/zlib-ng/zlib-ng/trees_emit.h:56:23: note: expanded from macro 'send_bits'
   56 |         bi_buf |= val << bi_valid;\
      |                   ~~~~^~~~~~~~~~~

Idk if this will work:

#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wbitwise-shift"
bi_buf |= val << bi_valid;
#pragma clang diagnostic pop

There is no real case where bi_valid will ever be greater than BIT_BUF_SIZE (64)

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 5, 2025

We could also attempt something like this:

bi_buf |= val << (bi_valid & BIT_BUF_SIZE);

Which I think is the same as max(bi_valid, BIT_BUF_SIZE)

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 5, 2025

We could also attempt something like this:

bi_buf |= val << (bi_valid & BIT_BUF_SIZE);

Which I think is the same as max(bi_valid, BIT_BUF_SIZE)

That doesn't work... You can use "bitwise and" only with numbers where either lower or higher bits are all ones.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
inflate.c (1)

294-294: Architecture-level suggestion for bit buffer handling.

All the changes consistently upgrade the bit buffer from 32-bit to 64-bit types. Consider adding a type definition for the bit buffer to make future changes easier.

+/* Define bit buffer type for consistency */
+typedef uint64_t bit_buf_t;
+
 /* ... */
-uint64_t hold;              /* bit buffer */
+bit_buf_t hold;            /* bit buffer */

Also applies to: 390-390, 482-482, 580-580, 707-707, 1131-1131

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 741ac23 and de4c941.

📒 Files selected for processing (4)
  • infback.c (2 hunks)
  • inflate.c (6 hunks)
  • inflate.h (2 hunks)
  • trees_emit.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • infback.c
  • trees_emit.h
  • inflate.h
⏰ Context from checks skipped due to timeout of 90000ms (57)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (6)
inflate.c (6)

482-482: Consistent type change for hold variable.

The change from implicit to uint64_t type for the hold variable in the inflate() function aligns with the PR's goal of standardizing the bit buffer size.


707-707: Verify ZSWAP32 operation with unsigned cast.

The cast to unsigned before ZSWAP32 operation needs verification to ensure it doesn't affect the byte-swapping operation.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check the implementation of ZSWAP32 macro
rg -A 2 'ZSWAP32\('

Length of output: 2555


Cast to Unsigned is Appropriate for the ZSWAP32 Operation

After reviewing the definitions of the ZSWAP32 macro (which include implementations using built-in byte swap routines and manual bit-shifting) and its usage across the codebase, the explicit cast to unsigned in ZSWAP32((unsigned)hold) is correct. The cast ensures that the input is treated as a 32-bit unsigned value, which is exactly what ZSWAP32 is designed to operate on, preserving the intended bit pattern for accurate byte swapping.


390-390: Verify the PULLBYTE macro usage.

The change to uint64_t in the PULLBYTE macro is consistent with the PR's objective to standardize the hold variable size.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check for any other instances of PULLBYTE macro to ensure consistent usage
rg -A 2 'PULLBYTE\(\)'

Length of output: 1809


PULLBYTE macro usage appears consistent with the PR’s objective.

  • The macro definition in inflate.c uses a cast to uint64_t, and similar usage is found across the codebase.
  • No conflicting usage or deviations were identified in the preliminary search.

1131-1131: Verify data check comparison.

The cast to unsigned in the data check comparison is consistent with other changes, but we should verify it doesn't affect the checksum validation.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check other instances of checksum validation
rg -A 2 'state->check'

Length of output: 3428


Checksum validation comparison verified.

  • The usage of ZSWAP32((unsigned)hold) in the checksum comparison is consistent with similar conversions used in other sections (e.g., checksum updates in both inflate.c and arch/s390/dfltcc_inflate.c).
  • The cast to unsigned does not affect the correctness of the checksum validation, as it maintains consistency with how the checksum value is computed and compared throughout the codebase.

294-294: Verify the bit shift operation.

The change from implicit to explicit 64-bit type for hold is part of the fix for shift overflow. However, we should verify that the shift operation is safe.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check if bits can exceed 32 in inflatePrime
ast-grep --pattern $'if \(bits > $_\) {
  $$$
}'

Length of output: 49


I'll gather more context around the bit shift operation in inflate.c to verify that the usage of state->bits is appropriately safeguarded. Let's check the surrounding code where this operation occurs and search for any bounds checks on state->bits.


🏁 Script executed:

#!/bin/bash
# 1. Look up the context of the shift operation in inflate.c to see how 'state->bits' is handled.
rg "hold \+= \(uint64_t\)value << state->bits" -n -C 5 inflate.c

# 2. Search for any conditional checks on 'state->bits' in the file.
rg "if\s*\(.*state->bits\s*[<>]=" -n -C 5 inflate.c

Length of output: 426


Shift operation in inflate.c (line 294) is safe as implemented.
The condition at line 291 ensures that the sum of bits to be shifted (state->bits plus bits) never exceeds 32, keeping the shift amount well within safe bounds for a 64-bit value. This verification confirms that the explicit cast and the bounded shift ensure no overflow occurs during the operation.


580-580: Verify time value truncation.

The cast to unsigned when assigning to state->head->time could potentially truncate values. We should verify if this is intentional and safe.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check the type of state->head->time to understand potential truncation
ast-grep --pattern $'struct $_head {
  $$$
  unsigned time;
  $$$
}'

Length of output: 69


I'm not seeing enough context from the initial AST search. Let's run a few additional shell scripts to locate both the assignment context for state->head->time and how hold is defined in inflate.c so we can better assess whether the cast to unsigned truncates values intentionally.


🏁 Script executed:

#!/bin/bash
# 1. Search for the assignment of state->head->time in inflate.c to see surrounding context.
rg -n "state->head->time" inflate.c -C 5

# 2. Search for the declaration or usage of 'hold' to understand its type.
rg -n "hold" inflate.c -C 10

# 3. Attempt to locate the definition of the structure containing 'time'. 
#    This searches for lines defining a member 'time' in any struct in inflate.c.
rg -n "time[[:space:]]*;" inflate.c -C 5

Length of output: 13028


Attention: The 32-bit cast is intentional and safe.
The header’s time field is specified as a 32‐bit value (consistent with the GZIP format), and the code explicitly reads 32 bits (via the NEEDBITS(32) macro) into the 64-bit accumulator before casting. In this context, truncating to an unsigned 32-bit value is by design and does not lead to data loss.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 5, 2025

Idk if this will work:

#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wbitwise-shift"
bi_buf |= val << bi_valid;
#pragma clang diagnostic pop

There is no real case where bi_valid will ever be greater than BIT_BUF_SIZE (64)

You can't use pragmas within preprocessor macros.

@Dead2
Copy link
Member

Dead2 commented Feb 6, 2025

Retested with updated PR

For reference, Develop:

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.078/0.082/0.083/0.001        0.027/0.030/0.031/0.001        8,526,745
 2     43.871%      0.144/0.149/0.151/0.001        0.027/0.030/0.031/0.001        6,903,702
 3     42.390%      0.178/0.182/0.183/0.001        0.025/0.028/0.030/0.001        6,670,664
 4     41.644%      0.206/0.212/0.214/0.002        0.024/0.028/0.030/0.001        6,553,205
 5     41.215%      0.235/0.238/0.241/0.002        0.024/0.028/0.029/0.001        6,485,659
 6     41.032%      0.277/0.285/0.290/0.004        0.023/0.027/0.029/0.002        6,456,912
 7     40.778%      0.381/0.388/0.392/0.003        0.024/0.028/0.029/0.001        6,416,941
 8     40.704%      0.492/0.501/0.506/0.004        0.022/0.027/0.030/0.002        6,405,249
 9     40.409%      0.588/0.595/0.599/0.003        0.024/0.027/0.028/0.001        6,358,951

 avg1  42.914%                        0.292                          0.028
 tot                                 78.931                          7.601       60,778,028

   text    data     bss     dec     hex filename
 130504    1344       8  131856   20310 libz-ng.so.2

compress_bench/compress_bench/1                 5347 ns         5338 ns       124032
compress_bench/compress_bench/8                 5631 ns         5621 ns       117456
compress_bench/compress_bench/16                5669 ns         5660 ns       120083
compress_bench/compress_bench/32                6035 ns         6025 ns       112324
compress_bench/compress_bench/64                6344 ns         6334 ns       106719
compress_bench/compress_bench/512               6809 ns         6798 ns       100918
compress_bench/compress_bench/4096             12010 ns        11988 ns        58512
compress_bench/compress_bench/32768            46215 ns        46130 ns        15176
uncompress_bench/uncompress_bench/1              150 ns          149 ns      4683675
uncompress_bench/uncompress_bench/64             310 ns          309 ns      2264460
uncompress_bench/uncompress_bench/1024           484 ns          483 ns      1449716
uncompress_bench/uncompress_bench/16384         4637 ns         4629 ns       150867
uncompress_bench/uncompress_bench/131072       26841 ns        26795 ns        26045
uncompress_bench/uncompress_bench/1048576     207574 ns       207153 ns         3372

This PR:

Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.083/0.087/0.088/0.001        0.028/0.030/0.031/0.001        8,526,745
 2     43.871%      0.149/0.152/0.154/0.002        0.028/0.031/0.033/0.001        6,903,702
 3     42.390%      0.179/0.184/0.186/0.002        0.026/0.029/0.031/0.001        6,670,664
 4     41.644%      0.207/0.213/0.215/0.002        0.024/0.029/0.030/0.002        6,553,205
 5     41.215%      0.236/0.240/0.243/0.002        0.024/0.028/0.030/0.001        6,485,659
 6     41.032%      0.278/0.287/0.291/0.003        0.025/0.028/0.029/0.001        6,456,912
 7     40.778%      0.381/0.389/0.394/0.003        0.027/0.028/0.030/0.001        6,416,941
 8     40.704%      0.493/0.501/0.506/0.003        0.027/0.029/0.030/0.001        6,405,249
 9     40.409%      0.583/0.594/0.598/0.003        0.025/0.028/0.029/0.001        6,358,951

 avg1  42.914%                        0.294                          0.029
 tot                                 79.421                          7.776       60,778,028

   text    data     bss     dec     hex filename
 130376    1344       8  131728   20290 libz-ng.so.2

compress_bench/compress_bench/1                 5336 ns         5327 ns       123379
compress_bench/compress_bench/8                 5646 ns         5606 ns       119203
compress_bench/compress_bench/16                5863 ns         5854 ns       113485
compress_bench/compress_bench/32                6057 ns         6047 ns       112427
compress_bench/compress_bench/64                6357 ns         6347 ns       108111
compress_bench/compress_bench/512               6950 ns         6939 ns        98693
compress_bench/compress_bench/4096             11944 ns        11923 ns        58593
compress_bench/compress_bench/32768            46457 ns        46378 ns        15090
uncompress_bench/uncompress_bench/1              151 ns          151 ns      4643511
uncompress_bench/uncompress_bench/64             309 ns          309 ns      2268901
uncompress_bench/uncompress_bench/1024           485 ns          484 ns      1445545
uncompress_bench/uncompress_bench/16384         4663 ns         4655 ns       150376
uncompress_bench/uncompress_bench/131072       26968 ns        26924 ns        26216
uncompress_bench/uncompress_bench/1048576     207662 ns       207227 ns         3374

It is now 128 bytes smaller than Develop.
Compression is now only 0.6% slower on average.
Level 1 is 6% slower, resulting in this having a larger slowdown in ms, not only in percent.
Level 2 is 2% slower
Levels 3-6 are around 0.5-1% slower
Levels 7-9 no change.

Confirmed these benchmarks with another run of before/after compiled with increased function alignment, showing the same trends as the above (0.58% slower on average).

@Dead2
Copy link
Member

Dead2 commented Feb 7, 2025

I benchmarked this alternative:

--- a/trees_emit.h
+++ b/trees_emit.h
@@ -45,10 +45,10 @@ extern Z_INTERNAL const int base_dist[D_CODES];
     uint32_t total_bits = bi_valid + len;\
     send_bits_trace(s, val, len);\
     sent_bits_add(s, len);\
-    if (total_bits < BIT_BUF_SIZE) {\
+    if (total_bits < BIT_BUF_SIZE && bi_valid < BIT_BUF_SIZE) {\
         bi_buf |= val << bi_valid;\
         bi_valid = total_bits;\
-    } else if (bi_valid == BIT_BUF_SIZE) {\
+    } else if (bi_valid >= BIT_BUF_SIZE) {\
         put_uint64(s, bi_buf);\
         bi_buf = val;\
         bi_valid = len;\

Wouldn't this also fix the problem? It keeps the block ordering more optimal.

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.082/0.085/0.087/0.001        0.027/0.030/0.031/0.001        8,526,745
 2     43.871%      0.148/0.151/0.152/0.001        0.026/0.030/0.032/0.002        6,903,702
 3     42.390%      0.179/0.181/0.183/0.001        0.025/0.028/0.030/0.001        6,670,664
 4     41.644%      0.206/0.213/0.215/0.003        0.026/0.028/0.029/0.001        6,553,205
 5     41.215%      0.233/0.239/0.242/0.002        0.024/0.028/0.029/0.001        6,485,659
 6     41.032%      0.283/0.288/0.291/0.002        0.024/0.027/0.029/0.002        6,456,912
 7     40.778%      0.384/0.388/0.391/0.002        0.023/0.026/0.029/0.002        6,416,941
 8     40.704%      0.493/0.500/0.505/0.004        0.025/0.028/0.030/0.001        6,405,249
 9     40.409%      0.589/0.597/0.603/0.004        0.024/0.027/0.029/0.001        6,358,951

 avg1  42.914%                        0.294                          0.028
 tot                                 79.269                          7.571       60,778,028

   text    data     bss     dec     hex filename
 130184    1344       8  131536   201d0 libz-ng.so.2

Surprisingly this results in even smaller code, -320bytes, the compiler must see some optimization opportunity that I don't.
On average 0.43% slower compression.

@nmoinvaz
Copy link
Member

nmoinvaz commented Feb 7, 2025

That does look a bit better. I think the logic is probably ok. Is it necessary to change bi_valid == BIT_BUF_SIZE to bi_valid >= BIT_BUF_SIZE?

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 7, 2025

I benchmarked this alternative:

--- a/trees_emit.h
+++ b/trees_emit.h
@@ -45,10 +45,10 @@ extern Z_INTERNAL const int base_dist[D_CODES];
     uint32_t total_bits = bi_valid + len;\
     send_bits_trace(s, val, len);\
     sent_bits_add(s, len);\
-    if (total_bits < BIT_BUF_SIZE) {\
+    if (total_bits < BIT_BUF_SIZE && bi_valid < BIT_BUF_SIZE) {\
         bi_buf |= val << bi_valid;\
         bi_valid = total_bits;\
-    } else if (bi_valid == BIT_BUF_SIZE) {\
+    } else if (bi_valid >= BIT_BUF_SIZE) {\
         put_uint64(s, bi_buf);\
         bi_buf = val;\
         bi_valid = len;\

Wouldn't this also fix the problem? It keeps the block ordering more optimal.

That's exactly the code what I thought first, but it's kinda less readable than the current "alternative"... If it performs better, I can change... I didn't like the extra test at top as every extra check usually means penalty somewhere and rely on optimizer to mask the penalty.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 7, 2025

That does look a bit better. I think the logic is probably ok. Is it necessary to change bi_valid == BIT_BUF_SIZE to bi_valid >= BIT_BUF_SIZE?

Static analyser does always require that and we really try to not mess too much with it to avoid the risk of replacing bad code with another bad code...

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
inflate.c (1)

291-292: Consider adding overflow checks for bits parameter.

The function checks if bits > 16 or if the total bits exceed 32, but with the change to uint64_t, we might want to add additional validation:

-if (bits > 16 || state->bits + (unsigned int)bits > 32)
+if (bits > 16 || state->bits + (unsigned int)bits > 32 || 
+    ((uint64_t)1 << bits) > UINT64_MAX / (1ULL << state->bits))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de4c941 and e191d30.

📒 Files selected for processing (4)
  • infback.c (2 hunks)
  • inflate.c (6 hunks)
  • inflate.h (2 hunks)
  • trees_emit.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • trees_emit.h
  • inflate.h
  • infback.c
⏰ Context from checks skipped due to timeout of 90000ms (113)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu Emscripten WASM32
  • GitHub Check: Ubuntu GCC AARCH64 No NEON
  • GitHub Check: Ubuntu GCC AARCH64 No ACLE
  • GitHub Check: Ubuntu GCC AARCH64
🔇 Additional comments (4)
inflate.c (4)

482-482: LGTM! The change to uint64_t aligns with the PR objectives.

The change from unsigned to uint64_t for the hold variable ensures consistent 64-bit storage across all versions of the inflate function, preventing potential overflow issues.


386-392: LGTM! The PULLBYTE macro has been updated correctly.

The modification to use uint64_t in the bit accumulator ensures proper handling of larger values and maintains consistency with the hold variable type change.


294-294: LGTM! The shift operation in inflatePrime is now safer.

The change to use uint64_t for the shift operation prevents potential overflow when shifting by large bit values.


580-580: Verify the type casting implications.

The code adds explicit casts to unsigned in several places:

  1. state->head->time = (unsigned)(hold)
  2. strm->adler = state->check = ZSWAP32((unsigned)hold)
  3. ZSWAP32((unsigned)hold)) != state->check

While these casts appear correct, we should verify that:

  1. No data loss occurs when casting from uint64_t to unsigned
  2. The ZSWAP32 macro correctly handles the unsigned value

Also applies to: 707-707, 1131-1131

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Search for ZSWAP32 macro definition and usage
rg -A 5 "ZSWAP32" --type c

Length of output: 4821


CAST IMPLICATIONS in inflate.c

After reviewing the output for macro definitions and usage, it appears that the explicit casts to unsigned (typically a 32‑bit value) are consistently applied to the variable hold before passing it to ZSWAP32. In our codebase, hold is used in a controlled context where only 32 meaningful bits are expected, and ZSWAP32 is defined to operate on a 32‑bit value. This confirms that:

• No unexpected data loss occurs when converting a uint64_t to an unsigned, given that only the lower 32 bits are used.
• The ZSWAP32 macro is implemented to correctly handle the unsigned value across different platforms.

Thus, the changes in explicit casts appear safe and correct.

@Dead2
Copy link
Member

Dead2 commented Feb 7, 2025

Interestingly, testing the code in godbolt (although as a function, not a macro, so less chances for the compiler to optimize) seems to indicate that changing position on the two if checks would allow it to only have 2 extra code blocks instead of 3.
It is of course quite logical now that I think about it.
There is no code size change, and performance is similar, compression is only 0.31% slower, so it is possibly slightly faster than the other way around.

https://godbolt.org/z/91e5WT1sP

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.081/0.085/0.086/0.001        0.028/0.030/0.031/0.001        8,526,745
 2     43.871%      0.147/0.151/0.153/0.001        0.025/0.029/0.030/0.001        6,903,702
 3     42.390%      0.179/0.183/0.184/0.002        0.025/0.028/0.030/0.001        6,670,664
 4     41.644%      0.210/0.214/0.216/0.001        0.026/0.028/0.030/0.001        6,553,205
 5     41.215%      0.233/0.239/0.242/0.002        0.023/0.027/0.029/0.001        6,485,659
 6     41.032%      0.282/0.287/0.291/0.003        0.025/0.028/0.029/0.001        6,456,912
 7     40.778%      0.381/0.387/0.391/0.003        0.023/0.027/0.029/0.001        6,416,941
 8     40.704%      0.495/0.500/0.505/0.003        0.024/0.028/0.029/0.002        6,405,249
 9     40.409%      0.589/0.594/0.599/0.003        0.025/0.027/0.028/0.001        6,358,951

 avg1  42.914%                        0.293                          0.028
 tot                                 79.177                          7.564       60,778,028

   text    data     bss     dec     hex filename
 130184    1344       8  131536   201d0 libz-ng.so.2
--- a/trees_emit.h
+++ b/trees_emit.h
@@ -45,10 +45,10 @@ extern Z_INTERNAL const int base_dist[D_CODES];
     uint32_t total_bits = bi_valid + len;\
     send_bits_trace(s, val, len);\
     sent_bits_add(s, len);\
-    if (total_bits < BIT_BUF_SIZE) {\
+    if (bi_valid < BIT_BUF_SIZE && total_bits < BIT_BUF_SIZE) {\
         bi_buf |= val << bi_valid;\
         bi_valid = total_bits;\
-    } else if (bi_valid == BIT_BUF_SIZE) {\
+    } else if (bi_valid >= BIT_BUF_SIZE) {\
         put_uint64(s, bi_buf);\
         bi_buf = val;\
         bi_valid = len;\

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 7, 2025

Interestingly, testing the code in godbolt (although as a function, not a macro, so less chances for the compiler to optimize) seems to indicate that changing position on the two if checks would allow it to only have 2 extra code blocks instead of 3. It is of course quite logical now that I think about it. There is no code size change, and performance is similar, compression is only 0.31% slower, so it is possibly slightly faster than the other way around.

This is how I "parse" the logic.

  1. Check if there is room for more bits, if there is, add as many bits as possible
  2. Check if there is 64 used bits and flush if there is. After that check if we still have more bits to add and add them if there is
  3. Update bi_valid by taking lowest 6 bits of bi_valid + len

@Dead2
Copy link
Member

Dead2 commented Feb 7, 2025

That's exactly the code what I thought first, but it's kinda less readable than the current "alternative"... If it performs better, I can change... I didn't like the extra test at top as every extra check usually means penalty somewhere and rely on optimizer to mask the penalty.

I guess the extra jump is more expensive than the extra test.
IIRC, on Broadwell for example, a tight loop of only jumps (worst case) is one jump every 12 cycles. In real code a large part of that latency can be hidden if the instruction and the target has already been decoded. Not sure what best case is, but I think ~4 cycles.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 7, 2025

That's exactly the code what I thought first, but it's kinda less readable than the current "alternative"... If it performs better, I can change... I didn't like the extra test at top as every extra check usually means penalty somewhere and rely on optimizer to mask the penalty.

I guess the extra jump is more expensive than the extra test. IIRC, on Broadwell for example, a tight loop of only jumps (worst case) is one jump every 12 cycles. In real code a large part of that latency can be hidden if the instruction and the target has already been decoded. Not sure what best case is, but I think ~4 cycles.

Modern compilers and processors are pretty good at guessing if jump is taken most of the time and reorder the code to make it as fast as possible. Like I said earlier, I'm open for suggestions from anyone, but I'm still worried if @nmoinvaz has trouble decoding the logic, then how can we expect that new or less experienced contributors can decode the logic and avoid breaking it later on... Honestly thinking a lot of the people who have contributed regularly are more experienced coders than I am.

@Dead2
Copy link
Member

Dead2 commented Feb 7, 2025

--- a/trees_emit.h
+++ b/trees_emit.h
@@ -45,10 +45,10 @@ extern Z_INTERNAL const int base_dist[D_CODES];
     uint32_t total_bits = bi_valid + len;\
     send_bits_trace(s, val, len);\
     sent_bits_add(s, len);\
-    if (total_bits < BIT_BUF_SIZE) {\
+    if (bi_valid < BIT_BUF_SIZE && total_bits < BIT_BUF_SIZE) {\
         bi_buf |= val << bi_valid;\
         bi_valid = total_bits;\
-    } else if (bi_valid == BIT_BUF_SIZE) {\
+    } else if (bi_valid >= BIT_BUF_SIZE) {\
         put_uint64(s, bi_buf);\
         bi_buf = val;\
         bi_valid = len;\

Also there should not be any need for checking bi_valid < BIT_BUF_SIZE in addition to total_bits < BIT_BUF_SIZE, right?
After all, total_bits = bi_valid + len, so if that is smaller than BIT_BUF_SIZE, then bi_valid will automatically also be smaller.

That would make the diff only

--- a/trees_emit.h
+++ b/trees_emit.h
@@ -48,7 +48,7 @@ extern Z_INTERNAL const int base_dist[D_CODES];
     if (total_bits < BIT_BUF_SIZE) {\
         bi_buf |= val << bi_valid;\
         bi_valid = total_bits;\
-    } else if (bi_valid == BIT_BUF_SIZE) {\
+    } else if (bi_valid >= BIT_BUF_SIZE) {\
         put_uint64(s, bi_buf);\
         bi_buf = val;\
         bi_valid = len;\

This also reduces the code size further.

   text    data     bss     dec     hex filename
 130056    1344       8  131408   20150 libz-ng.so.2

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 7, 2025

Also there should not be any need for checking bi_valid < BIT_BUF_SIZE in addition to total_bits < BIT_BUF_SIZE, right? After all, total_bits = bi_valid + len, so if that is smaller than BIT_BUF_SIZE, then bi_valid will automatically also be smaller.

Static analyser doesn't work that way... total_bits is not evaluated at that point yet, because the value can't be evaluated at build time (during code tree generation). First time it is evaluated is after value bi_valid is replaced with len, because that is often a constant. Essentially it's an explicit hint (not assume hint) to static analyser of what the valid range is.

@Dead2
Copy link
Member

Dead2 commented Feb 7, 2025

Also there should not be any need for checking bi_valid < BIT_BUF_SIZE in addition to total_bits < BIT_BUF_SIZE, right? After all, total_bits = bi_valid + len, so if that is smaller than BIT_BUF_SIZE, then bi_valid will automatically also be smaller.

Static analyser doesn't work that way... total_bits is not evaluated at that point yet, because the value can't be evaluated at build time (during code tree generation). First time it is evaluated is after value bi_valid is replaced with len, because that is often a constant. Essentially it's an explicit hint (not assume hint) to static analyser of what the valid range is.

Right, I had completely changed focus from static analyzer to optimizing the code while retaining correctness.
It is tricky if we cannot give range hints.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Feb 7, 2025

Right, I had completely changed focus from static analyzer to optimizing the code while retaining correctness. It is tricky if we cannot give range hints.

I have Clang 18 and that doesn't yet support range hints... Next big update will be April 2026 and then maybe the default becomes either Clang 19 or Clang 20 depending on if Clang 20 is stable enough.

Visual C++ has supported range hints for quite a while already. I haven't tested gcc 13 and later as those are broken for me.

Copy link
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Unless there is further discussion/changes, I'll merge this once comments have been added.

@Dead2 Dead2 merged commit 43b2703 into zlib-ng:develop Feb 8, 2025
149 of 150 checks passed
@Dead2 Dead2 mentioned this pull request Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants