这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@am11
Copy link
Contributor

@am11 am11 commented Dec 12, 2024

Fix misspelling of ensure

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2024

Walkthrough

The changes involve extensive updates to the zlib library, focusing on enhancing the testing coverage for the inflate() and inflateBack() functions, as well as modifications to the header files zlib-ng.h.in and zlib.h.in. New test cases address various edge cases and error conditions, including a specific CVE. The header files have been updated to include new functions, macros, and support for 64-bit file offsets, alongside version information updates and the removal of certain undocumented functions.

Changes

File Change Summary
test/infcover.c Added extensive test cases for inflate() and inflateBack() functions, covering edge cases, error conditions, and specific CVE (CVE-2022-37434).
zlib-ng.h.in Added new macros and constants for compression parameters, introduced functions zng_deflateSetParams, zng_deflateGetParams, and several undocumented functions. Added gzfile operations functions zng_gzvprintf and zng_gzopen_w (Windows).
zlib.h.in Updated version information for zlib-ng, clarified API differences, added support for 64-bit file offsets and new functions (deflatePrime, inflateGetDictionary, inflateResetKeep, deflateResetKeep, gzvprintf), removed inflateBackInit and inflateBack, and made minor adjustments to comments and formatting.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 43d74a2 and 7e2c637.

📒 Files selected for processing (3)
  • test/infcover.c (1 hunks)
  • zlib-ng.h.in (1 hunks)
  • zlib.h.in (1 hunks)
🔇 Additional comments (4)
test/infcover.c (2)

629-629: Typo Correction Enhances Comment Clarity

The corrected comment improves the readability and understanding of the code explanation.


Line range hint 893-893: Typo Correction Improves Documentation Accuracy

The updated comment provides clearer guidance to the developers.

zlib-ng.h.in (1)

Line range hint 1-1: No Changes Detected for Review

The file zlib-ng.h.in has no annotated code changes to review.

zlib.h.in (1)

Line range hint 1-1: No Changes Detected for Review

The file zlib.h.in has no annotated code changes to review.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nmoinvaz
Copy link
Member

Please squash the commits.

@am11
Copy link
Contributor Author

am11 commented Dec 16, 2024

@nmoinvaz github has Squash and Merge option in the UI for someone with merge rights.
image

@Dead2 Dead2 merged commit 4fa76be into zlib-ng:develop Dec 20, 2024
142 of 148 checks passed
@codecov
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (43d74a2) to head (7e2c637).
Report is 4 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #1825       +/-   ##
===========================================
- Coverage    32.25%       0   -32.26%     
===========================================
  Files           67       0       -67     
  Lines         5745       0     -5745     
  Branches      1239       0     -1239     
===========================================
- Hits          1853       0     -1853     
+ Misses        3637       0     -3637     
+ Partials       255       0      -255     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@am11 am11 deleted the patch-4 branch December 20, 2024 23:18
@Dead2 Dead2 mentioned this pull request Dec 31, 2024
fneddy pushed a commit to fneddy/zlib-ng that referenced this pull request Jan 23, 2025
rkausch-fender pushed a commit to cclsoftware/zlib-ng that referenced this pull request Jan 28, 2025
When building with CMake toolchain provided by NDK, the ARCH variable is
not "aarch64", but "aarch64-none-linux-android26" (or similar). The
strict string match check causes the WITH_ARMV6 option to be enabled in
such a case. In result, arch/arm/slide_hash_armv6.c is compiled, which
is not intended to be used on aarch64, and fails.

Relax the check and assume aarch64 if the ARCH variable contains aarch64.

Allow overridde CMAKE_CXX_STANDARD, CMAKE_CXX_STANDARD_REQUIRED, CMAKE_CXX_EXTENSIONS variables for tests and benchmarks.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

Fix overridde CMAKE_C_STANDARD, CMAKE_C_STANDARD_REQUIRED, CMAKE_C_EXTENSIONS. False value is allowed for CMAKE_C_STANDARD_REQUIRED and CMAKE_C_EXTENSIONS.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

Use target include instead of raw include

[CI] Don't try to use macOS 11 as it's no longer supported.

Replace non-ascii characters to fix MSVC warning

Force Visual C++ to treat source files as UTF-8.

Disable MSVC warning 4324 (struct padded due to alignment)

Simplify chunking in the copy ladder here

As it turns out, trying to peel off the remainder with so many branches
caused the code size to inflate a bit too much that this function
wouldn't inline without some fairly aggressive optimization flags. Only
catching vector sized chunks here makes the loop body small enough and
having the byte by byte copy idiom at the bottom gives the compiler some
flexibility that it is likely to do something there.

Explicitly set CMake policy 0169 to silence warning

The recommended `FetchContent_MakeAvailable()` is introduced in CMake
3.14 which is greater than `cmake_minimum_required()`.

CMake policy will effects subdirectories.

The `cmake_minimum_required(VERSION)` command implicitly calls
`cmake_policy(VERSION)`.

Closes zlib-ng#1788

Compute the "safe" distance properly

The safe pointer that is computed is an exclusive, not inclusive bounds.
While we were probably rarely ever bit this, if ever, it still makes
sense to apply the limit, properly.

Don't use 'dmax' and 'sane' variables unless their checks have been compiled in.

Add variable 'wbufsize' to track window buffer including padding, to allow
the chunkset code to spill garbage data into the padding area if available.

Reorder 'inflate_state' struct to improve cache-locality of variables
needed by inffast (from 6 cachelines to 1).
Also fill in some unnecessary holes.

configure: Fix linker flags for Haiku.

configure: add --mandir to override $mandir on command line.

Reorder variables in inflate functions to reduce padding holes
due to variable alignment requirements.

Simplify avx2 chunkset a bit

Put length 16 in the length checking ladder and take care of it there
since it's also a simple case to handle. We kind of went out of our way
to pretend 128 bit vectors didn't exist when using avx2 but this can be
handled in a single instruction. Strangely the intrinsic uses vector
register operands but the instruction itself assumes a memory operand
for the source. This also means we don't have to handle this case in our
"GET_CHUNK_MAG" function.

Make chunkset_avx2 half chunk aware

This gives us appreciable gains on a number of fronts.  The first being
we're inlining a pretty hot function that was getting dispatched to
regularly. Another is that we're able to do a safe lagged copy of a
distance that is smaller, so CHUNKCOPY gets its teeth back here for
smaller sizes, without having to do another dispatch to a function.

We're also now doing two overlapping writes at once and letting the CPU
do its store forwarding. This was an enhancement @dougallj had suggested
a while back.

Additionally, the "half chunk mag" here is fundamentally less
complicated because it doesn't require sythensizing cross lane permutes
with a blend operation, so we can optimistically do that first if the
len is small enough that a full 32 byte chunk doesn't make any sense.

Try to simply the inflate loop by collapsing most cases to chunksets

Make an AVX512 inflate fast with low cost masked writes

This takes advantage of the fact that on AVX512 architectures, masked
moves are incredibly cheap. There are many places where we have to
fallback to the safe C implementation of chunkcopy_safe because of the
assumed overwriting that occurs. We're to sidestep most of the branching
needed here by simply controlling the bounds of our writes with a mask.

Force use of latest Windows SDK with 32-bit ARM support

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

Fix casting warning/error in test_compress_bound.cc

Fixes the following error when building with msvc compiler
```
test_compress_bound.cc
D:\zlib-ng\test\test_compress_bound.cc(41,50): error C2220: the following warning is treated as an error
D:\zlib-ng\test\test_compress_bound.cc(41,50): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
D:\zlib-ng\test\test_compress_bound.cc(43,68): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
```

Remove unused HAVE_CHUNKMEMSET_1 define

Fix native detection of CRC instruction

It's unclear if raspberry pi OS's shipped GCC doesn't properly detect
ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the
preprocessor macro for that flag is not defined with -march=native on a
raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do
not get a fast CRC function.  The CRC32 preprocessor macro _IS_ defined,
and the auto detection when built without NATIVE support does properly
get dispatched to. Since we only need the scalar CRC32 and not the polynomial
stuff anyhow, let's make it be an || condition and not a && one.

Bump codecov/codecov-action from 4 to 5

Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v4...v5)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Address deprecated cmake version warning.

Use cmake_minimum_required(VERSION <min>...<policy_max>) syntax to set
the policy at the same time as the compatibile CMake version.

Enable AVX2 functions to be built with BMI2 instructions

While these are technically different instructions, no such CPU exists
that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to
eliminate several flag stalls by having flagless versions of shifts, and
allows us to not clobber and move around GPRs so much in scalar code.
There's usually a sizeable benefit for enabling it. Since we're building
with BMI2 for AVX2 functions, let's also just make sure the CPU claims
to support it (just to cover our bases).

zbuild: Provide a fallback for "ALIGNED_(x)" for other compiler

Improve pipeling for AVX512 chunking

For reasons that aren't quite so clear, using the masked writes here
did not pipeline very well. Either setting up the mask stalled things
or masked moves have issues overlapping regular moves. Simply putting
the masked moves behind a branch that is rarely taken seemed to do the
trick in improving the ILP. While here, put masked loads behind the same
branch in case there were ever a hazard for overreading.

Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
it is time to replace the UNALIGNED_OK checks that have since really only been
used to select the optimal comparison sizes for the arch instead.

Revert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics),"

This reverts commit 80fffd7.
It was mistakenly pushed to develop instead of going through a PR and the appropriate reviews.

added in-tree build artifacts to .gitignore

Fix typos (zlib-ng#1825)

Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
it is time to replace the UNALIGNED_OK checks that have since really only been
used to select the optimal comparison sizes for the arch instead.

adler32_rvv: Fix some overflow problems

There are currently some overflow problems in adler32_rvv
implementation, which can lead to wrong results for some input, and
these problems could be easily exhibited when running `git fsck` with
zlib-ng suitituting the system zlib on a big git repository.

These problems and the solutions are the following:

- When the input data is long enough, the v_buf32_accu can overflow too.
  Add it to the modulo code that happens per ~NMAX bytes.
- When the vector data is reduced to scalar ones, the resulting scalar
  value (and the proceeded length) may lead to the calculation of sum2
  to overflow. Add mod BASE to all these reductions and initial
  calculation of sum2.
- When the remaining data less than vl bytes, the code falls back to a
  scalar implementation; however the sum2 and alder2 values are just
  reduced from vectors and could be very big that makes sum2 overflows
  in the scalar code. Modulo them before the scalar code to prevent such
  overflow (because vl is surely quite smaller than NMAX).

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>

Make big endians first class citizens again

No longer do the big iron on yore which lack SIMD optimized loads need
to search strings a byte at a time like primitive machines of the vax
era. This guard here was mostly due to the fact that the string
comparison was searched with "count trailing zero", which assumes an
endianness.  We can just conditionally use leading zeros when on big
endian and stop using the extremely naive C implementation. This makes
things a tad bit faster.

Fix "RLE" compression with big endian architectures

This was missed in zlib-ng#1831. The RLE methods compare a string of bytes
directly with itself to directly derive a simple run length encoding.
They use similar but not identical methods to compare256. This needs
a similar endianness check at compile time to know which compare bit
count to use (leading or trailing).

Set OPTIMAL_CMP for 32-bit PowerPC

Update s390x actions-runner docker

Fix unaligned access in ACLE based crc32

This fixes a rightful complaint from the alignment sanitizer that we
alias memory in an unaligned fashion. A nice added bonus is that this
improves performance a tiny bit on the larger buffers, perhaps due to
loops that idiomatically decrement a count and increment a single buffer
pointer rather than the maze of conditional pointer reassignments.

While here, let's write a unit test just for this. Since this is the only
variant that accesses memory in a potentially unaligned fashion that doesn't
explicitly go byte by byte or use intrinsics that don't require alignment,
we'll enable it only for this function for now. Adding more tests later if
need be should be possible. For everything else not crc, we're relying on
ubsan to hopefully catch things by chance.

Improved setting of OPTIMAL_CMP on ARM

Use GCC's may_alias attribute for unaligned memory access

Rename functions to get rid of old and now misleading "unaligned" naming

Continued cleanup of old UNALIGNED_OK checks
- Remove obsolete checks
- Fix checks that are inconsistent
- Stop compiling compare256/longest_match variants that never gets called
- Improve how the generic compare256 functions are handled.
- Allow overriding OPTIMAL_CMP

This simplifies the code and avoids having a lot of code in the compiled library than can never get executed.

2.2.3 Release
rkausch-fender added a commit to cclsoftware/zlib-ng that referenced this pull request Jun 13, 2025
commit 860e4cf
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Feb 9 13:19:01 2025 +0100

    2.2.4 Release

commit 43b2703
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Sun Jan 26 21:31:36 2025 +0200

    Fix shift overflow in inflate and send_code.

commit 287c4dc
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sun Feb 2 21:05:37 2025 -0500

    Fix an unfortunate bug with Visual Studio 2015

    Evidently this instruction, despite the intrinsic having a register operand,
    is a memory-register instruction. There seems to be no alignment requirement
    for the source operand. Because of this, compilers when not optimized are doing
    the unaligned load and then dumping back to the stack to do the broadcasting load.
    In doing this, MSVC seems to be dumping to the stack with an aligned move at an
    unaligned address, causing a segfault.  GCC does not seem to make this mistake, as
    it stashes to an aligned address.

    If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128
    bit load followed by a vinserti128. This _should_ fix this (issue zlib-ng#1861).

commit a3c0430
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Jan 29 18:46:34 2025 +0100

    Fix -Wmaybe-uninitialized warnings in benchmarks.

commit 057104f
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Jan 29 16:54:36 2025 +0100

    Add uncompress benchmark

commit a0fa247
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Jan 26 15:05:24 2025 +0100

    s390x: Add workaround to install custom Clang 19.1.5 rpms to actions-runner
    image in order to avoid the VX compiler bug in older clang versions.

commit 05305ed
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Fri Jan 24 01:45:41 2025 +0500

    Remove unused include directories

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 69a60bf
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Fri Jan 24 01:45:26 2025 +0500

    Rename "arch/power/fallback_builtins.h" to avoid possible conflict with "fallback_builtins.h" in zlib-ng sources directory

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 7701ce9
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Sun Jan 26 13:19:08 2025 +0200

    [abicheck] Regenerate ABI files for zlib
    * Generate using Ubuntu 24.04.1 LTS to fix mismatch in function signatures of gzseek() and gztell()

commit 5e3510e
Author: Eduard Stefes <eduard.stefes@ibm.com>
Date:   Tue Jan 21 10:48:07 2025 +0100

    Disable CRC32-VX Extention for some Clang versions
    We have to disable the CRC32-VX implementation for some Clang versions
    (18 <= version < 19.1.2) that generate bad code for the IBM S390 VGFMA intrinsics.

commit 8cebc9c
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Thu Jan 23 23:25:09 2025 +0500

    Increase cmake workflow timeout

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 608871e
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Mon Jan 20 10:26:51 2025 -0800

    Use Ubuntu 20.04 for PPC64LE tests due to broken qemu.

commit 62d52a5
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Thu Jan 9 15:47:06 2025 -0800

    Use Ubuntu 22.04 for AARCH64 tests

    It seems that qemu might be failing. Tests on Raspberry Pi 5 with Ubuntu 24.04
    appear to work just fine.

commit b7dc018
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Sun Jan 5 08:01:41 2025 -0800

    Add missing compiler-rt libraries for Ubuntu 24. zlib-ng#1840

commit a95ee9e
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 16:20:17 2025 -0800

    Ignore gcovr parser errors.

commit bdfe700
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 14:41:27 2025 -0800

    Don't pin gcovr version any longer. zlib-ng#1840

commit 2ffbbdb
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Sat Jan 4 22:05:25 2025 -0800

    Use correct version of gcov for cross-compilers.

commit 6286088
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Thu Jan 2 15:17:33 2025 -0800

    Use Ubuntu 24 crossbuild-essential packages.

commit fbba9cb
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 14:46:59 2025 -0800

    Remove package qemu for Ubuntu 24. zlib-ng#1840

commit 7077052
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 14:38:12 2025 -0800

    Upgrade CI from Clang-11 to Clang 15 for Ubuntu 24. zlib-ng#1840

commit 212563d
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sat Jan 4 21:19:42 2025 +0100

    Improve image/container rebuild script to work properly under cron.

commit 9064a25
Author: Dmitry Kurtaev <dmitry.kurtaev@gmail.com>
Date:   Wed Jan 15 20:28:44 2025 +0300

    Workaround error G6E97C40B

    Warning as an error with GCC from Uubuntu 24.04:
    ```
    /home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/external/zlib-ng/arch/riscv/riscv_features.c(25,33): error G6E97C40B: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] [/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/libs/build-native.proj]
    ```

commit 6d24fb8
Author: Sam James <sam@gentoo.org>
Date:   Thu Jan 9 11:36:40 2025 +0000

    cmake: disable LTO for some configure checks

    Some of zlib-ng's configure tests define a function expecting it to be compiled but
    don't call that function, or don't use its return value. This is risky with
    LTO where the whole thing may be optimised out, which has happened before:
    * zlib-ng#1616
    * zlib-ng#1622
    * https://gitlab.kitware.com/cmake/cmake/-/issues/26103

    Closes: zlib-ng#1841

commit 787c7f6
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Wed Jan 1 13:53:16 2025 +0500

    Force use of latest Windows SDK with 32-bit ARM support for release workflows

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit cbb6ec1
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Dec 29 19:01:35 2024 +0100

    2.2.3 Release

commit bf05e88
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Fri Dec 20 23:31:37 2024 +0100

    Continued cleanup of old UNALIGNED_OK checks
    - Remove obsolete checks
    - Fix checks that are inconsistent
    - Stop compiling compare256/longest_match variants that never gets called
    - Improve how the generic compare256 functions are handled.
    - Allow overriding OPTIMAL_CMP

    This simplifies the code and avoids having a lot of code in the compiled library than can never get executed.

commit 1aeb291
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Dec 22 13:25:27 2024 +0100

    Rename functions to get rid of old and now misleading "unaligned" naming

commit d7e121e
Author: Cameron Cawley <ccawley2011@gmail.com>
Date:   Thu Jul 27 21:07:29 2023 +0100

    Use GCC's may_alias attribute for unaligned memory access

commit fc90e7b
Author: Cameron Cawley <ccawley2011@gmail.com>
Date:   Sun Dec 22 13:43:30 2024 +0000

    Improved setting of OPTIMAL_CMP on ARM

commit 06bba67
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sat Dec 21 11:04:47 2024 -0500

    Fix unaligned access in ACLE based crc32

    This fixes a rightful complaint from the alignment sanitizer that we
    alias memory in an unaligned fashion. A nice added bonus is that this
    improves performance a tiny bit on the larger buffers, perhaps due to
    loops that idiomatically decrement a count and increment a single buffer
    pointer rather than the maze of conditional pointer reassignments.

    While here, let's write a unit test just for this. Since this is the only
    variant that accesses memory in a potentially unaligned fashion that doesn't
    explicitly go byte by byte or use intrinsics that don't require alignment,
    we'll enable it only for this function for now. Adding more tests later if
    need be should be possible. For everything else not crc, we're relying on
    ubsan to hopefully catch things by chance.

commit 87d8e95
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Mon Sep 16 13:15:46 2024 +0200

    Update s390x actions-runner docker

commit 005c2d3
Author: Cameron Cawley <ccawley2011@gmail.com>
Date:   Sat Dec 21 17:30:18 2024 +0000

    Set OPTIMAL_CMP for 32-bit PowerPC

commit 90913e8
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sat Dec 21 10:09:58 2024 -0500

    Fix "RLE" compression with big endian architectures

    This was missed in zlib-ng#1831. The RLE methods compare a string of bytes
    directly with itself to directly derive a simple run length encoding.
    They use similar but not identical methods to compare256. This needs
    a similar endianness check at compile time to know which compare bit
    count to use (leading or trailing).

commit 04d1b75
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Fri Dec 20 18:53:51 2024 -0500

    Make big endians first class citizens again

    No longer do the big iron on yore which lack SIMD optimized loads need
    to search strings a byte at a time like primitive machines of the vax
    era. This guard here was mostly due to the fact that the string
    comparison was searched with "count trailing zero", which assumes an
    endianness.  We can just conditionally use leading zeros when on big
    endian and stop using the extremely naive C implementation. This makes
    things a tad bit faster.

commit dbccbd1
Author: Icenowy Zheng <uwu@icenowy.me>
Date:   Sun Dec 15 01:31:48 2024 +0800

    adler32_rvv: Fix some overflow problems

    There are currently some overflow problems in adler32_rvv
    implementation, which can lead to wrong results for some input, and
    these problems could be easily exhibited when running `git fsck` with
    zlib-ng suitituting the system zlib on a big git repository.

    These problems and the solutions are the following:

    - When the input data is long enough, the v_buf32_accu can overflow too.
      Add it to the modulo code that happens per ~NMAX bytes.
    - When the vector data is reduced to scalar ones, the resulting scalar
      value (and the proceeded length) may lead to the calculation of sum2
      to overflow. Add mod BASE to all these reductions and initial
      calculation of sum2.
    - When the remaining data less than vl bytes, the code falls back to a
      scalar implementation; however the sum2 and alder2 values are just
      reduced from vectors and could be very big that makes sum2 overflows
      in the scalar code. Modulo them before the scalar code to prevent such
      overflow (because vl is surely quite smaller than NMAX).

    Signed-off-by: Icenowy Zheng <uwu@icenowy.me>

commit 509f6b5
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Tue Dec 17 23:02:32 2024 +0100

    Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
    it is time to replace the UNALIGNED_OK checks that have since really only been
    used to select the optimal comparison sizes for the arch instead.

commit 4fa76be
Author: Adeel Mujahid <3840695+am11@users.noreply.github.com>
Date:   Sat Dec 21 00:35:50 2024 +0200

    Fix typos (zlib-ng#1825)

commit c295c28
Author: Eduard Stefes <eduard.stefes@ibm.com>
Date:   Wed Dec 4 09:15:27 2024 +0100

    added in-tree build artifacts to .gitignore

commit 037ab0f
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Tue Dec 17 23:09:31 2024 +0100

    Revert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics),"

    This reverts commit 80fffd7.
    It was mistakenly pushed to develop instead of going through a PR and the appropriate reviews.

commit 80fffd7
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Tue Dec 17 23:02:32 2024 +0100

    Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
    it is time to replace the UNALIGNED_OK checks that have since really only been
    used to select the optimal comparison sizes for the arch instead.

commit 43d74a2
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sat Nov 30 09:23:28 2024 -0500

    Improve pipeling for AVX512 chunking

    For reasons that aren't quite so clear, using the masked writes here
    did not pipeline very well. Either setting up the mask stalled things
    or masked moves have issues overlapping regular moves. Simply putting
    the masked moves behind a branch that is rarely taken seemed to do the
    trick in improving the ILP. While here, put masked loads behind the same
    branch in case there were ever a hazard for overreading.

commit a4e7c34
Author: Detlef Riekenberg <wine.dev@web.de>
Date:   Fri Nov 29 22:59:52 2024 +0100

    zbuild: Provide a fallback for "ALIGNED_(x)" for other compiler

commit 7020cb3
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Wed Nov 27 19:00:52 2024 -0500

    Enable AVX2 functions to be built with BMI2 instructions

    While these are technically different instructions, no such CPU exists
    that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to
    eliminate several flag stalls by having flagless versions of shifts, and
    allows us to not clobber and move around GPRs so much in scalar code.
    There's usually a sizeable benefit for enabling it. Since we're building
    with BMI2 for AVX2 functions, let's also just make sure the CPU claims
    to support it (just to cover our bases).

commit 11bef87
Author: Bradley Lowekamp <blowekamp@mail.nih.gov>
Date:   Tue Nov 26 09:12:49 2024 -0500

    Address deprecated cmake version warning.

    Use cmake_minimum_required(VERSION <min>...<policy_max>) syntax to set
    the policy at the same time as the compatibile CMake version.

commit 2562fd1
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Sun Dec 1 07:13:42 2024 +0000

    Bump codecov/codecov-action from 4 to 5

    Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5.
    - [Release notes](https://github.com/codecov/codecov-action/releases)
    - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
    - [Commits](codecov/codecov-action@v4...v5)

    ---
    updated-dependencies:
    - dependency-name: codecov/codecov-action
      dependency-type: direct:production
      update-type: version-update:semver-major
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 785444d
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Thu Nov 28 14:05:32 2024 -0500

    Fix native detection of CRC instruction

    It's unclear if raspberry pi OS's shipped GCC doesn't properly detect
    ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the
    preprocessor macro for that flag is not defined with -march=native on a
    raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do
    not get a fast CRC function.  The CRC32 preprocessor macro _IS_ defined,
    and the auto detection when built without NATIVE support does properly
    get dispatched to. Since we only need the scalar CRC32 and not the polynomial
    stuff anyhow, let's make it be an || condition and not a && one.

commit 3c11f65
Author: Pavel P <pavlov.pavel@gmail.com>
Date:   Thu Nov 28 01:18:20 2024 +0200

    Remove unused HAVE_CHUNKMEMSET_1 define

commit 7fdc3aa
Author: Pavel P <pavlov.pavel@gmail.com>
Date:   Wed Nov 27 23:13:34 2024 +0200

    Fix casting warning/error in test_compress_bound.cc

    Fixes the following error when building with msvc compiler
    ```
    test_compress_bound.cc
    D:\zlib-ng\test\test_compress_bound.cc(41,50): error C2220: the following warning is treated as an error
    D:\zlib-ng\test\test_compress_bound.cc(41,50): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
    D:\zlib-ng\test\test_compress_bound.cc(43,68): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
    ```

commit 5456966
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Sun Nov 24 18:34:40 2024 +0500

    Force use of latest Windows SDK with 32-bit ARM support

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 0ed5ac8
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Wed Sep 25 17:56:36 2024 -0400

    Make an AVX512 inflate fast with low cost masked writes

    This takes advantage of the fact that on AVX512 architectures, masked
    moves are incredibly cheap. There are many places where we have to
    fallback to the safe C implementation of chunkcopy_safe because of the
    assumed overwriting that occurs. We're to sidestep most of the branching
    needed here by simply controlling the bounds of our writes with a mask.

commit 94aacd8
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Mon Sep 23 18:26:04 2024 -0400

    Try to simply the inflate loop by collapsing most cases to chunksets

commit e874b34
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Thu Sep 12 17:47:30 2024 -0400

    Make chunkset_avx2 half chunk aware

    This gives us appreciable gains on a number of fronts.  The first being
    we're inlining a pretty hot function that was getting dispatched to
    regularly. Another is that we're able to do a safe lagged copy of a
    distance that is smaller, so CHUNKCOPY gets its teeth back here for
    smaller sizes, without having to do another dispatch to a function.

    We're also now doing two overlapping writes at once and letting the CPU
    do its store forwarding. This was an enhancement @dougallj had suggested
    a while back.

    Additionally, the "half chunk mag" here is fundamentally less
    complicated because it doesn't require sythensizing cross lane permutes
    with a blend operation, so we can optimistically do that first if the
    len is small enough that a full 32 byte chunk doesn't make any sense.

commit b52e703
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Wed Sep 11 18:34:54 2024 -0400

    Simplify avx2 chunkset a bit

    Put length 16 in the length checking ladder and take care of it there
    since it's also a simple case to handle. We kind of went out of our way
    to pretend 128 bit vectors didn't exist when using avx2 but this can be
    handled in a single instruction. Strangely the intrinsic uses vector
    register operands but the instruction itself assumes a memory operand
    for the source. This also means we don't have to handle this case in our
    "GET_CHUNK_MAG" function.

commit dae668d
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Oct 9 16:27:43 2024 +0200

    Reorder variables in inflate functions to reduce padding holes
    due to variable alignment requirements.

commit 1ec47b7
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Sat Sep 28 08:09:17 2024 +0300

    configure: add --mandir to override $mandir on command line.

commit 22a4cbb
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Fri Sep 27 17:09:22 2024 +0300

    configure: Fix linker flags for Haiku.

commit 18af700
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 17:25:19 2024 +0200

    Reorder 'inflate_state' struct to improve cache-locality of variables
    needed by inffast (from 6 cachelines to 1).
    Also fill in some unnecessary holes.

commit a5c20ed
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 17:21:28 2024 +0200

    Add variable 'wbufsize' to track window buffer including padding, to allow
    the chunkset code to spill garbage data into the padding area if available.

commit 39e9c86
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 17:18:49 2024 +0200

    Don't use 'dmax' and 'sane' variables unless their checks have been compiled in.

commit 3297953
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Thu Oct 3 17:17:44 2024 -0400

    Compute the "safe" distance properly

    The safe pointer that is computed is an exclusive, not inclusive bounds.
    While we were probably rarely ever bit this, if ever, it still makes
    sense to apply the limit, properly.

commit 8d10c30
Author: FantasqueX <fantasquex@gmail.com>
Date:   Fri Sep 20 00:53:18 2024 +0800

    Explicitly set CMake policy 0169 to silence warning

    The recommended `FetchContent_MakeAvailable()` is introduced in CMake
    3.14 which is greater than `cmake_minimum_required()`.

    CMake policy will effects subdirectories.

    The `cmake_minimum_required(VERSION)` command implicitly calls
    `cmake_policy(VERSION)`.

    Closes zlib-ng#1788

commit b80eb4c
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sun Sep 15 12:23:50 2024 -0400

    Simplify chunking in the copy ladder here

    As it turns out, trying to peel off the remainder with so many branches
    caused the code size to inflate a bit too much that this function
    wouldn't inline without some fairly aggressive optimization flags. Only
    catching vector sized chunks here makes the loop body small enough and
    having the byte by byte copy idiom at the bottom gives the compiler some
    flexibility that it is likely to do something there.

commit 8a1205f
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 20:52:26 2024 +0200

    Disable MSVC warning 4324 (struct padded due to alignment)

commit 13d0a89
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Wed Sep 18 21:55:40 2024 +0300

    Force Visual C++ to treat source files as UTF-8.

commit a689e10
Author: FantasqueX <fantasquex@gmail.com>
Date:   Fri Sep 20 00:05:26 2024 +0800

    Replace non-ascii characters to fix MSVC warning

commit 8e19f15
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Fri Feb 23 13:21:28 2024 +0200

    [CI] Don't try to use macOS 11 as it's no longer supported.

commit 09f8404
Author: Letu Ren <fantasquex@gmail.com>
Date:   Tue Sep 17 21:49:27 2024 +0800

    Use target include instead of raw include

commit efca012
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Tue Sep 17 20:10:34 2024 +0500

    Fix overridde CMAKE_C_STANDARD, CMAKE_C_STANDARD_REQUIRED, CMAKE_C_EXTENSIONS. False value is allowed for CMAKE_C_STANDARD_REQUIRED and CMAKE_C_EXTENSIONS.

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit ce93943
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Tue Sep 17 20:08:41 2024 +0500

    Allow overridde CMAKE_CXX_STANDARD, CMAKE_CXX_STANDARD_REQUIRED, CMAKE_CXX_EXTENSIONS variables for tests and benchmarks.

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 68e31fa
Author: Bartosz Taudul <wolf@nereid.pl>
Date:   Tue Sep 17 12:46:11 2024 +0200

    Fix build on aarch64 android.

    When building with CMake toolchain provided by NDK, the ARCH variable is
    not "aarch64", but "aarch64-none-linux-android26" (or similar). The
    strict string match check causes the WITH_ARMV6 option to be enabled in
    such a case. In result, arch/arm/slide_hash_armv6.c is compiled, which
    is not intended to be used on aarch64, and fails.

    Relax the check and assume aarch64 if the ARCH variable contains aarch64.
@coderabbitai coderabbitai bot mentioned this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants