-
-
Notifications
You must be signed in to change notification settings - Fork 308
Fix typos #1825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix typos #1825
Conversation
WalkthroughThe changes involve extensive updates to the zlib library, focusing on enhancing the testing coverage for the Changes
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🔇 Additional comments (4)test/infcover.c (2)
The corrected comment improves the readability and understanding of the code explanation. Line range hint The updated comment provides clearer guidance to the developers. zlib-ng.h.in (1)Line range hint The file zlib.h.in (1)Line range hint The file Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Please squash the commits. |
|
@nmoinvaz github has Squash and Merge option in the UI for someone with merge rights. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1825 +/- ##
===========================================
- Coverage 32.25% 0 -32.26%
===========================================
Files 67 0 -67
Lines 5745 0 -5745
Branches 1239 0 -1239
===========================================
- Hits 1853 0 -1853
+ Misses 3637 0 -3637
+ Partials 255 0 -255 ☔ View full report in Codecov by Sentry. |
When building with CMake toolchain provided by NDK, the ARCH variable is not "aarch64", but "aarch64-none-linux-android26" (or similar). The strict string match check causes the WITH_ARMV6 option to be enabled in such a case. In result, arch/arm/slide_hash_armv6.c is compiled, which is not intended to be used on aarch64, and fails. Relax the check and assume aarch64 if the ARCH variable contains aarch64. Allow overridde CMAKE_CXX_STANDARD, CMAKE_CXX_STANDARD_REQUIRED, CMAKE_CXX_EXTENSIONS variables for tests and benchmarks. Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> Fix overridde CMAKE_C_STANDARD, CMAKE_C_STANDARD_REQUIRED, CMAKE_C_EXTENSIONS. False value is allowed for CMAKE_C_STANDARD_REQUIRED and CMAKE_C_EXTENSIONS. Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> Use target include instead of raw include [CI] Don't try to use macOS 11 as it's no longer supported. Replace non-ascii characters to fix MSVC warning Force Visual C++ to treat source files as UTF-8. Disable MSVC warning 4324 (struct padded due to alignment) Simplify chunking in the copy ladder here As it turns out, trying to peel off the remainder with so many branches caused the code size to inflate a bit too much that this function wouldn't inline without some fairly aggressive optimization flags. Only catching vector sized chunks here makes the loop body small enough and having the byte by byte copy idiom at the bottom gives the compiler some flexibility that it is likely to do something there. Explicitly set CMake policy 0169 to silence warning The recommended `FetchContent_MakeAvailable()` is introduced in CMake 3.14 which is greater than `cmake_minimum_required()`. CMake policy will effects subdirectories. The `cmake_minimum_required(VERSION)` command implicitly calls `cmake_policy(VERSION)`. Closes zlib-ng#1788 Compute the "safe" distance properly The safe pointer that is computed is an exclusive, not inclusive bounds. While we were probably rarely ever bit this, if ever, it still makes sense to apply the limit, properly. Don't use 'dmax' and 'sane' variables unless their checks have been compiled in. Add variable 'wbufsize' to track window buffer including padding, to allow the chunkset code to spill garbage data into the padding area if available. Reorder 'inflate_state' struct to improve cache-locality of variables needed by inffast (from 6 cachelines to 1). Also fill in some unnecessary holes. configure: Fix linker flags for Haiku. configure: add --mandir to override $mandir on command line. Reorder variables in inflate functions to reduce padding holes due to variable alignment requirements. Simplify avx2 chunkset a bit Put length 16 in the length checking ladder and take care of it there since it's also a simple case to handle. We kind of went out of our way to pretend 128 bit vectors didn't exist when using avx2 but this can be handled in a single instruction. Strangely the intrinsic uses vector register operands but the instruction itself assumes a memory operand for the source. This also means we don't have to handle this case in our "GET_CHUNK_MAG" function. Make chunkset_avx2 half chunk aware This gives us appreciable gains on a number of fronts. The first being we're inlining a pretty hot function that was getting dispatched to regularly. Another is that we're able to do a safe lagged copy of a distance that is smaller, so CHUNKCOPY gets its teeth back here for smaller sizes, without having to do another dispatch to a function. We're also now doing two overlapping writes at once and letting the CPU do its store forwarding. This was an enhancement @dougallj had suggested a while back. Additionally, the "half chunk mag" here is fundamentally less complicated because it doesn't require sythensizing cross lane permutes with a blend operation, so we can optimistically do that first if the len is small enough that a full 32 byte chunk doesn't make any sense. Try to simply the inflate loop by collapsing most cases to chunksets Make an AVX512 inflate fast with low cost masked writes This takes advantage of the fact that on AVX512 architectures, masked moves are incredibly cheap. There are many places where we have to fallback to the safe C implementation of chunkcopy_safe because of the assumed overwriting that occurs. We're to sidestep most of the branching needed here by simply controlling the bounds of our writes with a mask. Force use of latest Windows SDK with 32-bit ARM support Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> Fix casting warning/error in test_compress_bound.cc Fixes the following error when building with msvc compiler ``` test_compress_bound.cc D:\zlib-ng\test\test_compress_bound.cc(41,50): error C2220: the following warning is treated as an error D:\zlib-ng\test\test_compress_bound.cc(41,50): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data D:\zlib-ng\test\test_compress_bound.cc(43,68): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data ``` Remove unused HAVE_CHUNKMEMSET_1 define Fix native detection of CRC instruction It's unclear if raspberry pi OS's shipped GCC doesn't properly detect ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the preprocessor macro for that flag is not defined with -march=native on a raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do not get a fast CRC function. The CRC32 preprocessor macro _IS_ defined, and the auto detection when built without NATIVE support does properly get dispatched to. Since we only need the scalar CRC32 and not the polynomial stuff anyhow, let's make it be an || condition and not a && one. Bump codecov/codecov-action from 4 to 5 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4...v5) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Address deprecated cmake version warning. Use cmake_minimum_required(VERSION <min>...<policy_max>) syntax to set the policy at the same time as the compatibile CMake version. Enable AVX2 functions to be built with BMI2 instructions While these are technically different instructions, no such CPU exists that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to eliminate several flag stalls by having flagless versions of shifts, and allows us to not clobber and move around GPRs so much in scalar code. There's usually a sizeable benefit for enabling it. Since we're building with BMI2 for AVX2 functions, let's also just make sure the CPU claims to support it (just to cover our bases). zbuild: Provide a fallback for "ALIGNED_(x)" for other compiler Improve pipeling for AVX512 chunking For reasons that aren't quite so clear, using the masked writes here did not pipeline very well. Either setting up the mask stalled things or masked moves have issues overlapping regular moves. Simply putting the masked moves behind a branch that is rarely taken seemed to do the trick in improving the ILP. While here, put masked loads behind the same branch in case there were ever a hazard for overreading. Since we long ago make unaligned reads safe (by using memcpy or intrinsics), it is time to replace the UNALIGNED_OK checks that have since really only been used to select the optimal comparison sizes for the arch instead. Revert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics)," This reverts commit 80fffd7. It was mistakenly pushed to develop instead of going through a PR and the appropriate reviews. added in-tree build artifacts to .gitignore Fix typos (zlib-ng#1825) Since we long ago make unaligned reads safe (by using memcpy or intrinsics), it is time to replace the UNALIGNED_OK checks that have since really only been used to select the optimal comparison sizes for the arch instead. adler32_rvv: Fix some overflow problems There are currently some overflow problems in adler32_rvv implementation, which can lead to wrong results for some input, and these problems could be easily exhibited when running `git fsck` with zlib-ng suitituting the system zlib on a big git repository. These problems and the solutions are the following: - When the input data is long enough, the v_buf32_accu can overflow too. Add it to the modulo code that happens per ~NMAX bytes. - When the vector data is reduced to scalar ones, the resulting scalar value (and the proceeded length) may lead to the calculation of sum2 to overflow. Add mod BASE to all these reductions and initial calculation of sum2. - When the remaining data less than vl bytes, the code falls back to a scalar implementation; however the sum2 and alder2 values are just reduced from vectors and could be very big that makes sum2 overflows in the scalar code. Modulo them before the scalar code to prevent such overflow (because vl is surely quite smaller than NMAX). Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Make big endians first class citizens again No longer do the big iron on yore which lack SIMD optimized loads need to search strings a byte at a time like primitive machines of the vax era. This guard here was mostly due to the fact that the string comparison was searched with "count trailing zero", which assumes an endianness. We can just conditionally use leading zeros when on big endian and stop using the extremely naive C implementation. This makes things a tad bit faster. Fix "RLE" compression with big endian architectures This was missed in zlib-ng#1831. The RLE methods compare a string of bytes directly with itself to directly derive a simple run length encoding. They use similar but not identical methods to compare256. This needs a similar endianness check at compile time to know which compare bit count to use (leading or trailing). Set OPTIMAL_CMP for 32-bit PowerPC Update s390x actions-runner docker Fix unaligned access in ACLE based crc32 This fixes a rightful complaint from the alignment sanitizer that we alias memory in an unaligned fashion. A nice added bonus is that this improves performance a tiny bit on the larger buffers, perhaps due to loops that idiomatically decrement a count and increment a single buffer pointer rather than the maze of conditional pointer reassignments. While here, let's write a unit test just for this. Since this is the only variant that accesses memory in a potentially unaligned fashion that doesn't explicitly go byte by byte or use intrinsics that don't require alignment, we'll enable it only for this function for now. Adding more tests later if need be should be possible. For everything else not crc, we're relying on ubsan to hopefully catch things by chance. Improved setting of OPTIMAL_CMP on ARM Use GCC's may_alias attribute for unaligned memory access Rename functions to get rid of old and now misleading "unaligned" naming Continued cleanup of old UNALIGNED_OK checks - Remove obsolete checks - Fix checks that are inconsistent - Stop compiling compare256/longest_match variants that never gets called - Improve how the generic compare256 functions are handled. - Allow overriding OPTIMAL_CMP This simplifies the code and avoids having a lot of code in the compiled library than can never get executed. 2.2.3 Release
commit 860e4cf Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Sun Feb 9 13:19:01 2025 +0100 2.2.4 Release commit 43b2703 Author: Mika Lindqvist <postmaster@raasu.org> Date: Sun Jan 26 21:31:36 2025 +0200 Fix shift overflow in inflate and send_code. commit 287c4dc Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Sun Feb 2 21:05:37 2025 -0500 Fix an unfortunate bug with Visual Studio 2015 Evidently this instruction, despite the intrinsic having a register operand, is a memory-register instruction. There seems to be no alignment requirement for the source operand. Because of this, compilers when not optimized are doing the unaligned load and then dumping back to the stack to do the broadcasting load. In doing this, MSVC seems to be dumping to the stack with an aligned move at an unaligned address, causing a segfault. GCC does not seem to make this mistake, as it stashes to an aligned address. If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128 bit load followed by a vinserti128. This _should_ fix this (issue zlib-ng#1861). commit a3c0430 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Wed Jan 29 18:46:34 2025 +0100 Fix -Wmaybe-uninitialized warnings in benchmarks. commit 057104f Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Wed Jan 29 16:54:36 2025 +0100 Add uncompress benchmark commit a0fa247 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Sun Jan 26 15:05:24 2025 +0100 s390x: Add workaround to install custom Clang 19.1.5 rpms to actions-runner image in order to avoid the VX compiler bug in older clang versions. commit 05305ed Author: Vladislav Shchapov <vladislav@shchapov.ru> Date: Fri Jan 24 01:45:41 2025 +0500 Remove unused include directories Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> commit 69a60bf Author: Vladislav Shchapov <vladislav@shchapov.ru> Date: Fri Jan 24 01:45:26 2025 +0500 Rename "arch/power/fallback_builtins.h" to avoid possible conflict with "fallback_builtins.h" in zlib-ng sources directory Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> commit 7701ce9 Author: Mika Lindqvist <postmaster@raasu.org> Date: Sun Jan 26 13:19:08 2025 +0200 [abicheck] Regenerate ABI files for zlib * Generate using Ubuntu 24.04.1 LTS to fix mismatch in function signatures of gzseek() and gztell() commit 5e3510e Author: Eduard Stefes <eduard.stefes@ibm.com> Date: Tue Jan 21 10:48:07 2025 +0100 Disable CRC32-VX Extention for some Clang versions We have to disable the CRC32-VX implementation for some Clang versions (18 <= version < 19.1.2) that generate bad code for the IBM S390 VGFMA intrinsics. commit 8cebc9c Author: Vladislav Shchapov <vladislav@shchapov.ru> Date: Thu Jan 23 23:25:09 2025 +0500 Increase cmake workflow timeout Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> commit 608871e Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Mon Jan 20 10:26:51 2025 -0800 Use Ubuntu 20.04 for PPC64LE tests due to broken qemu. commit 62d52a5 Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Thu Jan 9 15:47:06 2025 -0800 Use Ubuntu 22.04 for AARCH64 tests It seems that qemu might be failing. Tests on Raspberry Pi 5 with Ubuntu 24.04 appear to work just fine. commit b7dc018 Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Sun Jan 5 08:01:41 2025 -0800 Add missing compiler-rt libraries for Ubuntu 24. zlib-ng#1840 commit a95ee9e Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Wed Jan 1 16:20:17 2025 -0800 Ignore gcovr parser errors. commit bdfe700 Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Wed Jan 1 14:41:27 2025 -0800 Don't pin gcovr version any longer. zlib-ng#1840 commit 2ffbbdb Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Sat Jan 4 22:05:25 2025 -0800 Use correct version of gcov for cross-compilers. commit 6286088 Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Thu Jan 2 15:17:33 2025 -0800 Use Ubuntu 24 crossbuild-essential packages. commit fbba9cb Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Wed Jan 1 14:46:59 2025 -0800 Remove package qemu for Ubuntu 24. zlib-ng#1840 commit 7077052 Author: Nathan Moinvaziri <nathan@nathanm.com> Date: Wed Jan 1 14:38:12 2025 -0800 Upgrade CI from Clang-11 to Clang 15 for Ubuntu 24. zlib-ng#1840 commit 212563d Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Sat Jan 4 21:19:42 2025 +0100 Improve image/container rebuild script to work properly under cron. commit 9064a25 Author: Dmitry Kurtaev <dmitry.kurtaev@gmail.com> Date: Wed Jan 15 20:28:44 2025 +0300 Workaround error G6E97C40B Warning as an error with GCC from Uubuntu 24.04: ``` /home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/external/zlib-ng/arch/riscv/riscv_features.c(25,33): error G6E97C40B: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] [/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/libs/build-native.proj] ``` commit 6d24fb8 Author: Sam James <sam@gentoo.org> Date: Thu Jan 9 11:36:40 2025 +0000 cmake: disable LTO for some configure checks Some of zlib-ng's configure tests define a function expecting it to be compiled but don't call that function, or don't use its return value. This is risky with LTO where the whole thing may be optimised out, which has happened before: * zlib-ng#1616 * zlib-ng#1622 * https://gitlab.kitware.com/cmake/cmake/-/issues/26103 Closes: zlib-ng#1841 commit 787c7f6 Author: Vladislav Shchapov <vladislav@shchapov.ru> Date: Wed Jan 1 13:53:16 2025 +0500 Force use of latest Windows SDK with 32-bit ARM support for release workflows Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> commit cbb6ec1 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Sun Dec 29 19:01:35 2024 +0100 2.2.3 Release commit bf05e88 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Fri Dec 20 23:31:37 2024 +0100 Continued cleanup of old UNALIGNED_OK checks - Remove obsolete checks - Fix checks that are inconsistent - Stop compiling compare256/longest_match variants that never gets called - Improve how the generic compare256 functions are handled. - Allow overriding OPTIMAL_CMP This simplifies the code and avoids having a lot of code in the compiled library than can never get executed. commit 1aeb291 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Sun Dec 22 13:25:27 2024 +0100 Rename functions to get rid of old and now misleading "unaligned" naming commit d7e121e Author: Cameron Cawley <ccawley2011@gmail.com> Date: Thu Jul 27 21:07:29 2023 +0100 Use GCC's may_alias attribute for unaligned memory access commit fc90e7b Author: Cameron Cawley <ccawley2011@gmail.com> Date: Sun Dec 22 13:43:30 2024 +0000 Improved setting of OPTIMAL_CMP on ARM commit 06bba67 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Sat Dec 21 11:04:47 2024 -0500 Fix unaligned access in ACLE based crc32 This fixes a rightful complaint from the alignment sanitizer that we alias memory in an unaligned fashion. A nice added bonus is that this improves performance a tiny bit on the larger buffers, perhaps due to loops that idiomatically decrement a count and increment a single buffer pointer rather than the maze of conditional pointer reassignments. While here, let's write a unit test just for this. Since this is the only variant that accesses memory in a potentially unaligned fashion that doesn't explicitly go byte by byte or use intrinsics that don't require alignment, we'll enable it only for this function for now. Adding more tests later if need be should be possible. For everything else not crc, we're relying on ubsan to hopefully catch things by chance. commit 87d8e95 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Mon Sep 16 13:15:46 2024 +0200 Update s390x actions-runner docker commit 005c2d3 Author: Cameron Cawley <ccawley2011@gmail.com> Date: Sat Dec 21 17:30:18 2024 +0000 Set OPTIMAL_CMP for 32-bit PowerPC commit 90913e8 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Sat Dec 21 10:09:58 2024 -0500 Fix "RLE" compression with big endian architectures This was missed in zlib-ng#1831. The RLE methods compare a string of bytes directly with itself to directly derive a simple run length encoding. They use similar but not identical methods to compare256. This needs a similar endianness check at compile time to know which compare bit count to use (leading or trailing). commit 04d1b75 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Fri Dec 20 18:53:51 2024 -0500 Make big endians first class citizens again No longer do the big iron on yore which lack SIMD optimized loads need to search strings a byte at a time like primitive machines of the vax era. This guard here was mostly due to the fact that the string comparison was searched with "count trailing zero", which assumes an endianness. We can just conditionally use leading zeros when on big endian and stop using the extremely naive C implementation. This makes things a tad bit faster. commit dbccbd1 Author: Icenowy Zheng <uwu@icenowy.me> Date: Sun Dec 15 01:31:48 2024 +0800 adler32_rvv: Fix some overflow problems There are currently some overflow problems in adler32_rvv implementation, which can lead to wrong results for some input, and these problems could be easily exhibited when running `git fsck` with zlib-ng suitituting the system zlib on a big git repository. These problems and the solutions are the following: - When the input data is long enough, the v_buf32_accu can overflow too. Add it to the modulo code that happens per ~NMAX bytes. - When the vector data is reduced to scalar ones, the resulting scalar value (and the proceeded length) may lead to the calculation of sum2 to overflow. Add mod BASE to all these reductions and initial calculation of sum2. - When the remaining data less than vl bytes, the code falls back to a scalar implementation; however the sum2 and alder2 values are just reduced from vectors and could be very big that makes sum2 overflows in the scalar code. Modulo them before the scalar code to prevent such overflow (because vl is surely quite smaller than NMAX). Signed-off-by: Icenowy Zheng <uwu@icenowy.me> commit 509f6b5 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Tue Dec 17 23:02:32 2024 +0100 Since we long ago make unaligned reads safe (by using memcpy or intrinsics), it is time to replace the UNALIGNED_OK checks that have since really only been used to select the optimal comparison sizes for the arch instead. commit 4fa76be Author: Adeel Mujahid <3840695+am11@users.noreply.github.com> Date: Sat Dec 21 00:35:50 2024 +0200 Fix typos (zlib-ng#1825) commit c295c28 Author: Eduard Stefes <eduard.stefes@ibm.com> Date: Wed Dec 4 09:15:27 2024 +0100 added in-tree build artifacts to .gitignore commit 037ab0f Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Tue Dec 17 23:09:31 2024 +0100 Revert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics)," This reverts commit 80fffd7. It was mistakenly pushed to develop instead of going through a PR and the appropriate reviews. commit 80fffd7 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Tue Dec 17 23:02:32 2024 +0100 Since we long ago make unaligned reads safe (by using memcpy or intrinsics), it is time to replace the UNALIGNED_OK checks that have since really only been used to select the optimal comparison sizes for the arch instead. commit 43d74a2 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Sat Nov 30 09:23:28 2024 -0500 Improve pipeling for AVX512 chunking For reasons that aren't quite so clear, using the masked writes here did not pipeline very well. Either setting up the mask stalled things or masked moves have issues overlapping regular moves. Simply putting the masked moves behind a branch that is rarely taken seemed to do the trick in improving the ILP. While here, put masked loads behind the same branch in case there were ever a hazard for overreading. commit a4e7c34 Author: Detlef Riekenberg <wine.dev@web.de> Date: Fri Nov 29 22:59:52 2024 +0100 zbuild: Provide a fallback for "ALIGNED_(x)" for other compiler commit 7020cb3 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Wed Nov 27 19:00:52 2024 -0500 Enable AVX2 functions to be built with BMI2 instructions While these are technically different instructions, no such CPU exists that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to eliminate several flag stalls by having flagless versions of shifts, and allows us to not clobber and move around GPRs so much in scalar code. There's usually a sizeable benefit for enabling it. Since we're building with BMI2 for AVX2 functions, let's also just make sure the CPU claims to support it (just to cover our bases). commit 11bef87 Author: Bradley Lowekamp <blowekamp@mail.nih.gov> Date: Tue Nov 26 09:12:49 2024 -0500 Address deprecated cmake version warning. Use cmake_minimum_required(VERSION <min>...<policy_max>) syntax to set the policy at the same time as the compatibile CMake version. commit 2562fd1 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun Dec 1 07:13:42 2024 +0000 Bump codecov/codecov-action from 4 to 5 Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4...v5) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> commit 785444d Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Thu Nov 28 14:05:32 2024 -0500 Fix native detection of CRC instruction It's unclear if raspberry pi OS's shipped GCC doesn't properly detect ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the preprocessor macro for that flag is not defined with -march=native on a raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do not get a fast CRC function. The CRC32 preprocessor macro _IS_ defined, and the auto detection when built without NATIVE support does properly get dispatched to. Since we only need the scalar CRC32 and not the polynomial stuff anyhow, let's make it be an || condition and not a && one. commit 3c11f65 Author: Pavel P <pavlov.pavel@gmail.com> Date: Thu Nov 28 01:18:20 2024 +0200 Remove unused HAVE_CHUNKMEMSET_1 define commit 7fdc3aa Author: Pavel P <pavlov.pavel@gmail.com> Date: Wed Nov 27 23:13:34 2024 +0200 Fix casting warning/error in test_compress_bound.cc Fixes the following error when building with msvc compiler ``` test_compress_bound.cc D:\zlib-ng\test\test_compress_bound.cc(41,50): error C2220: the following warning is treated as an error D:\zlib-ng\test\test_compress_bound.cc(41,50): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data D:\zlib-ng\test\test_compress_bound.cc(43,68): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data ``` commit 5456966 Author: Vladislav Shchapov <vladislav@shchapov.ru> Date: Sun Nov 24 18:34:40 2024 +0500 Force use of latest Windows SDK with 32-bit ARM support Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> commit 0ed5ac8 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Wed Sep 25 17:56:36 2024 -0400 Make an AVX512 inflate fast with low cost masked writes This takes advantage of the fact that on AVX512 architectures, masked moves are incredibly cheap. There are many places where we have to fallback to the safe C implementation of chunkcopy_safe because of the assumed overwriting that occurs. We're to sidestep most of the branching needed here by simply controlling the bounds of our writes with a mask. commit 94aacd8 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Mon Sep 23 18:26:04 2024 -0400 Try to simply the inflate loop by collapsing most cases to chunksets commit e874b34 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Thu Sep 12 17:47:30 2024 -0400 Make chunkset_avx2 half chunk aware This gives us appreciable gains on a number of fronts. The first being we're inlining a pretty hot function that was getting dispatched to regularly. Another is that we're able to do a safe lagged copy of a distance that is smaller, so CHUNKCOPY gets its teeth back here for smaller sizes, without having to do another dispatch to a function. We're also now doing two overlapping writes at once and letting the CPU do its store forwarding. This was an enhancement @dougallj had suggested a while back. Additionally, the "half chunk mag" here is fundamentally less complicated because it doesn't require sythensizing cross lane permutes with a blend operation, so we can optimistically do that first if the len is small enough that a full 32 byte chunk doesn't make any sense. commit b52e703 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Wed Sep 11 18:34:54 2024 -0400 Simplify avx2 chunkset a bit Put length 16 in the length checking ladder and take care of it there since it's also a simple case to handle. We kind of went out of our way to pretend 128 bit vectors didn't exist when using avx2 but this can be handled in a single instruction. Strangely the intrinsic uses vector register operands but the instruction itself assumes a memory operand for the source. This also means we don't have to handle this case in our "GET_CHUNK_MAG" function. commit dae668d Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Wed Oct 9 16:27:43 2024 +0200 Reorder variables in inflate functions to reduce padding holes due to variable alignment requirements. commit 1ec47b7 Author: Mika Lindqvist <postmaster@raasu.org> Date: Sat Sep 28 08:09:17 2024 +0300 configure: add --mandir to override $mandir on command line. commit 22a4cbb Author: Mika Lindqvist <postmaster@raasu.org> Date: Fri Sep 27 17:09:22 2024 +0300 configure: Fix linker flags for Haiku. commit 18af700 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Wed Sep 25 17:25:19 2024 +0200 Reorder 'inflate_state' struct to improve cache-locality of variables needed by inffast (from 6 cachelines to 1). Also fill in some unnecessary holes. commit a5c20ed Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Wed Sep 25 17:21:28 2024 +0200 Add variable 'wbufsize' to track window buffer including padding, to allow the chunkset code to spill garbage data into the padding area if available. commit 39e9c86 Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Wed Sep 25 17:18:49 2024 +0200 Don't use 'dmax' and 'sane' variables unless their checks have been compiled in. commit 3297953 Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Thu Oct 3 17:17:44 2024 -0400 Compute the "safe" distance properly The safe pointer that is computed is an exclusive, not inclusive bounds. While we were probably rarely ever bit this, if ever, it still makes sense to apply the limit, properly. commit 8d10c30 Author: FantasqueX <fantasquex@gmail.com> Date: Fri Sep 20 00:53:18 2024 +0800 Explicitly set CMake policy 0169 to silence warning The recommended `FetchContent_MakeAvailable()` is introduced in CMake 3.14 which is greater than `cmake_minimum_required()`. CMake policy will effects subdirectories. The `cmake_minimum_required(VERSION)` command implicitly calls `cmake_policy(VERSION)`. Closes zlib-ng#1788 commit b80eb4c Author: Adam Stylinski <kungfujesus06@gmail.com> Date: Sun Sep 15 12:23:50 2024 -0400 Simplify chunking in the copy ladder here As it turns out, trying to peel off the remainder with so many branches caused the code size to inflate a bit too much that this function wouldn't inline without some fairly aggressive optimization flags. Only catching vector sized chunks here makes the loop body small enough and having the byte by byte copy idiom at the bottom gives the compiler some flexibility that it is likely to do something there. commit 8a1205f Author: Hans Kristian Rosbach <hk-git@circlestorm.org> Date: Wed Sep 25 20:52:26 2024 +0200 Disable MSVC warning 4324 (struct padded due to alignment) commit 13d0a89 Author: Mika Lindqvist <postmaster@raasu.org> Date: Wed Sep 18 21:55:40 2024 +0300 Force Visual C++ to treat source files as UTF-8. commit a689e10 Author: FantasqueX <fantasquex@gmail.com> Date: Fri Sep 20 00:05:26 2024 +0800 Replace non-ascii characters to fix MSVC warning commit 8e19f15 Author: Mika Lindqvist <postmaster@raasu.org> Date: Fri Feb 23 13:21:28 2024 +0200 [CI] Don't try to use macOS 11 as it's no longer supported. commit 09f8404 Author: Letu Ren <fantasquex@gmail.com> Date: Tue Sep 17 21:49:27 2024 +0800 Use target include instead of raw include commit efca012 Author: Vladislav Shchapov <vladislav@shchapov.ru> Date: Tue Sep 17 20:10:34 2024 +0500 Fix overridde CMAKE_C_STANDARD, CMAKE_C_STANDARD_REQUIRED, CMAKE_C_EXTENSIONS. False value is allowed for CMAKE_C_STANDARD_REQUIRED and CMAKE_C_EXTENSIONS. Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> commit ce93943 Author: Vladislav Shchapov <vladislav@shchapov.ru> Date: Tue Sep 17 20:08:41 2024 +0500 Allow overridde CMAKE_CXX_STANDARD, CMAKE_CXX_STANDARD_REQUIRED, CMAKE_CXX_EXTENSIONS variables for tests and benchmarks. Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru> commit 68e31fa Author: Bartosz Taudul <wolf@nereid.pl> Date: Tue Sep 17 12:46:11 2024 +0200 Fix build on aarch64 android. When building with CMake toolchain provided by NDK, the ARCH variable is not "aarch64", but "aarch64-none-linux-android26" (or similar). The strict string match check causes the WITH_ARMV6 option to be enabled in such a case. In result, arch/arm/slide_hash_armv6.c is compiled, which is not intended to be used on aarch64, and fails. Relax the check and assume aarch64 if the ARCH variable contains aarch64.
Fix misspelling of ensure