2.2.0 Release Candidate
Pre-releaseThis release contains several larger changes and optimizations. On x86-64 for example, this leads to a compression speedup of ~12% on default level.
We also have a major reorganization of memory alloc/free to always happen during init, this allows applications to potentially do the init early and be finished with the malloc system calls before it needs to process latency sensitive compression/decompression. It also ensures that zlib-ng can not fail due to memory pressure after having run the init functions successfully. We also now only do a single memory allocation deflate or inflate, ensuring we do less system calls and the allocated buffers live close together in memory.
Compression or decompression of very small buffers will now also be faster due to spending less time doing malloc/free.
The downside to this is that decompression will now always allocate the maximum required memory (~42KB total on 64-bit platforms), previously it would allocate (and potentially free) memory as needed during decompression.
It also means that applications that replace the alloc/free functions with their own can potentially have some issues (Yes I am looking at you Nginx).
Changes
Buildsystem
- Generate CMake package configuration files #1647
- Relocate CMake target export definitions #1657
- Allow overwrite NATIVEFLAG value. #1662 #1684
- Fix xsave intrinsic test for clang, and gcc 8.2 or newer, and icc #1664
- Disable Intel Compiler diagnostic message 10441 #1666
- Add missing checks for 64bit arm/intel with msvc compiler #1667
- Don't export git/github-related files in tar/zip archives #1688
- Cleanup and update NMake Makefiles #1673
- Add more result variables to the cmake package configuration #1671
- Fix building with NVHPC #1698
- CMake: Replace ; by $ in generator-expression #1707
- Bump max CMake policy version to 3.29.0 #1709
- make darwin cross compilation possible #1714
CI/Test
- Improve code coverage handling #1640 #1642 #1675 #1729
- Add VPCLMULQDQ crc32 tests to Google benchmarks #1651
- Add small compress() benchmark #1721 #1730
- Add back-and-forth inflateCopy() test #1731
- Enable orphaned unit tests for compare256_rle family of functions #1739
- Fix MSAN error in test_dict #1726
- CI workflows
- Add dependabot for github actions #1687
- Upgrade ilammy/msvc-dev-cmd to v1.13.0 #1665
- Upgrade codecov/codecov-action to v4. #1676
- Upgrade github/codeql-action from 2 to 3 #1691
- Upgrade actions/upload-artifact from 3 to 4 #1692
- Upgrade mymindstorm/setup-emsdk to v14. #1677
- Update dependencies for 32-bit MinGW CI run #1711
- Use windows-2019 for build with toolset v141 #1725
- Fix macOS Github Actions #1720
Cleanup
Refactoring and Optimizations
- Move C fallback functions into arch/generic [Part 1] #1630 #1631 #1658 #1668
- Remove unneeded pointer for functable.longest_match in deflate_slow #1633
- Improve x86 intrinsics dependency handling #1643
- Split cpu_features.h by architecture #1644
- Speed up crc32_[v]pclmulqdq on small strings. #1650
- Cleanup of bi_flush() #1660
- Split cpu features and arch functions #1685 #1696
- Inline CHUNKCOPY and CHUNKUNROLL #1669
- Remove update_hash and insert_string implementations from functable #1681
- Disable dynamic function dispatching for native or arch-specific builds #1659 #1701
- Clean up insert_match() in deflate_medium #1682
- Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635
- Rewrite deflate memory allocation #1713 #1736
ARM
- Add test for checking if -march=native needs -mfpu=neon for 32-bit ARM. #1683
- Override Clang x4 NEON intrinsics for Android #1694
- Add AArch64 feature detection support for OpenBSD #1732
- Improved Configure ACLE check #1727
Power
- Fix regression in Power8/9 detection #1649
RVV
- Optimized rvv slide_hash #1704
- arch/riscv/riscv_features.c: fix uclibc build #1700
- Disable CodeCov for RISC-V as the toolchain doesn't support generating code coverage #1679
S390x
- Update s390x dockerfile #1716
- IBM zSystems DFLTCC: Extend sanitizer checks #1717
- IBM zSystems DFLTCC: Inline DLFTCC states into zlib states #1718
- Remove unused function dfltcc_alloc_state #1728
x86
- Fix PCLMULQDQ, AVX512VNNI and VPCLMULQDQ feature tests for Intel LLVM compiler (icx) #1672
- Fix invalid instruction usage in Xeon Phi x200 processors #1723