-
-
Notifications
You must be signed in to change notification settings - Fork 308
Don't build C-fallback functions that never get used on x86_64 #1984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a build-time option Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Config as configure / CMake
participant Build as Build system
participant Runtime as init_functable()
note over Config,Build: Configure/CMake decide which sources/flags to compile
Config->>Build: Provide flags: WITH_ALL_FALLBACKS, WITH_OPTIM, arch, HAVE_BUILTIN_CTZ
alt WITH_ALL_FALLBACKS = ON or WITH_OPTIM = OFF
Build-->>Build: Include full fallback source set (ZLIB_ALL_FALLBACK_SRCS)
else WITH_OPTIM = ON and !WITH_ALL_FALLBACKS
Build-->>Build: Include conditional subset of fallback sources (arch/HAVE_BUILTIN_CTZ dependent)
end
note over Build,Runtime: Runtime pointer selection depends on compile-time macros
Runtime->>Runtime: Check __x86_64__, WITH_ALL_FALLBACKS, WITH_OPTIM, HAVE_BUILTIN_CTZ
alt Broad fallbacks enabled (WITH_ALL_FALLBACKS)
Runtime-->>Runtime: Assign full generic C fallbacks (slide_hash, inflate_fast, chunkmemset_safe, longest_match*, compare256, crc/adler)
else Reduced x86_64 fallbacks
Runtime-->>Runtime: Assign limited C fallbacks (select longest_match variants, crc/adler)
opt if !HAVE_BUILTIN_CTZ
Runtime-->>Runtime: Also enable compare256 variant(s)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
4e24a5f to
18206d9
Compare
Develop 10 OctPR #19843.7% smaller compiled zlib-ng library on x86_64 |
e34656f to
75420bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CMakeLists.txt(4 hunks)README.md(2 hunks)functable.c(2 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
test/benchmarks/benchmark_slidehash.cc (2)
arch/x86/slide_hash_sse2.c (1)
void(54-63)test/benchmarks/benchmark_main.cc (1)
cpu_features(16-16)
functable.c (1)
functable.h (3)
adler(26-41)struct crc32_fold_s(36-36)struct crc32_fold_s(33-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
🔇 Additional comments (7)
CMakeLists.txt (4)
57-57: LGTM! Clear documentation and appropriate default.The option is correctly scoped to benchmarks and has clear documentation explaining its purpose.
1172-1175: LGTM! Correct fallback behavior when optimizations disabled.When
WITH_OPTIMis OFF, forcingWITH_ALL_FALLBACKSto ON ensures all generic fallbacks are available, since architecture-specific optimizations won't be built.
1177-1179: LGTM!Standard CMake pattern for propagating build options to compile definitions.
1361-1361: LGTM!Correctly uses the conditionally-populated
ZLIB_GENERIC_SRCSvariable.README.md (1)
207-207: LGTM!Cosmetic formatting change with no semantic impact.
test/benchmarks/benchmark_slidehash.cc (1)
71-73: LGTM! Correct conditional registration.The benchmark is only registered when
slide_hash_cis actually built, which matches the CMakeLists.txt logic that excludesslide_hash_c.cfrom x86_64 builds unlessWITH_ALL_FALLBACKSis enabled.functable.c (1)
51-80: LGTM! Correct fallback strategy for x86_64.The refactored logic correctly leverages the fact that x86_64 always has SSE2:
- On x86_64, SSE2 implementations (
slide_hash_sse2,chunkmemset_safe_sse2,inflate_fast_sse2) serve as the base fallbacks instead of the generic C implementations- C fallbacks are still needed for functions that SSE2 doesn't optimize (adler32, crc32) until higher SIMD levels (AVX2, PCLMULQDQ) are detected
- When
HAVE_BUILTIN_CTZis not available, C fallbacks forlongest_matchandcompare256are also needed since the SSE2 versions require this builtinThis aligns perfectly with the PR objective and the build changes in CMakeLists.txt.
75420bc to
cb959b2
Compare
|
It looks like some CI is failing or needs |
Package checks are expected to fail, since configure does not have this feature, and I am leaning towards not implementing it there either. Using Not quite sure what fails with |
b3d0b04 to
723fac9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(4 hunks)README.md(2 hunks)functable.c(3 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
test/benchmarks/benchmark_slidehash.cc (2)
arch/x86/slide_hash_sse2.c (1)
void(54-63)arch/x86/slide_hash_avx2.c (1)
void(33-40)
functable.c (1)
functable.h (4)
adler(26-41)struct crc32_fold_s(36-36)struct crc32_fold_s(33-33)struct crc32_fold_s(34-34)
🔇 Additional comments (10)
.github/workflows/pkgcheck.yml (2)
140-140: LGTM! CI configuration correctly enables all fallbacks.The addition of
-DWITH_ALL_FALLBACKS=ONensures package checks exercise the new fallback mechanism across all architectures.
150-150: LGTM! Compat builds also enable all fallbacks.Consistent with the non-compat build configuration, ensuring thorough testing.
README.md (2)
207-207: LGTM! Formatting correction improves table alignment.The spacing adjustment ensures consistent table formatting with other rows.
220-220: LGTM! Documentation accurately reflects the new option.The WITH_ALL_FALLBACKS option is correctly documented with:
- Clear description of purpose (useful for Gbench comparisons)
- Accurate default value (OFF, matching CMakeLists.txt line 57)
- Appropriate placement in the Advanced Build Options table
CMakeLists.txt (4)
57-57: LGTM! Option definition is well-structured.The
cmake_dependent_optioncorrectly:
- Defaults to OFF (appropriate for most builds)
- Makes the option available only when WITH_BENCHMARKS is ON (since it's primarily useful for benchmark comparisons)
- Provides a clear description
1172-1179: LGTM! Fallback forcing logic is correct.When WITH_OPTIM is disabled, forcing WITH_ALL_FALLBACKS to ON ensures all generic implementations are available, which is necessary since no architecture-specific optimizations are built. The subsequent preprocessor definition propagates this to source files.
1358-1358: LGTM! Source list composition correctly includes generic sources.The updated
ZLIB_ALL_SRCScomposition now includesZLIB_GENERIC_SRCS, which contains the conditionally-selected fallback sources. This ensures the right fallback implementations are built based on architecture and build options.
1309-1338: All C fallback functions are implemented
Verified that every function referenced in functable.c has a corresponding definition in the source.functable.c (2)
52-82: LGTM! Fallback initialization logic is well-structured.The new architecture-specific fallback initialization correctly:
- On x86_64 (when NOT WITH_ALL_FALLBACKS): Sets only the C fallbacks that are actually used, since SSE2 implementations serve as fallbacks for chunkmemset_safe, inflate_fast, and slide_hash
- When WITH_ALL_FALLBACKS: Sets all C fallback implementations, enabling comprehensive benchmark comparisons
- Conditionally sets compare256/longest_match fallbacks based on HAVE_BUILTIN_CTZ availability
This aligns with the PR objective to avoid building unused C-fallback functions on x86_64 while maintaining flexibility for benchmarking.
85-322: LGTM! Architecture-specific optimizations are properly guarded.The
#ifdef WITH_OPTIMguard ensures that architecture-specific function table assignments are only performed when optimizations are enabled. This is consistent with the CMakeLists.txt logic that forces WITH_ALL_FALLBACKS when WITH_OPTIM is disabled, ensuring fallbacks are always available.
723fac9 to
bdf1473
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
functable.c (1)
52-83: Uninitialized function pointers when WITH_ALL_FALLBACKS=0 (non‑x86_64) and when WITH_OPTIM=OFF (x86_64).
- For non‑x86_64 with WITH_ALL_FALLBACKS unset, no C fallbacks are assigned at all → ft fields remain uninitialized.
- On x86_64 with WITH_ALL_FALLBACKS unset and WITH_OPTIM=OFF, many ft fields remain uninitialized (chunkmemset_safe, inflate_fast, slide_hash, etc.). Calls into stubs will reenter init_functable and can recurse forever.
Initialize full C fallbacks for non‑x86_64 in this block, and fail fast for the invalid x86_64 config (or ensure build system forces WITH_ALL_FALLBACKS when WITH_OPTIM=OFF).
Apply this diff to fix non‑x86_64 and guard the invalid x86_64 config:
// Set up generic C code fallbacks -#ifndef WITH_ALL_FALLBACKS -# ifdef __x86_64__ +#ifndef WITH_ALL_FALLBACKS +# ifdef __x86_64__ +/* On x86_64 we omit most C fallbacks; SSE2 is used instead. */ ft.adler32 = &adler32_c; ft.adler32_fold_copy = &adler32_fold_copy_c; ft.crc32 = &crc32_c; ft.crc32_fold = &crc32_fold_c; ft.crc32_fold_copy = &crc32_fold_copy_c; ft.crc32_fold_final = &crc32_fold_final_c; ft.crc32_fold_reset = &crc32_fold_reset_c; # ifndef HAVE_BUILTIN_CTZ ft.longest_match = &longest_match_c; ft.longest_match_slow = &longest_match_slow_c; ft.compare256 = &compare256_c; # endif -# endif +# ifndef WITH_OPTIM +# error "Building on x86_64 with WITH_OPTIM=OFF requires WITH_ALL_FALLBACKS=ON to provide complete C fallbacks." +# endif +# else + /* Non-x86_64: provide complete C fallbacks */ + ft.adler32 = &adler32_c; + ft.adler32_fold_copy = &adler32_fold_copy_c; + ft.chunkmemset_safe = &chunkmemset_safe_c; + ft.crc32 = &crc32_c; + ft.crc32_fold = &crc32_fold_c; + ft.crc32_fold_copy = &crc32_fold_copy_c; + ft.crc32_fold_final = &crc32_fold_final_c; + ft.crc32_fold_reset = &crc32_fold_reset_c; + ft.inflate_fast = &inflate_fast_c; + ft.slide_hash = &slide_hash_c; + ft.longest_match = &longest_match_c; + ft.longest_match_slow = &longest_match_slow_c; + ft.compare256 = &compare256_c; +# endif #elseAdditionally ensure the build system sets WITH_ALL_FALLBACKS=ON automatically when WITH_OPTIM=OFF on x86_64, to avoid this configuration entirely. Based on PR objectives.
🧹 Nitpick comments (2)
README.md (1)
220-221: Clarify configure behavior for WITH_ALL_FALLBACKS.The configure script now unconditionally defines WITH_ALL_FALLBACKS, so configure builds always include C fallbacks. Please note this here to avoid confusion (CMake default remains OFF).
For example, append a short note below the table: “Note: configure builds always enable all C fallbacks; the CMake default is OFF unless -DWITH_ALL_FALLBACKS=ON.”
configure (1)
1780-1784: Only define WITH_ALL_FALLBACKS when needed (preserve x86_64 size win).Unconditionally adding -DWITH_ALL_FALLBACKS undermines the PR’s x86_64 size reduction for configure builds. Gate it to cases that require C fallbacks (non‑x86_64 or WITH_OPTIM=OFF).
Apply this diff:
-# CMake can exclude building some of the generic fallback functions, -# configure does not have the detection code to do so. -CFLAGS="${CFLAGS} -DWITH_ALL_FALLBACKS" -SFLAGS="${SFLAGS} -DWITH_ALL_FALLBACKS" +# CMake can exclude some generic fallbacks; configure lacks equivalent detection. +# Keep C fallbacks when not on x86_64, or when optimizations are disabled. +if [ "${ARCH}" != "x86_64" ] || [ ${without_optimizations} -ne 0 ]; then + CFLAGS="${CFLAGS} -DWITH_ALL_FALLBACKS" + SFLAGS="${SFLAGS} -DWITH_ALL_FALLBACKS" +fi
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(4 hunks)README.md(2 hunks)configure(1 hunks)functable.c(3 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/pkgcheck.yml
🔇 Additional comments (2)
README.md (1)
207-207: LGTM on minor formatting.Entry renders correctly.
test/benchmarks/benchmark_slidehash.cc (1)
71-73: Correct macro guard usage.Using defined(WITH_ALL_FALLBACKS) avoids undefined-macro pitfalls. Condition reads correctly.
cd418e3 to
b55be08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
functable.c (1)
52-82: LGTM: Fallback initialization correctly restructured.The new fallback initialization logic properly handles three scenarios:
- x86_64 without
WITH_ALL_FALLBACKS(lines 53-67): Minimal C fallbacks, relying on SSE2 implementations forchunkmemset_safe,inflate_fast, andslide_hash(line 93, 97-98)- With
WITH_ALL_FALLBACKSdefined (lines 68-82): All C fallback implementationsThe logic is sound: on x86_64, the SSE2 block (lines 88-104) unconditionally executes, providing the missing initializations. When
WITH_OPTIMis OFF,WITH_ALL_FALLBACKSis automatically enabled (CMakeLists.txt line 1176), ensuring all fallbacks are initialized.Consider adding inline comments to clarify the initialization strategy:
ft.force_init = &force_init_empty; - // Set up generic C code fallbacks + // Set up generic C fallback functions + // On x86_64: minimal set (SSE2 provides chunkmemset_safe, inflate_fast, slide_hash) + // Other platforms: full set (required as ultimate fallbacks) #ifndef WITH_ALL_FALLBACKS # if defined(__x86_64__) || defined(_M_X64) + // x86_64 always has SSE2, so only need fallbacks that SSE2 doesn't provide ft.adler32 = &adler32_c;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(7 hunks)README.md(2 hunks)configure(1 hunks)functable.c(3 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- test/benchmarks/benchmark_slidehash.cc
- configure
- .github/workflows/pkgcheck.yml
- README.md
🧰 Additional context used
🧬 Code graph analysis (1)
functable.c (1)
functable.h (2)
adler(26-41)struct crc32_fold_s(36-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
🔇 Additional comments (7)
CMakeLists.txt (6)
93-93: LGTM: New build option properly declared.The new
WITH_ALL_FALLBACKSoption is correctly declared, marked as advanced, and added to the feature summary.Also applies to: 155-155, 1584-1584
718-718: LGTM: Proper gating definition added.This correctly defines
WITH_OPTIMfor the preprocessor, enabling the optimization gate infunctable.c(line 85).
1307-1316: LGTM: Fallback sources well organized.Defining all generic fallback sources in a single list improves maintainability and makes the conditional inclusion logic clearer.
1318-1338: LGTM: Fallback selection logic correctly implemented.The conditional logic properly handles three cases:
- Explicit
WITH_ALL_FALLBACKS=ON: includes all fallbacks- x86_64 with SSE2 (implicit): includes only necessary C fallbacks, excluding
chunkset_c.c- Other architectures: includes all fallbacks and defines
WITH_ALL_FALLBACKSNote: The past review comment about excluding
chunkset_c.cfrom x86_64 builds has been correctly addressed. On x86_64,chunkmemset_safe_sse2is always used (functable.c line 93), sochunkset_c.cis properly excluded (line 1323-1329).
1358-1358: LGTM: Source composition updated correctly.The change from using separate source lists to consolidating with
ZLIB_GENERIC_SRCSensures the conditionally-selected fallback sources are properly included in the build.
1174-1177: Verify build & tests with WITH_OPTIM=OFF
CMakeLists.txt correctly forces WITH_ALL_FALLBACKS=ON and functable.c enables fallback code under that flag; confirm the project builds and all tests pass with -DWITH_OPTIM=OFF.functable.c (1)
85-85: LGTM: Optimization code properly gated.The
WITH_OPTIMguards correctly wrap all architecture-specific optimization code (lines 85-322), ensuring that when optimizations are disabled, only the C fallback implementations are used. This aligns with the automaticWITH_ALL_FALLBACKS=ONsetting in CMakeLists.txt (line 1176) whenWITH_OPTIM=OFF.Also applies to: 322-322
b55be08 to
8804f69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
CMakeLists.txt (1)
1318-1339: x86_64 fallback pruning is correct; resolves prior feedback.The x86_64 branch excludes chunkset_c.c and only includes compare256_c.c when NOT HAVE_BUILTIN_CTZ. This matches the “SSE2 as fallback” rationale and removes unused C fallbacks on x86_64.
🧹 Nitpick comments (2)
.github/workflows/pkgcheck.yml (1)
150-151: Consistent with pkgcheck; consider documenting why ABI steps differ.Compat job also enables WITH_ALL_FALLBACKS. If ABI steps are meant to differ, a brief comment in this workflow would prevent regressions.
CMakeLists.txt (1)
1174-1177: Force WITH_ALL_FALLBACKS in cache when WITH_OPTIM=OFF.set(WITH_ALL_FALLBACKS ON) overrides the normal variable, but not the cache entry. Force-setting the cache makes GUIs/feature summaries consistent and avoids user cache overrides lingering.
-else() - # If WITH_OPTIM is disabled, we need all the fallbacks. - set(WITH_ALL_FALLBACKS ON) -endif() +else() + # If WITH_OPTIM is disabled, we need all the fallbacks. + set(WITH_ALL_FALLBACKS ON CACHE BOOL "Build all generic fallback functions (Useful for Gbench)" FORCE) +endif()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(7 hunks)README.md(2 hunks)configure(1 hunks)functable.c(4 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- test/benchmarks/benchmark_slidehash.cc
- configure
- README.md
- functable.c
🔇 Additional comments (3)
.github/workflows/pkgcheck.yml (1)
140-141: Align ABI steps or confirm intentional divergence.Good to enable WITH_ALL_FALLBACKS for pkgcheck. The ABI check steps still use CMAKE_ARGS without this flag. If parity is desired, mirror the flag there to avoid config drift; otherwise, confirm it’s intentional that ABI checks run without fallbacks.
CMakeLists.txt (2)
93-94: New build toggle looks good.WITH_ALL_FALLBACKS is a sensible, narrowly-scoped option. No issues.
1584-1585: Feature summary entry added appropriately.Exposes WITH_ALL_FALLBACKS in FeatureSummary; matches the new option.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1984 +/- ##
============================================
+ Coverage 39.28% 82.44% +43.16%
============================================
Files 73 161 +88
Lines 7904 13762 +5858
Branches 1280 3131 +1851
============================================
+ Hits 3105 11346 +8241
+ Misses 4575 1382 -3193
- Partials 224 1034 +810 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
x86_64 always has SSE2, therefore we can use the SSE2 functions as fallbacks instead of
Summary by CodeRabbit
New Features
Refactor
Tests
Documentation
Chores