Fix building without SSE4.2 inside Docker container. #1531

mtl1979 · 2023-07-01T12:38:20Z

codecov · 2023-07-01T12:41:26Z

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (cf89cf3) 83.90% compared to head (30a6204) 83.89%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1531      +/-   ##
===========================================
- Coverage    83.90%   83.89%   -0.01%     
===========================================
  Files          132      132              
  Lines        10842    10842              
  Branches      2793     2793              
===========================================
- Hits          9097     9096       -1     
- Misses        1046     1047       +1     
  Partials       699      699

Flag	Coverage Δ
macos_clang	`42.97% <ø> (ø)`
macos_gcc	`73.67% <ø> (ø)`
ubuntu_clang	`82.51% <ø> (ø)`
ubuntu_clang_debug	`82.15% <ø> (ø)`
ubuntu_clang_inflate_allow_invalid_dist	`82.17% <ø> (ø)`
ubuntu_clang_inflate_strict	`82.28% <ø> (-1.26%)`	⬇️
ubuntu_clang_mmap	`82.84% <ø> (ø)`
ubuntu_clang_pigz	`14.13% <ø> (ø)`
ubuntu_clang_pigz_no_optim	`11.51% <ø> (ø)`
ubuntu_clang_pigz_no_threads	`13.95% <ø> (ø)`
ubuntu_clang_reduced_mem	`82.91% <ø> (ø)`
ubuntu_clang_toolchain_riscv	`?`
ubuntu_gcc	`75.31% <ø> (ø)`
ubuntu_gcc_aarch64	`77.41% <ø> (ø)`
ubuntu_gcc_aarch64_compat_no_opt	`75.67% <ø> (-0.02%)`	⬇️
ubuntu_gcc_aarch64_no_acle	`76.22% <ø> (ø)`
ubuntu_gcc_aarch64_no_neon	`76.22% <ø> (ø)`
ubuntu_gcc_armhf	`77.48% <ø> (ø)`
ubuntu_gcc_armhf_compat_no_opt	`75.66% <ø> (ø)`
ubuntu_gcc_armhf_no_acle	`77.43% <ø> (ø)`
ubuntu_gcc_armhf_no_neon	`77.33% <ø> (ø)`
ubuntu_gcc_armsf	`74.79% <ø> (ø)`
ubuntu_gcc_armsf_compat_no_opt	`74.25% <ø> (-0.02%)`	⬇️
ubuntu_gcc_benchmark	`73.70% <ø> (-0.25%)`	⬇️
ubuntu_gcc_compat_no_opt	`76.86% <ø> (ø)`
ubuntu_gcc_compat_sprefix	`73.72% <ø> (ø)`
ubuntu_gcc_m32	`73.36% <ø> (ø)`
ubuntu_gcc_mingw_i686	`73.57% <ø> (-0.57%)`	⬇️
ubuntu_gcc_mingw_x86_64	`73.58% <ø> (-0.57%)`	⬇️
ubuntu_gcc_mips	`75.04% <ø> (ø)`
ubuntu_gcc_mips64	`75.06% <ø> (ø)`
ubuntu_gcc_no_avx2	`74.33% <ø> (ø)`
ubuntu_gcc_no_ctz	`74.78% <ø> (ø)`
ubuntu_gcc_no_ctzll	`74.77% <ø> (ø)`
ubuntu_gcc_no_pclmulqdq	`74.32% <ø> (-0.71%)`	⬇️
ubuntu_gcc_no_sse2	`74.59% <ø> (ø)`
ubuntu_gcc_no_sse42	`74.77% <ø> (ø)`
ubuntu_gcc_o1	`74.16% <ø> (-0.06%)`	⬇️
ubuntu_gcc_osb	`∅ <ø> (∅)`
ubuntu_gcc_pigz	`38.26% <ø> (-0.06%)`	⬇️
ubuntu_gcc_pigz_aarch64	`39.23% <ø> (-0.15%)`	⬇️
ubuntu_gcc_ppc	`74.05% <ø> (ø)`
ubuntu_gcc_ppc64	`74.49% <ø> (ø)`
ubuntu_gcc_ppc64_power9	`74.66% <ø> (ø)`
ubuntu_gcc_ppc64le	`74.50% <ø> (ø)`
ubuntu_gcc_ppc64le_novsx	`74.88% <ø> (ø)`
ubuntu_gcc_ppc64le_power9	`74.44% <ø> (ø)`
ubuntu_gcc_ppc_no_power8	`74.77% <ø> (ø)`
ubuntu_gcc_s390x	`74.93% <ø> (ø)`
ubuntu_gcc_s390x_dfltcc	`72.00% <ø> (ø)`
ubuntu_gcc_s390x_dfltcc_compat	`74.02% <ø> (ø)`
ubuntu_gcc_s390x_no_crc32	`74.73% <ø> (ø)`
ubuntu_gcc_sparc64	`74.87% <ø> (ø)`
ubuntu_gcc_sprefix	`73.37% <ø> (ø)`
win64_gcc	`74.07% <ø> (ø)`
win64_gcc_compat_no_opt	`74.86% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
arch/x86/adler32_sse42.c	`100.00% <ø> (ø)`
functable.c	`74.56% <ø> (ø)`

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Dead2 · 2023-07-15T23:10:05Z

I think I'd prefer it if configure was changed to appending native instead of replacing the instruction set enabling compiler parameters. That would make it behave closer to the same as cmake and at the same time fix this problem.
IMO, that would be a cleaner solution, with the small benefit of still including optimizations for better instruction sets (in case of future cpu upgrades), and only setting a lower bound to equal the compiling cpus instruction sets.

And as an aside to this; The whole native support could be improved to possibly even bypass functable and such at some point, but that is a pretty big and invasive change.

mtl1979 · 2023-07-15T23:27:11Z

I think I'd prefer it if configure was changed to appending native instead of replacing the instruction set enabling compiler parameters. That would make it behave closer to the same as cmake and at the same time fix this problem.

That wouldn't fix this specific issue which is basically caused by oversimplified preprocessor defines for SSE4 tests... Basically compiler can use inline assembly for SSE4, but our code has no inline assembly for SSSE3 instructions or older.

KungFuJesus · 2023-07-16T00:25:09Z

So the code itself only actually uses ssse3 instructions, but it's intentionally gated to use sse4.2 because the checksum + copy routine is actually slower on pre sse4.2 CPUs due to the misalignment penalty prior to nehalem. The ssse3 checksum version is the same but it does aligning scalar sums before doing purely aligned loads. The copy is then handled separately.

Dead2 · 2023-07-16T10:40:31Z

Ah, there are two different issues then.

But I still think this one is a configure/cmake issue. There are no cpus supporting SSE4.2 that do not also support SSSE3, and enabling SSE4.2 (-msse42 in GCC) will also enable all its predecessors. So how can configure then say that SSE4.2 is available, but SSSE3 is not?

I think this part of configure should be changed, as it is kind of backwards today:
https://github.com/zlib-ng/zlib-ng/blob/develop/configure#L1609

            if test ${HAVE_SSE42CRC_INTRIN} -eq 1 || test ${HAVE_SSE42CRC_INLINE_ASM} -eq 1; then
                CFLAGS="${CFLAGS} -DX86_SSE42"
                SFLAGS="${SFLAGS} -DX86_SSE42"

                if test ${HAVE_SSE42CRC_INTRIN} -eq 1; then
                  CFLAGS="${CFLAGS} -DX86_SSE42_CRC_INTRIN"
                  SFLAGS="${SFLAGS} -DX86_SSE42_CRC_INTRIN"
                fi

I think that code snippet should instead set: X86_SSE42 if intrinsics are available, and X86_SSE42_CRC_ASM if inline asm is supported. Perhaps also skip the asm fallback define entirely if intrinsics are supported?
This requires some minor changes to the ifdef checks of course, but IMO this would be the more logical way to do this. The intrinsics are after all the primary, and inline asm is the fallback. This is also how all the other instruction set defines are set (for x86 at least), this one is the odd one out, easily causing these kinds of weird issues and confusions.

Dead2 · 2023-07-16T10:52:28Z

So the code itself only actually uses ssse3 instructions, but it's intentionally gated to use sse4.2 because the checksum + copy routine is actually slower on pre sse4.2 CPUs due to the misalignment penalty prior to nehalem. The ssse3 checksum version is the same but it does aligning scalar sums before doing purely aligned loads. The copy is then handled separately.

This would be a great comment to add to the code actually.

mtl1979 · 2023-07-16T12:16:57Z

Ah, there are two different issues then.

But I still think this one is a configure/cmake issue. There are no cpus supporting SSE4.2 that do not also support SSSE3, and enabling SSE4.2 (-msse42 in GCC) will also enable all its predecessors. So how can configure then say that SSE4.2 is available, but SSSE3 is not?

This isn't about supporting real CPUs... This is only about working around broken cpuid instruction in virtualized Docker containers. As much as I'd like to have best possible patch for everything, I'm really against making huge rewrite if there is shorter solution. The inline assembly was added to work around older compilers that didn't have support for CRC intrinsics but could target SSE4. There is still a lot of systems that are stuck using older compilers due to bugs in newer versions that only affect certain processors that we officially still support, but gcc for example doesn't.

Dead2 · 2023-07-16T13:58:31Z

I know this is not about cpu support, this is about the way we do detection, and for SSE4.2 that is done in a strange way that in this case breaks builds and assumptions. I don't see this as being a huge rewrite, it is rather a small number of defines that need renaming and a few ifdef checks that need to be changed. It would in fact improve code quality.
If you don't want to have a go at it, i'll try to find some time for it in the next couple of days, still have a bunch of RL backlog after 2 weeks on vacation.

mtl1979 · 2023-07-16T14:11:00Z

If you don't want to have a go at it, i'll try to find some time for it in the next couple of days, still have a bunch of RL backlog after 2 weeks on vacation.

I have a real job to focus on currently... When I joined zlib-ng originally, I had been unemployed for over 10 years, so any project was good to keep my skills up-to-date. I still intend to make small patches once in a while, but I leave bigger ones to people who have more spare time.

KungFuJesus · 2023-07-17T13:31:25Z

This would be a great comment to add to the code actually.

I can make a pull request for that if it'd add clarity. I think the commit messages said something of the same effect.

Dead2 · 2023-08-06T08:21:59Z

This is not needed after #1542

Fix building without SSE4.2 inside Docker container.

30a6204

nmoinvaz added the Architecture Architecture specific label Jul 2, 2023

Dead2 closed this Aug 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix building without SSE4.2 inside Docker container. #1531

Fix building without SSE4.2 inside Docker container. #1531

Uh oh!

mtl1979 commented Jul 1, 2023

Uh oh!

codecov bot commented Jul 1, 2023 •

edited

Loading

Uh oh!

Dead2 commented Jul 15, 2023

Uh oh!

mtl1979 commented Jul 15, 2023

Uh oh!

KungFuJesus commented Jul 16, 2023 •

edited

Loading

Uh oh!

Dead2 commented Jul 16, 2023

Uh oh!

Dead2 commented Jul 16, 2023

Uh oh!

mtl1979 commented Jul 16, 2023

Uh oh!

Dead2 commented Jul 16, 2023

Uh oh!

mtl1979 commented Jul 16, 2023

Uh oh!

KungFuJesus commented Jul 17, 2023

Uh oh!

Dead2 commented Aug 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Fix building without SSE4.2 inside Docker container. #1531

Fix building without SSE4.2 inside Docker container. #1531

Uh oh!

Conversation

mtl1979 commented Jul 1, 2023

Uh oh!

codecov bot commented Jul 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Dead2 commented Jul 15, 2023

Uh oh!

mtl1979 commented Jul 15, 2023

Uh oh!

KungFuJesus commented Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dead2 commented Jul 16, 2023

Uh oh!

Dead2 commented Jul 16, 2023

Uh oh!

mtl1979 commented Jul 16, 2023

Uh oh!

Dead2 commented Jul 16, 2023

Uh oh!

mtl1979 commented Jul 16, 2023

Uh oh!

KungFuJesus commented Jul 17, 2023

Uh oh!

Dead2 commented Aug 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jul 1, 2023 •

edited

Loading

KungFuJesus commented Jul 16, 2023 •

edited

Loading