这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@mtl1979
Copy link
Collaborator

@mtl1979 mtl1979 commented Jul 1, 2023

See #1529.

@codecov
Copy link

codecov bot commented Jul 1, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (cf89cf3) 83.90% compared to head (30a6204) 83.89%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1531      +/-   ##
===========================================
- Coverage    83.90%   83.89%   -0.01%     
===========================================
  Files          132      132              
  Lines        10842    10842              
  Branches      2793     2793              
===========================================
- Hits          9097     9096       -1     
- Misses        1046     1047       +1     
  Partials       699      699              
Flag Coverage Δ
macos_clang 42.97% <ø> (ø)
macos_gcc 73.67% <ø> (ø)
ubuntu_clang 82.51% <ø> (ø)
ubuntu_clang_debug 82.15% <ø> (ø)
ubuntu_clang_inflate_allow_invalid_dist 82.17% <ø> (ø)
ubuntu_clang_inflate_strict 82.28% <ø> (-1.26%) ⬇️
ubuntu_clang_mmap 82.84% <ø> (ø)
ubuntu_clang_pigz 14.13% <ø> (ø)
ubuntu_clang_pigz_no_optim 11.51% <ø> (ø)
ubuntu_clang_pigz_no_threads 13.95% <ø> (ø)
ubuntu_clang_reduced_mem 82.91% <ø> (ø)
ubuntu_clang_toolchain_riscv ?
ubuntu_gcc 75.31% <ø> (ø)
ubuntu_gcc_aarch64 77.41% <ø> (ø)
ubuntu_gcc_aarch64_compat_no_opt 75.67% <ø> (-0.02%) ⬇️
ubuntu_gcc_aarch64_no_acle 76.22% <ø> (ø)
ubuntu_gcc_aarch64_no_neon 76.22% <ø> (ø)
ubuntu_gcc_armhf 77.48% <ø> (ø)
ubuntu_gcc_armhf_compat_no_opt 75.66% <ø> (ø)
ubuntu_gcc_armhf_no_acle 77.43% <ø> (ø)
ubuntu_gcc_armhf_no_neon 77.33% <ø> (ø)
ubuntu_gcc_armsf 74.79% <ø> (ø)
ubuntu_gcc_armsf_compat_no_opt 74.25% <ø> (-0.02%) ⬇️
ubuntu_gcc_benchmark 73.70% <ø> (-0.25%) ⬇️
ubuntu_gcc_compat_no_opt 76.86% <ø> (ø)
ubuntu_gcc_compat_sprefix 73.72% <ø> (ø)
ubuntu_gcc_m32 73.36% <ø> (ø)
ubuntu_gcc_mingw_i686 73.57% <ø> (-0.57%) ⬇️
ubuntu_gcc_mingw_x86_64 73.58% <ø> (-0.57%) ⬇️
ubuntu_gcc_mips 75.04% <ø> (ø)
ubuntu_gcc_mips64 75.06% <ø> (ø)
ubuntu_gcc_no_avx2 74.33% <ø> (ø)
ubuntu_gcc_no_ctz 74.78% <ø> (ø)
ubuntu_gcc_no_ctzll 74.77% <ø> (ø)
ubuntu_gcc_no_pclmulqdq 74.32% <ø> (-0.71%) ⬇️
ubuntu_gcc_no_sse2 74.59% <ø> (ø)
ubuntu_gcc_no_sse42 74.77% <ø> (ø)
ubuntu_gcc_o1 74.16% <ø> (-0.06%) ⬇️
ubuntu_gcc_osb ∅ <ø> (∅)
ubuntu_gcc_pigz 38.26% <ø> (-0.06%) ⬇️
ubuntu_gcc_pigz_aarch64 39.23% <ø> (-0.15%) ⬇️
ubuntu_gcc_ppc 74.05% <ø> (ø)
ubuntu_gcc_ppc64 74.49% <ø> (ø)
ubuntu_gcc_ppc64_power9 74.66% <ø> (ø)
ubuntu_gcc_ppc64le 74.50% <ø> (ø)
ubuntu_gcc_ppc64le_novsx 74.88% <ø> (ø)
ubuntu_gcc_ppc64le_power9 74.44% <ø> (ø)
ubuntu_gcc_ppc_no_power8 74.77% <ø> (ø)
ubuntu_gcc_s390x 74.93% <ø> (ø)
ubuntu_gcc_s390x_dfltcc 72.00% <ø> (ø)
ubuntu_gcc_s390x_dfltcc_compat 74.02% <ø> (ø)
ubuntu_gcc_s390x_no_crc32 74.73% <ø> (ø)
ubuntu_gcc_sparc64 74.87% <ø> (ø)
ubuntu_gcc_sprefix 73.37% <ø> (ø)
win64_gcc 74.07% <ø> (ø)
win64_gcc_compat_no_opt 74.86% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
arch/x86/adler32_sse42.c 100.00% <ø> (ø)
functable.c 74.56% <ø> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nmoinvaz nmoinvaz added the Architecture Architecture specific label Jul 2, 2023
@Dead2
Copy link
Member

Dead2 commented Jul 15, 2023

I think I'd prefer it if configure was changed to appending native instead of replacing the instruction set enabling compiler parameters. That would make it behave closer to the same as cmake and at the same time fix this problem.
IMO, that would be a cleaner solution, with the small benefit of still including optimizations for better instruction sets (in case of future cpu upgrades), and only setting a lower bound to equal the compiling cpus instruction sets.

And as an aside to this; The whole native support could be improved to possibly even bypass functable and such at some point, but that is a pretty big and invasive change.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 15, 2023

I think I'd prefer it if configure was changed to appending native instead of replacing the instruction set enabling compiler parameters. That would make it behave closer to the same as cmake and at the same time fix this problem.

That wouldn't fix this specific issue which is basically caused by oversimplified preprocessor defines for SSE4 tests... Basically compiler can use inline assembly for SSE4, but our code has no inline assembly for SSSE3 instructions or older.

@KungFuJesus
Copy link
Contributor

KungFuJesus commented Jul 16, 2023

So the code itself only actually uses ssse3 instructions, but it's intentionally gated to use sse4.2 because the checksum + copy routine is actually slower on pre sse4.2 CPUs due to the misalignment penalty prior to nehalem. The ssse3 checksum version is the same but it does aligning scalar sums before doing purely aligned loads. The copy is then handled separately.

@Dead2
Copy link
Member

Dead2 commented Jul 16, 2023

Ah, there are two different issues then.

But I still think this one is a configure/cmake issue. There are no cpus supporting SSE4.2 that do not also support SSSE3, and enabling SSE4.2 (-msse42 in GCC) will also enable all its predecessors. So how can configure then say that SSE4.2 is available, but SSSE3 is not?

I think this part of configure should be changed, as it is kind of backwards today:
https://github.com/zlib-ng/zlib-ng/blob/develop/configure#L1609

            if test ${HAVE_SSE42CRC_INTRIN} -eq 1 || test ${HAVE_SSE42CRC_INLINE_ASM} -eq 1; then
                CFLAGS="${CFLAGS} -DX86_SSE42"
                SFLAGS="${SFLAGS} -DX86_SSE42"

                if test ${HAVE_SSE42CRC_INTRIN} -eq 1; then
                  CFLAGS="${CFLAGS} -DX86_SSE42_CRC_INTRIN"
                  SFLAGS="${SFLAGS} -DX86_SSE42_CRC_INTRIN"
                fi

I think that code snippet should instead set: X86_SSE42 if intrinsics are available, and X86_SSE42_CRC_ASM if inline asm is supported. Perhaps also skip the asm fallback define entirely if intrinsics are supported?
This requires some minor changes to the ifdef checks of course, but IMO this would be the more logical way to do this. The intrinsics are after all the primary, and inline asm is the fallback. This is also how all the other instruction set defines are set (for x86 at least), this one is the odd one out, easily causing these kinds of weird issues and confusions.

@Dead2
Copy link
Member

Dead2 commented Jul 16, 2023

So the code itself only actually uses ssse3 instructions, but it's intentionally gated to use sse4.2 because the checksum + copy routine is actually slower on pre sse4.2 CPUs due to the misalignment penalty prior to nehalem. The ssse3 checksum version is the same but it does aligning scalar sums before doing purely aligned loads. The copy is then handled separately.

This would be a great comment to add to the code actually.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 16, 2023

Ah, there are two different issues then.

But I still think this one is a configure/cmake issue. There are no cpus supporting SSE4.2 that do not also support SSSE3, and enabling SSE4.2 (-msse42 in GCC) will also enable all its predecessors. So how can configure then say that SSE4.2 is available, but SSSE3 is not?

This isn't about supporting real CPUs... This is only about working around broken cpuid instruction in virtualized Docker containers. As much as I'd like to have best possible patch for everything, I'm really against making huge rewrite if there is shorter solution. The inline assembly was added to work around older compilers that didn't have support for CRC intrinsics but could target SSE4. There is still a lot of systems that are stuck using older compilers due to bugs in newer versions that only affect certain processors that we officially still support, but gcc for example doesn't.

@Dead2
Copy link
Member

Dead2 commented Jul 16, 2023

I know this is not about cpu support, this is about the way we do detection, and for SSE4.2 that is done in a strange way that in this case breaks builds and assumptions. I don't see this as being a huge rewrite, it is rather a small number of defines that need renaming and a few ifdef checks that need to be changed. It would in fact improve code quality.
If you don't want to have a go at it, i'll try to find some time for it in the next couple of days, still have a bunch of RL backlog after 2 weeks on vacation.

@mtl1979
Copy link
Collaborator Author

mtl1979 commented Jul 16, 2023

If you don't want to have a go at it, i'll try to find some time for it in the next couple of days, still have a bunch of RL backlog after 2 weeks on vacation.

I have a real job to focus on currently... When I joined zlib-ng originally, I had been unemployed for over 10 years, so any project was good to keep my skills up-to-date. I still intend to make small patches once in a while, but I leave bigger ones to people who have more spare time.

@KungFuJesus
Copy link
Contributor

This would be a great comment to add to the code actually.

I can make a pull request for that if it'd add clarity. I think the commit messages said something of the same effect.

@Dead2
Copy link
Member

Dead2 commented Aug 6, 2023

This is not needed after #1542

@Dead2 Dead2 closed this Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Architecture Architecture specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants