这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@ccawley2011
Copy link
Contributor

@ccawley2011 ccawley2011 commented Jul 13, 2023

This uses the __uqsub16 instruction from newer versions of arm_acle.h, with fallbacks for MSVC and older GCC.

Results from benchmark_zlib on a Raspberry Pi 1B:

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
slide_hash/c/1024         2351799 ns      1543374 ns          462
slide_hash/c/2048         1666346 ns      1588153 ns          426
slide_hash/c/4096         3016597 ns      1687021 ns          424
slide_hash/c/8192         1952722 ns      1768284 ns          395
slide_hash/c/16384        2239252 ns      1986384 ns          354
slide_hash/c/32768        2786077 ns      2473195 ns          282
slide_hash/armv6/1024     1038111 ns       923860 ns          755
slide_hash/armv6/2048     1104154 ns       949017 ns          746
slide_hash/armv6/4096     1225774 ns       968121 ns          726
slide_hash/armv6/8192     1187857 ns      1051441 ns          667
slide_hash/armv6/16384    1374731 ns      1218741 ns          595
slide_hash/armv6/32768    1614386 ns      1552546 ns          460
slide_hash/neon/1024   ERROR OCCURRED: 'CPU does not support neon'
slide_hash/neon/2048   ERROR OCCURRED: 'CPU does not support neon'
slide_hash/neon/4096   ERROR OCCURRED: 'CPU does not support neon'
slide_hash/neon/8192   ERROR OCCURRED: 'CPU does not support neon'
slide_hash/neon/16384  ERROR OCCURRED: 'CPU does not support neon'
slide_hash/neon/32768  ERROR OCCURRED: 'CPU does not support neon'

@codecov
Copy link

codecov bot commented Jul 13, 2023

Codecov Report

Patch coverage: 92.30% and project coverage change: +0.03% 🎉

Comparison is base (4838f0e) 83.88% compared to head (2e7aa08) 83.91%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1538      +/-   ##
===========================================
+ Coverage    83.88%   83.91%   +0.03%     
===========================================
  Files          132      133       +1     
  Lines        10843    10882      +39     
  Branches      2801     2805       +4     
===========================================
+ Hits          9096     9132      +36     
- Misses        1048     1049       +1     
- Partials       699      701       +2     
Flag Coverage Δ
macos_clang 42.97% <ø> (ø)
macos_gcc 73.61% <ø> (ø)
ubuntu_clang 82.50% <ø> (ø)
ubuntu_clang_debug 82.14% <ø> (ø)
ubuntu_clang_inflate_allow_invalid_dist 82.15% <ø> (+0.22%) ⬆️
ubuntu_clang_inflate_strict 82.49% <ø> (+0.22%) ⬆️
ubuntu_clang_mmap 82.82% <ø> (ø)
ubuntu_clang_pigz 13.91% <ø> (-0.06%) ⬇️
ubuntu_clang_pigz_no_optim 11.51% <ø> (ø)
ubuntu_clang_pigz_no_threads 13.78% <ø> (ø)
ubuntu_clang_reduced_mem 82.90% <ø> (ø)
ubuntu_clang_toolchain_riscv ∅ <ø> (∅)
ubuntu_gcc 75.23% <ø> (-0.08%) ⬇️
ubuntu_gcc_aarch64 77.40% <100.00%> (+<0.01%) ⬆️
ubuntu_gcc_aarch64_compat_no_opt 75.64% <ø> (ø)
ubuntu_gcc_aarch64_no_acle 76.17% <100.00%> (+<0.01%) ⬆️
ubuntu_gcc_aarch64_no_neon 76.16% <100.00%> (+<0.01%) ⬆️
ubuntu_gcc_armhf 77.17% <13.88%> (-0.30%) ⬇️
ubuntu_gcc_armhf_compat_no_opt 75.60% <ø> (ø)
ubuntu_gcc_armhf_no_acle 77.12% <13.88%> (-0.30%) ⬇️
ubuntu_gcc_armhf_no_neon 77.25% <92.10%> (-0.08%) ⬇️
ubuntu_gcc_armsf 74.60% <88.88%> (-0.06%) ⬇️
ubuntu_gcc_armsf_compat_no_opt 74.09% <ø> (ø)
ubuntu_gcc_benchmark 73.57% <ø> (ø)
ubuntu_gcc_compat_no_opt 76.85% <ø> (+0.01%) ⬆️
ubuntu_gcc_compat_sprefix 73.57% <ø> (-0.16%) ⬇️
ubuntu_gcc_m32 73.23% <ø> (ø)
ubuntu_gcc_mingw_i686 73.65% <ø> (+0.15%) ⬆️
ubuntu_gcc_mingw_x86_64 73.51% <ø> (ø)
ubuntu_gcc_mips 74.97% <ø> (ø)
ubuntu_gcc_mips64 74.98% <ø> (ø)
ubuntu_gcc_no_avx2 74.28% <ø> (-0.06%) ⬇️
ubuntu_gcc_no_ctz 74.65% <ø> (ø)
ubuntu_gcc_no_ctzll 74.64% <ø> (ø)
ubuntu_gcc_no_pclmulqdq 74.28% <ø> (ø)
ubuntu_gcc_no_sse2 74.54% <ø> (ø)
ubuntu_gcc_no_sse42 74.73% <ø> (ø)
ubuntu_gcc_o1 74.18% <ø> (ø)
ubuntu_gcc_osb ∅ <ø> (∅)
ubuntu_gcc_pigz 38.15% <ø> (-0.08%) ⬇️
ubuntu_gcc_pigz_aarch64 39.09% <100.00%> (+0.10%) ⬆️
ubuntu_gcc_ppc 73.92% <ø> (ø)
ubuntu_gcc_ppc64 74.36% <ø> (ø)
ubuntu_gcc_ppc64_power9 74.53% <ø> (ø)
ubuntu_gcc_ppc64le 74.43% <ø> (ø)
ubuntu_gcc_ppc64le_novsx 74.75% <ø> (ø)
ubuntu_gcc_ppc64le_power9 74.31% <ø> (ø)
ubuntu_gcc_ppc_no_power8 74.63% <ø> (ø)
ubuntu_gcc_s390x 74.80% <ø> (ø)
ubuntu_gcc_s390x_dfltcc 71.92% <ø> (ø)
ubuntu_gcc_s390x_dfltcc_compat 73.98% <ø> (ø)
ubuntu_gcc_s390x_no_crc32 74.59% <ø> (ø)
ubuntu_gcc_sparc64 74.79% <ø> (ø)
ubuntu_gcc_sprefix 73.24% <ø> (ø)
win64_gcc 73.95% <ø> (ø)
win64_gcc_compat_no_opt 74.71% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
test/benchmarks/benchmark_slidehash.cc 72.00% <ø> (ø)
functable.c 74.28% <50.00%> (-0.29%) ⬇️
arch/arm/arm_features.c 88.23% <71.42%> (-11.77%) ⬇️
arch/arm/slide_hash_armv6.c 100.00% <100.00%> (ø)

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Dead2 Dead2 added the Rebase needed Please do a 'git rebase develop yourbranch' label Jul 16, 2023
@Dead2
Copy link
Member

Dead2 commented Jul 16, 2023

This looks pretty good to me, but I have not tested it myself.
But it does need a rebase now.

@ccawley2011 What about your other pending PRs? Should we try to find someone else to come in and finish them up for inclusion?

@mtl1979
Copy link
Collaborator

mtl1979 commented Jul 17, 2023

As I've said earlier, we support all vector instruction sets, no matter if 32-bit wide or wider... As such I have nothing against adding support for optimizations using x2 intrinsics.

After adding -latomic to the test and benchmark binaries, I could compile targeting armv6l.

slide_hash/c/1024          219620 ns       219619 ns         3118
slide_hash/c/2048          220503 ns       220505 ns         3131
slide_hash/c/4096          225817 ns       225809 ns         3060
slide_hash/c/8192          239124 ns       239125 ns         2931
slide_hash/c/16384         264708 ns       264707 ns         2620
slide_hash/c/32768         319575 ns       319575 ns         2194
slide_hash/armv6/1024       64614 ns        64615 ns        10306
slide_hash/armv6/2048       65148 ns        65148 ns        10110
slide_hash/armv6/4096       66986 ns        66985 ns        10262
slide_hash/armv6/8192       70785 ns        70785 ns         9973
slide_hash/armv6/16384      77270 ns        77270 ns         8997
slide_hash/armv6/32768      91714 ns        91714 ns         7564

@nmoinvaz nmoinvaz added optimization Architecture Architecture specific labels Jul 17, 2023
@mtl1979
Copy link
Collaborator

mtl1979 commented Jul 18, 2023

CMake tests seem to fail even though configure tests pass:

CMake:

-- Performing Test HAVE_ARMV6_INLINE_ASM
-- Performing Test HAVE_ARMV6_INLINE_ASM - Failed
-- Performing Test HAVE_ARMV6_INTRIN
-- Performing Test HAVE_ARMV6_INTRIN - Failed

Configure:

Check whether -march=armv6 works ... No.
Checking for ARMv6 inline assembly ... Yes.
Checking for ARMv6 intrinsics ... Yes.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jul 18, 2023

I also tried compiling for 32-bit ARMv8-A and short-circuiting the feature test:

slide_hash/c/1024                           218280 ns       218280 ns         3119
slide_hash/c/2048                           220862 ns       220863 ns         3095
slide_hash/c/4096                           225947 ns       225949 ns         3056
slide_hash/c/8192                           240282 ns       240282 ns         2903
slide_hash/c/16384                          266467 ns       266467 ns         2606
slide_hash/c/32768                          318262 ns       318264 ns         2207
slide_hash/armv6/1024                        61459 ns        61457 ns        10814
slide_hash/armv6/2048                        62265 ns        62265 ns        10918
slide_hash/armv6/4096                        64613 ns        64613 ns        10496
slide_hash/armv6/8192                        67780 ns        67780 ns        10103
slide_hash/armv6/16384                       75325 ns        75325 ns         9235
slide_hash/armv6/32768                       90238 ns        90239 ns         7710
slide_hash/neon/1024                         20416 ns        20416 ns        33979
slide_hash/neon/2048                         20812 ns        20813 ns        33433
slide_hash/neon/4096                         21430 ns        21430 ns        32242
slide_hash/neon/8192                         22770 ns        22770 ns        30518
slide_hash/neon/16384                        25395 ns        25395 ns        27580
slide_hash/neon/32768                        30330 ns        30328 ns        23137

@ccawley2011 ccawley2011 force-pushed the slide-hash-armv6 branch 2 times, most recently from da62d60 to 1ccb42a Compare September 13, 2023 13:14
@ccawley2011
Copy link
Contributor Author

This should be ready to merge now.

@nmoinvaz
Copy link
Member

This PR seems to fail:
https://github.com/zlib-ng/zlib-ng/actions/runs/6172998856/job/16754372490?pr=1538
It might suggest something is not syncing up between configure and CMake outputs.

@ccawley2011
Copy link
Contributor Author

This PR seems to fail: https://github.com/zlib-ng/zlib-ng/actions/runs/6172998856/job/16754372490?pr=1538 It might suggest something is not syncing up between configure and CMake outputs.

This was because configure checked if -march=armv6 works before using it to check for ARMv6 intrinsics, but CMake didn't. That's fixed now.

Copy link
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed with one change request, after that I think it is ready for approval.

@nmoinvaz nmoinvaz removed the Rebase needed Please do a 'git rebase develop yourbranch' label Sep 15, 2023
Copy link
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit 16fe1f8 into zlib-ng:develop Sep 16, 2023
@ccawley2011 ccawley2011 deleted the slide-hash-armv6 branch September 26, 2023 19:14
@Dead2 Dead2 mentioned this pull request Oct 13, 2023

static inline uint16x2_t __uqsub16(uint16x2_t __a, uint16x2_t __b) {
uint16x2_t __c;
__asm__ __volatile__("uqsub16\t%0, %1, %2" : "=r" (__c) : "r"(__a), "r"(__b));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccawley2011 is it intentional that \t follows uqsub16? I noticed that the CMake checks use a space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Architecture Architecture specific optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants