这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@ccawley2011
Copy link
Contributor

@ccawley2011 ccawley2011 commented Jul 27, 2023

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it. Using the may_alias attribute ensures that the code for reading and writing one byte at a time is inlined, and should also handle cases where unaligned memory access is allowed (although checking UNALIGNED_OK works better in that regard).

Summary by CodeRabbit

  • New Features

    • Introduced new memory reading and writing functions (zng_memread_*, zng_memwrite_*) for optimized memory operations.
    • Added a new header file, zmemory.h, which consolidates memory management functions.
  • Bug Fixes

    • Replaced memcpy calls with direct memory operations to enhance performance and reduce overhead.
  • Documentation

    • Updated comments in various files to clarify memory handling and performance implications.
  • Chores

    • Updated object file dependencies across multiple Makefiles to reflect the new header file usage.

@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Attention: Patch coverage is 95.34884% with 4 lines in your changes missing coverage. Please review.

Project coverage is 82.41%. Comparing base (04d1b75) to head (2f4631d).
Report is 6 commits behind head on develop.

Files with missing lines Patch % Lines
arch/x86/chunkset_avx512.c 0.00% 3 Missing ⚠️
arch/generic/chunkset_c.c 80.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1548      +/-   ##
===========================================
+ Coverage    82.10%   82.41%   +0.30%     
===========================================
  Files          136      137       +1     
  Lines        10969    10956      -13     
  Branches      2716     2772      +56     
===========================================
+ Hits          9006     9029      +23     
+ Misses        1266     1237      -29     
+ Partials       697      690       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KungFuJesus
Copy link
Contributor

Ah cool, this might actually improve the purely scalar performance I saw testing this on a Sun Fire V240 (depressingly slower than stock zlib). I'll bet the extra function calls here per aliasing load were a big part of that.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jul 28, 2023

Fails a lot of CI tests, need to investigate why...

@ccawley2011 ccawley2011 force-pushed the may-alias branch 2 times, most recently from 8737e10 to 6adb2fc Compare July 28, 2023 08:32
zmemory.h Outdated

static inline uint32_t zng_memread_4(const void *ptr) {
#if defined(UNALIGNED_OK)
return *(const uint32_t *)ptr;
Copy link
Member

@nmoinvaz nmoinvaz Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a ton of work to avoid this kind of thing...
Compiler will convert memcpy to unaligned access if it is supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These UNALIGNED_OK branches should be removed. So there is only HAVE_MAY_ALIAS and the memcpy path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, UNALIGNED_OK branch should be removed.
We for sure don't want to default to that, if it was a new default-off setting like FORCE_UNALIGNED or something, then there would at least be room for discussion.

The HAVE_MAY_ALIAS branch is interesting, I wonder what kind of performance that gets compared to memcpy on the various platforms/compilers. If we do want to use that, detection would preferably be in configure/CMake for cleaner and wider compiler support, or at the very least moved to zbuild.h.

@nmoinvaz
Copy link
Member

nmoinvaz commented Aug 5, 2023

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it.

Do you have a Compiler Explorer example where this is the case?

@phprus
Copy link
Contributor

phprus commented Aug 6, 2023

Does it revert PR #1309?
What modern and supported architectures and compilers are having this issue?

@ccawley2011
Copy link
Contributor Author

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it.

Do you have a Compiler Explorer example where this is the case?

This example showcases the difference on ARM with modern GCC when unaligned memory access is and isn't available. It also covers SPARC as well, where memcpy is inlined for 4 byte reads but not 8 byte reads. It doesn't appear to be an issue on every architecture that requires aligned memory access, but it does occur on the architectures demonstrated. I also haven't investigated how compilers approach 16 and 32 byte reads, which would heavily impact chunkcopy_safe if memcpy doesn't get inlined.

Compiler Explorer link

zmemory.h Outdated

static inline uint32_t zng_memread_4(const void *ptr) {
#if defined(UNALIGNED_OK)
return *(const uint32_t *)ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These UNALIGNED_OK branches should be removed. So there is only HAVE_MAY_ALIAS and the memcpy path.

zmemory.h Outdated
}

static inline int32_t zng_memcmp_4(const void *src0, const void *src1) {
#if defined(UNALIGNED_OK) || defined(HAVE_MAY_ALIAS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the defined(UNALIGNED_OK) from all these. Only HAVE_MAY_ALIAS. We error on the side of "the compiler knows best" about converting memcpy to unaligned access. This is better than undefined behavior that can occur. There are other GitHub PRs related to this including #1100 that might be worth reading.

@ccawley2011
Copy link
Contributor Author

OK, I've rebased the branch and removed the special cases for UNALIGNED_OK and UNALIGNED64_OK.

@Dead2 Dead2 added the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 21, 2024
#if defined(HAVE_MAY_ALIAS)
return zng_memread_2(src0) != zng_memread_2(src1);
#else
return memcmp(src0, src1, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment suggests it should be using memcpy and not memcmp.

Perhaps it should be: return zng_memread_2(src0) != zng_memread_2(src1); no matter what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now does return zng_memread_2(src0) != zng_memread_2(src1); for cases where UNALIGNED_OK is defined, which should cover MSVC.

I'm not sure about using it for all cases since the worst case scenario there would be to emit two memcpy calls instead of one memcmp call, but since this PR covers the most common compilers (GCC >= 4 (and compatible) and MSVC), this should be OK for the time being.

@Dead2 Dead2 removed the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 21, 2024
@Dead2
Copy link
Member

Dead2 commented Feb 23, 2024

I'll have to do some in-depth testing of this, but I am leaving on vacation now with very limited access, so it'll unfortunately have to wait for a while.

@Dead2
Copy link
Member

Dead2 commented Mar 5, 2024

X86-64 i9900K, GCC 13.2

Develop Mar 5 af494fc

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.104/0.108/0.111/0.002        0.033/0.036/0.037/0.001        8,526,745
 2     43.871%      0.197/0.200/0.202/0.001        0.033/0.036/0.038/0.001        6,903,702
 3     42.388%      0.245/0.248/0.250/0.001        0.032/0.035/0.036/0.002        6,670,239
 4     41.647%      0.272/0.277/0.280/0.002        0.032/0.034/0.035/0.001        6,553,746
 5     41.216%      0.299/0.306/0.309/0.002        0.030/0.034/0.035/0.001        6,485,936
 6     41.038%      0.347/0.354/0.357/0.002        0.030/0.034/0.036/0.002        6,457,827
 7     40.778%      0.493/0.501/0.505/0.003        0.030/0.034/0.035/0.001        6,416,941
 8     40.704%      0.610/0.615/0.620/0.003        0.032/0.034/0.035/0.001        6,405,249
 9     40.409%      0.899/0.907/0.912/0.003        0.030/0.033/0.035/0.002        6,358,951

 avg1  42.915%                        0.391                          0.034
 tot                                105.495                          9.250       60,779,336

   text    data     bss     dec     hex filename
 136798    1312       8  138118   21b86 libz-ng.so.2

PR 1548

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.105/0.109/0.111/0.001        0.034/0.035/0.037/0.001        8,526,745
 2     43.871%      0.198/0.200/0.201/0.001        0.031/0.036/0.038/0.002        6,903,702
 3     42.388%      0.241/0.247/0.250/0.002        0.032/0.034/0.035/0.001        6,670,239
 4     41.647%      0.274/0.277/0.279/0.001        0.031/0.034/0.035/0.001        6,553,746
 5     41.216%      0.301/0.307/0.311/0.003        0.031/0.034/0.035/0.001        6,485,936
 6     41.038%      0.350/0.354/0.358/0.002        0.029/0.033/0.035/0.001        6,457,827
 7     40.778%      0.493/0.500/0.503/0.002        0.031/0.034/0.036/0.001        6,416,941
 8     40.704%      0.608/0.616/0.621/0.003        0.030/0.033/0.034/0.001        6,405,249
 9     40.409%      0.899/0.907/0.912/0.004        0.030/0.033/0.035/0.001        6,358,951

 avg1  42.915%                        0.391                          0.034
 tot                                105.539                          9.192       60,779,336

   text    data     bss     dec     hex filename
 136798    1312       8  138118   21b86 libz-ng.so.2

No changes noticeable.

Anyone able to do MSVC builds and benchmarks?

@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 5, 2024

This PR must be rebased. There are some conflicts in the nmake files.

@nmoinvaz nmoinvaz added the Rebase needed Please do a 'git rebase develop yourbranch' label Mar 5, 2024
@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 5, 2024

silesia.tar

MSVC 19.38.33134.0

DEVELOP af494fc

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-develop/Release/minigzip.exe Size: 105,472 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev   Compressed size
 0    100.008%      0.086/0.087/0.088/0.001       211,973,953
 1     44.409%      0.696/0.699/0.700/0.001        94,127,497
 2     35.519%      1.101/1.105/1.108/0.002        75,286,322
 3     33.882%      1.472/1.477/1.481/0.003        71,815,206
 4     33.175%      1.705/1.711/1.714/0.002        70,317,229
 5     32.661%      1.946/1.949/1.952/0.002        69,228,146
 6     32.507%      2.325/2.333/2.337/0.003        68,902,143
 7     32.255%      3.091/3.098/3.106/0.005        68,366,759
 8     32.167%      4.974/5.027/5.094/0.030        68,180,762
 9     31.887%      5.402/5.447/5.522/0.032        67,586,442

 avg1  40.847%                        2.293
 avg2  45.386%                        2.548
 tot                                687.972       865,784,459

PR #1548 rebased on top of DEVELOP

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-1548/Release/minigzip.exe Size: 104,960 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev   Compressed size
 0    100.008%      0.085/0.087/0.087/0.000       211,973,953
 1     44.409%      0.694/0.698/0.699/0.001        94,127,497
 2     35.519%      1.099/1.102/1.104/0.001        75,286,322
 3     33.882%      1.472/1.476/1.479/0.002        71,815,206
 4     33.175%      1.708/1.711/1.713/0.002        70,317,229
 5     32.661%      1.943/1.947/1.950/0.002        69,228,146
 6     32.507%      2.323/2.329/2.333/0.003        68,902,143
 7     32.255%      3.091/3.097/3.100/0.002        68,366,759
 8     32.167%      4.965/5.018/5.054/0.028        68,180,762
 9     31.887%      5.398/5.436/5.471/0.020        67,586,442

 avg1  40.847%                        2.290
 avg2  45.386%                        2.544
 tot                                686.977       865,784,459

I forgot to turn on decompression in the tests so I will check that out later. It is interesting to see a reduction in code size even though it should compile to the same thing.

@nmoinvaz
Copy link
Member

nmoinvaz commented Mar 5, 2024

With decompression

silesia.tar

MSVC 19.38.33134.0

DEVELOP af494fc

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-develop/Release/minigzip.exe Size: 105,472 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.087/0.089/0.089/0.001        0.084/0.085/0.086/0.001      211,973,953
 1     44.409%      0.702/0.706/0.709/0.002        0.341/0.345/0.346/0.001       94,127,497
 2     35.519%      1.111/1.116/1.120/0.002        0.336/0.339/0.341/0.001       75,286,322
 3     33.882%      1.486/1.492/1.500/0.004        0.320/0.324/0.325/0.001       71,815,206
 4     33.175%      1.724/1.733/1.754/0.007        0.314/0.317/0.318/0.001       70,317,229
 5     32.661%      1.957/1.973/1.992/0.009        0.310/0.313/0.314/0.001       69,228,146
 6     32.507%      2.346/2.367/2.405/0.018        0.308/0.312/0.314/0.001       68,902,143
 7     32.255%      3.115/3.142/3.182/0.020        0.307/0.311/0.313/0.001       68,366,759
 8     32.167%      5.091/5.179/5.234/0.041        0.309/0.312/0.313/0.001       68,180,762
 9     31.887%      5.488/5.575/5.633/0.031        0.304/0.307/0.309/0.001       67,586,442

 avg1  40.847%                        2.337                          0.296
 avg2  45.386%                        2.597                          0.329
 tot                                701.133                         88.939      865,784,459

PR #1548 rebased on top of DEVELOP

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-1548/Release/minigzip.exe Size: 104,960 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.087/0.089/0.089/0.001        0.084/0.086/0.086/0.001      211,973,953
 1     44.409%      0.697/0.705/0.709/0.004        0.343/0.346/0.348/0.001       94,127,497
 2     35.519%      1.104/1.121/1.139/0.009        0.334/0.339/0.342/0.002       75,286,322
 3     33.882%      1.479/1.498/1.520/0.012        0.321/0.325/0.327/0.002       71,815,206
 4     33.175%      1.726/1.733/1.743/0.005        0.315/0.317/0.319/0.001       70,317,229
 5     32.661%      1.959/1.970/1.985/0.006        0.309/0.313/0.315/0.001       69,228,146
 6     32.507%      2.345/2.362/2.392/0.012        0.309/0.311/0.313/0.001       68,902,143
 7     32.255%      3.117/3.138/3.180/0.018        0.309/0.311/0.313/0.001       68,366,759
 8     32.167%      5.063/5.174/5.256/0.056        0.310/0.311/0.313/0.001       68,180,762
 9     31.887%      5.523/5.589/5.643/0.034        0.304/0.308/0.309/0.001       67,586,442

 avg1  40.847%                        2.338                          0.297
 avg2  45.386%                        2.597                          0.330
 tot                                701.305                         88.968      865,784,459

@ccawley2011 ccawley2011 force-pushed the may-alias branch 2 times, most recently from 820046b to f295cca Compare March 6, 2024 15:05
@nmoinvaz nmoinvaz removed the Rebase needed Please do a 'git rebase develop yourbranch' label Mar 6, 2024
@Dead2
Copy link
Member

Dead2 commented Mar 6, 2024

So, we have established that new gcc and msvc (both on x86-64) do not get regressions with this.
Do we know what compiler versions (and arches) that benefit from this change?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 21, 2024

Walkthrough

This pull request introduces a comprehensive refactoring of memory reading, writing, and comparison functions across multiple architecture-specific files in the zlib-ng library. The primary change is the introduction of a new header file zmemory.h with optimized inline functions for reading, writing, and comparing memory at 2, 4, and 8-byte granularities. These functions replace existing memcpy calls and provide a more efficient and architecture-aware approach to memory operations, with special handling for unaligned memory access.

Changes

File Change Summary
zmemory.h New header file with inline memory read/write/compare functions for 2, 4, and 8-byte operations
arch/*/chunkset_*.c Replaced memcpy with zng_memread_* functions in chunk memory setting functions
arch/*/compare256_*.c Updated header inclusion from zutil_p.h to zmemory.h
deflate.h, deflate_quick.c Replaced output and memory writing functions with zng_memwrite_*
match_tpl.h, insert_string_tpl.h Optimized memory reading for hash calculations and matching
win32/Makefile.* Updated object file dependencies to include zmemory.h

Suggested labels

cleanup, Architecture

Suggested reviewers

  • KungFuJesus
  • nmoinvaz
  • mtl1979
  • Dead2

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 2

🧹 Nitpick comments (14)
zmemory.h (4)

23-32: Review performance impact of zng_memread_4
Using a packed struct can be efficient if the compiler inlines these reads, but in tight loops or for large-scale usage, it may be worth benchmarking against an optimized memcpy or compiler intrinsics. Consider a fallback or specialized approach for non-GCC compilers.


54-61: Possible code duplication in zng_memwrite_4
The logic here is nearly identical to zng_memwrite_2, except for the data size. Consider templating or macro expansion to reduce code duplication for read/write pairs.


75-81: Check UNALIGNED_OK logic in zng_memcmp_2
You’ve maintained a fallback to memcmp if neither HAVE_MAY_ALIAS nor UNALIGNED_OK is defined. However, there is a risk that compilers might still insert unaligned reads. Consider making this logic more explicit.


83-89: Inconsistent usage of UNALIGNED_OK in zng_memcmp_4
Unlike zng_memcmp_2, which allows UNALIGNED_OK, consider whether your platform can handle 4-byte unaligned reads or if separate macros (e.g., UNALIGNED64_OK vs. UNALIGNED32_OK) are needed for future clarity.

deflate_quick.c (1)

21-21: Validate that zmemory.h is necessary in this file
Including "zmemory.h" here suggests usage of the new memory functions. Although your current usage relies on zng_memcmp_2, consider whether inlined definitions from zmemory.h might cause code bloat if included in too many places.

match_tpl.h (2)

78-79: Use of zng_memread_8 for scan_start and scan_end
Reading 8 bytes at a time can be beneficial for searching long matches, but it may be overkill if the typical match length is shorter. Evaluate performance on smaller or embedded platforms.


157-158: Use of zng_memcmp_8 for large comparisons
This is an improvement for performance, but confirm that older toolchains do not miscompile it.

arch/generic/chunkset_c.c (3)

6-6: Consider narrower include scope
You only appear to use zng_memread_* and zng_memwrite_* in a few functions here. If build times matter, consider grouping these functions or limiting includes.


21-21: 8-byte chunk setting
Similarly, using zng_memread_8 can speed up chunk transfers. Evaluate on 32-bit platforms if alignment is suboptimal or if fallback is needed.


25-25: loadchunk usage
This function is straightforward, but ensure it’s not called in performance-critical hotspots that rely on potential compiler auto-vectorization or intrinsics.

compare256_rle.h (1)

78-84: Consider caching the memory read result

The 32-bit comparison could benefit from caching the result of zng_memread_4 if the compiler doesn't optimize it automatically.

Consider this alternative implementation:

-    src0_cmp = zng_memread_2(src0);
-    sv = ((uint32_t)src0_cmp << 16) | src0_cmp;
+    uint32_t src0_val = zng_memread_4(src0);
+    sv = src0_val;
arch/x86/chunkset_avx2.c (1)

23-23: Excellent optimization for AVX2 memory operations

The implementation efficiently combines unaligned memory reads with AVX2 vector operations. The use of zng_memread_* functions with _mm256_set1_epi* is an optimal approach for loading and broadcasting values.

This implementation:

  1. Eliminates temporary variables
  2. Reduces memory operations
  3. Leverages AVX2's efficient broadcast instructions

Also applies to: 27-27, 31-31

inflate_p.h (1)

178-181: Consider tracking the TODO comment as a separate issue.

The comment identifies a potential performance optimization opportunity regarding memcpy inlining on architectures with strict memory alignment.

Would you like me to create a GitHub issue to track this TODO for future optimization?

arch/x86/chunkset_avx512.c (1)

37-45: Consider adding performance benchmarks

While the changes look good from a technical perspective, it would be valuable to have performance benchmarks specifically for these chunked memory operations across different architectures.

Would you like me to help create a benchmark suite for these operations?

🛑 Comments failed to post (2)
zmemory.h (1)

45-52: ⚠️ Potential issue

Check for atomicity needs in zng_memwrite_2
This function writes 16 bits using direct assignment to a packed struct if HAVE_MAY_ALIAS is defined. If the surrounding code relies on atomic 16-bit writes, confirm whether the hardware and compiler guarantee atomicity here.

arch/generic/chunkset_c.c (1)

29-29: ⚠️ Potential issue

storechunk atomicity
zng_memwrite_8 is used for storing chunks. For concurrency or lock-free data structures, confirm that partial writes are not an issue.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
zmemory.h (2)

72-81: zng_memcmp_2 function might benefit from naming clarifications.
The function returns an int32_t but does a boolean check (inequality). The usage is correct, but the name “memcmp_2” could imply returning a memcmp-style result rather than a boolean. Consider clarifying the docs if future maintainers might be confused.


83-89: zng_memcmp_4 function uses the same approach as zng_memcmp_2.
No correctness issues.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d28c71a and 2f4631d.

📒 Files selected for processing (25)
  • arch/arm/chunkset_neon.c (2 hunks)
  • arch/arm/compare256_neon.c (1 hunks)
  • arch/generic/Makefile.in (1 hunks)
  • arch/generic/chunkset_c.c (2 hunks)
  • arch/generic/compare256_c.c (3 hunks)
  • arch/power/chunkset_power8.c (2 hunks)
  • arch/power/compare256_power9.c (1 hunks)
  • arch/riscv/compare256_rvv.c (1 hunks)
  • arch/x86/chunkset_avx2.c (2 hunks)
  • arch/x86/chunkset_avx512.c (2 hunks)
  • arch/x86/chunkset_sse2.c (2 hunks)
  • arch/x86/chunkset_ssse3.c (2 hunks)
  • arch/x86/compare256_avx2.c (1 hunks)
  • arch/x86/compare256_sse2.c (1 hunks)
  • compare256_rle.h (4 hunks)
  • deflate.h (6 hunks)
  • deflate_quick.c (1 hunks)
  • inflate_p.h (3 hunks)
  • insert_string_tpl.h (2 hunks)
  • match_tpl.h (4 hunks)
  • win32/Makefile.a64 (2 hunks)
  • win32/Makefile.arm (2 hunks)
  • win32/Makefile.msc (2 hunks)
  • zmemory.h (1 hunks)
  • zutil_p.h (0 hunks)
💤 Files with no reviewable changes (1)
  • zutil_p.h
🚧 Files skipped from review as they are similar to previous changes (22)
  • inflate_p.h
  • win32/Makefile.msc
  • arch/x86/compare256_sse2.c
  • arch/x86/chunkset_avx512.c
  • insert_string_tpl.h
  • arch/x86/chunkset_sse2.c
  • arch/generic/Makefile.in
  • arch/arm/chunkset_neon.c
  • deflate_quick.c
  • deflate.h
  • arch/power/compare256_power9.c
  • win32/Makefile.a64
  • arch/x86/chunkset_ssse3.c
  • win32/Makefile.arm
  • arch/riscv/compare256_rvv.c
  • arch/arm/compare256_neon.c
  • match_tpl.h
  • arch/x86/chunkset_avx2.c
  • arch/x86/compare256_avx2.c
  • arch/generic/chunkset_c.c
  • compare256_rle.h
  • arch/power/chunkset_power8.c
🔇 Additional comments (14)
zmemory.h (11)

1-3: File header looks solid and provides necessary context.
Great job including copyright and distribution references.


5-7: Guard macros appear correct and well-named.
Using "_ZMEMORY_H" is consistent and helps avoid collisions.


12-21: zng_memread_2 function is clear and concise.
Utilizing a packed, may_alias struct is a standard approach for unaligned loads. The fallback to memcpy for compilers without may_alias ensures portability.


23-32: zng_memread_4 function mirrors zng_memread_2 pattern.
This consistent approach is easy to maintain across multiple data sizes.


34-43: zng_memread_8 function continues the same pattern.
No issues spotted. Adheres to the PR’s objective of optimizing unaligned reads.


45-52: zng_memwrite_2 function is straightforward.
Ensuring two-byte writes handle alignment carefully is beneficial.


54-61: zng_memwrite_4 function.
Again, consistent with the prior approach. No concerns.


63-70: zng_memwrite_8 function mirrors the smaller-size approach.
Everything looks good.


91-97: zng_memcmp_8 function remains consistent with the smaller-size versions.
No issues. Good unaligned handling.


99-99: End of file macro
Everything looks properly closed.


8-11: Conditional check for GCC version.
Defining HAVE_MAY_ALIAS meets the PR objective of utilizing GCC’s may_alias attribute. Ensure that the build system sets this macro consistently.

✅ Verification successful

Let me verify if HAVE_MAY_ALIAS is used consistently across the codebase.


Let me check the context where HAVE_MAY_ALIAS is used in zmemory.h to ensure it's used consistently with the GCC version check.


HAVE_MAY_ALIAS macro is correctly defined and consistently used
The macro is properly defined based on GCC version (>= 4) and is consistently used throughout zmemory.h for optimized memory operations. It enables the use of the __may_alias__ attribute for unaligned memory access when available, with proper fallbacks to memcpy/memcmp when the attribute is not supported. The implementation is clean and follows good practices for compiler-specific optimizations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Confirm that HAVE_MAY_ALIAS is set only when __GNUC__ >= 4
rg -A 3 '__GNUC__.*>= 4'

Length of output: 140


Script:

#!/bin/bash
# Check for other occurrences of HAVE_MAY_ALIAS
rg "HAVE_MAY_ALIAS"

# Check for other GCC version checks
rg "__GNUC__"

Length of output: 1786


Script:

#!/bin/bash
# Show the content of zmemory.h to see how HAVE_MAY_ALIAS is used
cat zmemory.h

Length of output: 3039

arch/generic/compare256_c.c (3)

7-7: Switch to zmemory.h.
This aligns with the new unaligned memory approach and centralizes memory operations. Good practice to reduce duplication.


110-111: Use of zng_memread_4 in compare256_unaligned_32_static.
This is consistent with the PR’s intent to replace memcpy with direct unaligned reads. Potential performance benefits on platforms without native unaligned access.


154-155: Use of zng_memread_8 in compare256_unaligned_64_static.
Similar pattern for 64-bit reads. Looks correct and consistent.

Copy link
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me.

@Dead2 Dead2 merged commit d7e121e into zlib-ng:develop Dec 24, 2024
145 of 150 checks passed
@Dead2 Dead2 mentioned this pull request Dec 31, 2024
@ccawley2011 ccawley2011 deleted the may-alias branch July 9, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants