Use GCC's may_alias attribute for unaligned memory access #1548

ccawley2011 · 2023-07-27T20:25:12Z

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it. Using the may_alias attribute ensures that the code for reading and writing one byte at a time is inlined, and should also handle cases where unaligned memory access is allowed (although checking UNALIGNED_OK works better in that regard).

Summary by CodeRabbit

New Features
- Introduced new memory reading and writing functions (zng_memread_*, zng_memwrite_*) for optimized memory operations.
- Added a new header file, zmemory.h, which consolidates memory management functions.
Bug Fixes
- Replaced memcpy calls with direct memory operations to enhance performance and reduce overhead.
Documentation
- Updated comments in various files to clarify memory handling and performance implications.
Chores
- Updated object file dependencies across multiple Makefiles to reflect the new header file usage.

codecov · 2023-07-27T20:29:08Z

Codecov Report

Attention: Patch coverage is 95.34884% with 4 lines in your changes missing coverage. Please review.

Project coverage is 82.41%. Comparing base (04d1b75) to head (2f4631d).
Report is 6 commits behind head on develop.

Files with missing lines	Patch %	Lines
arch/x86/chunkset_avx512.c	0.00%	3 Missing ⚠️
arch/generic/chunkset_c.c	80.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1548      +/-   ##
===========================================
+ Coverage    82.10%   82.41%   +0.30%     
===========================================
  Files          136      137       +1     
  Lines        10969    10956      -13     
  Branches      2716     2772      +56     
===========================================
+ Hits          9006     9029      +23     
+ Misses        1266     1237      -29     
+ Partials       697      690       -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

KungFuJesus · 2023-07-27T20:41:29Z

Ah cool, this might actually improve the purely scalar performance I saw testing this on a Sun Fire V240 (depressingly slower than stock zlib). I'll bet the extra function calls here per aliasing load were a big part of that.

mtl1979 · 2023-07-28T00:18:31Z

Fails a lot of CI tests, need to investigate why...

nmoinvaz · 2023-08-05T01:14:32Z

zmemory.h

+
+static inline uint32_t zng_memread_4(const void *ptr) {
+#if defined(UNALIGNED_OK)
+    return *(const uint32_t *)ptr;


I did a ton of work to avoid this kind of thing...
Compiler will convert memcpy to unaligned access if it is supported.

These UNALIGNED_OK branches should be removed. So there is only HAVE_MAY_ALIAS and the memcpy path.

I agree, UNALIGNED_OK branch should be removed.
We for sure don't want to default to that, if it was a new default-off setting like FORCE_UNALIGNED or something, then there would at least be room for discussion.

The HAVE_MAY_ALIAS branch is interesting, I wonder what kind of performance that gets compared to memcpy on the various platforms/compilers. If we do want to use that, detection would preferably be in configure/CMake for cleaner and wider compiler support, or at the very least moved to zbuild.h.

nmoinvaz · 2023-08-05T01:17:10Z

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it.

Do you have a Compiler Explorer example where this is the case?

phprus · 2023-08-06T09:13:38Z

Does it revert PR #1309?
What modern and supported architectures and compilers are having this issue?

ccawley2011 · 2023-09-13T14:32:39Z

On platforms that don't allow unaligned memory access, calls to memcpy don't always get inlined in cases where they would on platforms with it.

Do you have a Compiler Explorer example where this is the case?

This example showcases the difference on ARM with modern GCC when unaligned memory access is and isn't available. It also covers SPARC as well, where memcpy is inlined for 4 byte reads but not 8 byte reads. It doesn't appear to be an issue on every architecture that requires aligned memory access, but it does occur on the architectures demonstrated. I also haven't investigated how compilers approach 16 and 32 byte reads, which would heavily impact chunkcopy_safe if memcpy doesn't get inlined.

Compiler Explorer link

nmoinvaz · 2023-09-17T22:38:47Z

zmemory.h

+
+static inline uint32_t zng_memread_4(const void *ptr) {
+#if defined(UNALIGNED_OK)
+    return *(const uint32_t *)ptr;


These UNALIGNED_OK branches should be removed. So there is only HAVE_MAY_ALIAS and the memcpy path.

nmoinvaz · 2023-09-17T22:41:03Z

zmemory.h

+}
+
+static inline int32_t zng_memcmp_4(const void *src0, const void *src1) {
+#if defined(UNALIGNED_OK) || defined(HAVE_MAY_ALIAS)


Should remove the defined(UNALIGNED_OK) from all these. Only HAVE_MAY_ALIAS. We error on the side of "the compiler knows best" about converting memcpy to unaligned access. This is better than undefined behavior that can occur. There are other GitHub PRs related to this including #1100 that might be worth reading.

ccawley2011 · 2024-02-16T18:14:14Z

OK, I've rebased the branch and removed the special cases for UNALIGNED_OK and UNALIGNED64_OK.

nmoinvaz · 2024-02-21T18:52:55Z

zmemory.h

+#if defined(HAVE_MAY_ALIAS)
+    return zng_memread_2(src0) != zng_memread_2(src1);
+#else
+    return memcmp(src0, src1, 2);


The original comment suggests it should be using memcpy and not memcmp.

Perhaps it should be: return zng_memread_2(src0) != zng_memread_2(src1); no matter what.

The code now does return zng_memread_2(src0) != zng_memread_2(src1); for cases where UNALIGNED_OK is defined, which should cover MSVC.

I'm not sure about using it for all cases since the worst case scenario there would be to emit two memcpy calls instead of one memcmp call, but since this PR covers the most common compilers (GCC >= 4 (and compatible) and MSVC), this should be OK for the time being.

zmemory.h

Dead2 · 2024-02-23T12:46:29Z

I'll have to do some in-depth testing of this, but I am leaving on vacation now with very limited access, so it'll unfortunately have to wait for a while.

Dead2 · 2024-03-05T13:38:14Z

X86-64 i9900K, GCC 13.2

Develop Mar 5 `af494fc`

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.104/0.108/0.111/0.002        0.033/0.036/0.037/0.001        8,526,745
 2     43.871%      0.197/0.200/0.202/0.001        0.033/0.036/0.038/0.001        6,903,702
 3     42.388%      0.245/0.248/0.250/0.001        0.032/0.035/0.036/0.002        6,670,239
 4     41.647%      0.272/0.277/0.280/0.002        0.032/0.034/0.035/0.001        6,553,746
 5     41.216%      0.299/0.306/0.309/0.002        0.030/0.034/0.035/0.001        6,485,936
 6     41.038%      0.347/0.354/0.357/0.002        0.030/0.034/0.036/0.002        6,457,827
 7     40.778%      0.493/0.501/0.505/0.003        0.030/0.034/0.035/0.001        6,416,941
 8     40.704%      0.610/0.615/0.620/0.003        0.032/0.034/0.035/0.001        6,405,249
 9     40.409%      0.899/0.907/0.912/0.003        0.030/0.033/0.035/0.002        6,358,951

 avg1  42.915%                        0.391                          0.034
 tot                                105.495                          9.250       60,779,336

   text    data     bss     dec     hex filename
 136798    1312       8  138118   21b86 libz-ng.so.2

PR 1548

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.185%      0.105/0.109/0.111/0.001        0.034/0.035/0.037/0.001        8,526,745
 2     43.871%      0.198/0.200/0.201/0.001        0.031/0.036/0.038/0.002        6,903,702
 3     42.388%      0.241/0.247/0.250/0.002        0.032/0.034/0.035/0.001        6,670,239
 4     41.647%      0.274/0.277/0.279/0.001        0.031/0.034/0.035/0.001        6,553,746
 5     41.216%      0.301/0.307/0.311/0.003        0.031/0.034/0.035/0.001        6,485,936
 6     41.038%      0.350/0.354/0.358/0.002        0.029/0.033/0.035/0.001        6,457,827
 7     40.778%      0.493/0.500/0.503/0.002        0.031/0.034/0.036/0.001        6,416,941
 8     40.704%      0.608/0.616/0.621/0.003        0.030/0.033/0.034/0.001        6,405,249
 9     40.409%      0.899/0.907/0.912/0.004        0.030/0.033/0.035/0.001        6,358,951

 avg1  42.915%                        0.391                          0.034
 tot                                105.539                          9.192       60,779,336

   text    data     bss     dec     hex filename
 136798    1312       8  138118   21b86 libz-ng.so.2

No changes noticeable.

Anyone able to do MSVC builds and benchmarks?

nmoinvaz · 2024-03-05T17:39:09Z

This PR must be rebased. There are some conflicts in the nmake files.

nmoinvaz · 2024-03-05T19:52:15Z

silesia.tar

MSVC 19.38.33134.0

DEVELOP `af494fc`

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-develop/Release/minigzip.exe Size: 105,472 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev   Compressed size
 0    100.008%      0.086/0.087/0.088/0.001       211,973,953
 1     44.409%      0.696/0.699/0.700/0.001        94,127,497
 2     35.519%      1.101/1.105/1.108/0.002        75,286,322
 3     33.882%      1.472/1.477/1.481/0.003        71,815,206
 4     33.175%      1.705/1.711/1.714/0.002        70,317,229
 5     32.661%      1.946/1.949/1.952/0.002        69,228,146
 6     32.507%      2.325/2.333/2.337/0.003        68,902,143
 7     32.255%      3.091/3.098/3.106/0.005        68,366,759
 8     32.167%      4.974/5.027/5.094/0.030        68,180,762
 9     31.887%      5.402/5.447/5.522/0.032        67,586,442

 avg1  40.847%                        2.293
 avg2  45.386%                        2.548
 tot                                687.972       865,784,459

PR #1548 rebased on top of DEVELOP

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-1548/Release/minigzip.exe Size: 104,960 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev   Compressed size
 0    100.008%      0.085/0.087/0.087/0.000       211,973,953
 1     44.409%      0.694/0.698/0.699/0.001        94,127,497
 2     35.519%      1.099/1.102/1.104/0.001        75,286,322
 3     33.882%      1.472/1.476/1.479/0.002        71,815,206
 4     33.175%      1.708/1.711/1.713/0.002        70,317,229
 5     32.661%      1.943/1.947/1.950/0.002        69,228,146
 6     32.507%      2.323/2.329/2.333/0.003        68,902,143
 7     32.255%      3.091/3.097/3.100/0.002        68,366,759
 8     32.167%      4.965/5.018/5.054/0.028        68,180,762
 9     31.887%      5.398/5.436/5.471/0.020        67,586,442

 avg1  40.847%                        2.290
 avg2  45.386%                        2.544
 tot                                686.977       865,784,459

I forgot to turn on decompression in the tests so I will check that out later. It is interesting to see a reduction in code size even though it should compile to the same thing.

nmoinvaz · 2024-03-05T23:42:23Z

With decompression

silesia.tar

MSVC 19.38.33134.0

DEVELOP `af494fc`

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-develop/Release/minigzip.exe Size: 105,472 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.087/0.089/0.089/0.001        0.084/0.085/0.086/0.001      211,973,953
 1     44.409%      0.702/0.706/0.709/0.002        0.341/0.345/0.346/0.001       94,127,497
 2     35.519%      1.111/1.116/1.120/0.002        0.336/0.339/0.341/0.001       75,286,322
 3     33.882%      1.486/1.492/1.500/0.004        0.320/0.324/0.325/0.001       71,815,206
 4     33.175%      1.724/1.733/1.754/0.007        0.314/0.317/0.318/0.001       70,317,229
 5     32.661%      1.957/1.973/1.992/0.009        0.310/0.313/0.314/0.001       69,228,146
 6     32.507%      2.346/2.367/2.405/0.018        0.308/0.312/0.314/0.001       68,902,143
 7     32.255%      3.115/3.142/3.182/0.020        0.307/0.311/0.313/0.001       68,366,759
 8     32.167%      5.091/5.179/5.234/0.041        0.309/0.312/0.313/0.001       68,180,762
 9     31.887%      5.488/5.575/5.633/0.031        0.304/0.307/0.309/0.001       67,586,442

 avg1  40.847%                        2.337                          0.296
 avg2  45.386%                        2.597                          0.329
 tot                                701.133                         88.939      865,784,459

PR #1548 rebased on top of DEVELOP

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
Tool: ../zlib-ng/build-1548/Release/minigzip.exe Size: 104,960 B
Levels: 0-9
Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 0    100.008%      0.087/0.089/0.089/0.001        0.084/0.086/0.086/0.001      211,973,953
 1     44.409%      0.697/0.705/0.709/0.004        0.343/0.346/0.348/0.001       94,127,497
 2     35.519%      1.104/1.121/1.139/0.009        0.334/0.339/0.342/0.002       75,286,322
 3     33.882%      1.479/1.498/1.520/0.012        0.321/0.325/0.327/0.002       71,815,206
 4     33.175%      1.726/1.733/1.743/0.005        0.315/0.317/0.319/0.001       70,317,229
 5     32.661%      1.959/1.970/1.985/0.006        0.309/0.313/0.315/0.001       69,228,146
 6     32.507%      2.345/2.362/2.392/0.012        0.309/0.311/0.313/0.001       68,902,143
 7     32.255%      3.117/3.138/3.180/0.018        0.309/0.311/0.313/0.001       68,366,759
 8     32.167%      5.063/5.174/5.256/0.056        0.310/0.311/0.313/0.001       68,180,762
 9     31.887%      5.523/5.589/5.643/0.034        0.304/0.308/0.309/0.001       67,586,442

 avg1  40.847%                        2.338                          0.297
 avg2  45.386%                        2.597                          0.330
 tot                                701.305                         88.968      865,784,459

Dead2 · 2024-03-06T22:31:05Z

So, we have established that new gcc and msvc (both on x86-64) do not get regressions with this.
Do we know what compiler versions (and arches) that benefit from this change?

coderabbitai · 2024-12-21T18:18:15Z

Walkthrough

This pull request introduces a comprehensive refactoring of memory reading, writing, and comparison functions across multiple architecture-specific files in the zlib-ng library. The primary change is the introduction of a new header file zmemory.h with optimized inline functions for reading, writing, and comparing memory at 2, 4, and 8-byte granularities. These functions replace existing memcpy calls and provide a more efficient and architecture-aware approach to memory operations, with special handling for unaligned memory access.

Changes

File	Change Summary
`zmemory.h`	New header file with inline memory read/write/compare functions for 2, 4, and 8-byte operations
`arch//chunkset_.c`	Replaced `memcpy` with `zng_memread_*` functions in chunk memory setting functions
`arch//compare256_.c`	Updated header inclusion from `zutil_p.h` to `zmemory.h`
`deflate.h`, `deflate_quick.c`	Replaced output and memory writing functions with `zng_memwrite_*`
`match_tpl.h`, `insert_string_tpl.h`	Optimized memory reading for hash calculations and matching
`win32/Makefile.*`	Updated object file dependencies to include `zmemory.h`

Suggested labels

cleanup, Architecture

Suggested reviewers

KungFuJesus
nmoinvaz
mtl1979
Dead2

Possibly related PRs

Remove UNALIGNED_OK checks #1828: Removes UNALIGNED_OK checks, related to unaligned memory access handling
Make big endians first class citizens again #1831: Introduces unaligned comparison functions without BYTE_ORDER dependency

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 2

🧹 Nitpick comments (14)

zmemory.h (4)

23-32: Review performance impact of zng_memread_4
Using a packed struct can be efficient if the compiler inlines these reads, but in tight loops or for large-scale usage, it may be worth benchmarking against an optimized memcpy or compiler intrinsics. Consider a fallback or specialized approach for non-GCC compilers.

54-61: Possible code duplication in zng_memwrite_4
The logic here is nearly identical to zng_memwrite_2, except for the data size. Consider templating or macro expansion to reduce code duplication for read/write pairs.

75-81: Check UNALIGNED_OK logic in zng_memcmp_2
You’ve maintained a fallback to memcmp if neither HAVE_MAY_ALIAS nor UNALIGNED_OK is defined. However, there is a risk that compilers might still insert unaligned reads. Consider making this logic more explicit.

83-89: Inconsistent usage of UNALIGNED_OK in zng_memcmp_4
Unlike zng_memcmp_2, which allows UNALIGNED_OK, consider whether your platform can handle 4-byte unaligned reads or if separate macros (e.g., UNALIGNED64_OK vs. UNALIGNED32_OK) are needed for future clarity.

deflate_quick.c (1)

21-21: Validate that zmemory.h is necessary in this file
Including "zmemory.h" here suggests usage of the new memory functions. Although your current usage relies on zng_memcmp_2, consider whether inlined definitions from zmemory.h might cause code bloat if included in too many places.

match_tpl.h (2)

78-79: Use of zng_memread_8 for scan_start and scan_end
Reading 8 bytes at a time can be beneficial for searching long matches, but it may be overkill if the typical match length is shorter. Evaluate performance on smaller or embedded platforms.

157-158: Use of zng_memcmp_8 for large comparisons
This is an improvement for performance, but confirm that older toolchains do not miscompile it.

arch/generic/chunkset_c.c (3)

6-6: Consider narrower include scope
You only appear to use zng_memread_* and zng_memwrite_* in a few functions here. If build times matter, consider grouping these functions or limiting includes.

21-21: 8-byte chunk setting
Similarly, using zng_memread_8 can speed up chunk transfers. Evaluate on 32-bit platforms if alignment is suboptimal or if fallback is needed.

25-25: loadchunk usage
This function is straightforward, but ensure it’s not called in performance-critical hotspots that rely on potential compiler auto-vectorization or intrinsics.
compare256_rle.h (1)
78-84: Consider caching the memory read result

The 32-bit comparison could benefit from caching the result of zng_memread_4 if the compiler doesn't optimize it automatically.

Consider this alternative implementation:
-    src0_cmp = zng_memread_2(src0);
-    sv = ((uint32_t)src0_cmp << 16) | src0_cmp;
+    uint32_t src0_val = zng_memread_4(src0);
+    sv = src0_val;
arch/x86/chunkset_avx2.c (1)

23-23: Excellent optimization for AVX2 memory operations

The implementation efficiently combines unaligned memory reads with AVX2 vector operations. The use of zng_memread_* functions with _mm256_set1_epi* is an optimal approach for loading and broadcasting values.

This implementation:

Eliminates temporary variables

Reduces memory operations

Leverages AVX2's efficient broadcast instructions

Also applies to: 27-27, 31-31

inflate_p.h (1)

178-181: Consider tracking the TODO comment as a separate issue.

The comment identifies a potential performance optimization opportunity regarding memcpy inlining on architectures with strict memory alignment.

Would you like me to create a GitHub issue to track this TODO for future optimization?

arch/x86/chunkset_avx512.c (1)

37-45: Consider adding performance benchmarks

While the changes look good from a technical perspective, it would be valuable to have performance benchmarks specifically for these chunked memory operations across different architectures.

Would you like me to help create a benchmark suite for these operations?

🛑 Comments failed to post (2)

zmemory.h (1)

45-52: ⚠️ Potential issue

Check for atomicity needs in zng_memwrite_2
This function writes 16 bits using direct assignment to a packed struct if HAVE_MAY_ALIAS is defined. If the surrounding code relies on atomic 16-bit writes, confirm whether the hardware and compiler guarantee atomicity here.

arch/generic/chunkset_c.c (1)

29-29: ⚠️ Potential issue

storechunk atomicity
zng_memwrite_8 is used for storing chunks. For concurrency or lock-free data structures, confirm that partial writes are not an issue.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

zmemory.h (2)

72-81: zng_memcmp_2 function might benefit from naming clarifications.
The function returns an int32_t but does a boolean check (inequality). The usage is correct, but the name “memcmp_2” could imply returning a memcmp-style result rather than a boolean. Consider clarifying the docs if future maintainers might be confused.

83-89: zng_memcmp_4 function uses the same approach as zng_memcmp_2.
No correctness issues.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d28c71a and 2f4631d.

📒 Files selected for processing (25)

arch/arm/chunkset_neon.c (2 hunks)
arch/arm/compare256_neon.c (1 hunks)
arch/generic/Makefile.in (1 hunks)
arch/generic/chunkset_c.c (2 hunks)
arch/generic/compare256_c.c (3 hunks)
arch/power/chunkset_power8.c (2 hunks)
arch/power/compare256_power9.c (1 hunks)
arch/riscv/compare256_rvv.c (1 hunks)
arch/x86/chunkset_avx2.c (2 hunks)
arch/x86/chunkset_avx512.c (2 hunks)
arch/x86/chunkset_sse2.c (2 hunks)
arch/x86/chunkset_ssse3.c (2 hunks)
arch/x86/compare256_avx2.c (1 hunks)
arch/x86/compare256_sse2.c (1 hunks)
compare256_rle.h (4 hunks)
deflate.h (6 hunks)
deflate_quick.c (1 hunks)
inflate_p.h (3 hunks)
insert_string_tpl.h (2 hunks)
match_tpl.h (4 hunks)
win32/Makefile.a64 (2 hunks)
win32/Makefile.arm (2 hunks)
win32/Makefile.msc (2 hunks)
zmemory.h (1 hunks)
zutil_p.h (0 hunks)

💤 Files with no reviewable changes (1)

zutil_p.h

🚧 Files skipped from review as they are similar to previous changes (22)

inflate_p.h
win32/Makefile.msc
arch/x86/compare256_sse2.c
arch/x86/chunkset_avx512.c
insert_string_tpl.h
arch/x86/chunkset_sse2.c
arch/generic/Makefile.in
arch/arm/chunkset_neon.c
deflate_quick.c
deflate.h
arch/power/compare256_power9.c
win32/Makefile.a64
arch/x86/chunkset_ssse3.c
win32/Makefile.arm
arch/riscv/compare256_rvv.c
arch/arm/compare256_neon.c
match_tpl.h
arch/x86/chunkset_avx2.c
arch/x86/compare256_avx2.c
arch/generic/chunkset_c.c
compare256_rle.h
arch/power/chunkset_power8.c

🔇 Additional comments (14)

zmemory.h (11)

1-3: File header looks solid and provides necessary context.
Great job including copyright and distribution references.

5-7: Guard macros appear correct and well-named.
Using "_ZMEMORY_H" is consistent and helps avoid collisions.

12-21: zng_memread_2 function is clear and concise.
Utilizing a packed, may_alias struct is a standard approach for unaligned loads. The fallback to memcpy for compilers without may_alias ensures portability.

23-32: zng_memread_4 function mirrors zng_memread_2 pattern.
This consistent approach is easy to maintain across multiple data sizes.

34-43: zng_memread_8 function continues the same pattern.
No issues spotted. Adheres to the PR’s objective of optimizing unaligned reads.

45-52: zng_memwrite_2 function is straightforward.
Ensuring two-byte writes handle alignment carefully is beneficial.

54-61: zng_memwrite_4 function.
Again, consistent with the prior approach. No concerns.

63-70: zng_memwrite_8 function mirrors the smaller-size approach.
Everything looks good.

91-97: zng_memcmp_8 function remains consistent with the smaller-size versions.
No issues. Good unaligned handling.

99-99: End of file macro
Everything looks properly closed.

8-11: Conditional check for GCC version.
Defining HAVE_MAY_ALIAS meets the PR objective of utilizing GCC’s may_alias attribute. Ensure that the build system sets this macro consistently.

✅ Verification successful

Let me verify if HAVE_MAY_ALIAS is used consistently across the codebase.

Let me check the context where HAVE_MAY_ALIAS is used in zmemory.h to ensure it's used consistently with the GCC version check.

HAVE_MAY_ALIAS macro is correctly defined and consistently used
The macro is properly defined based on GCC version (>= 4) and is consistently used throughout zmemory.h for optimized memory operations. It enables the use of the __may_alias__ attribute for unaligned memory access when available, with proper fallbacks to memcpy/memcmp when the attribute is not supported. The implementation is clean and follows good practices for compiler-specific optimizations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Confirm that HAVE_MAY_ALIAS is set only when __GNUC__ >= 4
rg -A 3 '__GNUC__.*>= 4'

Length of output: 140

Script:

#!/bin/bash
# Check for other occurrences of HAVE_MAY_ALIAS
rg "HAVE_MAY_ALIAS"

# Check for other GCC version checks
rg "__GNUC__"

Length of output: 1786

Script:

#!/bin/bash
# Show the content of zmemory.h to see how HAVE_MAY_ALIAS is used
cat zmemory.h

Length of output: 3039

arch/generic/compare256_c.c (3)

7-7: Switch to zmemory.h.
This aligns with the new unaligned memory approach and centralizes memory operations. Good practice to reduce duplication.

110-111: Use of zng_memread_4 in compare256_unaligned_32_static.
This is consistent with the PR’s intent to replace memcpy with direct unaligned reads. Potential performance benefits on platforms without native unaligned access.

154-155: Use of zng_memread_8 in compare256_unaligned_64_static.
Similar pattern for 64-bit reads. Looks correct and consistent.

nmoinvaz

Looks ok to me.

ccawley2011 force-pushed the may-alias branch 2 times, most recently from 8737e10 to 6adb2fc Compare July 28, 2023 08:32

nmoinvaz reviewed Aug 5, 2023

View reviewed changes

Dead2 added the information needed label Aug 13, 2023

nmoinvaz requested changes Sep 17, 2023

View reviewed changes

nmoinvaz added optimization Build Env and removed information needed labels Sep 17, 2023

ccawley2011 mentioned this pull request Sep 25, 2023

General optimized chunkset #1568

Merged

ccawley2011 force-pushed the may-alias branch from 6adb2fc to afdacc1 Compare February 16, 2024 18:12

ccawley2011 force-pushed the may-alias branch from afdacc1 to 5a9d9e3 Compare February 16, 2024 18:20

Dead2 added the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 21, 2024

ccawley2011 force-pushed the may-alias branch from 5a9d9e3 to 28509bd Compare February 21, 2024 18:46

nmoinvaz reviewed Feb 21, 2024

View reviewed changes

ccawley2011 force-pushed the may-alias branch from 28509bd to 5dc5794 Compare February 21, 2024 19:56

Dead2 removed the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 21, 2024

nmoinvaz reviewed Feb 21, 2024

View reviewed changes

zmemory.h Outdated Show resolved Hide resolved

ccawley2011 force-pushed the may-alias branch from 5dc5794 to d8da4c7 Compare February 22, 2024 11:37

nmoinvaz approved these changes Feb 22, 2024

View reviewed changes

nmoinvaz added the Rebase needed Please do a 'git rebase develop yourbranch' label Mar 5, 2024

ccawley2011 force-pushed the may-alias branch 2 times, most recently from 820046b to f295cca Compare March 6, 2024 15:05

nmoinvaz removed the Rebase needed Please do a 'git rebase develop yourbranch' label Mar 6, 2024

ccawley2011 mentioned this pull request Dec 21, 2024

Fix unaligned access in ACLE based crc32 #1833

Merged

ccawley2011 force-pushed the may-alias branch from f295cca to d28c71a Compare December 21, 2024 18:18

Use GCC's may_alias attribute for unaligned memory access

2f4631d

ccawley2011 force-pushed the may-alias branch from d28c71a to 2f4631d Compare December 21, 2024 18:22

coderabbitai bot reviewed Dec 21, 2024

View reviewed changes

Dead2 requested review from KungFuJesus and nmoinvaz December 22, 2024 14:59

nmoinvaz approved these changes Dec 22, 2024

View reviewed changes

Dead2 mentioned this pull request Dec 23, 2024

Continued cleanup of old UNALIGNED_OK checks #1830

Merged

Dead2 approved these changes Dec 23, 2024

View reviewed changes

Dead2 merged commit d7e121e into zlib-ng:develop Dec 24, 2024
145 of 150 checks passed

Dead2 mentioned this pull request Dec 31, 2024

2.2.3 Release #1838

Merged

This was referenced Jan 2, 2025

Chorba #1837

Merged

Use arch-specific deflate_quick functions instead of compare256. #1847

Closed

This was referenced Jan 20, 2025

CI changes for Ubuntu 24 #1843

Merged

Disable CRC32-VX Extention for some Clang versions #1852

Merged

coderabbitai bot mentioned this pull request Feb 21, 2025

Explicit SSE2 vectorization of Chorba CRC method #1872

Merged

ccawley2011 deleted the may-alias branch July 9, 2025 18:43

Uh oh!

Use GCC's may_alias attribute for unaligned memory access #1548

Use GCC's may_alias attribute for unaligned memory access #1548

Uh oh!

Conversation

ccawley2011 commented Jul 27, 2023 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

codecov bot commented Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

KungFuJesus commented Jul 27, 2023

Uh oh!

mtl1979 commented Jul 28, 2023

Uh oh!

nmoinvaz Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nmoinvaz Sep 17, 2023

Choose a reason for hiding this comment

Uh oh!

Dead2 Nov 7, 2023

Choose a reason for hiding this comment

Uh oh!

nmoinvaz commented Aug 5, 2023

Uh oh!

phprus commented Aug 6, 2023

Uh oh!

ccawley2011 commented Sep 13, 2023

Uh oh!

nmoinvaz Sep 17, 2023

Choose a reason for hiding this comment

Uh oh!

nmoinvaz Sep 17, 2023

Choose a reason for hiding this comment

Uh oh!

ccawley2011 commented Feb 16, 2024

Uh oh!

nmoinvaz Feb 21, 2024

Choose a reason for hiding this comment

Uh oh!

ccawley2011 Feb 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Dead2 commented Feb 23, 2024

Uh oh!

Dead2 commented Mar 5, 2024

Develop Mar 5 af494fc

PR 1548

Uh oh!

nmoinvaz commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmoinvaz commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

silesia.tar

DEVELOP af494fc

PR #1548 rebased on top of DEVELOP

Uh oh!

nmoinvaz commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

silesia.tar

DEVELOP af494fc

PR #1548 rebased on top of DEVELOP

Uh oh!

Dead2 commented Mar 6, 2024

Uh oh!

coderabbitai bot commented Dec 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Suggested labels

Suggested reviewers

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

ccawley2011 commented Jul 27, 2023 •

edited by coderabbitai bot

Loading

codecov bot commented Jul 27, 2023 •

edited

Loading

nmoinvaz Aug 5, 2023 •

edited

Loading

Develop Mar 5 `af494fc`

nmoinvaz commented Mar 5, 2024 •

edited

Loading

nmoinvaz commented Mar 5, 2024 •

edited

Loading

DEVELOP `af494fc`

nmoinvaz commented Mar 5, 2024 •

edited

Loading

DEVELOP `af494fc`

coderabbitai bot commented Dec 21, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)