-
-
Notifications
You must be signed in to change notification settings - Fork 308
Fix building without SSE4.2 inside Docker container. #1531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## develop #1531 +/- ##
===========================================
- Coverage 83.90% 83.89% -0.01%
===========================================
Files 132 132
Lines 10842 10842
Branches 2793 2793
===========================================
- Hits 9097 9096 -1
- Misses 1046 1047 +1
Partials 699 699
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
|
I think I'd prefer it if configure was changed to appending native instead of replacing the instruction set enabling compiler parameters. That would make it behave closer to the same as cmake and at the same time fix this problem. And as an aside to this; The whole native support could be improved to possibly even bypass functable and such at some point, but that is a pretty big and invasive change. |
That wouldn't fix this specific issue which is basically caused by oversimplified preprocessor defines for SSE4 tests... Basically compiler can use inline assembly for SSE4, but our code has no inline assembly for SSSE3 instructions or older. |
|
So the code itself only actually uses ssse3 instructions, but it's intentionally gated to use sse4.2 because the checksum + copy routine is actually slower on pre sse4.2 CPUs due to the misalignment penalty prior to nehalem. The ssse3 checksum version is the same but it does aligning scalar sums before doing purely aligned loads. The copy is then handled separately. |
|
Ah, there are two different issues then. But I still think this one is a configure/cmake issue. There are no cpus supporting SSE4.2 that do not also support SSSE3, and enabling SSE4.2 ( I think this part of configure should be changed, as it is kind of backwards today: I think that code snippet should instead set: |
This would be a great comment to add to the code actually. |
This isn't about supporting real CPUs... This is only about working around broken |
|
I know this is not about cpu support, this is about the way we do detection, and for SSE4.2 that is done in a strange way that in this case breaks builds and assumptions. I don't see this as being a huge rewrite, it is rather a small number of defines that need renaming and a few ifdef checks that need to be changed. It would in fact improve code quality. |
I have a real job to focus on currently... When I joined zlib-ng originally, I had been unemployed for over 10 years, so any project was good to keep my skills up-to-date. I still intend to make small patches once in a while, but I leave bigger ones to people who have more spare time. |
I can make a pull request for that if it'd add clarity. I think the commit messages said something of the same effect. |
|
This is not needed after #1542 |
See #1529.