-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
ls: Implement locale-aware sorting #8828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
naoNao89
wants to merge
3
commits into
uutils:main
Choose a base branch
from
naoNao89:feat/ls-locale-sorting-only
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+527
−3
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implements locale-aware collation for file name sorting, respecting the LC_COLLATE environment variable. This matches GNU ls behavior. Changes: - Added i18n-collator feature dependency to ls - Initialize ICU collator with default options in uumain() - Replace simple string comparison with locale_cmp() for Sort::Name - Respects LC_ALL > LC_COLLATE > LANG priority order - Falls back to byte comparison for C locale Behavior: - With LC_ALL=C: byte-order sorting (e.g., Zebra before apple) - With LC_ALL=de_DE.UTF-8: locale-aware sorting (e.g., Äpfel groups with apple) - With LC_ALL=vi_VN.UTF-8: Vietnamese collation order Testing: - All existing unit tests pass - Manually verified against GNU ls with German and Vietnamese locales - Output matches GNU ls behavior exactly This is a critical feature for international users who need proper alphabetical sorting according to their locale's collation rules.
GNU testsuite comparison:
|
@naoNao89 please write the PR comment yourself. An AI generated comments isn't useful (too many information that are not relevant). thanks |
please write a new benchmark for this. and it lacks tests |
- Add locale sorting benchmarks with ASCII and Unicode datasets - Add tests for German, French, Spanish locale collation - Add test for environment variable precedence (LC_ALL > LC_COLLATE > LANG) - Add test for C/POSIX locale fallback to byte comparison - Benchmark performance impact across different locales and file counts
done |
GNU testsuite comparison:
|
GNU testsuite comparison:
|
- Add locale test words (German, French, Spanish) to cspell dictionary - Fix formatting in locale test assertions
44f5eca
to
d931a9f
Compare
GNU testsuite comparison:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements locale-aware sorting for
ls
to match GNU ls behavior and fix incorrect sorting of non-ASCII filenames.ls
was using byte-order comparison, causing non-ASCII characters to sort incorrectly. For example, German umlauts (ä, ö, ü) appeared after 'z' instead of near their base letters.Solution
Integrates
uucore::i18n::collator
to respect locale environment variables (LC_ALL > LC_COLLATE > LANG). Falls back to byte comparison for C/POSIX locale, preserving performance in ASCII-only environments.