+
Skip to content

Conversation

naoNao89
Copy link
Contributor

@naoNao89 naoNao89 commented Oct 6, 2025

Implements locale-aware sorting for ls to match GNU ls behavior and fix incorrect sorting of non-ASCII filenames.

ls was using byte-order comparison, causing non-ASCII characters to sort incorrectly. For example, German umlauts (ä, ö, ü) appeared after 'z' instead of near their base letters.

Solution

Integrates uucore::i18n::collator to respect locale environment variables (LC_ALL > LC_COLLATE > LANG). Falls back to byte comparison for C/POSIX locale, preserving performance in ASCII-only environments.

Implements locale-aware collation for file name sorting, respecting the
LC_COLLATE environment variable. This matches GNU ls behavior.

Changes:
- Added i18n-collator feature dependency to ls
- Initialize ICU collator with default options in uumain()
- Replace simple string comparison with locale_cmp() for Sort::Name
- Respects LC_ALL > LC_COLLATE > LANG priority order
- Falls back to byte comparison for C locale

Behavior:
- With LC_ALL=C: byte-order sorting (e.g., Zebra before apple)
- With LC_ALL=de_DE.UTF-8: locale-aware sorting (e.g., Äpfel groups with apple)
- With LC_ALL=vi_VN.UTF-8: Vietnamese collation order

Testing:
- All existing unit tests pass
- Manually verified against GNU ls with German and Vietnamese locales
- Output matches GNU ls behavior exactly

This is a critical feature for international users who need proper
alphabetical sorting according to their locale's collation rules.
Copy link

github-actions bot commented Oct 6, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

@naoNao89 please write the PR comment yourself. An AI generated comments isn't useful (too many information that are not relevant). thanks

@sylvestre
Copy link
Contributor

please write a new benchmark for this.
see how it is done here:
src/uu//benches/

and it lacks tests

@naoNao89 naoNao89 marked this pull request as draft October 6, 2025 18:58
naoNao89

This comment was marked as off-topic.

- Add locale sorting benchmarks with ASCII and Unicode datasets
- Add tests for German, French, Spanish locale collation
- Add test for environment variable precedence (LC_ALL > LC_COLLATE > LANG)
- Add test for C/POSIX locale fallback to byte comparison
- Benchmark performance impact across different locales and file counts
@naoNao89
Copy link
Contributor Author

naoNao89 commented Oct 6, 2025

done

Copy link

github-actions bot commented Oct 6, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

Copy link

github-actions bot commented Oct 6, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

- Add locale test words (German, French, Spanish) to cspell dictionary
- Fix formatting in locale test assertions
@naoNao89 naoNao89 force-pushed the feat/ls-locale-sorting-only branch from 44f5eca to d931a9f Compare October 7, 2025 11:40
Copy link

github-actions bot commented Oct 7, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

@naoNao89 naoNao89 marked this pull request as ready for review October 7, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载