+
Skip to content

Update l10n helper script #2699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 7, 2025
Merged

Conversation

KJhellico
Copy link
Collaborator

Proposed change

Update l10n helper script:

  • use msgid for default languages .po files
  • add option to search for suitable translations in all locales of language
  • add fuzzy flag when several translation alternatives remaining in .po file

Some explanation about new option (--strict-locale/--no-strict-locale): with --no-strict-locale parameter, the search for translations for locale like fr_NE will be performed on all available fr_*.

Type of change

  • New country/market holidays support (thank you!)
  • Supported country/market holidays update (calendar discrepancy fix, localization)
  • Existing code/documentation/test/process quality improvement (best practice, cleanup, refactoring, optimization)
  • Dependency update (version deprecation/pin/upgrade)
  • Bugfix (non-breaking change which fixes an issue)
  • Breaking change (a code change causing existing functionality to break)
  • New feature (new holidays functionality in general)

Checklist

- use `msgid` for default languages .po files
- add option to search for suitable translations in all locales of language
- add fuzzy flag when several translation alternatives remaining in .po file
Copy link
Contributor

coderabbitai bot commented Jul 4, 2025

Summary by CodeRabbit

  • New Features

    • Added a command-line option to control strict or relaxed locale matching when searching for translations.
  • Improvements

    • Enhanced handling of missing translations by providing better fallbacks and improved joining of multiple candidate translations.
    • Improved marking of translation entries as "fuzzy" when appropriate.
    • Optimized processing for the default language and translation file searches.

Walkthrough

A new --strict-locale command-line argument was added to control locale matching during translation file searches. Several methods in the localization helper script were updated to adjust how translation entries are processed, flagged, and merged, including changes to fallback logic and formatting of combined translations.

Changes

File(s) Change Summary
scripts/l10n/l10n_helper.py Added --strict-locale argument; updated locale matching logic, entry processing, fuzzy flag handling, fallback mechanisms, and translation joining character.

Suggested labels

script

Suggested reviewers

  • PPsyrius

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 328c836 and a6e57d3.

📒 Files selected for processing (1)
  • scripts/l10n/l10n_helper.py (6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: PPsyrius
PR: vacanza/holidays#2608
File: holidays/locale/en_US/LC_MESSAGES/VC.po:76-76
Timestamp: 2025-06-10T05:08:07.939Z
Learning: For the holidays project, localization file formatting issues (like missing terminal periods in .po files) should be automatically fixed by running `make l10n` command (which is included in `make check`). Authors should be directed to use this automated tooling instead of manual formatting fixes.
Learnt from: KJhellico
PR: vacanza/holidays#2676
File: holidays/locale/en_US/LC_MESSAGES/TN.po:17-28
Timestamp: 2025-06-26T15:34:35.476Z
Learning: In the holidays project, .po file header metadata updates (version numbers, revision dates, translator information) are legitimate changes when part of localization work and don't require `make l10n` regeneration. The `make l10n` command is primarily for formatting fixes and missing translator comments, not for intentional metadata updates.
Learnt from: PPsyrius
PR: vacanza/holidays#2608
File: holidays/locale/en_VC/LC_MESSAGES/VC.po:61-66
Timestamp: 2025-06-10T05:07:29.372Z
Learning: For missing translator comments in .po localization files in the holidays repository, direct authors to run `make l10n` or `make check` commands instead of suggesting manual fixes, as these commands automatically handle translator comment generation.
Learnt from: PPsyrius
PR: vacanza/holidays#2537
File: holidays/countries/finland.py:195-199
Timestamp: 2025-05-09T18:34:33.990Z
Learning: In the holidays library, localization (l10n) is handled through separate .po files for different languages rather than combining multiple translations in a single string. The code should use the default language with tr() function, and translations are provided in language-specific .po files.
Learnt from: KJhellico
PR: vacanza/holidays#2607
File: scripts/l10n/l10n_helper.py:86-88
Timestamp: 2025-06-06T15:23:42.337Z
Learning: The scripts/l10n/l10n_helper.py script is an internal localization helper tool intended for use by developers who understand the codebase, so additional safety checks may not be necessary.
Learnt from: KJhellico
PR: vacanza/holidays#2651
File: holidays/locale/nl/LC_MESSAGES/BQ.po:29-95
Timestamp: 2025-06-25T20:55:00.642Z
Learning: In the holidays library, Dutch locale files (.po) with `X-Source-Language: nl` should have empty msgstr entries when the target language is also Dutch. The library uses fallback=True with gettext, which returns the original msgid when msgstr is empty. This is the correct pattern for native language files and does not cause blank holiday names.
Learnt from: KJhellico
PR: vacanza/holidays#2530
File: holidays/locale/ca/LC_MESSAGES/AD.po:31-40
Timestamp: 2025-05-06T15:25:44.333Z
Learning: In the Holidays project, msgid fields in localization files contain strings in the entity's default language (as defined by default_language attribute), not English source strings as in standard gettext implementations.
Learnt from: KJhellico
PR: vacanza/holidays#2388
File: holidays/locale/en_CI/LC_MESSAGES/CI.po:88-88
Timestamp: 2025-03-30T18:25:07.087Z
Learning: In the holidays library, localization files have a specific structure: message comments are in standard English (en_US) describing the holiday, while actual translations (msgstr) should use the locale-specific terminology (e.g., en_CI for Ivory Coast English). For example, "Night of Power" in standard English is translated as "Lailatou-Kadr" in Ivory Coast English.
Learnt from: PPsyrius
PR: vacanza/holidays#2402
File: holidays/locale/en_TT/LC_MESSAGES/TT.po:46-48
Timestamp: 2025-04-05T08:12:19.986Z
Learning: In the holidays library, localization (PO) files follow this convention: comments (#.) always use American English (en_US) spelling, while the msgid content follows the locale-specific spelling standards of the target language.
Learnt from: KJhellico
PR: vacanza/holidays#2631
File: tests/countries/test_sint_maarten.py:62-0
Timestamp: 2025-06-14T21:12:07.224Z
Learning: KJhellico prefers to focus on completing and reviewing the main holiday implementation code before doing detailed reviews of the corresponding test files.
Learnt from: KJhellico
PR: vacanza/holidays#2259
File: holidays/locale/en_IN/LC_MESSAGES/IN.po:285-299
Timestamp: 2025-03-08T11:28:48.652Z
Learning: In the holidays project, message IDs (msgids) in locale files use region-specific naming conventions (e.g., "Muharram", "Id-ul-Fitr" in en_IN locale for India), while translator comments use internationally recognized names from the project's default locale (en_US) such as "Ashura", "Eid al-Fitr". This difference is intentional for proper localization.
Learnt from: KJhellico
PR: vacanza/holidays#2394
File: holidays/locale/pt_PT/LC_MESSAGES/CV.po:31-88
Timestamp: 2025-03-31T19:37:57.691Z
Learning: In the holidays library localization pattern, when a locale file matches a country's default language (e.g., pt_PT for Cape Verde), the msgstr fields should be left empty. Only non-default language locale files should have translations in the msgstr fields.
Learnt from: KJhellico
PR: vacanza/holidays#2653
File: holidays/locale/th/LC_MESSAGES/TW.po:17-21
Timestamp: 2025-06-21T18:06:50.027Z
Learning: KJhellico's username includes a tilde character (~) as part of their nickname (appears as "~Jhellico" in Last-Translator headers), which is intentional formatting and not an error.
scripts/l10n/l10n_helper.py (22)
Learnt from: KJhellico
PR: vacanza/holidays#2607
File: scripts/l10n/l10n_helper.py:86-88
Timestamp: 2025-06-06T15:23:42.337Z
Learning: The scripts/l10n/l10n_helper.py script is an internal localization helper tool intended for use by developers who understand the codebase, so additional safety checks may not be necessary.
Learnt from: KJhellico
PR: vacanza/holidays#2676
File: holidays/locale/en_US/LC_MESSAGES/TN.po:17-28
Timestamp: 2025-06-26T15:34:35.476Z
Learning: In the holidays project, .po file header metadata updates (version numbers, revision dates, translator information) are legitimate changes when part of localization work and don't require `make l10n` regeneration. The `make l10n` command is primarily for formatting fixes and missing translator comments, not for intentional metadata updates.
Learnt from: PPsyrius
PR: vacanza/holidays#2608
File: holidays/locale/en_US/LC_MESSAGES/VC.po:76-76
Timestamp: 2025-06-10T05:08:07.939Z
Learning: For the holidays project, localization file formatting issues (like missing terminal periods in .po files) should be automatically fixed by running `make l10n` command (which is included in `make check`). Authors should be directed to use this automated tooling instead of manual formatting fixes.
Learnt from: PPsyrius
PR: vacanza/holidays#2608
File: holidays/locale/en_VC/LC_MESSAGES/VC.po:61-66
Timestamp: 2025-06-10T05:07:29.372Z
Learning: For missing translator comments in .po localization files in the holidays repository, direct authors to run `make l10n` or `make check` commands instead of suggesting manual fixes, as these commands automatically handle translator comment generation.
Learnt from: PPsyrius
PR: vacanza/holidays#2537
File: holidays/countries/finland.py:195-199
Timestamp: 2025-05-09T18:34:33.990Z
Learning: In the holidays library, localization (l10n) is handled through separate .po files for different languages rather than combining multiple translations in a single string. The code should use the default language with tr() function, and translations are provided in language-specific .po files.
Learnt from: KJhellico
PR: vacanza/holidays#2607
File: scripts/l10n/l10n_helper.py:167-167
Timestamp: 2025-06-07T11:09:09.087Z
Learning: In scripts/l10n/l10n_helper.py, CSV files are structured with exactly 3+ columns: msgid, msgcomment, and at least one language column. The structure is controlled by the export process and guarantees minimum column count.
Learnt from: KJhellico
PR: vacanza/holidays#2651
File: holidays/locale/nl/LC_MESSAGES/BQ.po:29-95
Timestamp: 2025-06-25T20:55:00.642Z
Learning: In the holidays library, Dutch locale files (.po) with `X-Source-Language: nl` should have empty msgstr entries when the target language is also Dutch. The library uses fallback=True with gettext, which returns the original msgid when msgstr is empty. This is the correct pattern for native language files and does not cause blank holiday names.
Learnt from: KJhellico
PR: vacanza/holidays#2530
File: holidays/countries/andorra.py:13-14
Timestamp: 2025-05-06T15:29:31.893Z
Learning: In this codebase, `tr` imported as `from gettext import gettext as tr` is used as a message marker for extracting translatable strings when generating .po files, not just for runtime translation.
Learnt from: KJhellico
PR: vacanza/holidays#2607
File: scripts/l10n/l10n_helper.py:147-150
Timestamp: 2025-06-06T15:22:39.950Z
Learning: In the holidays project localization system, all .po files are generated from the same .pot template for all supported languages, ensuring that all msgids exist for all languages. This makes defensive programming for missing language keys in po_data[msgid] unnecessary in the normal workflow.
Learnt from: KJhellico
PR: vacanza/holidays#2388
File: holidays/locale/en_CI/LC_MESSAGES/CI.po:88-88
Timestamp: 2025-03-30T18:25:07.087Z
Learning: In the holidays library, localization files have a specific structure: message comments are in standard English (en_US) describing the holiday, while actual translations (msgstr) should use the locale-specific terminology (e.g., en_CI for Ivory Coast English). For example, "Night of Power" in standard English is translated as "Lailatou-Kadr" in Ivory Coast English.
Learnt from: KJhellico
PR: vacanza/holidays#2259
File: holidays/locale/en_IN/LC_MESSAGES/IN.po:30-299
Timestamp: 2025-03-05T17:51:00.633Z
Learning: In the Holidays project, .po files for a country's default locale use empty msgstr fields as a standard convention.
Learnt from: KJhellico
PR: vacanza/holidays#2687
File: holidays/locale/en_US/LC_MESSAGES/CF.po:13-28
Timestamp: 2025-06-29T09:37:35.283Z
Learning: In the holidays project, .po files follow a standard formatting convention where there is always a blank line after the metadata header section (after the "X-Source-Language" line). This blank line separates the header from the actual translation content and should not be removed.
Learnt from: KJhellico
PR: vacanza/holidays#2684
File: holidays/locale/it/LC_MESSAGES/SM.po:13-13
Timestamp: 2025-06-28T10:39:19.185Z
Learning: In the holidays project, .po file header comments use the format "# [Country] holidays." for default language files (without trailing hash) and "# [Country] holidays [locale] localization." for non-default language files (also without trailing hash).
Learnt from: PPsyrius
PR: vacanza/holidays#2388
File: holidays/locale/fr/LC_MESSAGES/CI.po:28-101
Timestamp: 2025-03-30T13:33:31.598Z
Learning: In the holidays library, for localization files of the default language (like French for Ivory Coast in fr/LC_MESSAGES/CI.po), the best practice is to leave the message strings (msgstr) empty to avoid possible typos, since the message IDs (msgid) are already in the target language.
Learnt from: KJhellico
PR: vacanza/holidays#2601
File: README.md:108-108
Timestamp: 2025-06-09T19:39:38.818Z
Learning: The correct way to count supported countries in the holidays library is by counting unique country module files in holidays/countries/ (excluding __init__.py) or by counting entries in the COUNTRIES registry, not by counting individual class definitions which include alias classes.
Learnt from: KJhellico
PR: vacanza/holidays#2394
File: holidays/locale/pt_PT/LC_MESSAGES/CV.po:31-88
Timestamp: 2025-03-31T19:37:57.691Z
Learning: In the holidays library, when a locale file matches the country's default language (e.g., pt_PT for Cape Verde), the msgstr fields should be left empty. Only non-default language files should have filled msgstr fields with translations.
Learnt from: PPsyrius
PR: vacanza/holidays#2402
File: holidays/locale/en_TT/LC_MESSAGES/TT.po:46-48
Timestamp: 2025-04-05T08:12:19.986Z
Learning: In the holidays library, localization (PO) files follow this convention: comments (#.) always use American English (en_US) spelling, while the msgid content follows the locale-specific spelling standards of the target language.
Learnt from: ankushhKapoor
PR: vacanza/holidays#2601
File: holidays/locale/en_MN/LC_MESSAGES/MN.po:13-14
Timestamp: 2025-06-11T18:32:25.595Z
Learning: For non-default locale `.po` files, the header comment format is:
`# <Country> holidays <locale> localization.` (no trailing hash).
Learnt from: PPsyrius
PR: vacanza/holidays#2676
File: holidays/locale/ar/LC_MESSAGES/EG.po:0-0
Timestamp: 2025-06-25T10:09:29.029Z
Learning: In the holidays library, msgstr fields can be left empty for source/default_language files when using Lingva, the localization tool used by the project.
Learnt from: KJhellico
PR: vacanza/holidays#2530
File: holidays/locale/ca/LC_MESSAGES/AD.po:31-40
Timestamp: 2025-05-06T15:25:44.333Z
Learning: In the Holidays project, msgid fields in localization files contain strings in the entity's default language (as defined by default_language attribute), not English source strings as in standard gettext implementations.
Learnt from: KJhellico
PR: vacanza/holidays#2465
File: holidays/locale/nl/LC_MESSAGES/SR.po:31-32
Timestamp: 2025-04-13T19:11:32.337Z
Learning: In the holidays library, when a language is set as the default language for a country (like Dutch/nl for Suriname), the `msgid` strings in the PO file are already in that language, and the corresponding `msgstr` fields are intentionally left empty. This is by design, as the system will fall back to using the `msgid` value when no translation is provided.
Learnt from: PPsyrius
PR: vacanza/holidays#2438
File: holidays/locale/ar_IQ/LC_MESSAGES/IQ.po:35-81
Timestamp: 2025-04-17T17:08:48.082Z
Learning: In holiday PO files, when the file represents the default language of an entity (e.g., ar_IQ for Iraq), no translations in `msgstr` are required as the `msgid` values are already in the target language.
🔇 Additional comments (6)
scripts/l10n/l10n_helper.py (6)

54-59: Good addition of flexible locale matching control.

The new --strict-locale argument provides useful control over translation search scope. The default of True maintains backward compatibility while allowing users to opt into broader searches when needed.


101-102: Clearing occurrences is a good cleanup step.

Clearing the occurrences from POT entries makes sense for generated .po files, as the line number references from the original extraction aren't relevant for the final translation files.


121-122: Smart optimization for en_US default language.

Early return when the default language is "en_US" avoids unnecessary processing since the msgstr would just duplicate the msgid. This aligns with the project's localization conventions.


193-194: Proper fuzzy flag implementation for uncertain translations.

Adding the fuzzy flag when translations contain "|" is excellent practice. This correctly marks entries that have multiple translation candidates, indicating they need human review.


211-228: Well-implemented locale filtering logic.

The refactored translation search logic properly handles both strict and loose locale matching. The fallback to msgid when msgstr is empty is correct for the project's gettext implementation.


237-237: Better separator alignment with gettext conventions.

Changing from "/" to " | " as the separator for multiple translations is a good improvement that aligns with standard gettext practices and improves readability.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in a Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the script label Jul 4, 2025
Copy link

sonarqubecloud bot commented Jul 4, 2025

Copy link

codecov bot commented Jul 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (328c836) to head (a6e57d3).
Report is 3 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff            @@
##               dev     #2699   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          254       254           
  Lines        15266     15266           
  Branches      2098      2098           
=========================================
  Hits         15266     15266           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@PPsyrius PPsyrius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌐

@arkid15r arkid15r added this pull request to the merge queue Jul 7, 2025
Merged via the queue into vacanza:dev with commit 0a4d0d7 Jul 7, 2025
33 checks passed
@KJhellico KJhellico deleted the upd-l10n-helper branch July 7, 2025 12:36
@KJhellico KJhellico mentioned this pull request Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载