Update init.py #15396

Rich-AM · 2025-11-09T03:26:29Z

Added code to detect (via the chardet module) file type encoding, then pass the detected encoding to the appropriate '.read_text' function. Also added 'log.info' statements to state which files are being read along with their detected file character encoding

Description

Reason for the change.

If a conda channel's repodata.json file contains any non-ASCII (e.g. UTF-8) characters, conda defaults to the local workstation regional file encoding. For Linux and Mac, this is typically UTF-8, for Windows, it is the system regional default (cp-1252 Western Latin for most US based Windows systems). If a single conda channel contains a UTF-8 encoded repodata.json file, it can cause a Windows installed version of conda to fail due to an 'invalid character' being detected. Example: If the unicode character U+201D (the right double quote character) is in the repodata.json file for a channel added to the Windows machine conda environment, conda will fail even a simple command such as conda search pandas

The changes proposed in this __init__.py file merely uses the chardet module to detect the file's character encoding, then passes the detected encoding to the appropriate .read_text function. Additionally log.info statements have been added to provide users with additional insight into which files are being read (cached or downloaded) as well as their detected file character encoding.

Lastly, all changes were bracketed by a string of 20 '#' with specific comments added.

Thank you for time and consideration of this request

Checklist - did you ...

I did not do any of the following checklist items as I did not think this change request was significant enough

~~[ ] Add a file to the news directory (using the template) for the next release's release notes?~~
~~[ ] Add / update necessary tests?~~
~~[ ] Add / update outdated documentation?~~

Added code to detect (via the chardet module) file type encoding, then pass the detected encoding to the appropriate '.read_text' function. Also added 'log.info' statements to state which files are being read along with their detected file character encoding

conda-bot · 2025-11-09T03:26:43Z

We require contributors to sign our Contributor License Agreement and we don't have one on file for @Rich-AM.

In order for us to review and merge your code, please e-sign the Contributor License Agreement PDF. We then need to manually verify your signature, merge the PR (conda/infrastructure#1238), and ping the bot to refresh the PR.

dholth · 2025-11-10T01:25:27Z

json must be utf-8 only.

It might be worth checking that conda works correctly under less-common default encodings.

https://peps.python.org/pep-0686/ makes utf-8 default in Python 3.15

travishathaway · 2025-11-11T10:15:25Z

@conda-bot check

travishathaway · 2025-11-12T08:13:45Z

Some additional context on this from ChatGPT:

On most Windows systems, the default character encoding used by Python is cp1252 (also known as Windows-1252), which is a common Western European encoding.

However, this can vary depending on the system locale settings.
Here’s how it typically works:

sys.getdefaultencoding() → Always returns 'utf-8' in modern Python (since Python 3).
locale.getpreferredencoding() → Returns the system’s default text encoding, which on Windows is often 'cp1252' (but could be 'cp932', 'cp1251', etc., depending on the locale).
When opening files without specifying encoding → Python uses the value from locale.getpreferredencoding(False) as the default encoding.

So, in short:

✅ On most English-language Windows installations, the default encoding is cp1252, though you should always explicitly specify encoding="utf-8" in file operations to ensure portability.

@Rich-AM,

With the above in mind and what you have described, I think you bring up some valid points here, but I don't think we'll accept the solution as you proposed because it's not as efficient as it could be (you load the entire file into memory and these files are normally many megabytes if not hundreds), and these errors aren't very widespread because most repodata.json from the most popular channels (e.g. conda-forge and Anaconda's main) don't contain non-ascii characters.

If you would like to see better character encoding in conda, I suggest opening a separate issue as a "feature" request to clearly definitely the problem and how we can solve it.

Also, for future pull requests to this repository, it's very important to have tests that validate the solution you have proposed. If you are ever unsure of how to write these tests or where to place them please reach out to us via these pull requests or our chat/message board: https://conda.zulipchat.com

Update __init__.py

c29dea0

Added code to detect (via the chardet module) file type encoding, then pass the detected encoding to the appropriate '.read_text' function. Also added 'log.info' statements to state which files are being read along with their detected file character encoding

Rich-AM requested a review from a team as a code owner November 9, 2025 03:26

conda-bot added this to 🔎 Review Nov 9, 2025

github-project-automation bot moved this to 🆕 New in 🔎 Review Nov 9, 2025

conda-bot mentioned this pull request Nov 9, 2025

Adding CLA signee Rich-AM conda/infrastructure#1238

Merged

conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update init.py #15396

Update init.py #15396

Rich-AM commented Nov 9, 2025

Uh oh!

conda-bot commented Nov 9, 2025

Uh oh!

dholth commented Nov 10, 2025

Uh oh!

travishathaway commented Nov 11, 2025

Uh oh!

travishathaway commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update __init__.py #15396

Are you sure you want to change the base?

Update __init__.py #15396

Conversation

Rich-AM commented Nov 9, 2025

Description

Checklist - did you ...

Uh oh!

conda-bot commented Nov 9, 2025

Uh oh!

dholth commented Nov 10, 2025

Uh oh!

travishathaway commented Nov 11, 2025

Uh oh!

travishathaway commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update init.py #15396

Update init.py #15396