这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@Rich-AM
Copy link

@Rich-AM Rich-AM commented Nov 9, 2025

Added code to detect (via the chardet module) file type encoding, then pass the detected encoding to the appropriate '.read_text' function. Also added 'log.info' statements to state which files are being read along with their detected file character encoding

Description

Reason for the change.

If a conda channel's repodata.json file contains any non-ASCII (e.g. UTF-8) characters, conda defaults to the local workstation regional file encoding. For Linux and Mac, this is typically UTF-8, for Windows, it is the system regional default (cp-1252 Western Latin for most US based Windows systems). If a single conda channel contains a UTF-8 encoded repodata.json file, it can cause a Windows installed version of conda to fail due to an 'invalid character' being detected. Example: If the unicode character U+201D (the right double quote character) is in the repodata.json file for a channel added to the Windows machine conda environment, conda will fail even a simple command such as conda search pandas

The changes proposed in this __init__.py file merely uses the chardet module to detect the file's character encoding, then passes the detected encoding to the appropriate .read_text function. Additionally log.info statements have been added to provide users with additional insight into which files are being read (cached or downloaded) as well as their detected file character encoding.

Lastly, all changes were bracketed by a string of 20 '#' with specific comments added.

Thank you for time and consideration of this request

Checklist - did you ...

I did not do any of the following checklist items as I did not think this change request was significant enough

  • [ ] Add a file to the news directory (using the template) for the next release's release notes?
  • [ ] Add / update necessary tests?
  • [ ] Add / update outdated documentation?

Added code to detect (via the chardet module) file type encoding, then pass the detected encoding to the appropriate '.read_text' function.  Also added 'log.info' statements to state which files are being read along with their detected file character encoding
@Rich-AM Rich-AM requested a review from a team as a code owner November 9, 2025 03:26
@github-project-automation github-project-automation bot moved this to 🆕 New in 🔎 Review Nov 9, 2025
@conda-bot
Copy link
Contributor

We require contributors to sign our Contributor License Agreement and we don't have one on file for @Rich-AM.

In order for us to review and merge your code, please e-sign the Contributor License Agreement PDF. We then need to manually verify your signature, merge the PR (conda/infrastructure#1238), and ping the bot to refresh the PR.

@dholth
Copy link
Contributor

dholth commented Nov 10, 2025

json must be utf-8 only.

It might be worth checking that conda works correctly under less-common default encodings.

https://peps.python.org/pep-0686/ makes utf-8 default in Python 3.15

@travishathaway
Copy link
Contributor

@conda-bot check

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Nov 11, 2025
@travishathaway
Copy link
Contributor

Some additional context on this from ChatGPT:

On most Windows systems, the default character encoding used by Python is cp1252 (also known as Windows-1252), which is a common Western European encoding.

However, this can vary depending on the system locale settings.
Here’s how it typically works:

  • sys.getdefaultencoding() → Always returns 'utf-8' in modern Python (since Python 3).
  • locale.getpreferredencoding() → Returns the system’s default text encoding, which on Windows is often 'cp1252' (but could be 'cp932', 'cp1251', etc., depending on the locale).
  • When opening files without specifying encoding → Python uses the value from locale.getpreferredencoding(False) as the default encoding.

So, in short:

✅ On most English-language Windows installations, the default encoding is cp1252, though you should always explicitly specify encoding="utf-8" in file operations to ensure portability.

@Rich-AM,

With the above in mind and what you have described, I think you bring up some valid points here, but I don't think we'll accept the solution as you proposed because it's not as efficient as it could be (you load the entire file into memory and these files are normally many megabytes if not hundreds), and these errors aren't very widespread because most repodata.json from the most popular channels (e.g. conda-forge and Anaconda's main) don't contain non-ascii characters.

If you would like to see better character encoding in conda, I suggest opening a separate issue as a "feature" request to clearly definitely the problem and how we can solve it.

Also, for future pull requests to this repository, it's very important to have tests that validate the solution you have proposed. If you are ever unsure of how to write these tests or where to place them please reach out to us via these pull requests or our chat/message board: https://conda.zulipchat.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed [bot] added once the contributor has signed the CLA

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

4 participants