Skip to content

fix: add explicit encoding="utf-8" to text-mode open() calls#7095

Open
Ghraven wants to merge 1 commit into
chroma-core:mainfrom
Ghraven:fix/add-encoding-utf8-to-text-mode-opens
Open

fix: add explicit encoding="utf-8" to text-mode open() calls#7095
Ghraven wants to merge 1 commit into
chroma-core:mainfrom
Ghraven:fix/add-encoding-utf8-to-text-mode-opens

Conversation

@Ghraven
Copy link
Copy Markdown

@Ghraven Ghraven commented May 20, 2026

What

Several text-mode open() calls in chromadb load or persist user-facing configuration and credential files without an explicit encoding= argument. Python falls back to locale.getpreferredencoding(), which is cp1252 on Windows and utf-8 on Linux/macOS — so a file written on one platform can become unreadable on another, and any non-ASCII content in user-supplied paths or values can fail to round-trip.

Why

  • Reliability across Windows / Linux / macOS / Docker — credential and config files are frequently round-tripped between developer machines.
  • PEP 597 recommends always specifying encoding= for text-mode opens.
  • Eliminates EncodingWarning under PYTHONWARNDEFAULTENCODING=1 (Python 3.10+).
  • No behavior change on systems where the default is already utf-8.

Changes

Added encoding="utf-8" to 7 text-mode open() call sites across 5 files:

  • chromadb/telemetry/product/__init__.py — 2 sites in TelemetryClient user-id read/write
  • chromadb/auth/__init__.py — 2 sites in credential and config file loaders
  • chromadb/cli/utils.py — 1 site in log-config loader
  • chromadb/utils/embedding_functions/schemas/registry.py — 1 site loading embedding-function JSON schemas
  • chromadb/utils/embedding_functions/schemas/schema_utils.py — 1 site loading embedding-function JSON schemas

Total: 5 files, +7/-7 lines.

Testing

Pure keyword-argument addition to existing open() calls; no logic touched. Verified all 5 files parse cleanly with python -m py_compile. The files were already being read/written as utf-8 in practice on Linux/macOS CI — this just makes it explicit and portable to Windows.

Several text-mode open() calls in chromadb load or persist
user-facing configuration and credential files without an explicit
encoding= argument. Python falls back to locale.getpreferredencoding(),
which is cp1252 on Windows and utf-8 on Linux/macOS — so a file
written on one platform can become unreadable on another, and any
non-ASCII content in user-supplied paths or values can fail to round-
trip.

Affected sites (7 total, 5 files):
- chromadb/telemetry/product/__init__.py: 2 sites in TelemetryClient
  user-id read/write
- chromadb/auth/__init__.py: 2 sites in credential and config file
  loaders
- chromadb/cli/utils.py: 1 site in log-config loader
- chromadb/utils/embedding_functions/schemas/registry.py: 1 site
  loading embedding-function JSON schemas
- chromadb/utils/embedding_functions/schemas/schema_utils.py: 1 site
  loading embedding-function JSON schemas

No behavior change on systems where the default encoding is already
utf-8; eliminates locale-dependent breakage everywhere else. PEP 597
also recommends always specifying encoding= for text-mode opens.
@github-actions
Copy link
Copy Markdown

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant