Skip to content

fix(parser): handle bytes input in escape_char using to_unicode#1330

Closed
Alm0stSurely wants to merge 1 commit intocollective:mainfrom
Alm0stSurely:fix/escape-char-bytes-v4
Closed

fix(parser): handle bytes input in escape_char using to_unicode#1330
Alm0stSurely wants to merge 1 commit intocollective:mainfrom
Alm0stSurely:fix/escape-char-bytes-v4

Conversation

@Alm0stSurely
Copy link
Copy Markdown

@Alm0stSurely Alm0stSurely commented Apr 19, 2026

Problem

escape_char in icalendar.parser.string was annotated to accept str | bytes but used str.replace() operations directly, causing TypeError when bytes input contained characters that needed escaping (comma, semicolon, backslash, newline).

This was flagged by Copilot in the review of #1222.

Analysis

The function signature allowed bytes input, but the implementation used string literals for all replacements. When bytes was passed, Python would raise TypeError: a bytes-like object is required, not str (or similar, depending on the replacement).

The unescape_char function already handles this correctly by branching on isinstance(text, str) vs isinstance(text, bytes). A more consistent approach is to use to_unicode(), which is already available in the codebase and converts bytes to str using the default encoding.

Solution

  • Use to_unicode(text) at the start of escape_char to normalize input to str
  • Update the return type from str | bytes to str
  • Add test_escape_char_bytes verifying all escape patterns work with bytes input

Benchmarks

No performance impact expected — to_unicode is a fast encoding check/conversion that is already used extensively in the parser.

Notes

  • This aligns escape_char with the rest of the parser, which uses to_unicode for input normalization.
  • The return type change from str | bytes to str is technically a narrowing, but since the function previously crashed on bytes input for most cases, this is a bugfix rather than a breaking change.

Closes #1226


📚 Documentation preview 📚: https://icalendar--1330.org.readthedocs.build/

escape_char was annotated to accept str | bytes but used str.replace()
operations directly, causing TypeError when bytes input contained characters
that needed escaping (comma, semicolon, backslash, newline).

Use to_unicode() to convert bytes to str before applying replacements,
matching the approach used in unescape_char. This also corrects the
return type from str | bytes to str.

Adds test_escape_char_bytes verifying all escape patterns work with
bytes input.

Closes collective#1226
@stevepiercy
Copy link
Copy Markdown
Member

@Alm0stSurely would you please resolve the merge conflicts, so we can review? Thank you!

@stevepiercy stevepiercy added the ai-suspicion This contribution is possibly created with lots of AI help without enough human understanding. label Apr 24, 2026
@stevepiercy
Copy link
Copy Markdown
Member

Closing due to no response and superseded by #1332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-suspicion This contribution is possibly created with lots of AI help without enough human understanding.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use to_unicode in escape_char

3 participants