fix(parser): handle bytes input in escape_char using to_unicode#1330
Closed
Alm0stSurely wants to merge 1 commit intocollective:mainfrom
Closed
fix(parser): handle bytes input in escape_char using to_unicode#1330Alm0stSurely wants to merge 1 commit intocollective:mainfrom
Alm0stSurely wants to merge 1 commit intocollective:mainfrom
Conversation
escape_char was annotated to accept str | bytes but used str.replace() operations directly, causing TypeError when bytes input contained characters that needed escaping (comma, semicolon, backslash, newline). Use to_unicode() to convert bytes to str before applying replacements, matching the approach used in unescape_char. This also corrects the return type from str | bytes to str. Adds test_escape_char_bytes verifying all escape patterns work with bytes input. Closes collective#1226
Documentation build overview
115 files changed ·
|
Member
|
@Alm0stSurely would you please resolve the merge conflicts, so we can review? Thank you! |
Member
|
Closing due to no response and superseded by #1332 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
escape_charinicalendar.parser.stringwas annotated to acceptstr | bytesbut usedstr.replace()operations directly, causingTypeErrorwhenbytesinput contained characters that needed escaping (comma, semicolon, backslash, newline).This was flagged by Copilot in the review of #1222.
Analysis
The function signature allowed
bytesinput, but the implementation used string literals for all replacements. Whenbyteswas passed, Python would raiseTypeError: a bytes-like object is required, not str(or similar, depending on the replacement).The
unescape_charfunction already handles this correctly by branching onisinstance(text, str)vsisinstance(text, bytes). A more consistent approach is to useto_unicode(), which is already available in the codebase and convertsbytestostrusing the default encoding.Solution
to_unicode(text)at the start ofescape_charto normalize input tostrstr | bytestostrtest_escape_char_bytesverifying all escape patterns work with bytes inputBenchmarks
No performance impact expected —
to_unicodeis a fast encoding check/conversion that is already used extensively in the parser.Notes
escape_charwith the rest of the parser, which usesto_unicodefor input normalization.str | bytestostris technically a narrowing, but since the function previously crashed onbytesinput for most cases, this is a bugfix rather than a breaking change.Closes #1226
📚 Documentation preview 📚: https://icalendar--1330.org.readthedocs.build/