Fix from_format crash on multi-character literal blocks#972
Open
vineethsaivs wants to merge 1 commit into
Open
Fix from_format crash on multi-character literal blocks#972vineethsaivs wants to merge 1 commit into
vineethsaivs wants to merge 1 commit into
Conversation
Formatter.parse() runs re.escape() on the format string first, so a literal block like [de] arrives as \[de\]. The token regex only suppressed the characters immediately adjacent to the brackets, so token letters in the middle of a multi-character literal (for example the d and e in [del]) were still tokenized, raising AttributeError or a "redefinition of group name" re.error. Match the whole escaped literal block as a single token in _FROM_FORMAT_RE and unwrap it in _replace_tokens, keeping the inner text as a literal. Fixes python-pendulum#971
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #971
Formatter.parse()runsre.escape()on the format string, so a literal block such as[de]arrives as\[de\]. The token regex_FROM_FORMAT_REonly suppressed the characters immediately adjacent to the brackets (via the lookbehind/lookahead), so token letters in the middle of a multi-character literal were still tokenized:This matches the whole escaped literal block as a single token in
_FROM_FORMAT_REand unwraps it in_replace_tokens, so the inner text stays literal regardless of which token letters it contains. Single-character literals like[T]/[Z]keep working, and the formatting path (format()) is untouched.Added a regression test (
test_from_format_with_multi_character_escaped_elements). The fullfrom_formatandformattingtest suites pass.