gh-130273: Fix traceback color output with unicode characters#142529
gh-130273: Fix traceback color output with unicode characters#142529grayjk wants to merge 5 commits intopython:mainfrom
Conversation
|
@serhiy-storchaka: Here is a PR about text width and Unicode characters :-) |
|
updated to use @serhiy-storchaka's recently added unicodedata.iter_graphemes |
|
@pablogsal @hauntsaninja as recent reviewers of traceback.py, would you mind taking look |
|
There are conflicts, please fix them. |
|
@StanFromIreland conflicts resolved |
Lib/traceback.py
Outdated
| 2 if unicodedata.east_asian_width(char) in _WIDE_CHAR_SPECIFIERS else 1 | ||
| for char in line[:offset] | ||
| ) | ||
| from _pyrepl.utils import wlen |
There was a problem hiding this comment.
I would prefer to not depend on _pyrepl in the traceback module. I would prefer to move wlen() here, and modify _pyrepl.utils to get it from traceback.
There was a problem hiding this comment.
I've moved wlen/str_width in commit 467656e and made them private (prefixed with _) to avoid putting them in traceback.__all__ but mypy isn't happy about that. Should I make them public?
There was a problem hiding this comment.
are # type: ignore comments in this case okay?
There was a problem hiding this comment.
or alternatively I could move wlen to a new support file with a name prefixed with _
|
There are conflicts again I'm afraid, and mypy isn't happy either. |
| return 2 | ||
|
|
||
|
|
||
| ANSI_ESCAPE_SEQUENCE = re.compile(r"\x1b\[[ -@]*[A-~]") |
There was a problem hiding this comment.
It should also be private.
| import unicodedata | ||
| if ord(c) < 128: | ||
| return 1 |
There was a problem hiding this comment.
There is no need to import unicodedata for ASCII characters:
| import unicodedata | |
| if ord(c) < 128: | |
| return 1 | |
| if ord(c) < 128: | |
| return 1 | |
| import unicodedata |
| def _zip_display_width(line, carets): | ||
| import unicodedata | ||
| carets = iter(carets) | ||
| for char in unicodedata.iter_graphemes(line): | ||
| char = str(char) | ||
| char_width = _display_width(char) | ||
| yield char, "".join(itertools.islice(carets, char_width)) |
There was a problem hiding this comment.
Would it be possible to avoid the heavy unicodedata import for ASCII line?
| def _zip_display_width(line, carets): | |
| import unicodedata | |
| carets = iter(carets) | |
| for char in unicodedata.iter_graphemes(line): | |
| char = str(char) | |
| char_width = _display_width(char) | |
| yield char, "".join(itertools.islice(carets, char_width)) | |
| def _zip_display_width(line, carets): | |
| carets = iter(carets) | |
| if line.isascii(): | |
| for char in line: | |
| yield char, next(carets, "") | |
| else: | |
| import unicodedata | |
| for char in unicodedata.iter_graphemes(line): | |
| char = str(char) | |
| char_width = _display_width(char) | |
| yield char, "".join(itertools.islice(carets, char_width)) |
I'm not sure that my code is correct :-)
| @@ -0,0 +1 @@ | |||
| Fix traceback color output with unicode characters | |||
There was a problem hiding this comment.
| Fix traceback color output with unicode characters | |
| Fix traceback color output with Unicode characters. |
Account for the display width of unicode characters so that colors and underlining in traceback output is correct
Closes #130273