gh-130057: Pygettext: Support translator comments#130061
gh-130057: Pygettext: Support translator comments#130061serhiy-storchaka merged 13 commits intopython:mainfrom
Conversation
|
(Updated the branch to fix the |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Very well. But xgettext supports also --add-comments without argument.
Tools/i18n/pygettext.py
Outdated
| comments = {} | ||
| for token in tokenize.tokenize(BytesIO(source).readline): | ||
| if token.type == tokenize.COMMENT: | ||
| comments[token.start[0]] = token.string.removeprefix('#').strip() |
There was a problem hiding this comment.
How does xgettext handle multiple #s?
There was a problem hiding this comment.
xgettext extracts all of the following comments, while babel does not extract any:
## i18n: comment
_('foo')
# # i18n: comment
_('bar')
## # # i18n: comment
_('thud')I think we can be permissive here and follow what xgettext does.
|
I made Would you mind taking another look? |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Thanks.
Could you add tests for multiple --add-comments with different tags? And test the same file without --add-comments?
Tools/i18n/pygettext.py
Outdated
| if token.type == tokenize.COMMENT: | ||
| comments[token.start[0]] = token.string.removeprefix('#').strip() | ||
| # Remove any leading combination of '#' and whitespace | ||
| comment = re.sub(r'^[#\s]+', '', token.string) |
There was a problem hiding this comment.
Or token.string.lstrip('# \t') if you prefer.
|
I added both tests :) Is this what you had in mind? |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
I actually thought about using the same source for multiple tests: without --add-comments, --add-comments without argument, single --add-comments with tag, multiple --add-comments with different tags. But the current tests are fine too.
Got it! Thanks for the clarification :) |
|
Thank you for your contribution @tomasr8. I did not expect that you implement this so fast. |
|
Thanks for the review @serhiy-storchaka! |
Adds support for translator comments to pygettext.
Now that pygettext is using a parser we don't have access to the comments anymore but we can circumvent that by
first running the tokenizer and getting all comments first. The comments are then matched to their corresponding
gettext calls in the parser using line numbers.
The new CLI option is
-cTAGand--add-comments=TAGwhich is the same option that is used by both xgettext and babel.It is possible to specify the flag multiple times.
The usual translator comment looks like this:
The comment can also be multi-line:
The comment tag only needs to be on the first line:
So far both xgettext and babel agree, but there are some discrepancies when it comes to edge cases.
For instance, babel extracts both comments from this snippet while xgettext only extracts the second one:
xgettext, on the other hand extracts this comment while babel does not:
There are more cases, but this is the gist. For pygettext, I went for an implementation which is identical to xgettext and babel in 99% cases but is simpler and more predictable for the users in the other 1%. In short, the implementation does not allow any gaps between the comments or between the first comment and the gettext call.
This means that in this snippet, only the second comment is extracted (matches xgettext):
and this comment is not extracted (matches babel):