X Tutup
The Wayback Machine - https://web.archive.org/web/20250623125401/https://github.com/python/cpython/pull/1608
Skip to content

bpo-25324: copy tok_name before changing it #1608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 31, 2017

Conversation

albertjan
Copy link
Contributor

Saw this open bug report, and since I was looking at tokenize.py anyway. I figured I address it.

This may catch people off guard though because they may be relying on tok_name containing ENCODING, COMMENT and NL. 🤷‍♂️

@mention-bot
Copy link

@albertjan, thanks for your PR! By analyzing the history of the files in this pull request, we identified @serhiy-storchaka, @1st1 and @tpn to be potential reviewers.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@serhiy-storchaka serhiy-storchaka requested a review from meadori May 16, 2017 16:22
@@ -1,7 +1,7 @@
from test import support
from tokenize import (tokenize, _tokenize, untokenize, NUMBER, NAME, OP,
STRING, ENDMARKER, ENCODING, tok_name, detect_encoding,
open as tokenize_open, Untokenizer)
open as tokenize_open, Untokenizer, tok_name as tokenize_tok_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too long line.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's the right way to fix the issue: see http://bugs.python.org/issue25324 discussion. I also have comments on the change itself, but I will want until we agree on the way to fix the issue before reviewing the change.

@brettcannon
Copy link
Member

Due to a new release of Sphinx, we had to fix the documentation to build on Travis again. Please do a merge to get these changes to help get Travis passing on your PR.

Lib/token.py Outdated
@@ -66,8 +66,11 @@
OP = 53
AWAIT = 54
ASYNC = 55
ERRORTOKEN = 56
N_TOKENS = 57
COMMENT = 56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change ERRORTOKEN value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please copy token.h comment here.

Include/token.h Outdated
@@ -66,8 +66,12 @@ extern "C" {
#define OP 53
#define AWAIT 54
#define ASYNC 55
#define ERRORTOKEN 56
#define N_TOKENS 57
/* These aren't used by the c tokenizer but are needed for tokenize.py */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niypick: replace c with C.

@albertjan
Copy link
Contributor Author

Updated the PR. comments are now copied from token.h to token.py automatically. And I moved ERRORTOKEN back to where it was.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I added a new (last I hope) serie of nitpicking comments :-p

@@ -1417,7 +1417,6 @@ def test_pathological_trailing_whitespace(self):
# See http://bugs.python.org/issue16152
self.assertExactTypeEqual('@ ', token.AT)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not PEP 8 compliant :-)

Lib/token.py Outdated
@@ -104,13 +110,23 @@ def _main():
prog = re.compile(
"#define[ \t][ \t]*([A-Z0-9][A-Z0-9_]*)[ \t][ \t]*([0-9][0-9]*)",
re.IGNORECASE)
comment = re.compile(
"^\s*/\*\s*(.+)\s*\*/\s*$",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string emits:

<stdin>:1: DeprecationWarning: invalid escape sequence \s

I suggest to always use raw strings for regular expressions.

Lib/token.py Outdated
val = int(val)
tokens[val] = name # reverse so we can sort them...
prev_val = int(val)
tokens[prev_val] = {'token': name} # reverse so we can sort them...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use val here, but set prev_val after the tokens assignement. Here we use the current value, not the previous value.

Lib/token.py Outdated
tokens[prev_val] = {'token': name} # reverse so we can sort them...
else:
comment_match = comment.match(line)
if comment_match and prev_val:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"prev_val is not None" to support prev_val == 0 (ENDMARKER = 0).

Lib/token.py Outdated
else:
comment_match = comment.match(line)
if comment_match and prev_val:
val = comment_match.group(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: i suggest to rename the variable "comment".

Lib/token.py Outdated
@@ -128,7 +144,9 @@ def _main():
sys.exit(3)
lines = []
for val in keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I suggest to rename "val" to "key" to be more consistent.

@albertjan
Copy link
Contributor Author

This should address your comments. Thanks for taking the time to review my PR.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will merge the change once tests pass.

@serhiy-storchaka
Copy link
Member

Please wait with merging. I'm finishing my patch for generating token.h from token.py.

@vstinner
Copy link
Member

Please wait with merging. I'm finishing my patch for generating token.h from token.py.

Wait? Do you expect conflicts?

@serhiy-storchaka
Copy link
Member

Yes, conflicts, and maybe this will lead to redesigning both patches.

Misc/NEWS Outdated
@@ -10,6 +10,10 @@ What's New in Python 3.7.0 alpha 1?
Core and Builtins
-----------------

- bpo-25324: Tokens needed for parsing in python moved to C. ``COMMENT``,
``NL`` AND ``ENCODING``. This way the tokens and tok_names in token.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and: lower case

in token.py: in the token module

import tokenize.py: import the tokenize module.

@albertjan
Copy link
Contributor Author

Should I do a merge or a rebase to resolve the conflicts?

@vstinner
Copy link
Member

Should I do a merge or a rebase to resolve the conflicts?

As you want.

Misc/NEWS Outdated
@@ -10,6 +10,10 @@ What's New in Python 3.7.0 alpha 1?
Core and Builtins
-----------------

- bpo-25324: Tokens needed for parsing in python moved to C. ``COMMENT``,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python: title case.

N_TOKENS
NT_OFFSET

.. versionchanged:: 3.5
Added :data:`AWAIT` and :data:`ASYNC` tokens. Starting with
Python 3.7, "async" and "await" will be tokenized as :data:`NAME`
tokens, and :data:`AWAIT` and :data:`ASYNC` will be removed.

.. versionchanged:: 3.7
Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING`. To bring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a period is redundant here?

.. versionchanged:: 3.7
Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING`. To bring
the tokens in the C code in line with the tokens needed in
tokenize.py. These tokens aren't used by the C tokenizer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenize.py -> the :mod:`tokenize` module

@vstinner vstinner dismissed serhiy-storchaka’s stale review May 31, 2017 12:26

NEWS and doc was fixed.

@vstinner vstinner merged commit fc354f0 into python:master May 31, 2017
@vstinner
Copy link
Member

LGTM. Thanks @albertjan!

@ammaraskar
Copy link
Member

Nevermind @albertjan, looks like the issue I described here was caused by a merge issue. Thanks anyway :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
X Tutup