bpo-25324: copy tok_name before changing it #1608

albertjan · 2017-05-16T15:44:04Z

Saw this open bug report, and since I was looking at tokenize.py anyway. I figured I address it.

This may catch people off guard though because they may be relying on tok_name containing ENCODING, COMMENT and NL. 🤷‍♂️

mention-bot · 2017-05-16T15:44:08Z

@albertjan, thanks for your PR! By analyzing the history of the files in this pull request, we identified @serhiy-storchaka, @1st1 and @tpn to be potential reviewers.

the-knights-who-say-ni · 2017-05-16T15:44:09Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

serhiy-storchaka · 2017-05-16T16:15:42Z

Lib/test/test_tokenize.py

@@ -1,7 +1,7 @@
 from test import support
 from tokenize import (tokenize, _tokenize, untokenize, NUMBER, NAME, OP,
                     STRING, ENDMARKER, ENCODING, tok_name, detect_encoding,
-                     open as tokenize_open, Untokenizer)
+                     open as tokenize_open, Untokenizer, tok_name as tokenize_tok_name)


Too long line.

vstinner

I'm not sure that it's the right way to fix the issue: see http://bugs.python.org/issue25324 discussion. I also have comments on the change itself, but I will want until we agree on the way to fix the issue before reviewing the change.

brettcannon · 2017-05-16T21:44:44Z

Due to a new release of Sphinx, we had to fix the documentation to build on Travis again. Please do a merge to get these changes to help get Travis passing on your PR.

vstinner · 2017-05-25T11:07:26Z

Lib/token.py

@@ -66,8 +66,11 @@
 OP = 53
 AWAIT = 54
 ASYNC = 55
-ERRORTOKEN = 56
-N_TOKENS = 57
+COMMENT = 56


Please don't change ERRORTOKEN value.

Please copy token.h comment here.

vstinner · 2017-05-25T11:08:10Z

Include/token.h

@@ -66,8 +66,12 @@ extern "C" {
 #define OP		53
 #define AWAIT		54
 #define ASYNC		55
-#define ERRORTOKEN	56
-#define N_TOKENS	57
+/* These aren't used by the c tokenizer but are needed for tokenize.py */


Niypick: replace c with C.

albertjan · 2017-05-30T09:28:53Z

Updated the PR. comments are now copied from token.h to token.py automatically. And I moved ERRORTOKEN back to where it was.

vstinner

LGTM, but I added a new (last I hope) serie of nitpicking comments :-p

vstinner · 2017-05-30T10:53:50Z

Lib/test/test_tokenize.py

@@ -1417,7 +1417,6 @@ def test_pathological_trailing_whitespace(self):
        # See http://bugs.python.org/issue16152
        self.assertExactTypeEqual('@          ', token.AT)

-


This change is not PEP 8 compliant :-)

vstinner · 2017-05-30T10:56:12Z

Lib/token.py

@@ -104,13 +110,23 @@ def _main():
    prog = re.compile(
        "#define[ \t][ \t]*([A-Z0-9][A-Z0-9_]*)[ \t][ \t]*([0-9][0-9]*)",
        re.IGNORECASE)
+    comment = re.compile(
+        "^\s*/\*\s*(.+)\s*\*/\s*$",


This string emits:

<stdin>:1: DeprecationWarning: invalid escape sequence \s

I suggest to always use raw strings for regular expressions.

vstinner · 2017-05-30T10:57:34Z

Lib/token.py

-            val = int(val)
-            tokens[val] = name          # reverse so we can sort them...
+            prev_val = int(val)
+            tokens[prev_val] = {'token': name}          # reverse so we can sort them...


I would prefer to use val here, but set prev_val after the tokens assignement. Here we use the current value, not the previous value.

vstinner · 2017-05-30T10:58:12Z

Lib/token.py

+            tokens[prev_val] = {'token': name}          # reverse so we can sort them...
+        else:
+            comment_match = comment.match(line)
+            if comment_match and prev_val:


"prev_val is not None" to support prev_val == 0 (ENDMARKER = 0).

vstinner · 2017-05-30T10:58:40Z

Lib/token.py

+        else:
+            comment_match = comment.match(line)
+            if comment_match and prev_val:
+                val = comment_match.group(1)


nitpick: i suggest to rename the variable "comment".

vstinner · 2017-05-30T10:59:14Z

Lib/token.py

@@ -128,7 +144,9 @@ def _main():
        sys.exit(3)
    lines = []
    for val in keys:


nitpick: I suggest to rename "val" to "key" to be more consistent.

albertjan · 2017-05-30T11:48:04Z

This should address your comments. Thanks for taking the time to review my PR.

vstinner

LGTM. I will merge the change once tests pass.

serhiy-storchaka · 2017-05-30T12:08:35Z

Please wait with merging. I'm finishing my patch for generating token.h from token.py.

vstinner · 2017-05-30T12:09:49Z

Please wait with merging. I'm finishing my patch for generating token.h from token.py.

Wait? Do you expect conflicts?

serhiy-storchaka · 2017-05-30T12:14:46Z

Yes, conflicts, and maybe this will lead to redesigning both patches.

vstinner · 2017-05-31T08:24:10Z

Misc/NEWS

@@ -10,6 +10,10 @@ What's New in Python 3.7.0 alpha 1?
 Core and Builtins
 -----------------

+- bpo-25324: Tokens needed for parsing in python moved to C. ``COMMENT``,
+  ``NL`` AND ``ENCODING``. This way the tokens and tok_names in token.py


and: lower case

in token.py: in the token module

import tokenize.py: import the tokenize module.

albertjan · 2017-05-31T08:42:07Z

Should I do a merge or a rebase to resolve the conflicts?

vstinner · 2017-05-31T08:55:02Z

Should I do a merge or a rebase to resolve the conflicts?

As you want.

serhiy-storchaka · 2017-05-31T10:09:03Z

Misc/NEWS

@@ -10,6 +10,10 @@ What's New in Python 3.7.0 alpha 1?
 Core and Builtins
 -----------------

+- bpo-25324: Tokens needed for parsing in python moved to C. ``COMMENT``,


Python: title case.

serhiy-storchaka · 2017-05-31T10:11:04Z

Doc/library/token.rst

          N_TOKENS
          NT_OFFSET

   .. versionchanged:: 3.5
      Added :data:`AWAIT` and :data:`ASYNC` tokens. Starting with
      Python 3.7, "async" and "await" will be tokenized as :data:`NAME`
      tokens, and :data:`AWAIT` and :data:`ASYNC` will be removed.
+
+   .. versionchanged:: 3.7
+      Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING`. To bring


Isn't a period is redundant here?

serhiy-storchaka · 2017-05-31T10:16:06Z

Doc/library/token.rst

+   .. versionchanged:: 3.7
+      Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING`. To bring
+      the tokens in the C code in line with the tokens needed in
+      tokenize.py. These tokens aren't used by the C tokenizer.


tokenize.py -> the :mod:`tokenize` module

NEWS and doc was fixed.

vstinner · 2017-05-31T14:00:41Z

LGTM. Thanks @albertjan!

ammaraskar · 2017-06-24T22:11:05Z

Nevermind @albertjan, looks like the issue I described here was caused by a merge issue. Thanks anyway :)

the-knights-who-say-ni added the CLA not signed label May 16, 2017

serhiy-storchaka requested a review from meadori May 16, 2017 16:22

serhiy-storchaka reviewed May 16, 2017

View reviewed changes

vstinner requested changes May 16, 2017

View reviewed changes

the-knights-who-say-ni added CLA signed and removed CLA not signed labels May 24, 2017

vstinner requested changes May 25, 2017

View reviewed changes

vstinner reviewed May 30, 2017

View reviewed changes

vstinner approved these changes May 30, 2017

View reviewed changes

vstinner reviewed May 31, 2017

View reviewed changes

albertjan added 10 commits May 31, 2017 10:01

add test to check if were modifying token

44c1b9e

copy list so import tokenize doesnt have side effects on token

85ab9d6

shorten line

ac27a69

add tokenize tokens to token.h to get them to show up in token

607dde4

move ERRORTOKEN back to its previous location, and fix nitpick

2c89203

copy comments from token.h automatically

4593e1d

fix whitespace and make more pythonic

1c26215

change to fix comments from @Haypo

7759818

update token.rst and Misc/NEWS

6da5c14

change wording

e7113fa

albertjan force-pushed the fix-issue-25324 branch from f02a84e to e7113fa Compare May 31, 2017 09:02

serhiy-storchaka previously requested changes May 31, 2017

View reviewed changes

some more wording changes

e483bf6

vstinner merged commit fc354f0 into python:master May 31, 2017

ammaraskar mentioned this pull request Jun 24, 2017

bpo-29812: Improve testing of token and tokenize #681

Closed

May	JUN	Jul
	23
2024	2025	2026

		@@ -1417,7 +1417,6 @@ def test_pathological_trailing_whitespace(self):
		# See http://bugs.python.org/issue16152
		self.assertExactTypeEqual('@ ', token.AT)

Uh oh!

bpo-25324: copy tok_name before changing it #1608

bpo-25324: copy tok_name before changing it #1608

Uh oh!

Conversation

albertjan commented May 16, 2017

Uh oh!

mention-bot commented May 16, 2017

Uh oh!

the-knights-who-say-ni commented May 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

brettcannon commented May 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertjan commented May 30, 2017

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertjan commented May 30, 2017

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented May 30, 2017

Uh oh!

vstinner commented May 30, 2017

Uh oh!

serhiy-storchaka commented May 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertjan commented May 31, 2017

Uh oh!

vstinner commented May 31, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner commented May 31, 2017

Uh oh!

ammaraskar commented Jun 24, 2017

Uh oh!

Uh oh!