gh-80010: Expand fromisoformat to include most of ISO-8601 #92177

pganssle · 2022-05-02T18:53:01Z

This should cover all of ISO-8601 except for fractional non-second components.

@godlygeek Would you mind taking a look?

Note: Currently the tests are mostly written as hypothesis tests using the stubs from #22863. Before merge we can try to refactor those out into a big matrix of examples, but for now it's useful to know that we have good coverage of the enormous state space here.

#80010

cpython-cla-bot · 2022-05-02T18:53:03Z

The following commit authors need to sign the Contributor License Agreement:

paul@ganssle.io

Click the button to sign:

These are stubs to be used for adding hypothesis (https://hypothesis.readthedocs.io/en/latest/) tests to the standard library. When the tests are run in an environment where `hypothesis` and its various dependencies are not installed, the stubs will turn any tests with examples into simple parameterized tests and any tests without examples are skipped.

Rather than attempting to detect where the separator is first, we can take advantage of the fact that it really can only be in one of 3 locations to do the sanitization before any separator detection occurs.

pganssle · 2022-05-02T23:13:10Z

I've added an explicit test matrix that doesn't rely on the hypothesis stubs. Feel free to ignore them as part of the review.

JelleZijlstra

A few comments, I'll have to study the C code more closely.

JelleZijlstra · 2022-05-02T23:40:17Z

Lib/datetime.py

+            # YYYYMMDD (8)
+            return 8
+
+
 def _parse_isoformat_date(dtstr):
    # It is assumed that this function will only be called with a
    # string of length exactly 10, and (though this is not used) ASCII-only


Comment is out of date. Might be worth commenting on what the three cases are about.

JelleZijlstra · 2022-05-02T23:40:17Z

Lib/datetime.py

-    # This is equivalent to re.search('[+-]', tstr), but faster
-    tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1)
+    # This is equivalent to re.search('[+-Z]', tstr), but faster
+    tz_pos = (tstr.find('-') + 1  or tstr.find('+') + 1 or tstr.find('Z') + 1)


Suggested change

tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1)

tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1)

Only one space

JelleZijlstra · 2022-05-02T23:40:17Z

Lib/datetime.py


-        if len(tzstr) not in (5, 8, 15):
+        if len(tzstr) in (1, 3):


What about 0?

JelleZijlstra · 2022-05-02T23:40:17Z

Lib/datetime.py

+            separator_location = _find_isoformat_separator(date_string)
+            dstr = date_string[0:separator_location]
+            tstr = date_string[(separator_location+1):]
+
            date_components = _parse_isoformat_date(dstr)
        except ValueError:
            raise ValueError(f'Invalid isoformat string: {date_string!r}')


Maybe from None?

JelleZijlstra · 2022-05-02T23:40:17Z

Lib/test/datetimetester.py

+            ('2022W52', self.theclass(2022, 12, 26)),
+            ('2022-W52', self.theclass(2022, 12, 26)),
+            ('2022W527', self.theclass(2023, 1, 1)),
+            ('2022-W52-7', self.theclass(2023, 1, 1)),


Add a test case involving week 53 in a leap and a non-leap year (since that edge case is explicitly handled in the code).

JelleZijlstra · 2022-05-02T23:40:17Z

Modules/_datetimemodule.c

@@ -680,6 +713,11 @@ set_date_fields(PyDateTime_Date *self, int y, int m, int d)
 * String parsing utilities and helper functions
 */

+static unsigned char
+is_digit(const char c) {
+    return ((unsigned int)(c - '0')) < 10;


What about Unicode digits? Or does the spec you're implementing only allow ASCII?

The spec only allows ASCII. We go beyond the spec in a few places, but the only place where a unicode character can be introduced is the separator, and the separator should never really be a "digit" even if it's a unicode digit.

isoformatter.py

pganssle requested a review from abalkin as a code owner May 2, 2022

bedevere-bot added the awaiting core review label May 2, 2022

pganssle and others added 20 commits May 2, 2022

Add timezones to hypothestubs

7c96523

TEMP: Add isoformatter test

7687280

Add support for YYYYMMDD

f79f3e8

Expand support for ISO 8601 times

00d3e34

Add support for ISO calendar-style strings

9d613bb

Rework how string sanitization works

778f17f

Rather than attempting to detect where the separator is first, we can take advantage of the fact that it really can only be in one of 3 locations to do the sanitization before any separator detection occurs.

WIP

b6f4337

Move Isoformatter into test helper, add date/time tests

cb05bf4

Final location for isoformatter and strategies

561a36d

Working version of date.isoformat

d5a74c2

Fix failure to set an error

c0e93b7

First version with time parsing allowed

ca73300

Add support for leading T in time formatters

82c840f

Fix pure python separator detection in YYYYWwwd

d837bf4

Version with all tests passing

acd88d5

Migrate fromisoformat tests to their own file

a9c351c

Fix bug in time parsing logic

fdcd37f

s/ssize_t/size_t

b1dddd9

Add fromisoformat example tests

c79e33f

pganssle force-pushed the expand_fromisoformat branch from 55ee790 to c79e33f Compare May 2, 2022

Try to be consistent about use of double quotes in error messages

a0530e6

JelleZijlstra reviewed May 2, 2022

View changes

pganssle added 2 commits May 3, 2022

Update documentation

1da2662

Remove isoformatter

bd7f7bc

Apr	MAY	Jun
	03
2021	2022	2023

python / cpython Public

gh-80010: Expand fromisoformat to include most of ISO-8601 #92177

gh-80010: Expand fromisoformat to include most of ISO-8601 #92177

pganssle commented May 2, 2022 •

edited by JelleZijlstra

cpython-cla-bot bot commented May 2, 2022 •

edited

pganssle commented May 2, 2022

JelleZijlstra left a comment

JelleZijlstra May 2, 2022

JelleZijlstra May 2, 2022

JelleZijlstra May 2, 2022

JelleZijlstra May 2, 2022

JelleZijlstra May 2, 2022

JelleZijlstra May 2, 2022

pganssle May 3, 2022

	tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1)
	tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1)

python / cpython Public

gh-80010: Expand fromisoformat to include most of ISO-8601 #92177

Are you sure you want to change the base?

gh-80010: Expand fromisoformat to include most of ISO-8601 #92177

Conversation

pganssle commented May 2, 2022 • edited by JelleZijlstra

cpython-cla-bot bot commented May 2, 2022 • edited

pganssle commented May 2, 2022

JelleZijlstra left a comment

JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

pganssle May 3, 2022

Choose a reason for hiding this comment

pganssle commented May 2, 2022 •

edited by JelleZijlstra

cpython-cla-bot bot commented May 2, 2022 •

edited