New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-80010: Expand fromisoformat to include most of ISO-8601 #92177
base: main
Are you sure you want to change the base?
Conversation
|
The following commit authors need to sign the Contributor License Agreement: |
These are stubs to be used for adding hypothesis (https://hypothesis.readthedocs.io/en/latest/) tests to the standard library. When the tests are run in an environment where `hypothesis` and its various dependencies are not installed, the stubs will turn any tests with examples into simple parameterized tests and any tests without examples are skipped.
Rather than attempting to detect where the separator is first, we can take advantage of the fact that it really can only be in one of 3 locations to do the sanitization before any separator detection occurs.
|
I've added an explicit test matrix that doesn't rely on the hypothesis stubs. Feel free to ignore them as part of the review. |
A few comments, I'll have to study the C code more closely.
| # YYYYMMDD (8) | ||
| return 8 | ||
|
|
||
|
|
||
| def _parse_isoformat_date(dtstr): | ||
| # It is assumed that this function will only be called with a | ||
| # string of length exactly 10, and (though this is not used) ASCII-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is out of date. Might be worth commenting on what the three cases are about.
| # This is equivalent to re.search('[+-]', tstr), but faster | ||
| tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1) | ||
| # This is equivalent to re.search('[+-Z]', tstr), but faster | ||
| tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1) | |
| tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1) |
Only one space
|
|
||
| if len(tzstr) not in (5, 8, 15): | ||
| if len(tzstr) in (1, 3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about 0?
| separator_location = _find_isoformat_separator(date_string) | ||
| dstr = date_string[0:separator_location] | ||
| tstr = date_string[(separator_location+1):] | ||
|
|
||
| date_components = _parse_isoformat_date(dstr) | ||
| except ValueError: | ||
| raise ValueError(f'Invalid isoformat string: {date_string!r}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe from None?
| ('2022W52', self.theclass(2022, 12, 26)), | ||
| ('2022-W52', self.theclass(2022, 12, 26)), | ||
| ('2022W527', self.theclass(2023, 1, 1)), | ||
| ('2022-W52-7', self.theclass(2023, 1, 1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test case involving week 53 in a leap and a non-leap year (since that edge case is explicitly handled in the code).
| @@ -680,6 +713,11 @@ set_date_fields(PyDateTime_Date *self, int y, int m, int d) | |||
| * String parsing utilities and helper functions | |||
| */ | |||
|
|
|||
| static unsigned char | |||
| is_digit(const char c) { | |||
| return ((unsigned int)(c - '0')) < 10; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Unicode digits? Or does the spec you're implementing only allow ASCII?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec only allows ASCII. We go beyond the spec in a few places, but the only place where a unicode character can be introduced is the separator, and the separator should never really be a "digit" even if it's a unicode digit.

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

This should cover all of ISO-8601 except for fractional non-second components.
@godlygeek Would you mind taking a look?
Note: Currently the tests are mostly written as hypothesis tests using the stubs from #22863. Before merge we can try to refactor those out into a big matrix of examples, but for now it's useful to know that we have good coverage of the enormous state space here.
#80010