X Tutup
The Wayback Machine - https://web.archive.org/web/20220503000709/https://github.com/python/cpython/pull/92177
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-80010: Expand fromisoformat to include most of ISO-8601 #92177

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Copy link
Member

@pganssle pganssle commented May 2, 2022

This should cover all of ISO-8601 except for fractional non-second components.

@godlygeek Would you mind taking a look?

Note: Currently the tests are mostly written as hypothesis tests using the stubs from #22863. Before merge we can try to refactor those out into a big matrix of examples, but for now it's useful to know that we have good coverage of the enormous state space here.

#80010

@pganssle pganssle requested a review from abalkin as a code owner May 2, 2022
@cpython-cla-bot
Copy link

@cpython-cla-bot cpython-cla-bot bot commented May 2, 2022

The following commit authors need to sign the Contributor License Agreement:

Click the button to sign:
CLA not signed

@pganssle
Copy link
Author

@pganssle pganssle commented May 2, 2022

I've added an explicit test matrix that doesn't rely on the hypothesis stubs. Feel free to ignore them as part of the review.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

A few comments, I'll have to study the C code more closely.

# YYYYMMDD (8)
return 8


def _parse_isoformat_date(dtstr):
# It is assumed that this function will only be called with a
# string of length exactly 10, and (though this is not used) ASCII-only
Copy link
Member

@JelleZijlstra JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is out of date. Might be worth commenting on what the three cases are about.

# This is equivalent to re.search('[+-]', tstr), but faster
tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1)
# This is equivalent to re.search('[+-Z]', tstr), but faster
tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1)
Copy link
Member

@JelleZijlstra JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1)
tz_pos = (tstr.find('-') + 1 or tstr.find('+') + 1 or tstr.find('Z') + 1)

Only one space


if len(tzstr) not in (5, 8, 15):
if len(tzstr) in (1, 3):
Copy link
Member

@JelleZijlstra JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about 0?

separator_location = _find_isoformat_separator(date_string)
dstr = date_string[0:separator_location]
tstr = date_string[(separator_location+1):]

date_components = _parse_isoformat_date(dstr)
except ValueError:
raise ValueError(f'Invalid isoformat string: {date_string!r}')
Copy link
Member

@JelleZijlstra JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe from None?

('2022W52', self.theclass(2022, 12, 26)),
('2022-W52', self.theclass(2022, 12, 26)),
('2022W527', self.theclass(2023, 1, 1)),
('2022-W52-7', self.theclass(2023, 1, 1)),
Copy link
Member

@JelleZijlstra JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case involving week 53 in a leap and a non-leap year (since that edge case is explicitly handled in the code).

@@ -680,6 +713,11 @@ set_date_fields(PyDateTime_Date *self, int y, int m, int d)
* String parsing utilities and helper functions
*/

static unsigned char
is_digit(const char c) {
return ((unsigned int)(c - '0')) < 10;
Copy link
Member

@JelleZijlstra JelleZijlstra May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Unicode digits? Or does the spec you're implementing only allow ASCII?

Copy link
Member Author

@pganssle pganssle May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec only allows ASCII. We go beyond the spec in a few places, but the only place where a unicode character can be introduced is the separator, and the separator should never really be a "digit" even if it's a unicode digit.

isoformatter.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants
X Tutup