bpo-47040: improve document of checksum functions#31955
Conversation
Since CPython 3.0.0, the checksums are always truncated to `unsigned int`.
gpshead
left a comment
There was a problem hiding this comment.
mostly: keep the versionchanged, just make the suggested minor edit to the text.
Lib/test/test_binascii.py
Outdated
| self.assertRaises(TypeError, binascii.crc32) | ||
|
|
||
| def test_random_crc32(self): | ||
| dat = random.randbytes(1234) |
There was a problem hiding this comment.
Tests that run on random data have a good chance of being flakey when a problem exists rather than reliably reproducing the problem. The specific data or the seed used to generate it needs to be recorded and present in test output to meaningfully debug the problem.
The flakiness aspect can be improved by using a lot of runs over different random data (statistically likely to always trigger the problem being tested for on at least one random input) or by using a smaller set of strategically chosen inputs with desirable output values to test against.
For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.
There was a problem hiding this comment.
For crc32 I suggest just chosing a few strategic inputs that generate low and high crc32 values that might have illustrated this issue and need for the & mask in Python 2.
I ran an alder32()/crc32() test code, tested nearly 2 billion random 1KiB strings, no one result is greater than UINT32_MAX, although the type of return value is uLong (unsigned long).
There was a problem hiding this comment.
because >UINT32_MAX is impossible. this is a 32bit value.
it isn't clear what the purpose of this test is. what does it reliably prevent from happening?
it will pass because everything is correct today. I'm more interested in when and why would it fail and does it do so is a consistent reliable reproducable manner that can be debugged from the test failure.
the thing I had assumed this test was testing was that a negative value is never returned, as would happen half the time in 32-bit python 2. effectively a regression test for an old situation that py3's implementation can never trigger. if you want it to be a regression test, it should use values that would've triggered the regression: Ensure it includes a hash in the range 0x8000_0000 - 0xffff_ffff. those are what would've been negative long ago. So rather than random data, I'd pick a known input that has a crc in that range to cover that case.
if you want a test that guarantees a crc will never be greater than 0xffff_ffff, there is no input you can give a crc or adler 32 algorithm that will generate such a thing.
For that to happen the underlying implementation itself would need to be fundamentally flawed. If that is what you intend this test to prevent, it could do so; but not reliably by using a single random input. so it doesn't seem worth having.
There was a problem hiding this comment.
I remove the unit-tests, since no code actually changed.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
I modified the NEWS file: -Internal cleanup to :func:`zlib.crc32` / :func:`binascii.crc32` to not use
-an intermediate signed value. No functional change. Clarified the old Python
-versions compatiblity note in the docstrings.
+Clarified the old Python versions compatiblity note of :func:`binascii.crc32` /
+:func:`zlib.adler32` / :func:`zlib.crc32` functions.
This is an internal change, not visiable to users, so I removed it. Now this PR only modifies the doc and polishes the code slightly. |
|
This PR polished the code. |
…H-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org>
) (pythonGH-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
…H-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
) (pythonGH-32002) Clarifies a versionchanged note on crc32 & adler32 docs that the workaround is only needed for Python 2 and earlier. Also cleans up an unnecessary intermediate variable in the implementation. Authored-By: Ma Lin / animalize Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit 6d290d5) Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
Since CPython 3.0.0, the checksums are always truncated to
unsigned int:zlib.adler32(): https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930zlib.crc32(): https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950binascii.crc32(): https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035Also polish the code.
https://bugs.python.org/issue47040