bpo-29803: remove some redandunt ops in unicodeobject.c#660
bpo-29803: remove some redandunt ops in unicodeobject.c#660zhangyangyu merged 4 commits intopython:masterfrom
Conversation
Objects/unicodeobject.c
Outdated
There was a problem hiding this comment.
The necessary widen has been done in unicode_decode_call_errorhandler_writer. So I think this is not a must.
There was a problem hiding this comment.
And how about this? How could it lead to crash?
There was a problem hiding this comment.
This is even more important. Since WRITE_CHAR doesn't check the size of the output buffer, we need to allocate the space for writer.min_length = end - s + writer.pos characters past the last written character.
There was a problem hiding this comment.
Of course. But isn't the widen done in unicode_decode_call_errorhandler_writer? When the error handler generates only one character, we don't need more space since we have already got enough space. But when the error handler generates more, unicode_decode_call_errorhandler_writer allocates the spaces for you. You did the change to unicode_decode_call_errorhandler_writer to avoid crash.
There was a problem hiding this comment.
I don't remove all details, but when I wrote this code the call of _PyUnicodeWriter_Prepare() was needed.
Maybe something was changed since that time. I will examine thу code in detail some time later. But now I have no confidence that the removal of this call is safe.
There was a problem hiding this comment.
Just add assert(writer.min_length <= writer.size - writer.pos) and see how Python crashes when run tests.
There was a problem hiding this comment.
I cannot understand. :-( writer.min_length means the least needed space here and it means the least space _PyUnicodeWriter will allocate for you. And writer.size is the actually allocated size. So shouldn't the right assertion here is just assert(writer.min_length <= writer.size). If you minus writer.pos, it means the left space, then should use assert(end - s <= writer.size - writer.pos).
There was a problem hiding this comment.
You are right. The right assert is assert(end - s <= writer.size - writer.pos). Seems _PyUnicodeWriter_Prepare() is incorrectly used here and in unicode_decode_call_errorhandler_writer(). And min_length may be inconsistently used in different decoders.
There was a problem hiding this comment.
I just think it's not necessary but not an error.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
There is one legitimate fix of copy-paste error, one questionable change and few incorrect removes.
| } | ||
|
|
||
| if (PyUnicode_Check(path)) { | ||
| if (PyUnicode_READY(path) == -1) { |
There was a problem hiding this comment.
It's not an error but not a must. After the if ... else ... we get a code path to ready it.
|
|
||
| error: | ||
| endinpos = s-starts; | ||
| writer.min_length = end - s + writer.pos; |
There was a problem hiding this comment.
This code is needed. See also comments on Rietveld: https://bugs.python.org/review/16334/diff/17685/Objects/unicodeobject.c.
There was a problem hiding this comment.
Ohh, it's discussed. I know this may avoid unnecessary reallocation. But honestly I doubt how useful it could be.
| *p++ = (char) ch; | ||
| } | ||
| /* U+0000-U+00ff range: Map 16-bit characters to '\uHHHH' */ | ||
| /* U+0100-U+ffff range: Map 16-bit characters to '\uHHHH' */ |
| } | ||
|
|
||
| if (PyUnicode_Check(path)) { | ||
| if (PyUnicode_READY(path) == -1) { |
Objects/unicodeobject.c
Outdated
There was a problem hiding this comment.
This is even more important. Since WRITE_CHAR doesn't check the size of the output buffer, we need to allocate the space for writer.min_length = end - s + writer.pos characters past the last written character.
vstinner
left a comment
There was a problem hiding this comment.
You removed code to update writer.min_length and to call _PyUnicodeWriter_Prepare(). I don't think that this change is correct. I wrote this code long time ago, and I don't recall the rationale, and the code was carefully written for best performances, and also for correctness. If the buffer is too small, you create a buffer overflow...
|
I admit updating writer.min_length has its effect. It's something I am gonna restore. But I still don't understand why buffer overflow could happen. I'll wait for Serhiy's comment as an answer. |
request another round
|
I dismissed your reviews to request another round. No offense. |
|
@serhiy-storchaka , does this look correct now? Or I still mix things up? |
|
Created separate #5636 for incorrect use of _PyUnicodeWriter_Prepare(). After merging that PR and merging this PR with master the rest of it LGTM. |
|
@zhangyangyu: Please replace |
|
Thanks @serhiy-storchaka ! I was going to rebase it after getting home. :-) |


https://bugs.python.org/issue29803