gh-130567: Remove optimistic allocation in locale.strxfrm()#137143
gh-130567: Remove optimistic allocation in locale.strxfrm()#137143encukou merged 3 commits intopython:mainfrom
Conversation
On modern systems, the result of wcsxfrm() is much larger the size of the input string (from 4+2*n on Windows to 4+5*n on Linux for simple ASCII strings), so optimistic allocation of the buffer of the same size never works.
Modules/_localemodule.c
Outdated
| @@ -409,33 +409,23 @@ _locale_strxfrm_impl(PyObject *module, PyObject *str) | |||
| } | |||
|
|
|||
| /* assume no change in size, first */ | |||
There was a problem hiding this comment.
The comment should be updated to match changed code.
|
If this is a bug fix, it needs a NEWS entry. If the bug will be fixed in other way -- it is just cleanup and minor optimization not worth a NEWS entry. |
encukou
left a comment
There was a problem hiding this comment.
This does not fix the bug; macOS raises EINVAL in wcsxfrm(NULL, s, 0) on the Czech and Chinese strings.
So, it's just cleanup and minor optimization.
|
Actually, optimistic allocation works if the locale was not set or set to "C". But why would you use |
|
This PR should fix a crash discussed in #130567 (comment). So this is a bug fix. If we are not going to backport it, we need another PR to fix it. |
|
Let's backport it [edit: to 3.14.1], even if can't reproduce the corruption on my system. |
|
Created a simpler PR #138940 for the fix. |
|
Do you want to update this one? |
…thonGH-137143) On modern systems, the result of wcsxfrm() is much larger the size of the input string (from 4+2*n on Windows to 4+5*n on Linux for simple ASCII strings), so optimistic allocation of the buffer of the same size never works. The exception is if the locale is "C" (or unset), but in that case the `wcsxfrm` call should be fast (and calling `locale.strxfrm()` doesn't make too much sense in the first place).
On modern systems, the result of wcsxfrm() is much larger the size of the input string (from 4+2n on Windows to 4+5n on Linux for simple ASCII strings), so optimistic allocation of the buffer of the same size never works.