-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking 鈥淪ign up for GitHub鈥, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-119609: Add PyUnicode_Export() function #123738
base: main
Are you sure you want to change the base?
Conversation
Add PyUnicode_Export(), PyUnicode_GetBufferFormat() and PyUnicode_Import() functions to the limited C API.
|
When you're done making the requested changes, leave the comment: |
|
I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @mdboom: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A final nitpick on my side (sorry but I only skimmed through the implementation since I don't have much energy now...).
A bit off-topic, but do we use the PRI* macros in the code base? I saw that you used the %i for formatting a uint32_t value, which usually works, but I wondered whether you prefer using the platform-dependent ones.
|
A side effect of this change is to add the I had to implement |
Objects/unicodeobject.c
Outdated
| ucs2); | ||
| ucs2[len] = 0; | ||
|
|
||
| return unicode_export(unicode, view, format, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the cases where there is a conversion, does the buffer need to hold a reference to the unicode object? Shouldn't we instead be passing NULL here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to store unicode as view.obj in all cases. So PyBuffer_Release() is able to find the unicode_releasebuffer() function. Technically, you're right that in UCS2 and UCS4 cases, when we allocate a buffer, we don't need to hold a reference to unicode to make sure that data remains valid.
UCS2 can also copy the buffer.
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Use signed int32_t for the format.
|
@serhiy-storchaka: I updated the PR to use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32(), and address your other comments. |
|
I had to remove the check "last character in a NUL character" in tests, since _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32() don't write such last NUL character. |
That's a security vulnerability waiting to happen. Since the internal buffers do have the terminating NUL, and in most cases we expose those, people will expect the NUL even if we'd explicitly document that it's not guaranteed. IMO, we need to add it. |
@serhiy-storchaka: Sorry, I reverted the "Use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32()" change to get back the NUL trailing character. |
|
I'm not sure if we should guarantee that the exported buffer ends with a NUL character. I'm not sure that all Python implementations will be able to provide such guarantee in an efficient way (without having to allocate a temporary buffer for that). |
|
We should. As long as the API is used from C, exported strings should be NUL-terminated for safety. |
I suggest to continue this discussion at: capi-workgroup/decisions#33 (comment) |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Add PyUnicode_Export(), PyUnicode_GetBufferFormat() and PyUnicode_Import() functions to the limited C API.
馃摎 Documentation preview 馃摎: https://cpython-previews--123738.org.readthedocs.build/