X Tutup
The Wayback Machine - https://web.archive.org/web/20241212153753/https://github.com/python/cpython/issues/54966
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile.write, arcname should be allowed to be a byte string #54966

Open
connexion2000 mannequin opened this issue Dec 22, 2010 · 7 comments
Open

zipfile.write, arcname should be allowed to be a byte string #54966

connexion2000 mannequin opened this issue Dec 22, 2010 · 7 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@connexion2000
Copy link
Mannequin

connexion2000 mannequin commented Dec 22, 2010

BPO 10757
Nosy @loewis, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2010-12-22.12:44:03.540>
labels = ['type-bug', 'library']
title = 'zipfile.write, arcname should be allowed to be a byte string'
updated_at = <Date 2016-01-02.23:23:30.550>
user = 'https://bugs.python.org/connexion2000'

bugs.python.org fields:

activity = <Date 2016-01-02.23:23:30.550>
actor = 'Patrik Dufresne'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2010-12-22.12:44:03.540>
creator = 'connexion2000'
dependencies = []
files = []
hgrepos = []
issue_num = 10757
keywords = []
message_count = 7.0
messages = ['124499', '124518', '124519', '124641', '124686', '124690', '257385']
nosy_count = 6.0
nosy_names = ['loewis', 'aimacintyre', 'r.david.murray', 'connexion2000', 'ozialien', 'Patrik Dufresne']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'test needed'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue10757'
versions = ['Python 3.1', 'Python 3.2']

@connexion2000
Copy link
Mannequin Author

connexion2000 mannequin commented Dec 22, 2010

file = 'somefile.dat'
filename = "ółśąśółąś.dat"
zip = zipfile.ZipFile('archive.zip', 'w', zipfile.ZIP_DEFLATED)
zip.write(file, filename)

above produces very nasty filename in zip archive.


file = 'somefile.dat'
filename = "ółśąśółąś.dat"
zip = zipfile.ZipFile('archive.zip', 'w', zipfile.ZIP_DEFLATED)
zip.write(file, filename.encode('cp852'))

this produces TypeError: expected an object with the buffer interface

Documentation says that:
There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write().

I convert them to byte string but it ends with an error.
If it is documentation bug, what is the proper way to have filenames like "ółśąśółąś" in zip archive?

@connexion2000 connexion2000 mannequin added build The build process and cross-build stdlib Python modules in the Lib dir labels Dec 22, 2010
@loewis
Copy link
Mannequin

loewis mannequin commented Dec 22, 2010

This is not a bug. Your code that produces "very nasty filename" is the right one - the file name is actually the one you asked for. The second code is also behaving correctly: filename already *is* a bytestring, calling .encode for it is meaningless.

@loewis loewis mannequin closed this as completed Dec 22, 2010
@loewis loewis mannequin added the invalid label Dec 22, 2010
@loewis
Copy link
Mannequin

loewis mannequin commented Dec 22, 2010

Oops, I take this back - I didn't notice you were using Python 3.1.

In any case, your first code is correct. What you get is the best you can ask for.

That the second case fails is indeed a bug.

@loewis loewis mannequin reopened this Dec 22, 2010
@loewis loewis mannequin removed the invalid label Dec 22, 2010
@terryjreedy terryjreedy added type-bug An unexpected behavior, bug, or error and removed build The build process and cross-build labels Dec 24, 2010
@bitdancer
Copy link
Member

See also msg79724 of bpo-4871. From looking at the code it appears that the filename must be a string, and if it contains only ASCII characters it is entered as ascii, while if it contains non-ascii it is encoded to utf-8 and the appropriate flag bits set in the archive to indicate this (I know nothing about the archive format, by the way, I'm just looking at the code).

So, in reverse of bpo-4871, it appears that in this case the API should reject bytes input with an appropriate error message.

@loewis
Copy link
Mannequin

loewis mannequin commented Dec 26, 2010

So, in reverse of bpo-4871, it appears that in this case the API should reject bytes input with an appropriate error message.

-1. It is quite common to produce ill-formed zipfiles, and other
ziptools are interpreting them in violation of the format spec.
Python needs to support creation of such broken zipfiles,
even though it may not be able to read them back.

@bitdancer
Copy link
Member

Well, this is the same treat-strings-and-byte-strings-equivalently-in-the-same-API problem that we've had elsewhere. It'll require a bit of refactoring to make it work.

On read zipfile decodes filenames using cp437 if the utf-8 flag isn't set. Logically, then, a binary string should be encoded using cp437. Since cp437 has a character corresponding to each of the 256 bytes, it seems to me it should be enough to decode a binary filename using cp437 and set a flag that _encodeFilenameFlags would respect and re-encode to cp437 instead of utf-8. That might produce unexpected results if someone passes in a binary filename encoded in some other character set, but it would be consistent with how zipfiles work and so should be at least as interoperable as zipfiles normally are.

@bitdancer bitdancer changed the title zipfile.write, arcname should be bytestring zipfile.write, arcname should be allowed to be a byte string Dec 27, 2010
@PatrikDufresne
Copy link
Mannequin

PatrikDufresne mannequin commented Jan 2, 2016

This bug is very old, any development on the subject. This issue is hitting me trying to port my project (rdiffweb) to python3. It received a lot of broken filename with invalid encoding and I need to create a meaningful Zip archive with it. Currently, it just fail because zipfile doesn't accept arcname as bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants
X Tutup