X Tutup
The Wayback Machine - https://web.archive.org/web/20220927165840/https://github.com/python/cpython/issues/73974
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security] CVE-2007-4559: tarfile: Add absolute_path option to tarfile, disabled by default #73974

Open
vstinner opened this issue Mar 10, 2017 · 11 comments
Labels
3.11 stdlib Python modules in the Lib dir type-security A security issue

Comments

@vstinner
Copy link
Member

vstinner commented Mar 10, 2017

BPO 29788
Nosy @gustaebel, @vstinner, @jwilk, @berkerpeksag, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-03-10.16:13:44.639>
labels = ['type-security', 'library', '3.11']
title = '[Security] tarfile: Add absolute_path option to tarfile, disabled by default'
updated_at = <Date 2021-05-21.23:25:06.720>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2021-05-21.23:25:06.720>
actor = 'ned.deily'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2017-03-10.16:13:44.639>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 29788
keywords = []
message_count = 2.0
messages = ['289388', '289437']
nosy_count = 5.0
nosy_names = ['lars.gustaebel', 'vstinner', 'jwilk', 'berker.peksag', 'martin.panter']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue29788'
versions = ['Python 3.11']

@vstinner
Copy link
Member Author

vstinner commented Mar 10, 2017

I noticed that "python3 -m tarfile -x archive.tar" uses absolute paths by default, whereas the UNIX tar command doesn't by default. The UNIX tar command requires to add explicitly --absolute-paths (-P) option.

I suggest to add a boolean absolute_path option to tarfile, disabled by default.

Example to create such archive. See that tar also removes "/" by default and requires to pass explicitly -P:

$ cd $HOME
# /home/haypo
$ echo TEST > test
$ tar -cf test.tar /home/haypo/test
tar: Removing leading `/' from member names

$ rm -f test.tar
$ tar -P -cf test.tar /home/haypo/test
$ rm -f test

Extracting such archive using tar is safe *by default*:

$ mkdir z
$ cd z
$ tar -xf ~/test.tar
tar: Removing leading `/' from member names
$ find
.
./home
./home/haypo
./home/haypo/test

Extracting such archive using Python is unsafe:

$ python3 -m tarfile -e ~/test.tar
$ cat ~/test
TEST
$ pwd
/home/haypo/z

Python creates files outside the current directory which is unsafe, wheras tar doesn't.

@vstinner vstinner added 3.7 type-security A security issue stdlib Python modules in the Lib dir labels Mar 10, 2017
@vstinner vstinner changed the title Add absolute_path option to tarfile, disabled by default tarfile: Add absolute_path option to tarfile, disabled by default Mar 10, 2017
@vadmium
Copy link
Member

vadmium commented Mar 11, 2017

The CLI was added in bpo-13477. I didn’t see any discussion of traversal attacks there, so maybe it was overlooked. Perhaps there should also be a warning, like with the Tarfile.extract and “extractall” methods.

However I did see one of the goals was to keep the CLI simple, which I agree with. I would suggest that the CLI get this proposed behaviour by default (matching the default behaviour of modern “tar” commands), with no option to restore the current less-robust behaviour.

To implement it, I suggest to fix the module internals first: bpo-21109 and/or bpo-17102.

FWIW BSD calls the option “--absolute-paths” (plural paths) <https://www.freebsd.org/cgi/man.cgi?tar%281%29#OPTIONS\>, while Gnu calls it “--absolute-names” <https://www.gnu.org/software/tar/manual/html_chapter/tar_6.html#SEC121\>. Both these options disable other checks, such as for parent directories (..) and external symbolic link targets, so I think the term “absolute” is too specific. But please use at least replace the underscore with a dash or hyphen: “--absolute-path”, not “--absolute_path”.

@vstinner vstinner changed the title tarfile: Add absolute_path option to tarfile, disabled by default [Security] tarfile: Add absolute_path option to tarfile, disabled by default May 29, 2018
@ned-deily ned-deily added 3.11 and removed 3.7 labels May 21, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@vstinner vstinner changed the title [Security] tarfile: Add absolute_path option to tarfile, disabled by default [Security] CVE-2007-4559: tarfile: Add absolute_path option to tarfile, disabled by default Sep 23, 2022
@jowagner
Copy link

jowagner commented Sep 23, 2022

Questions:

  1. Can we use https://bugs.python.org/file8339/insecure_pathnames.diff (found in issue tarfile insecure pathname extraction #45385) as a basis for a PR? How do we give credit if @gustaebel is not the contributor but just another user with the same surname?
  2. Can we use the abspath + commonprefix approach from https://gist.github.com/Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e?
  3. Changing behaviour can break existing software, e.g. restoring a backup that uses absolute paths. How much notice should be given in this case? On the one hand, the issue should be fixed soon as it impacts security. On the other hand, what are a few years more after 15 years? What is the process for ensuring the defaults are changed for a specific target version or year?
  4. absolute-path is not a good name if we also test for .. path components.
  5. In the safe mode, should we (a) bail out as soon as an absolute path is encountered or (b) only if the absolute path brings us outside the hierarchy of the target directory (the current working directory by default), or (c) only if the effective target directory is not '/', or (d) only if '/' is not explicitly specified as the target directory (parameter path of the extract functions)? Consider '/etc/passwd' in a tar file. Should this file allowed to be written in safe mode when inside '/etc'? Should it be ok if running from the top-level '/' root folder? Should it only work if path='/' is set when calling an extract function of the tarfile module? The latter is something we could ask developers of restore utilities to do to continue being able to restore backups produced with absolute paths (and this would also be compatible with future backups stripping the leading /).
  6. The patch in issue tarfile insecure pathname extraction #45385 suggests a parameter check_path. While this naming does not make explicit what is being checked, it can cover all future security issues arising from tarfile member names. Or would it be better to have a separate parameter for each check?
  7. Should the idea to allow the user to provide a function with check_path that returns True/False for whether to skip the entry (if an exception is desired the function can raise it - no need to define a return value for that) or even allows to change the path (returning a string) be implemented in the same PR or should this be a separate PR?
  8. Could a specially crafted tar file cause the tarfile module to create a symlink to a directory outside the target tree, e.g. example --> / and then contain a member example/etc/passwd that overwrites /etc/passwd? To protect against this, one could call realpath() but this would also evaluate pre-existing symlinks. Users may expect the old behaviour e.g. when receiving updates via tar files for multiple folders and one of the folders has been moved to a different filesystem on the receiving system. To satisfy the expectation of these users and being safe, we need a version of realpath() that does not consult the actual filesystem but only the folders and symlinks found in the tar file so far. [Edit: Yes, see https://github.com/tarfile: Traversal attack vulnerability #65308#issuecomment-1093650963]

@alittlesir
Copy link

alittlesir commented Sep 24, 2022

Which versions are affected by this CVE?

@nanonyme
Copy link

nanonyme commented Sep 24, 2022

To be honest, whenever path is non-empty, I would expect it to be regarded "safe" to the extent that any final path must be contained inside the path user gives. So essentially passing relative paths would be fine as long as it does not escape the directory given by the user of the API. So eg I give path "/", then tarfile is allowed to unpack anywhere. I give path "", then there is no checking whatsoever (for backwards compat reasons). When I give path "something-else", then trying to unpack outside "something-else" raises an exception.

@nanonyme
Copy link

nanonyme commented Sep 24, 2022

Another possibility would be raising audit event for the path names so you can write an audit hook that prevents unpacking unsafe paths. This implies no API changes.

@nanonyme
Copy link

nanonyme commented Sep 24, 2022

@alittlesir everything everywhere. Documentation says you must check paths to be safe before unpacking.

@rooterkyberian
Copy link

rooterkyberian commented Sep 26, 2022

As a commercial python developer I would greatly appreciate this to become part of CPython.

So while "technically correct" python implementation was deemed "non-security issue" (#45385), the fact that it resoluted in thousands of vulnerable projects seems to indicate that in retrospect, it is an issue. Despite documentation warning. Like great, we have documentation which says to watch out, but here we are. People are not reading it. And all that for "being technically correct" and in support of a feature that hardly every anyone wants or needs.

Alternative non-breaking (and I really wonder which projects would be broken here) would be to have a subclass that does the protecting by default, in similar fashion as tarsafe project (https://github.com/beatsbears/tarsafe , but with different implementation, as the one there seems quite inefficient).

@nanonyme
Copy link

nanonyme commented Sep 26, 2022

I would prefer secure by default. This is implicitly used by various API's like shutil.unpack_archive which does not check TarInfo items are safe paths. Even standard library is not following tarfile module documentation.

@gustaebel
Copy link
Contributor

gustaebel commented Sep 26, 2022

I wrote a comment back in 2014 on the various security aspects of tarfile: #65308 (comment)
I think it is still worth a read.

@nanonyme
Copy link

nanonyme commented Sep 26, 2022

@gustaebel yes. I was mostly talking about other higher level tools in standard library that build on top of tarfile not being remotely secure either. I suggested adding audit event for unpacked TarInfo, that would easily allow gluing security on top without dropping all these nice things standard library has gathered. As said, shutil.unpack_archive will automatically call into insecure code in tarfile behind your back after sniffing that it needs tarfile based on file extension. This is not great design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 stdlib Python modules in the Lib dir type-security A security issue
Projects
Status: No status
Development

No branches or pull requests

8 participants
X Tutup