New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security] CVE-2007-4559: tarfile: Add absolute_path option to tarfile, disabled by default #73974
Comments
|
I noticed that "python3 -m tarfile -x archive.tar" uses absolute paths by default, whereas the UNIX tar command doesn't by default. The UNIX tar command requires to add explicitly --absolute-paths (-P) option. I suggest to add a boolean absolute_path option to tarfile, disabled by default. Example to create such archive. See that tar also removes "/" by default and requires to pass explicitly -P: $ cd $HOME
# /home/haypo
$ echo TEST > test
$ tar -cf test.tar /home/haypo/test
tar: Removing leading `/' from member names
$ rm -f test.tar
$ tar -P -cf test.tar /home/haypo/test
$ rm -f testExtracting such archive using tar is safe *by default*: $ mkdir z
$ cd z
$ tar -xf ~/test.tar
tar: Removing leading `/' from member names
$ find
.
./home
./home/haypo
./home/haypo/testExtracting such archive using Python is unsafe: $ python3 -m tarfile -e ~/test.tar
$ cat ~/test
TEST
$ pwd
/home/haypo/zPython creates files outside the current directory which is unsafe, wheras tar doesn't. |
|
The CLI was added in bpo-13477. I didn’t see any discussion of traversal attacks there, so maybe it was overlooked. Perhaps there should also be a warning, like with the Tarfile.extract and “extractall” methods. However I did see one of the goals was to keep the CLI simple, which I agree with. I would suggest that the CLI get this proposed behaviour by default (matching the default behaviour of modern “tar” commands), with no option to restore the current less-robust behaviour. To implement it, I suggest to fix the module internals first: bpo-21109 and/or bpo-17102. FWIW BSD calls the option “--absolute-paths” (plural paths) <https://www.freebsd.org/cgi/man.cgi?tar%281%29#OPTIONS\>, while Gnu calls it “--absolute-names” <https://www.gnu.org/software/tar/manual/html_chapter/tar_6.html#SEC121\>. Both these options disable other checks, such as for parent directories (..) and external symbolic link targets, so I think the term “absolute” is too specific. But please use at least replace the underscore with a dash or hyphen: “--absolute-path”, not “--absolute_path”. |
|
Questions:
|
|
Which versions are affected by this CVE? |
|
To be honest, whenever path is non-empty, I would expect it to be regarded "safe" to the extent that any final path must be contained inside the path user gives. So essentially passing relative paths would be fine as long as it does not escape the directory given by the user of the API. So eg I give path "/", then tarfile is allowed to unpack anywhere. I give path "", then there is no checking whatsoever (for backwards compat reasons). When I give path "something-else", then trying to unpack outside "something-else" raises an exception. |
|
Another possibility would be raising audit event for the path names so you can write an audit hook that prevents unpacking unsafe paths. This implies no API changes. |
|
@alittlesir everything everywhere. Documentation says you must check paths to be safe before unpacking. |
|
As a commercial python developer I would greatly appreciate this to become part of CPython. So while "technically correct" python implementation was deemed "non-security issue" (#45385), the fact that it resoluted in thousands of vulnerable projects seems to indicate that in retrospect, it is an issue. Despite documentation warning. Like great, we have documentation which says to watch out, but here we are. People are not reading it. And all that for "being technically correct" and in support of a feature that hardly every anyone wants or needs. Alternative non-breaking (and I really wonder which projects would be broken here) would be to have a subclass that does the protecting by default, in similar fashion as tarsafe project (https://github.com/beatsbears/tarsafe , but with different implementation, as the one there seems quite inefficient). |
|
I would prefer secure by default. This is implicitly used by various API's like shutil.unpack_archive which does not check TarInfo items are safe paths. Even standard library is not following tarfile module documentation. |
|
I wrote a comment back in 2014 on the various security aspects of tarfile: #65308 (comment) |
|
@gustaebel yes. I was mostly talking about other higher level tools in standard library that build on top of tarfile not being remotely secure either. I suggested adding audit event for unpacked TarInfo, that would easily allow gluing security on top without dropping all these nice things standard library has gathered. As said, shutil.unpack_archive will automatically call into insecure code in tarfile behind your back after sniffing that it needs tarfile based on file extension. This is not great design. |
|
I'd like to point out that PyPI itself is at risk, given that PEP-458 for signed package metadata has still not been implemented a decade after it was proposed. In short, every PyPI package should be considered "untrusted". While |
|
@jorhett the funny thing there is that this issue is about tarfile when in fact zipfile has the exact same vulnerability class considered by the CVE and PyPI has now converged around wheels that are essentially ZIP packages. Back when the CVE was filed tarfile was still used more. |
|
The not-funny thing is that PyPI packing is not cryptographically verifiable, there are no published instructions for validating PyPI packages, and yet Python maintainers consider it the user's responsibility to inspect the code before using it. I do this for a living, and Python is the hardest to evaluate. It has the least verifications/validations of origin possible. So we have to build a VM and diff before/after to be certain of what it does. Yeah, all the people using Python plugins in their tools are responsible for taking this kind of effort |
|
See also issue #94531 "Document zlib, gzip, bz2 and tarfile known vulnerabilities". |
|
it seems to have a big impact. when are you going to fix this cve? @vstinner |
I don't plan to implement my proposed idea soon, but if someone does, I will review the change. Anyone is free to propose a fix! |
|
There are multiple proposals and various issues identified in comments here and in linked issues. Before implementing a solution lets first read through all comments and agree on a proposal. Where there is a trade-off between safety and breaking existing applications, the right balance needs to be found, including the possibility to schedule changes to a later release and printing warnings about future changes in the meantime. |
|
Delete the first slash (“/”) character in the absolute path to make it a relative path. Is it any risks? |
|
That's not a reliable approach. The better is to normalize paths inside extract directory and making sure they don't lead outside it (which you can see with with '..' not being in normalized path). There was a concern of breaking old API assumptions though. |
Yes if risks include breaking existing apps and/or overwriting files that should not have been overwritten. A backup/restore app that assumes all paths are absolute may not have moved to an appropriate working directory nor specified a target path when extracting its own archives. I think the behaviour for absolute path should be changed in two stages:
Edit: Behaviour (d): skip the entry. |
As far as I understand detailed comments linked above, this is also not enough as the archive can first create a symlink I think a flexible approach can be implemented by passing a function or object as an extra parameter to the extract functions of the module. This function (or a function of the object) can raise an exception, return a modified path to use or return None to skip the item silently. As paths are checked on the fly, this should also work when reading an archive from For the convenience of the user of the module, we can provide functions that implement likely wanted behaviours. For detecting attempts to write using a symlink previously created by the archive, we need to keep state between calls, hence the suggestion to support passing a validator object. The validator currently considered the safest could be made available as Edit: To make the later thread safe, a new object would have to be created at each call of an extraction function so that two calls do not share the same validator object. Edit 2: A "make the path safe" behaviour for |
|
<speaking without intimate knowledge of the module and its history, please excuse if I'm talking rubbish> I fail to see why the discussion about lost functionality needs to drag on for so long. The Linux/UNIX/POSIX tar command recognized the danger and changed its default behaviour. Isn't the solution as simple as: emulate the tar command? Just follow upstream changes? See #97663 for an example of what tar now does. |
|
@jowagner I see. Yes, I would expect symlinks to be resolved during normalisation but maybe this is not in general reasonable. There could be multiple levels of symlinks. |
Yes, the validator would have a bit of work to do to keep track of all the symlinks. There can be legitimate uses of archives containing a large number of symlinks so bailing out after an unusual high number of symlinks (millions?) would be problematic. At the same time, memory usage of the validator could grow unacceptedly for such archives. If the target path is an empty folder at the start, we can just use |
This approach was rejected in 2007, see #45385 (comment) Hence the attempt to move this forward as a feature that changes the default behaviour. I further suggest to add value beyond safety, namely allowing the module user to filter and modify paths of archive members on the fly. |
|
The existing interface ( |
|
Is this cve not had that much impact |
|
@alittlesir @jowagner Maybe I can answer that. 2007 Was a long time ago, and things change. What was perfectly acceptable in 2007, is now a CVE with a high severity score. "It's not exploitable" was true in 2007. Now there is a video of (what is deemed) an exploit: https://youtu.be/jqs8S51_ENg I'm not going to debate whether it's really an exploitable vulnerability, or just a trick to make a user do something, not elevating any privileges at all. I suggest everyone watch the video and draws their own conclusions on that. I work for a bank, BTW. In my context, we'd patch the thing that even just smells like a vulnerability quickly, and worry about extra/new functionality later. But all is well with the world. The debate can happen on here, the added features for filtering can be added in due time or not at all. But in the meantime, the CVE is registered and it is scored as a 5.6. LINK The companies which have a high level of security paranoia will have a code scanner in place which alerts on the use of the module, and they can act accordingly. |
|
@Denelvo tarfile API clearly states you must manually sanitize filenames yourself. So does zipfile. What is more problematic is shutil.unpack_archive does not expose an API to do this. So clearly it must sanitize files for you as a high-level API. It probably did not exist back in 2007. |
|
@Denelvo I rather suggest everyone take a look at the actual code in Spyder IDE that is the basis for the exploit (from this blog article from Trellix).
To quote the pickle documentation:
This is a security vulnerability in Spyder IDE. |
|
Kind of yes but also if you have code that doesn't sanitize unpacking paths, then unrelated code cannot trust any path in entire hard disk anymore for safe unpickling. And you could even simpler attack write on Linux .bashrc file into user home for arbitrary code execution afterwards. |
|
Making tarfile safe by default was never expected to solve all software problems such as unsafe use of the pickle module. Can we move on to discussing what the PR should look like? How about the following default behaviour:
At minimum, the tarfile module should provide a |
I don't think that's enough: https://gist.github.com/Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e?permalink_comment_id=4358347#gistcomment-4358347 |
|
Digging deeper in the GNU How does GNU |
|
Note that the symlink trick can be applied also to file symlinks; the tar file created by: will try to write passwd when extracted by (needs some more work, like a test, plus I suppose |


vstinner commentedMar 10, 2017
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: