X Tutup
The Wayback Machine - https://web.archive.org/web/20210202091940/https://github.com/python/peps/issues/1004
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 551 discussion #1004

Open
tiran opened this issue Apr 18, 2019 · 14 comments
Open

PEP 551 discussion #1004

tiran opened this issue Apr 18, 2019 · 14 comments
Assignees

Comments

@tiran
Copy link
Member

@tiran tiran commented Apr 18, 2019

@zooba, I'm using this issue as a to be discussed list for PEP 551. Some are just crazy ideas that I wanted to dump into text form before I forget them again.

  • prevent piping code from stdin like curl https://example.com/script.py | spython
  • use -I (isolated mode), which also sets -E, and -s
  • explain why importing from pyc is not supported
  • optionally have further restrictions for open_for_import like O_MAYEXEC and W^X (Python file must not be owned or writeable by current effective uid/gid unless it's root).
  • optionally allow passing arguments to scripts. On Linux it's common to have command line utilities in Python. There must be a way to mark a script for arguments, perhaps # spython: arguments on the second line of the script? Or only allow arguments from scripts in well-known directories like /bin:/sbin:...?
  • optionally prevent interactive consoles (may have to block pdb, cmd, code)
  • discuss if spython should block/limit ctypes
  • block exec() of imports in *.pth files?
  • force compile() + exec() to go through a file on the file system? Maybe something that can be generated on build time and then shipped?
  • optionally block exec() completely?
    • namedtuple uses exec to generate a __new__ function. The function generator could be implemented as tuple._make_new() or so.
    • dataclasses are using exec
    • cgi, trace, doctest, timeit, and a couple of more modules use exec. They are probably not useful in secure mode.
  • SPYTHONLOG breaks with tradition to have all Python env vars prefixed with PYTHON.
@tiran tiran assigned tiran and zooba Apr 18, 2019
@zooba
Copy link
Member

@zooba zooba commented Apr 18, 2019

Awesome! Glad to have you on board here :)

Probably the most important thing to check right now is whether the additional limits people may want to apply are possible with PEP 578 as it is, or do we need additional API (additional events can come later, they're not all meant to be in the initial PEP).

Also, this doc isn't meant to be defining spython, just providing guidance for those who will. So the places where it "should" or "shouldn't" do something really ought to be phrased as "consider this" or "avoid that".

@dutc
Copy link

@dutc dutc commented Apr 18, 2019

@tiran Thanks for your comments!

A couple of thoughts:

prevent piping code from stdin like curl https://example.com/script.py | spython

I think this is already considered in the original spec as part of the alternate entry point that removes interactive modes.

use -I (isolated mode), which also sets -E, and -s

This is a good idea, but it may do too much to hinder the real-world usability of Python with larger code bases with (internal or external) library dependencies in complex deployment environments.

There are definitely environments in which this is a good idea.

However, how much flexibility should an spython have for customisation? In practice, can overly customisable security mechanisms lead to security lapses?

explain why importing from pyc is not supported

In a code-signing environment, .py files must be signed in order to be securely-opened for evaluation. .pyc importing won't work in a code signed environment unless the code-signing mechanism is integrated into the interpreter and signing keys are available to the interpreter. The former may prove tricky to do on platforms where code-signing is tightly integrated into the OS, and the latter would not be possible in environments where access to signing keys is closely controlled.

force compile() + exec() to go through a file on the file system? Maybe something that can be generated on build time and then shipped?

I like this as an idea. Designate some file system location where any dynamically-evaluated code must reside for later auditing.

Since these mechanisms are used for dynamically generated code, and it may not be possible to sign this code, if you're enforcing code-signing, then I think this idea would only generate auditing artefacts. That said, the complexity of finding a reliable place to write these files may also prove difficult. I think that the audit logging mechanism may be a better approach—generate an audit log message on every exec() and let the logging subsystem handle persisting this information for later investigation.

optionally allow passing arguments to scripts. On Linux it's common to have command line utilities in Python. There must be a way to mark a script for arguments, perhaps # spython: arguments on the second line of the script? Or only allow arguments from scripts in well-known directories like /bin:/sbin:...?

This is an interesting technique. It's definitely possible to create security holes using argparse (parser.add_argument('x', type=eval)), but that kind of issue may be out of scope for the interpreter.

What kind of attacks do you see this addressing that wouldn't be handled by the script itself?

optionally prevent interactive consoles (may have to block pdb, cmd, code)

I believe this is covered with a two-pronged approach:

  1. at the deployment level, make it easier to selectively remove certain modules from deployment and code signing
  2. at the security bootstrap code, block import of these modules using known mechanisms (e.g., an ImportHook or setting sys.modules[mod] = imp.new_module(mod), etc.)
@dutc
Copy link

@dutc dutc commented Apr 18, 2019

optionally block exec() completely?

An interesting mitigation to consider is whitelisted code-paths. Certain potentially "unsafe" behavior occurs in the standard library as you note. It might be possible to identify these fixed code paths and restrict certain "unsafe" operations only if they occur within that code path. This may have a performance penalty, but there may be techniques to significantly minimize that penalty.

@tiran
Copy link
Member Author

@tiran tiran commented Apr 18, 2019

Awesome! Glad to have you on board here :)

Probably the most important thing to check right now is whether the additional limits people may want to apply are possible with PEP 578 as it is, or do we need additional API (additional events can come later, they're not all meant to be in the initial PEP).

To be honest, I completely forgot about PEP 551 and was confused when you started to talk about three PEPs...

Also, this doc isn't meant to be defining spython, just providing guidance for those who will. So the places where it "should" or "shouldn't" do something really ought to be phrased as "consider this" or "avoid that".

Yeah, I don't mind the exact wording.

@tiran
Copy link
Member Author

@tiran tiran commented Apr 18, 2019

prevent piping code from stdin like curl example.com/script.py | spython

I think this is already considered in the original spec as part of the alternate entry point that removes interactive modes.

OK, it wasn't obvious to me ,because I don't consider piping a script from stdin as interactive. It's probably my Unix mentality. :) May I suggest to extend the wording to make it obvious for everybody?

use -I (isolated mode), which also sets -E, and -s

This is a good idea, but it may do too much to hinder the real-world usability of Python with larger code bases with (internal or external) library dependencies in complex deployment environments.

There are definitely environments in which this is a good idea.

However, how much flexibility should an spython have for customisation? In practice, can overly customisable security mechanisms lead to security lapses?

https://docs.python.org/3/using/cmdline.html?highlight=isolated

-I implies -E and -s. Additionally it removes the current working directory / script directory from path. I would argue that spython is only ever useful with properly packages software delivered in a virtual env or Python installation with restricted write permission.

$ python3 -I -c "import sys; print(sys.path)"
['/usr/lib64/python37.zip', '/usr/lib64/python3.7', '/usr/lib64/python3.7/lib-dynload', '/usr/lib64/python3.7/site-packages', '/usr/lib/python3.7/site-packages']
$ python3 -Es -c "import sys; print(sys.path)"
['', '/usr/lib64/python37.zip', '/usr/lib64/python3.7', '/usr/lib64/python3.7/lib-dynload', '/usr/lib64/python3.7/site-packages', '/usr/lib/python3.7/site-packages']

Could you give me an example why you consider the local directory useful?

explain why importing from pyc is not supported

In a code-signing environment, .py files must be signed in order to be securely-opened for evaluation. .pyc importing won't work in a code signed environment unless the code-signing mechanism is integrated into the interpreter and signing keys are available to the interpreter. The former may prove tricky to do on platforms where code-signing is tightly integrated into the OS, and the latter would not be possible in environments where access to signing keys is closely controlled.

It's certainly possible to extend the PYC format to allow a user-specific section after the header. Also it would be possible to store the code signing information in an NTFS ADS or Linux xattr. For example IMA Appraisal uses an xattr.

force compile() + exec() to go through a file on the file system? Maybe something that can be generated on build time and then shipped?

I like this as an idea. Designate some file system location where any dynamically-evaluated code must reside for later auditing.

Since these mechanisms are used for dynamically generated code, and it may not be possible to sign this code, if you're enforcing code-signing, then I think this idea would only generate auditing artefacts. That said, the complexity of finding a reliable place to write these files may also prove difficult. I think that the audit logging mechanism may be a better approach—generate an audit log message on every exec() and let the logging subsystem handle persisting this information for later investigation.

Let me think about this for a bit.

optionally allow passing arguments to scripts. On Linux it's common to have command line utilities in Python. There must be a way to mark a script for arguments, perhaps # spython: arguments on the second line of the script? Or only allow arguments from scripts in well-known directories like /bin:/sbin:...?

This is an interesting technique. It's definitely possible to create security holes using argparse (parser.add_argument('x', type=eval)), but that kind of issue may be out of scope for the interpreter.

What kind of attacks do you see this addressing that wouldn't be handled by the script itself?

I'm referring to "The spython entry point requires a script file be passed as the first argument, and does not allow any options to precede it.". It sounds to me that argument passing is not allowed with spython. However this would make spython unusable for command line scripts. If we want to harden Linux, then we need a way to only have spython on board.

optionally prevent interactive consoles (may have to block pdb, cmd, code)

I believe this is covered with a two-pronged approach:

1. at the deployment level, make it easier to selectively remove certain modules from deployment and code signing

2. at the security bootstrap code, block import of these modules using known mechanisms (e.g., an ImportHook or setting `sys.modules[mod] = imp.new_module(mod)`, etc.)

SGTM!

@tiran
Copy link
Member Author

@tiran tiran commented Apr 18, 2019

optionally block exec() completely?

An interesting mitigation to consider is whitelisted code-paths. Certain potentially "unsafe" behavior occurs in the standard library as you note. It might be possible to identify these fixed code paths and restrict certain "unsafe" operations only if they occur within that code path. This may have a performance penalty, but there may be techniques to significantly minimize that penalty.

I was thinking more about replacing namedtuple's exec with a specialized helper in C.

WARNING experimental hack with a bunch of reference leaks:

/*[clinic input]
@classmethod
tuple._mknew

     typename: object(subclass_of="&PyUnicode_Type")
     arg_list: object(subclass_of="&PyTuple_Type")
     /

Return number of occurrences of value.
[clinic start generated code]*/

static PyObject *
tuple__mknew_impl(PyTypeObject *type, PyObject *typename, PyObject *arg_list)
/*[clinic end generated code: output=9e46cb613dd3324c input=1dbb37e58b67fe19]*/
{
    PyObject *result = NULL;
    PyObject *seq = NULL;
    PyObject **fitem;
    PyObject *comma = NULL;
    PyObject *arg_str = NULL;
    PyObject *code_str = NULL;
    const char *str;
    PyObject *ns_dict = NULL;
    Py_ssize_t i, size;
    PyCompilerFlags cf;
    PyObject *tuple_new;

    cf.cf_flags = PyCF_SOURCE_IS_UTF8;
    cf.cf_feature_version = PY_MINOR_VERSION;

    if (!PyUnicode_IsIdentifier(typename)) {
        PyErr_SetString(PyExc_TypeError, "typename is not a valid identifier");
        goto end;
    }
    seq = PySequence_Fast(arg_list, "arg_list must be a tuple");
    if (seq == NULL) {
        goto end;
    }
    size = PySequence_Fast_GET_SIZE(seq);
    if (size == 0) {
        PyErr_SetString(PyExc_ValueError, "arg_list is empty.");
        goto end;
    }
    fitem = PySequence_Fast_ITEMS(seq);
    for (i = 0; i < size; i++) {
        if (!PyUnicode_IsIdentifier(fitem[i])) {
            PyErr_SetString(PyExc_TypeError, "arg_list contains an invalid identifier");
            goto end;
        }
    }
    Py_CLEAR(seq);

    comma = PyUnicode_FromString(", ");
    if (comma == NULL) {
        goto end;
    }
    arg_str = PyUnicode_Join(comma, arg_list);
    Py_CLEAR(comma);
    if (arg_str == NULL) {
        goto end;
    }
    code_str = PyUnicode_FromFormat(
        "def __new__(_cls, %U): return _tuple_new(_cls, (%U))",
        arg_str, arg_str
    );
    if (code_str == NULL)
        goto end;
    ns_dict = Py_BuildValue(
        "{sOsO}",
         "__name__", typename,
         "_tuple_new", PyObject_GetAttrString((PyObject*)&PyTuple_Type, "__new__")
         // "__builtins__", PyEval_GetBuiltins()
    );
    if (ns_dict == NULL)
        goto end;
    str = PyUnicode_AsUTF8(code_str);
    result = PyRun_StringFlags(str, Py_file_input, ns_dict, ns_dict, &cf);
    result = PyDict_GetItemString(ns_dict, "__new__");

  end:
    Py_XDECREF(seq);
    Py_XDECREF(comma);
    Py_XDECREF(code_str);
    Py_XDECREF(ns_dict);
    return result;
}
@zooba
Copy link
Member

@zooba zooba commented Apr 18, 2019

This is where tracing exec is so much more preferable to trying to figure out whether to block it or not. Once you start reviewing logs you'll see the namedtuple calls stick out and can easily mark them as uninteresting, while still treating other exec's as anomalies. And then if someone comes up with a clever exploit via namedtuple you haven't lost the traced code.

Similarly for ignoring .pyc files - yes, you can sign them, but you can't get a transcript of the code that was actually executed. When you hit the point where you discover you've been breached and are trying to figure out who/why/how, having access to their source code is much more useful. This is one of the biggest things that Powershell added for the same reasons.

@zooba
Copy link
Member

@zooba zooba commented Apr 18, 2019

I'm referring to "The spython entry point requires a script file be passed as the first argument, and does not allow any options to precede it.". It sounds to me that argument passing is not allowed with spython. However this would make spython unusable for command line scripts. If we want to harden Linux, then we need a way to only have spython on board.

"Precede" means you can't pass interpreter arguments (e.g. no -O, -X ..., etc.). Passing script arguments after the script name is totally fine.

@zooba
Copy link
Member

@zooba zooba commented Apr 18, 2019

Also, this doc isn't meant to be defining spython, just providing guidance for those who will. So the places where it "should" or "shouldn't" do something really ought to be phrased as "consider this" or "avoid that".

Yeah, I don't mind the exact wording.

It's a little bit more than just the wording, it's the whole intent of the PEP :)

We don't have to design something that will work for any given scenarios here. All we're doing is outlining recommendations and possibly specific details to help people make their own decisions and implementations. So spython is just a convenient codename for "your own locked-down implementation of Python" rather than a real thing.

We can provide sample code (examples) but are not trying to provide a ready-to-use implementation. (Maybe each of us personally will end up doing one for our respective employers anyway? But that's wearing the MSFT/RH hats rather than our python-dev hats, and this PEP is wearing our python-dev hats.)

@dutc
Copy link

@dutc dutc commented Apr 19, 2019

It's certainly possible to extend the PYC format to allow a user-specific section after the header. Also it would be possible to store the code signing information in an NTFS ADS or Linux xattr. For example IMA Appraisal uses an xattr.

Similarly for ignoring .pyc files - yes, you can sign them, but you can't get a transcript of the code that was actually executed. When you hit the point where you discover you've been breached and are trying to figure out who/why/how, having access to their source code is much more useful. This is one of the biggest things that Powershell added for the same reasons.

Critically, I believe that if you want to be able to sign .pyc files, you will need to have signing keys available to the interpreter. Most likely, you would have to pre-generate and pre-sign .pyc files as part of your build/deploy system. In practice, however, it seems easier to just ignore .pyc files with Py_DontWriteBytecodeFlag in the entry point.

@dutc
Copy link

@dutc dutc commented Apr 19, 2019

This is where tracing exec is so much more preferable to trying to figure out whether to block it or not. Once you start reviewing logs you'll see the namedtuple calls stick out and can easily mark them as uninteresting, while still treating other exec's as anomalies. And then if someone comes up with a clever exploit via namedtuple you haven't lost the traced code.

I was thinking more about replacing namedtuple's exec with a specialized helper in C.

We could definitely look into replacing exec throughout the standard library, but it would have to be done upstream in CPython, rather than something maintained in SPython.

I suspect most developers of Python code that will run on an SPython will be developing and testing against CPython, and I suspect that the secure environments in which SPython runs are likely to also be "high-reliability" environments. Many of the mitigations discussed in the PEPs try to be very surgical in nature to try to avoid surface area for subtle bugs specific to SPython to creep in.

Additionally, I assume that many code bases that run on SPython may make use of common third-party libraries, so we should also evaluate how frequently exec() and eval() (or other dynamic code generation mechanisms) are used.

Another possible approach might be to find a way to move dynamic code generation to the build system. A library using pyximport or cffi would have its build system modified to precompile C code (since it would likely also need to be signed.) Similarly, I suspect that most use of namedtuple occurs at module-scope and is not dependent on runtime information. In this case, the old namedtuple(..., verbose=True) approach could be used to eliminate these calls.

@dutc
Copy link

@dutc dutc commented Apr 20, 2019

prevent piping code from stdin like curl example.com/script.py | spython

I think this is already considered in the original spec as part of the alternate entry point that removes interactive modes.

OK, it wasn't obvious to me ,because I don't consider piping a script from stdin as interactive. It's probably my Unix mentality. :) May I suggest to extend the wording to make it obvious for everybody?

There is a related Linux-specific mitigation that has not been considered: shell redirection techniques like python <( echo ). This should be blocked, just like echo | python is blocked by (carefully) validating that the script file providing is an actual file on disk and not a named pipe or other construct.

This logic (plus any code signing mechanisms) would probably be included in a (secure_)open_for_execution function, which I think should be used in upstream CPython in place of raw open(3)/fopen(3).

@tiran
Copy link
Member Author

@tiran tiran commented Apr 23, 2019

There is a related Linux-specific mitigation that has not been considered: shell redirection techniques like python <( echo ). This should be blocked, just like echo | python is blocked by (carefully) validating that the script file providing is an actual file on disk and not a named pipe or other construct.

This logic (plus any code signing mechanisms) would probably be included in a (secure_)open_for_execution function, which I think should be used in upstream CPython in place of raw open(3)/fopen(3).

Excellent point! O_MAYEXEC or other checks for regular file will prevent shell redirects, because it's not a regular file. A shell redirect like python <( echo ) is implemented as a pipe:

  • redirected file looks like /dev/fd/63
  • /dev/fd is a symlink to /proc/self/fd
  • /proc/self/fd/63 is a pipe
@tiran
Copy link
Member Author

@tiran tiran commented Apr 23, 2019

Critically, I believe that if you want to be able to sign .pyc files, you will need to have signing keys available to the interpreter. Most likely, you would have to pre-generate and pre-sign .pyc files as part of your build/deploy system. In practice, however, it seems easier to just ignore .pyc files with Py_DontWriteBytecodeFlag in the entry point.

That's correct. The build system needs access to the private key in order to sign files. It's easily possible to build, sign, and ship .pyc files together .py files. On RPM systems (Centos, Fedora, RHEL) .pyc files are created on package built time and shipped in RPM package. That makes it possible to verify all code files and detect files that have been tampered with.

(There are also other models where the system is booted in signature generation mode and then switched into another policy that makes the signatures read-only (simply speaking).)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
X Tutup