-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log formatters which use newlines to separate messages should quote newlines for security reasons #94503
Comments
|
#90358 is another injection-into-logs issue, but not quite in the same way. |
|
I get your point, though the examples you give don't appear to be log injection in the way the OWASP link suggests. I would also have expected log injection to relate to treatment of untrusted input, whereas the examples you give seem to be developer "misbehaviour" - so I'm not sure there's a genuine security issue in your examples. In terms of handling untrusted input, I'm not sure a change can be made, unless it's opt-in, due to backwards compatibility concerns. Based on Stack Overflow questions and similar I've come across in the past, I expect there's code out there that deliberately uses newlines to format log messages across multiple lines, which might break if a change is made to e.g. convert A possibility is to add a Of course, even after this change, there are still potential issues caused by simple treatment of untrusted inputs (e.g. denial of service by using large string values or random control characters in user-sourced input), so a "Security considerations" section talking about untrusted inputs is perhaps warranted in the documentation (there is already a section with this title, but it relates to What are your views on the above? I agree with your point about the undesirability of leaving this issue up to individual projects. |
|
Sorry for the long delay. Among other things, I had COVID between then and now :).
I was trying to be brief, so here's a more elaborate example. Log injection attacks are a little hard to talk generally about because their impact is not usually straightforward: sensitive behavior derived from log scraping is often oblique business analytics stuff. So let me construct a really extreme contrived scenario: our scrappy bootstrapped startup doesn't have any automated internal onboarding tooling and so we have some hacky cron job that scrapes the log looking for a special log message to give certain users root access to our systems. This at least gives us a clearly delineated consequence of an attack. Imagine we have a "backend" like this: import logging
logging.basicConfig(format='%(asctime)s %(message)s')
def sensitive_function(new_admin: str) -> None:
logging.warning("hey sysadmin you should give admin access to %s", new_admin)
def process_request(some_user_input: str) -> None:
logging.warning("totally regular thing happened, and it was %s", some_user_input)In this backend, access to But of course Note that we are carefully letting the logging system do our formatting in this example as well, so we are not even doing the typical "injection" mistake of string formatting our inputs manually. Let's see we have a series of "requests" to our system exemplified by the following function calls: This produces the following output: hopefully it's clear why this might present an issue.
Personally my view is that a change must be made; this is a serious, if not always catastrophic, category of security problem. The question is more "how", not "whether". There's a bit of a philosophical question here about the compatibility contract of the So, to avoid any thorny philosophical issues, the cheater way forward would be to create a new class, let's call it
A single attribute like this seems both too flexible (you could easily quote newlines into something garbled from which they could not be recovered) and too rigid (what if you want to quote other characters that aren't newlines later?). My inclination would be to provide callables, say
In Twisted's system, the "text" log formatter is only used if you're logging to a literal TTY, just to help with legibility and familiarity during debugging. So, e.g. if you run Ignoring for the moment our extremely ugly system names (we are discussing a retrofit of all our There's a lot going on here, some of it is pretty silly, and most of it is irrelevant to our discussion but the important bit is the I am not trying to propose that
While a “security considerations” section would definitely be a helpful addition to the docs, I don’t think it should simply be a list of “it’s not our fault” carve-outs for things like large values and control characters. If it’s a known issue with a particular type of data (i.e. a very large string), the logging module ought to handle it in some way which is most likely to provide value for debugging and monitoring. I don't want to pretend to have all the answers here, there's plenty of incredibly nuanced depth to different strategies for handling these questions, but I do think that if there is a type of thing that we know is going to provide a terrible user experience for the user consuming the logs, considering the practical aspects of that experience should be more important than byte-for-byte fidelity, even between python versions.
To break down my simple opinion of the individual issue here:
However, in a broader sense, I do think there’s a bit of a compatibility conundrum here which is “what is the logging module’s job” and “what constitutes a compatible modification of the log module”. There will inevitably be another instance of a similar issue, as you’ve pointed out above (terminal control chars, large values, possibly other vectors for log injection that we have not considered yet) and it would be good to have some mechanism to make similar changes without having to make So I think it's worth answering this a bit more broadly to future-proof whatever fix is made here. (After logjam, I suspect there will be renewed scrutiny on logging libraries and more issues reported than in the fairly stable previous decade.) |


The logging module, in most common configurations, is vulnerable to log injection attacks.
For example:
results in
All available log formatters in the standard library should provide a straightforward way to tell the difference between log message contents and log file format framing. For example, if your output format is newline-delimited, then it cannot allow raw newlines in messages and should "sanitize" by quoting them somehow.
Twisted deals with this by quoting them with trailing tabs, so, for example, the following code:
Produces this output:
I'd suggest that the stdlib do basically the same thing.
One alternate solution is just documenting that no application or framework is ever allowed to log a newlines without doing this manually themselves (and unfortunately this seems to be where the Java world has ended up, see for example spring-projects/spring-framework@e9083d7 ), but putting the responsibility on individual projects to do this themselves means making app and library authors predict all possible Formatters that they might have applied to them, then try to avoid any framing characters that that Formatter might use to indicate a message boundary. Today the most popular default formatter uses newlines. But what if some framework were to try to make parsing easier by using RFC2822? Now every application has to start avoiding colons as well as newlines. CSV? Better make sure you don't use commas. Et cetera, et cetera.
Pushing this up to the app or framework means that every library that wants to log anything derived from user data can't log the data in a straightforward structured way that will be useful to sophisticated application consumers, because they have to mangle their output in a way which won't trigger any log-parsing issues with the naive formats. In practice this really just means newlines, but if we make newlines part of the contract here, that also hems in any future Formatter improvements the stdlib might want to make.
I suspect that the best place to handle this would be logging.Formatter.format; there's even some precedent for this, since it already has a tiny bit of special-cased handling of newlines (albeit only when logging exceptions).
(The right thing to do to avoid logging injection robustly is to emit all your logs as JSON, dump them into Cloudwatch or Honeycomb or something like that and skip this problem entirely. The more that the standard logging framework can encourage users to get onto that happy path quickly, the less Python needs to worry about trying to support scraping stuff out of text files with no schema as an interesting compatibility API surface, but this is a really big problem that I think spans more than one bug report.)
The text was updated successfully, but these errors were encountered: