bpo-43910 Fix handling of quoted values in cgi.parse_header #25519
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.


Updates the logic in cgi.parse_header to do a proper scan over the string managing the parse state properly. This corrects cases where a quoted value ends with a backslash character. This PR also correctly unescapes characters other than the backslash or double quote character by replacing them with the octet literal following a backslash in a quoted string (although according to the spec clients should not quote anything other than those two characters).
The goal is to recognize the language detailed in https://www.w3.org/Protocols/rfc1341/4_Content-Type.html (with additional details on quoted-string at https://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-16.html#rfc.section.3.2.1.p.3 ).
Note that this method has no validation that the header value is well formed and always returns a value. Do to this we recognize a slightly larger language that looks something like
This also eliminates an unnecessary quadratic loop (that was also the source of the correctness problem)
https://bugs.python.org/issue43910