gh-115801: Only allow sequence of strings as input for difflib.unified_diff#118333
gh-115801: Only allow sequence of strings as input for difflib.unified_diff#118333serhiy-storchaka merged 9 commits intopython:mainfrom
Conversation
Lib/difflib.py
Outdated
| if b and not isinstance(b[0], str): | ||
| raise TypeError('lines to compare must be str, not %s (%r)' % | ||
| (type(b[0]).__name__, b[0])) | ||
| if isinstance(a, str) or isinstance(b, str): |
There was a problem hiding this comment.
Following the existing code, IMO it would be great add the not %s (%r)' phrase.
Lib/test/test_difflib.py
Outdated
| self.assertIn('ımplıcıt', output) | ||
|
|
||
|
|
||
| class TestInputParsing(unittest.TestCase): |
There was a problem hiding this comment.
There are three existing tests for types checking: test_mixed_types_content, test_mixed_types_filenames and test_mixed_types_dates. I think that it is better to extract them in a separate class (they are no longer about mixing 8-bit and Unicode strings) and add a new test in this class.
There was a problem hiding this comment.
Done. I moved the 3 existing tests to a new class and refactored the new test to be in the same style.
Lib/difflib.py
Outdated
| raise TypeError('lines to compare must be str, not %s (%r)' % | ||
| (type(b[0]).__name__, b[0])) | ||
| if isinstance(a, str): | ||
| raise TypeError('input must be a sequence of strings, not %s (%r)' % |
There was a problem hiding this comment.
If the user pass the content of the whole file without splitting it on lines, all this (potentially megabites) will be included in the error message. This is not good. The content of the string does not add anything to clarify the bug, the type is enough.
It is better to include only type name instead of the repr also in all other error messages, but this is not related to this PR.
There was a problem hiding this comment.
I agree we should not print the entire object. I changed it for the new error messages. Let me know whether you want me to also change the other error messages in this PR.
|
Thank you for your contribution @eendebakpt. There were some unrelated random test failures, so I forget to merge this at its time. I should use "auto-merge" instead. |
difflib.unified_diff(anddifflib.context_diff) are documented to have lists of strings as input. When called with strings as input arguments, e.g.difflib.unified_diff('one', 'two')a diff is generated, instead of returning an error.In this PR we disallow input of pure strings to prevent users from accidentally passing a string to
difflib.unified_diff.