bpo-34088: sndhdr.what() doesn't fail on bad input. #8319
bpo-34088: sndhdr.what() doesn't fail on bad input. #8319davidawad wants to merge 16 commits intopython:mainfrom
Conversation
Lib/sndhdr.py
Outdated
| try: | ||
| res = whathdr(filename) | ||
| return res | ||
| except RuntimeError as e: |
There was a problem hiding this comment.
Should we only handle the specific exceptions that are created in these two examples? or is there a cleaner, more pythonic way to handle it?
I think this should follow the pattern set out by sndhdr and have the exception catching be on a per-test basis. Take a look at this block for example:
Lines 75 to 91 in feabae9
You should figure out what exceptions the individual tests code can throw and catch those specifically. As it stands now, its possible for a coding error like an out of bounds exception to be caught accidentally by your exception handler.
There was a problem hiding this comment.
cool, in that case we already catch exactly those two errors, the RuntimeError and the IndexError caused by these bad files and catch them explicitly. I'll add some testcases to Lib/test_sndhdr.py and @you when it's done.
Yes, definitely add test cases. |
|
And please don't make unrelated changes (like adding spaces or newlines). |
|
@ammaraskar @serhiy-storchaka just to be sure, are you referring to catching exceptions at the sndhdr.what() level? Or are you asking to add exception handling to the lower level I've added some tests for the different types on inputs this could get, but I want to point out that something weird that I wanted some clarity on. Python hangs when you pass Python 3.8.0a0 (heads/master:28f07364f0, Jul 17 2018, 11:42:07)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sndhdr
>>> sndhdr.what('NO')
>>> sndhdr.what(None)
>>> sndhdr.what(False)
# process hangsThe same thing happens for Thanks. |
At the lower test_ level.
Don't handle this case explicitly, what's happening here requires a bit of explanation, open() can apparently open file descriptors by their numbers. False is treated as 0 which opens up stdin for reading, that's why it appears to hang. Moreover, your tests aren't really testing the primary thing addressed in the bug, which is random binary data. Please add a test that attempts to call |
|
@ammaraskar That's really interesting! I figured it was something like that because i'd pass other numbers like I added two different files based on the bug ticket with binary data described in the ticket, and added handlers for the exceptions they throw. A few of the builds are still running but let me know if there's anything else i can do. Does this warrant an addition to news.d? |
Lib/sndhdr.py
Outdated
| try: | ||
| res = whathdr(filename) | ||
| return res | ||
| except (FileNotFoundError) as e: |
There was a problem hiding this comment.
Please remove the extra () and unused as e.
There was a problem hiding this comment.
Why silence FileNotFoundError at first point?
There was a problem hiding this comment.
@serhiy-storchaka This was my mistake. This exception really happens inside of whathdr(). I'll catch it there.
There was a problem hiding this comment.
@serhiy-storchaka I also added unit tests specifically checking for bad input on both what() and whathdr()
I should have caught this bug when writing those tests. My bad on that, still trying to learn and get better here :)
Lib/sndhdr.py
Outdated
| return SndHeaders(*res) | ||
| return None | ||
|
|
||
| except(TypeError) as e: |
There was a problem hiding this comment.
@tiran will do. definitely was silly to create e and not use it. thanks
Lib/sndhdr.py
Outdated
| rate = get_short_le(h[2:4]) | ||
| if 4000 <= rate <= 25000: | ||
| return 'sndr', rate, 1, -1, 8 | ||
| except (IndexError) as e: |
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
Lib/sndhdr.py
Outdated
| try: | ||
| res = whathdr(filename) | ||
| return res | ||
| except (FileNotFoundError) as e: |
There was a problem hiding this comment.
Why silence FileNotFoundError at first point?
Lib/sndhdr.py
Outdated
| return SndHeaders(*res) | ||
| return None | ||
|
|
||
| except(TypeError) as e: |
There was a problem hiding this comment.
I think that no any exception should be ignored at the top level. Catch appropriate exceptions in corresponding test functions.
For example your change silences the error in sndhdr.what(None).
There was a problem hiding this comment.
yes, definitely. My catch in sndhdr.what should really be happening here. I'll move it. Thanks!
Lib/sndhdr.py
Outdated
|
|
||
| tests = [] | ||
|
|
||
|
|
There was a problem hiding this comment.
This was a pep8 recommendation from a linting tool that I use. I'll remove it.
Lib/sndhdr.py
Outdated
| f.seek(0) | ||
| w = wave.open(f, 'r') | ||
| except (EOFError, wave.Error): | ||
| except (EOFError, |
There was a problem hiding this comment.
Can't it be written in a single line?
There was a problem hiding this comment.
In this case it's just the 4 error types, if you think it makes more sense to put them in one line I can definitely move them.
Just wondering why? Is it more pythonic to do it that way?
There was a problem hiding this comment.
Because
it
is
better
to
not
split
a
line
without
a
need.
|
I have made the requested changes; please review again. ^^ comment for bot |
|
Thanks for making the requested changes! @tiran: please review the changes made to this pull request. |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Please remove all unrelated changes and catch exceptions in proper places.
Lib/sndhdr.py
Outdated
| return SndHeaders(*res) | ||
| return None | ||
|
|
||
| except (TypeError, |
There was a problem hiding this comment.
Don't catch TypeError here. Catch it in test functions.
There was a problem hiding this comment.
So this TypeError is thrown when passing None to the open() call. That's specific to this function, would you prefer to just not catch it?
let me know whatever you'd prefer and i'll make the change.
When commenting out the try catch this is what happens in whathdr
>>> sndhdr.whathdr(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 62, in whathdr
with open(filename, 'rb') as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType
>>>
There was a problem hiding this comment.
I don't think you should catch it, from the traceback its very obvious what went wrong, since the documentation explicitly says what() works on files, it is expected to receive FileNotFound and invalid file name errors.
There was a problem hiding this comment.
wholly agree. I'll make the change.
Lib/sndhdr.py
Outdated
| f.seek(0) | ||
| w = wave.open(f, 'r') | ||
| except (EOFError, wave.Error): | ||
| except (EOFError, |
There was a problem hiding this comment.
Because
it
is
better
to
not
split
a
line
without
a
need.
Lib/sndhdr.py
Outdated
| return (fmt, a.getframerate(), a.getnchannels(), | ||
| a.getnframes(), 8 * a.getsampwidth()) | ||
|
|
||
|
|
There was a problem hiding this comment.
This is not related change.
Lib/sndhdr.py
Outdated
| return None | ||
|
|
||
| except (TypeError, | ||
| FileNotFoundError): |
There was a problem hiding this comment.
Don't catch FileNotFoundError.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
Alright, this was a really good learning experience. Thank you all for sticking with me and helping me make this change the right way. I think I've got all the changes requested at this point, but of course i'm happy to make more. Two quick questions:
and for the bot: |
|
Thanks for making the requested changes! @vstinner, @serhiy-storchaka, @tiran: please review the changes made to this pull request. |
|
Can anyone review this? It's been a while and I'd love to see it go in. |
|
@tiran @serhiy-storchaka @vstinner Re-requesting review to hopefully push this one to be merged. @davidair, you don't have to squash the commits as that is done upon merge. Also, we don't generally accept PRs that are just PEP8 changes as they create more noise in the git blame than benefit. Thank you for asking and wanting to improve Python! 🙂 |
Lib/test/test_sndhdr.py
Outdated
| filename = findfile(filename, subdir="sndhdrdata") | ||
|
|
||
| what = sndhdr.what(filename) | ||
| self.assertEqual(what, None) |
There was a problem hiding this comment.
| self.assertEqual(what, None) | |
| self.assertIsNone(what) |
Lib/test/test_sndhdr.py
Outdated
| self.assertEqual(what, None) | ||
|
|
||
| whathdr = sndhdr.whathdr(filename) | ||
| self.assertEqual(whathdr, None) |
There was a problem hiding this comment.
| self.assertEqual(whathdr, None) | |
| self.assertIsNone(whathdr) |
Lib/test/test_sndhdr.py
Outdated
| for tf in sndhdr.tests: | ||
| with open(filename, 'rb') as f: | ||
| h = f.read(512) | ||
| self.assertEqual(tf(h, f), None) |
There was a problem hiding this comment.
| self.assertEqual(tf(h, f), None) | |
| self.assertIsNone(tf(h, f)) |
Lib/wave.py
Outdated
| chunk.skip() | ||
| try: | ||
| chunk.skip() | ||
| except RuntimeError: |
There was a problem hiding this comment.
Why this was added? Tests are passed without this change.
There was a problem hiding this comment.
I don't think they actually pass without catching this. It's been a while so looking at this again I get the following issue when running these tests without this fix.
david@lithium ~/P/P/c/L/test>pso-34088>../../python.exe test_sndhdr.py 18:22:08
E..
======================================================================
ERROR: test_badinputs (__main__.TestFormats)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_sndhdr.py", line 35, in test_badinputs
what = sndhdr.what(filename)
File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 54, in what
res = whathdr(filename)
File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 63, in whathdr
res = tf(h, f)
File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 163, in test_wav
w = wave.open(f, 'r')
File "/Users/david/Projects/PYTHON/cpython/Lib/wave.py", line 510, in open
return Wave_read(f)
File "/Users/david/Projects/PYTHON/cpython/Lib/wave.py", line 164, in __init__
self.initfp(f)
File "/Users/david/Projects/PYTHON/cpython/Lib/wave.py", line 153, in initfp
chunk.skip()
File "/Users/david/Projects/PYTHON/cpython/Lib/chunk.py", line 160, in skip
self.file.seek(n, 1)
File "/Users/david/Projects/PYTHON/cpython/Lib/chunk.py", line 113, in seek
raise RuntimeError
RuntimeError
----------------------------------------------------------------------
Ran 3 tests in 0.010s
FAILED (errors=1)
There was a problem hiding this comment.
I believe at the time I thought that this file should have reported the runtime error if it failed to skip through a wav file that was poorly formatted.
Given I'm not very experienced at being a principled programmer; @serhiy-storchaka where do you think is the best place to catch this error or deal with the tests failing on the bad input files?
Probably wrong David - did you mean @davidawad? |
Yes, sorry about that and thank you for including the correct David. |
|
Definitely, thanks for bringing this back to my attention @csabella . I really want to see this somewhat simple PR get merged as it's been a really good learning experience overall and i'd love to try to improve upon this and help out on some of the other easier python issues as well. @serhiy-storchaka If I didn't make it clear last year I really appreciate your patience and advice it's been really helpful. I've made the changes but I want to know what you all think the correct way is to deal with the I've made the other changes for the unit tests. |
|
Both Since all errors are due to truncation, I suggest you to use the following test: def test_truncated(self):
for filename, expected in SAMPLES:
filename = findfile(filename, subdir="sndhdrdata")
with open(filename, 'rb') as f:
data = f.read()
self.addCleanup(unlink, TESTFN)
with self.subTest(filename=filename):
for i in range(len(data)):
with open(TESTFN, 'wb') as f:
f.write(data[:i])
what = sndhdr.what(TESTFN)
if what is not None:
self.assertEqual(what, expected)
for tf in sndhdr.tests:
with open(TESTFN, 'rb') as f:
t = tf(data[:i], f)
if t is not None:
self.assertEqual(t, what)
Currently this test fails. Make it passing. Maybe the simpler test is enough: |
|
I have imported that test and i'm able to correct some of these failures but I don't really understand exactly how these audio formats work. Is there documentation I can read to get a better understanding of how to handle some of the failure cases this test causes? For example he's a fauilure I'm not really sure the right way to fix. I'm not sure how to adjust the Can you show me how you'd do it or what I should read to understand? Thanks for any help |
|
In general you need to catch all expected errors in |
|
Alternatively you can add |
|
Ping, @davidawad! |
1 similar comment
|
Ping, @davidawad! |
|
@taleinat admittedly I gave up on this after a while. I'll try to take another look this weekend and see if I can figure it out. It was tricky just because I didn't have anyone to really ask questions to but I'm sure we can get it done. |
|
@davidawad, please feel free to ask any questions! You can do so here, on the b.p.o. issue, on the Python IRC or Zulip channels, or on the various mailing lists. |
|
Note the sndhdr module is deprecated in 3.11 and set for removal in 3.13. See PEP 594 – Removing dead batteries from the standard library and #91217. |
|
Closing as the module is deprecated. |


What this is
This is a very small fix for
sndhdr.pyso it can take bad input and not croak. This was originally reported by Jussi Judin on BPO here.To reproduce this these crashes you can try the following examples:
Another second crash comes from sndhdr module itself (again base64 encoded data is first decoded on command line):
What this does
Right now this PR just handles some of the Exceptions caused by bad input and returns
Nonebased on @vstinner 's comments in the BPO.How did I test this?
I ran the unit tests and retried the test cases from the bpo post.
Quick questions
So I have two questions about how this should be handled. This is my first python patch so I don't know if there's a better way to do this.
Should we only handle the specific exceptions that are created in these two examples? or is there a cleaner, more pythonic way to handle it?
Should I add test cases for bad inputs to the test_sndhdr.py file?
Thanks!
edit - sorry about the branch name. typo on my part.
https://bugs.python.org/issue34088