X Tutup
The Wayback Machine - https://web.archive.org/web/20260217183206/https://github.com/python/cpython/pull/8319
Skip to content

bpo-34088: sndhdr.what() doesn't fail on bad input. #8319

Closed
davidawad wants to merge 16 commits intopython:mainfrom
davidawad:pso-34088
Closed

bpo-34088: sndhdr.what() doesn't fail on bad input. #8319
davidawad wants to merge 16 commits intopython:mainfrom
davidawad:pso-34088

Conversation

@davidawad
Copy link

@davidawad davidawad commented Jul 17, 2018

What this is

This is a very small fix for sndhdr.py so it can take bad input and not croak. This was originally reported by Jussi Judin on BPO here.

To reproduce this these crashes you can try the following examples:

$ echo UklGRjAwMDBXQVZFZm10IDAwMDABADAwMDAwMDAwMDAwMDAw | python3.7 -mbase64 -d > in.file
$ python3.7 sndhdr/test.py in.file
Traceback (most recent call last):
  File "sndhdr/test.py", line 4, in <module>
    sndhdr.what(sys.argv[1])
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 54, in what
    res = whathdr(filename)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 63, in whathdr
    res = tf(h, f)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 163, in test_wav
    w = wave.open(f, 'r')
  File "/tmp/python-3.7-bin/lib/python3.7/wave.py", line 510, in open
    return Wave_read(f)
  File "/tmp/python-3.7-bin/lib/python3.7/wave.py", line 164, in __init__
    self.initfp(f)
  File "/tmp/python-3.7-bin/lib/python3.7/wave.py", line 153, in initfp
    chunk.skip()
  File "/tmp/python-3.7-bin/lib/python3.7/chunk.py", line 160, in skip
    self.file.seek(n, 1)
  File "/tmp/python-3.7-bin/lib/python3.7/chunk.py", line 113, in seek
    raise RuntimeError
RuntimeError

Another second crash comes from sndhdr module itself (again base64 encoded data is first decoded on command line):

$ echo AAA= | python3.7 -mbase64 -d > in.file
$ python3.7 sndhdr/test.py in.fileTraceback (most recent call last):
  File "sndhdr/test.py", line 4, in <module>
    sndhdr.what(sys.argv[1])
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 54, in what
    res = whathdr(filename)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 63, in whathdr
    res = tf(h, f)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 192, in test_sndr
    rate = get_short_le(h[2:4])
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 213, in get_short_le
    return (b[1] << 8) | b[0]
IndexError: index out of range

What this does

Right now this PR just handles some of the Exceptions caused by bad input and returns None based on @vstinner 's comments in the BPO.

How did I test this?

I ran the unit tests and retried the test cases from the bpo post.

(cpython-venv) [ david@lithium ]  ~/Projects/PYTHON/cpython
$>coverage run --pylib Lib/test/test_sndhdr.py
..
----------------------------------------------------------------------
Ran 2 tests in 0.005s

OK
(cpython-venv) [ david@lithium ]  ~/Projects/PYTHON/cpython
$>

Quick questions

So I have two questions about how this should be handled. This is my first python patch so I don't know if there's a better way to do this.

  • Should we only handle the specific exceptions that are created in these two examples? or is there a cleaner, more pythonic way to handle it?

  • Should I add test cases for bad inputs to the test_sndhdr.py file?

Thanks!

edit - sorry about the branch name. typo on my part.

https://bugs.python.org/issue34088

Lib/sndhdr.py Outdated
try:
res = whathdr(filename)
return res
except RuntimeError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only handle the specific exceptions that are created in these two examples? or is there a cleaner, more pythonic way to handle it?

I think this should follow the pattern set out by sndhdr and have the exception catching be on a per-test basis. Take a look at this block for example:

cpython/Lib/sndhdr.py

Lines 75 to 91 in feabae9

def test_aifc(h, f):
import aifc
if not h.startswith(b'FORM'):
return None
if h[8:12] == b'AIFC':
fmt = 'aifc'
elif h[8:12] == b'AIFF':
fmt = 'aiff'
else:
return None
f.seek(0)
try:
a = aifc.open(f, 'r')
except (EOFError, aifc.Error):
return None
return (fmt, a.getframerate(), a.getnchannels(),
a.getnframes(), 8 * a.getsampwidth())

You should figure out what exceptions the individual tests code can throw and catch those specifically. As it stands now, its possible for a coding error like an out of bounds exception to be caught accidentally by your exception handler.

Copy link
Author

@davidawad davidawad Jul 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, in that case we already catch exactly those two errors, the RuntimeError and the IndexError caused by these bad files and catch them explicitly. I'll add some testcases to Lib/test_sndhdr.py and @you when it's done.

@ammaraskar
Copy link
Member

Should I add test cases for bad inputs to the test_sndhdr.py file?

Yes, definitely add test cases.

@serhiy-storchaka
Copy link
Member

whathdr() shouldn't raise exceptions too, except OSErrors when open or read a file. I concur with @ammaraskar, it is better to catch exceptions on a per-test basis.

And please don't make unrelated changes (like adding spaces or newlines).

@davidawad
Copy link
Author

davidawad commented Jul 17, 2018

@ammaraskar @serhiy-storchaka just to be sure, are you referring to catching exceptions at the sndhdr.what() level? Or are you asking to add exception handling to the lower level test_<filetype>
functions here?

I've added some tests for the different types on inputs this could get, but I want to point out that something weird that I wanted some clarity on. Python hangs when you pass False here.

Python 3.8.0a0 (heads/master:28f07364f0, Jul 17 2018, 11:42:07)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sndhdr
>>> sndhdr.what('NO')
>>> sndhdr.what(None)
>>> sndhdr.what(False)
# process hangs

The same thing happens for True. What's the right way to handle this? raise exception for boolean inputs?

Thanks.

@ammaraskar
Copy link
Member

ammaraskar commented Jul 17, 2018

are you referring to catching exceptions at the sndhdr.what() level? Or are you asking to add exception handling to the lower level test_ functions here?

At the lower test_ level.

Python hangs when you pass False here. What's the right way to handle this? raise exception for boolean inputs?

Don't handle this case explicitly, what's happening here requires a bit of explanation, open() can apparently open file descriptors by their numbers. False is treated as 0 which opens up stdin for reading, that's why it appears to hang.

Moreover, your tests aren't really testing the primary thing addressed in the bug, which is random binary data. Please add a test that attempts to call sndhdr.what on the payloads provided in the bug ticket.

@davidawad
Copy link
Author

davidawad commented Jul 20, 2018

@ammaraskar That's really interesting! I figured it was something like that because i'd pass other numbers like 5, 888, etc. and I'd get an OSError about a bad file descriptor. Good to know!

I added two different files based on the bug ticket with binary data described in the ticket, and added handlers for the exceptions they throw.

A few of the builds are still running but let me know if there's anything else i can do.

Does this warrant an addition to news.d?

Lib/sndhdr.py Outdated
try:
res = whathdr(filename)
return res
except (FileNotFoundError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the extra () and unused as e.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why silence FileNotFoundError at first point?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka This was my mistake. This exception really happens inside of whathdr(). I'll catch it there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka I also added unit tests specifically checking for bad input on both what() and whathdr()

I should have caught this bug when writing those tests. My bad on that, still trying to learn and get better here :)

Lib/sndhdr.py Outdated
return SndHeaders(*res)
return None

except(TypeError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, except TypeError

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiran will do. definitely was silly to create e and not use it. thanks

Lib/sndhdr.py Outdated
rate = get_short_le(h[2:4])
if 4000 <= rate <= 25000:
return 'sndr', rate, 1, -1, 8
except (IndexError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, too

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

Lib/sndhdr.py Outdated
try:
res = whathdr(filename)
return res
except (FileNotFoundError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why silence FileNotFoundError at first point?

Lib/sndhdr.py Outdated
return SndHeaders(*res)
return None

except(TypeError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that no any exception should be ignored at the top level. Catch appropriate exceptions in corresponding test functions.

For example your change silences the error in sndhdr.what(None).

Copy link
Author

@davidawad davidawad Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, definitely. My catch in sndhdr.what should really be happening here. I'll move it. Thanks!

Lib/sndhdr.py Outdated

tests = []


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a pep8 recommendation from a linting tool that I use. I'll remove it.

Lib/sndhdr.py Outdated
f.seek(0)
w = wave.open(f, 'r')
except (EOFError, wave.Error):
except (EOFError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it be written in a single line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's just the 4 error types, if you think it makes more sense to put them in one line I can definitely move them.

Just wondering why? Is it more pythonic to do it that way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because
it
is
better
to
not
split
a
line
without
a
need.

@davidawad
Copy link
Author

I have made the requested changes; please review again.

^^ comment for bot

@bedevere-bot
Copy link

Thanks for making the requested changes!

@tiran: please review the changes made to this pull request.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all unrelated changes and catch exceptions in proper places.

Lib/sndhdr.py Outdated
return SndHeaders(*res)
return None

except (TypeError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't catch TypeError here. Catch it in test functions.

Copy link
Author

@davidawad davidawad Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this TypeError is thrown when passing None to the open() call. That's specific to this function, would you prefer to just not catch it?

let me know whatever you'd prefer and i'll make the change.


When commenting out the try catch this is what happens in whathdr

>>> sndhdr.whathdr(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 62, in whathdr
    with open(filename, 'rb') as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should catch it, from the traceback its very obvious what went wrong, since the documentation explicitly says what() works on files, it is expected to receive FileNotFound and invalid file name errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wholly agree. I'll make the change.

Lib/sndhdr.py Outdated
f.seek(0)
w = wave.open(f, 'r')
except (EOFError, wave.Error):
except (EOFError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because
it
is
better
to
not
split
a
line
without
a
need.

Lib/sndhdr.py Outdated
return (fmt, a.getframerate(), a.getnchannels(),
a.getnframes(), 8 * a.getsampwidth())


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider it gone!

Lib/sndhdr.py Outdated
return None

except (TypeError,
FileNotFoundError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't catch FileNotFoundError.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@davidawad
Copy link
Author

davidawad commented Jul 31, 2018

Alright, this was a really good learning experience. Thank you all for sticking with me and helping me make this change the right way. I think I've got all the changes requested at this point, but of course i'm happy to make more.

Two quick questions:

  • Would you prefer a rebase to compress all these commits into one since there's a lot of commits here?
  • Would it be possible for me to submit a separate style only PR on the sndhdr module to apply those PEP8 changes?

and for the bot:
I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner, @serhiy-storchaka, @tiran: please review the changes made to this pull request.

@davidawad
Copy link
Author

Can anyone review this? It's been a while and I'd love to see it go in.

@csabella
Copy link
Contributor

csabella commented Apr 9, 2019

@tiran @serhiy-storchaka @vstinner Re-requesting review to hopefully push this one to be merged.

@davidair, you don't have to squash the commits as that is done upon merge. Also, we don't generally accept PRs that are just PEP8 changes as they create more noise in the git blame than benefit. Thank you for asking and wanting to improve Python! 🙂

filename = findfile(filename, subdir="sndhdrdata")

what = sndhdr.what(filename)
self.assertEqual(what, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(what, None)
self.assertIsNone(what)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.assertEqual(what, None)

whathdr = sndhdr.whathdr(filename)
self.assertEqual(whathdr, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(whathdr, None)
self.assertIsNone(whathdr)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for tf in sndhdr.tests:
with open(filename, 'rb') as f:
h = f.read(512)
self.assertEqual(tf(h, f), None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(tf(h, f), None)
self.assertIsNone(tf(h, f))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Lib/wave.py Outdated
chunk.skip()
try:
chunk.skip()
except RuntimeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this was added? Tests are passed without this change.

Copy link
Author

@davidawad davidawad May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they actually pass without catching this. It's been a while so looking at this again I get the following issue when running these tests without this fix.

david@lithium ~/P/P/c/L/test>pso-34088>../../python.exe test_sndhdr.py          18:22:08
E..
======================================================================
ERROR: test_badinputs (__main__.TestFormats)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_sndhdr.py", line 35, in test_badinputs
    what = sndhdr.what(filename)
  File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 54, in what
    res = whathdr(filename)
  File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 63, in whathdr
    res = tf(h, f)
  File "/Users/david/Projects/PYTHON/cpython/Lib/sndhdr.py", line 163, in test_wav
    w = wave.open(f, 'r')
  File "/Users/david/Projects/PYTHON/cpython/Lib/wave.py", line 510, in open
    return Wave_read(f)
  File "/Users/david/Projects/PYTHON/cpython/Lib/wave.py", line 164, in __init__
    self.initfp(f)
  File "/Users/david/Projects/PYTHON/cpython/Lib/wave.py", line 153, in initfp
    chunk.skip()
  File "/Users/david/Projects/PYTHON/cpython/Lib/chunk.py", line 160, in skip
    self.file.seek(n, 1)
  File "/Users/david/Projects/PYTHON/cpython/Lib/chunk.py", line 113, in seek
    raise RuntimeError
RuntimeError

----------------------------------------------------------------------
Ran 3 tests in 0.010s

FAILED (errors=1)

Copy link
Author

@davidawad davidawad May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe at the time I thought that this file should have reported the runtime error if it failed to skip through a wav file that was poorly formatted.

Given I'm not very experienced at being a principled programmer; @serhiy-storchaka where do you think is the best place to catch this error or deal with the tests failing on the bad input files?

@davidair
Copy link
Contributor

davidair commented Apr 9, 2019

@tiran @serhiy-storchaka @vstinner Re-requesting review to hopefully push this one to be merged.

@davidair, you don't have to squash the commits as that is done upon merge. Also, we don't generally accept PRs that are just PEP8 changes as they create more noise in the git blame than benefit. Thank you for asking and wanting to improve Python! 🙂

Probably wrong David - did you mean @davidawad?

@csabella
Copy link
Contributor

csabella commented Apr 9, 2019

Probably wrong David - did you mean @davidawad?

Yes, sorry about that and thank you for including the correct David.

@davidawad
Copy link
Author

davidawad commented May 1, 2019

Definitely, thanks for bringing this back to my attention @csabella . I really want to see this somewhat simple PR get merged as it's been a really good learning experience overall and i'd love to try to improve upon this and help out on some of the other easier python issues as well.

@serhiy-storchaka If I didn't make it clear last year I really appreciate your patience and advice it's been really helpful.

I've made the changes but I want to know what you all think the correct way is to deal with the chunk.skip() issue mentioned here: #8319 (comment)

I've made the other changes for the unit tests.

@csabella csabella requested a review from serhiy-storchaka May 19, 2019 02:12
@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented May 19, 2019

Both input1.bad and input2.bad are empty files.

Since all errors are due to truncation, I suggest you to use the following test:

    def test_truncated(self):
        for filename, expected in SAMPLES:
            filename = findfile(filename, subdir="sndhdrdata")
            with open(filename, 'rb') as f:
                data = f.read()
            self.addCleanup(unlink, TESTFN)
            with self.subTest(filename=filename):
                for i in range(len(data)):
                    with open(TESTFN, 'wb') as f:
                        f.write(data[:i])
                    what = sndhdr.what(TESTFN)
                    if what is not None:
                        self.assertEqual(what, expected)
                    for tf in sndhdr.tests:
                        with open(TESTFN, 'rb') as f:
                            t = tf(data[:i], f)
                            if t is not None:
                                self.assertEqual(t, what)

SAMPLES is the same as in test_data, TESTFN and unlink are imported from test.support.

Currently this test fails. Make it passing.

Maybe the simpler test is enough: self.assertEqual(what[0], expected[0]) instead of self.assertEqual(what, expected), but it would be better to achieve the success of the more strong test.

@davidawad
Copy link
Author

davidawad commented Jun 5, 2019

I have imported that test and i'm able to correct some of these failures but I don't really understand exactly how these audio formats work.

Is there documentation I can read to get a better understanding of how to handle some of the failure cases this test causes?

For example he's a fauilure I'm not really sure the right way to fix.

======================================================================
FAIL: test_truncated (__main__.TestFormats) (filename='/Users/david/Projects/PYTHON/cpython/Lib/test/sndhdrdata/sndhdr.au')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_sndhdr.py", line 69, in test_truncated
    self.assertEqual(what, expected)
AssertionError: SndHeaders(filetype='au', framerate=0, nchannels=0, nframes=-1, sampwidth='?') != ('au', 44100, 2, 5.0, 16)

I'm not sure how to adjust the test_au() function in sndhdr.py in order to get this test to pass.

Can you show me how you'd do it or what I should read to understand?

Thanks for any help

@serhiy-storchaka
Copy link
Member

In general you need to catch all expected errors in test_* functions and return None. You have to remove if len(b) < 4: from get_long_be() and catch an IndexError in functions that use it.

@serhiy-storchaka
Copy link
Member

Alternatively you can add if len(h) < 24: return None in test_au() and similar checks in other functions. Then you could use int.from_bytes() instead of get_long_be() and friends.

@taleinat
Copy link
Contributor

taleinat commented Aug 5, 2019

Ping, @davidawad!

1 similar comment
@taleinat
Copy link
Contributor

Ping, @davidawad!

@davidawad
Copy link
Author

@taleinat admittedly I gave up on this after a while. I'll try to take another look this weekend and see if I can figure it out.

It was tricky just because I didn't have anyone to really ask questions to but I'm sure we can get it done.

@taleinat
Copy link
Contributor

@davidawad, please feel free to ask any questions! You can do so here, on the b.p.o. issue, on the Python IRC or Zulip channels, or on the various mailing lists.

@tiran tiran removed their request for review April 18, 2021 07:18
@hugovk
Copy link
Member

hugovk commented Apr 12, 2022

Note the sndhdr module is deprecated in 3.11 and set for removal in 3.13.

See PEP 594 – Removing dead batteries from the standard library and #91217.

@kumaraditya303
Copy link
Contributor

Closing as the module is deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

X Tutup