gh-117709: Add vectorcall support to str()#117725
gh-117709: Add vectorcall support to str()#117725erlend-aasland wants to merge 10 commits intopython:mainfrom
str()#117725Conversation
|
Some crude This PR# positional-only args
$ ./python.exe -m timeit "str()"
20000000 loops, best of 5: 14.6 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56.8 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"
10000000 loops, best of 5: 37.3 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 42.6 nsec per loop
# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
5000000 loops, best of 5: 48.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
5000000 loops, best of 5: 42.4 nsec per loop
# fallback to tp_call
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loop`main`# positional-only args
$ ./python.exe -m timeit "str()"
10000000 loops, best of 5: 26.2 nsec per loop
$ ./python.exe -m timeit "str(1)"
5000000 loops, best of 5: 56 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1')"¨
5000000 loops, best of 5: 75.4 nsec per loop
$ ./python.exe -m timeit "str(b'a', 'latin1', 'strict')"
5000000 loops, best of 5: 82.2 nsec per loop
# with kw args
$ ./python.exe -m timeit "str(b'a', 'latin1', errors='strict')"
2000000 loops, best of 5: 145 nsec per loop
$ ./python.exe -m timeit "str(b'a', encoding='latin1')"
2000000 loops, best of 5: 137 nsec per loop
$ ./python.exe -m timeit "str(object=b'a', encoding='latin1', errors='strict')"
2000000 loops, best of 5: 182 nsec per loopFootnotes |
| } | ||
| return PyUnicode_FromEncodedObject(object, encoding, errors); | ||
| } | ||
| if (nargs + nkwargs == 3) { |
There was a problem hiding this comment.
I'm not sure that it's worth it to optimize the 3 arguments case using vectorcall.
There was a problem hiding this comment.
Yes, this is the real question: which cases do we want to optimise? It would be nice if we had some usage stats. Perhaps the Faster CPython team has stats. cc. @mdboom
There was a problem hiding this comment.
Using a regular expression, I only found 5 lines in the whole stdlib (excluding tests) which call str() with 3 arguments, and all these matching lines only use positional arguments:
$ git grep '\<str *([^,()]\+,[^,()]\+,'
Lib/email/utils.py: return str(rawbytes, charset, errors)
Lib/encodings/punycode.py: base = str(text[:pos], "ascii", errors)
Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass')
Lib/pickletools.py: return str(data, 'utf-8', 'surrogatepass')I modified git grep output to ignore tests, documentation and lines which care not str() calls (like Argument Clinic code).
There was a problem hiding this comment.
By the way, I cannot find any str(bytes, encoding=...) or str(bytes, errors=...) call in the stdlib. I used the regex: git grep '\<str *([^,()]\+,[^,()]\+='. There are only results in the documentation and tests.
In short, str() is only called with positional-only arguments. We should maybe strictly focus on these cases?
There was a problem hiding this comment.
Well, optimising only for the positional-only argument cases would definitely make the PR smaller and the resulting code more maintainable.
There was a problem hiding this comment.
A PR for positional-only cases is up for comparison:
There was a problem hiding this comment.
There are only results in the documentation [...]
Well, people often copy the examples from the documentation, so I would expect the examples shown in the docs to be very common in the wild.
|
Thanks for the initial review, Victor; the PR is now ~20 lines shorter ;) |
vstinner
left a comment
There was a problem hiding this comment.
Can you please a PR without keyword arguments? So we can compare the two implementations.
| } | ||
|
|
||
| static PyObject * | ||
| fallback_to_tp_call(PyObject *type, Py_ssize_t nargs, Py_ssize_t nkwargs, |
There was a problem hiding this comment.
Would it be possible to inline this function in unicode_vectorcall(), using goto if needed? The function name is too generic, it doesn't mention unicode_new().
There was a problem hiding this comment.
Sure; that was my original code, but I refactored it out in order to reduce the size of the vectorcall function.
There was a problem hiding this comment.
I think the code is nicer with this refactoring in place though; perhaps a better name would suffice.
Created: |
|
I don't think that the complexity of handling keyword arguments is worth it. I prefer to only optimize positional-only arguments. |
|
Closed in favour of #117746. We can revisit the need to speed up some of the keyword-arg cases later. |
I would be fine if code parsing keyword arguments would be generated by Argument Clinic. But hand written code is more expensive to maintain. Here I don't think that it's worth it. |
Never forget #87613 |
IMO, this is bordering too much complexity, even though the speedup is considerable for some of the cases. We consider adding a simplified variant first.
PyUnicode_Typeneeds vectorcall. #117709