X Tutup
The Wayback Machine - https://web.archive.org/web/20250613114211/https://github.com/python/cpython/pull/101445
Skip to content

gh-101444: Optimize bytearray slice assignment for bytes-like object #101445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

msoxzw
Copy link
Contributor

@msoxzw msoxzw commented Jan 31, 2023

In addition to bytearray, bytes-like object supporting buffer protocol could also bypass unnecessary data copies, thereby giving 3 ~ 4 times speedup.

bytes is equivalent to immutable bytearray, and therefore, we could safely avoid unnecessary copy, thereby giving 300% speedup or so.
In addition to bytearray, bytes-like object supporting buffer protocol could bypass unnecessary data copies, thereby giving 3 times speedup or so.
@msoxzw msoxzw changed the title gh-101444: Optimize bytearray slice assignment for bytes gh-101444: Optimize bytearray slice assignment for bytes-like object Feb 6, 2023
@arhadthedev arhadthedev added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 6, 2023
@msoxzw
Copy link
Contributor Author

msoxzw commented Feb 8, 2023

I leveraged GitHub actions to measure optimization results on various systems. Only differences between bytearray and bytes, memoryview are significant, since benchmarks might not be run on the identical hardware.
python -m timeit -s "a=bytearray(4096);b=b'x'*1024;" "a[:1024]=b"
python -m timeit -s "a=bytearray(4096);b=memoryview(b'x'*1024);" "a[:1024]=b"
python -m timeit -s "a=bytearray(4096);b=bytearray(b'x'*1024);" "a[:1024]=b"

Before: https://github.com/msoxzw/cpython/actions/runs/4120204201

System bytes memoryview bytearray
Windows (x86) 392 ns 389 ns 111 ns
Windows (x64) 409 ns 420 ns 112 ns
macOS 397 ns 358 ns 90.2 ns
Ubuntu 292 ns 284 ns 91.1 ns

After: https://github.com/msoxzw/cpython/actions/runs/4120296060

System bytes memoryview bytearray
Windows (x86) 150 ns 163 ns 168 ns
Windows (x64) 83.3 ns 86.2 ns 81.9 ns
macOS 93.3 ns 99.7 ns 98.7 ns
Ubuntu 88.8 ns 76.2 ns 71.5 ns

Therefore, this PR makes bytearray slice assignment for bytes-like object run as fast as that for bytearray.

@msoxzw
Copy link
Contributor Author

msoxzw commented Feb 22, 2023

I manage to benchmark only on Windows on the same machine through GitHub actions. Buffer protocol would incur marginal ~10% overhead, if bytearray is regarded as buffer object.
https://github.com/msoxzw/cpython/actions/runs/4238205216/jobs/7365033501

time (ns) bytes memoryview bytearray
before 417 405 113
after 120 130 125

If data copies are bypassed only for bytes objects, such performance penalty would be avoided accordingly.
https://github.com/msoxzw/cpython/actions/runs/4238204896/jobs/7365028180

time (ns) bytes memoryview bytearray
before 331 332 93.4
after 89.9 343 91.8

So is it acceptable to such performance overhead?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work for ba[:] = memoryview(b'abcd')[::2]?

@msoxzw
Copy link
Contributor Author

msoxzw commented Feb 14, 2024

How does it work for ba[:] = memoryview(b'abcd')[::2]?

Thanks for such wonderful review.

It would work like byte or bytearray concatenation: bytearray() + memoryview(b'abcd')[::2]

This behavior is dictated by PyBUF_SIMPLE flag in PyObject_GetBuffer function. Nevertheless, it is probable to preserve original behavior, and thus maintain API compatibility.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring arbitrary errors in PyObject_GetBuffer() is not good.

How does it work for ba[::-1] = memoryview(ba)? For ba[:0] = memoryview(ba), ba[:2] = memoryview(ba)[-2:]?

Please add tests for all cases in which your intermediate code failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
X Tutup