bpo-38319: Fix shutil._fastcopy_sendfile(): set sendfile() max block size#16491
bpo-38319: Fix shutil._fastcopy_sendfile(): set sendfile() max block size#16491giampaolo merged 6 commits intopython:masterfrom giampaolo:sendfile-32bit-2
Conversation
Lib/shutil.py
Outdated
| # changes while being copied. | ||
| try: | ||
| blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB | ||
| blocksize = min(blocksize, 2 ** 31 - 2) # max 2GB |
There was a problem hiding this comment.
Why not 2 ** 31 - 1?
Would not be better to use a power of two, 2**30?
There was a problem hiding this comment.
Agreed, used 2**30.
There was a problem hiding this comment.
That sounds inefficient on 64-bit system: files larger than 2 GB will need multiple sendfile() syscalls, rather currently a single one is enough, no? Maybe use sys.maxsize instead?
There was a problem hiding this comment.
On my 64-bit Ubuntu 18.04 box sendfile() always copies up to 2GB per call, never more than that.
There was a problem hiding this comment.
Well ok, but the kernel can be updated to support larger copies. Why should Python put an arbitrary limit on a syscall? Well, you're the expert, it's up to you. It was just a comment ;-)
There was a problem hiding this comment.
OK, I think it makes sense. I added:
+ if sys.maxsize < 2 ** 32: # 32-bit architecture
+ blocksize = min(blocksize, 2 ** 30) # max 1GiBThere was a problem hiding this comment.
...I did that for shutil only. In socket.py I decided to always limit to max 1GiB: since we're dealing with a socket I want to exercise the loop more often (and hit the selector(), which implements the timeout logic).
Lib/shutil.py
Outdated
| try: | ||
| blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB | ||
| blocksize = min(blocksize, 2 ** 31 - 2) # max 2GB | ||
| except Exception: |
There was a problem hiding this comment.
What exceptions besides OSError can be raised? I do not feel comfortable with silencing a wide range of exceptions.
Lib/socket.py
Outdated
| @@ -357,6 +357,7 @@ def _sendfile_use_sendfile(self, file, offset=0, count=None): | |||
| if not fsize: | |||
| return 0 # empty file | |||
| blocksize = fsize if not count else count | |||
There was a problem hiding this comment.
| blocksize = fsize if not count else count | |
| blocksize = count or fsize |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Nitpick: I would use binary prefixes for exact powers of two (MiB, GiB).
Misc/NEWS.d/next/Library/2019-09-30-22-06-33.bpo-38319.5QjiDa.rst
Outdated
Show resolved
Hide resolved
Lib/shutil.py
Outdated
| except Exception: | ||
| blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MiB | ||
| if sys.maxsize < 2 ** 32: # 32-bit architecture | ||
| blocksize = min(blocksize, 2 ** 30) # max 1GiB |
There was a problem hiding this comment.
Can you please add a comment explaining this 1 GiB limit?
Lib/socket.py
Outdated
| if not fsize: | ||
| return 0 # empty file | ||
| blocksize = fsize if not count else count | ||
| blocksize = min(count or fsize, 2 ** 30) # max 1GiB |
There was a problem hiding this comment.
Do we have to truncate the size on 64-bit system? Please add a comment. I suggest to write this instruction in two limits, to make the purpose of each line more explicit. Lilke:
blocksize = fsize if not count else count
# bpo-38319: truncate to 1 GiB to prevent OverflowError
blocksize = min(blocksize, 2 ** 30)
1 GiB sounds an arbitrary limit. If the developer wants a fine control on the timing, they can pass a smaller "count", no? The current implementation of _sendfile_use_sendfile() doesn't seem to care about timeout. Can't it be implemented using a non-blocking socket and poll until the socket is ready to send? It should prevent to block in sendfile(), no? asyncio uses sendfile() on non-blocking socket, so it should be possible to do something similar in socket to respect the timeout :-) To fix https://bugs.python.org/issue38319 and only fix this issue: |
|
I think it is better to use a block size which is a power of two. It guaranties that it is divisible by the page size and by the logical or physical disk block size. 1 GiB is the largest power of two which not larger than I think it is worth to convert a count to |
Oh, I see. Sorry, I was lost in all these comments on the PR. That's why I asked for a quick comment added directly in the code to explain that. |
|
Thanks @giampaolo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8. |
|
I'm having trouble backporting to |
|
Thanks @giampaolo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8. |
|
GH-16506 is a backport of this pull request to the 3.8 branch. |
|
Thanks for the code review. |
|
Thanks @giampaolo for the fix, but also for your overall work: it's really cool that Python can use transparently sendfile() internally! I like this syscall() :-) |
An attempt to fix:
https://bugs.python.org/issue38319
https://bugs.python.org/issue38319