gh-71936: Fix race condition in multiprocessing.Pool#124973
Merged
encukou merged 6 commits intopython:mainfrom Nov 13, 2024
Merged
gh-71936: Fix race condition in multiprocessing.Pool#124973encukou merged 6 commits intopython:mainfrom
encukou merged 6 commits intopython:mainfrom
Conversation
Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows. A connection would be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task, and a new object is allocated that shares the id() of a previously deleted one. Instead of using the id() of the token (or the proxy), use a unique, non-reusable number. Co-Authored-By: Akinori Hattori <hattya@gmail.com>
gpshead
approved these changes
Nov 11, 2024
Member
gpshead
left a comment
There was a problem hiding this comment.
A couple possible things to improve, but good regardless.
I'm not worried about a regression test for this. Hard, and clearly better than what came before.
gpshead
approved these changes
Nov 12, 2024
|
As I confirmed in #71936 (comment), this solves the problem for me. Will this be back ported? |
Member
Author
I see no issues in our post-merge testing, so, yes :) |
|
Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
|
Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this pull request
Nov 15, 2024
…124973) * pythongh-71936: Fix race condition in multiprocessing.Pool Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows. A connection would be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task, and a new object is allocated that shares the id() of a previously deleted one. Instead of using the id() of the token (or the proxy), use a unique, non-reusable number. (cherry picked from commit ba088c8) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-Authored-By: Akinori Hattori <hattya@gmail.com>
|
GH-126869 is a backport of this pull request to the 3.13 branch. |
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this pull request
Nov 15, 2024
…124973) * pythongh-71936: Fix race condition in multiprocessing.Pool Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows. A connection would be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task, and a new object is allocated that shares the id() of a previously deleted one. Instead of using the id() of the token (or the proxy), use a unique, non-reusable number. (cherry picked from commit ba088c8) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-Authored-By: Akinori Hattori <hattya@gmail.com>
|
GH-126870 is a backport of this pull request to the 3.12 branch. |
encukou
added a commit
that referenced
this pull request
Nov 15, 2024
… (GH-126870) Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows. A connection would be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task, and a new object is allocated that shares the id() of a previously deleted one. Instead of using the id() of the token (or the proxy), use a unique, non-reusable number. (cherry picked from commit ba088c8) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Akinori Hattori <hattya@gmail.com>
encukou
added a commit
that referenced
this pull request
Nov 15, 2024
… (GH-126869) Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows. A connection would be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task, and a new object is allocated that shares the id() of a previously deleted one. Instead of using the id() of the token (or the proxy), use a unique, non-reusable number. (cherry picked from commit ba088c8) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Akinori Hattori <hattya@gmail.com>
picnixz
pushed a commit
to picnixz/cpython
that referenced
this pull request
Dec 8, 2024
…124973) * pythongh-71936: Fix race condition in multiprocessing.Pool Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows. A connection would be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task, and a new object is allocated that shares the id() of a previously deleted one. Instead of using the id() of the token (or the proxy), use a unique, non-reusable number. Co-Authored-By: Akinori Hattori <hattya@gmail.com>
ebonnal
pushed a commit
to ebonnal/cpython
that referenced
this pull request
Jan 12, 2025
…124973) * pythongh-71936: Fix race condition in multiprocessing.Pool Proxes of shared objects register a Finalizer in BaseProxy._incref(), and it will call BaseProxy._decref() when it is GCed. This may cause a race condition with Pool(maxtasksperchild=None) on Windows. A connection would be closed and raised TypeError when a GC occurs between _ConnectionBase._check_writable() and _ConnectionBase._send_bytes() in _ConnectionBase.send() in the second or later task, and a new object is allocated that shares the id() of a previously deleted one. Instead of using the id() of the token (or the proxy), use a unique, non-reusable number. Co-Authored-By: Akinori Hattori <hattya@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is based on @hattya's gh-98274.
As far as I can tell, the race condition is caused by using
id()as a unique identifier, assuming that it can't be reused.@hattya's PR supplements the remote object's id with the id of the proxy. But since
_decrefis called after the proxy is deleted, that id could be reused as well. Note that this is a theoretical concern, I wasn't able to reproduce it.This PR uses a counter (with locked store&increment) to get unique values for
_idset.Unfortunately, I have no idea how to test this :/