gh-139001: Fix thread-safety issue in pathlib.Path#139066
gh-139001: Fix thread-safety issue in pathlib.Path#139066colesbury merged 3 commits intopython:mainfrom
pathlib.Path#139066Conversation
Don't cache the joined path in `_raw_path` because the caching isn't thread safe.
|
It's a shame to lose the caching. Presumably something like this wouldn't work? @property
def _raw_path(self):
paths = self._raw_paths
if len(paths) == 1:
return paths[0]
elif paths:
# Join path segments from the initializer.
path = self.parser.join(*paths)
# Cache the joined path.
paths[:] = [path]
return path
else:
paths[:] = ['']
return '' |
|
That doesn't seem to work, at least not in the free threaded build. I've added a test case. I think we could probably make the caching thread safe with some extra work, but I'm not sure it's worth it. It looks like |
|
import pyperf
from pathlib import Path # or from pathlib_new which is a patched version
PATH = "/" + "a" * 511
p = Path(PATH)
def bench():
for i in range(100000):
a = p.root
del a
runner = pyperf.Runner()
runner.bench_func('bench', bench)
|
|
By bad, previous bench have some mistake import pyperf
from pathlib import Path # or from pathlib_new which is a patched version
PATH = [chr(ord('a') + (i % 26)) for i in range(20)]
p = Path(*PATH)
def bench():
a = p.root
del a
runner = pyperf.Runner()
runner.bench_func('bench', bench)Here's results |
|
Updated: import pyperf
from pathlib_new import Path
PATH = ["abc123" for _ in range(200)]
def bench():
p = Path(*PATH)
a = p.root
del a
runner = pyperf.Runner()
runner.bench_func('bench', bench)Base on the code, I find that if the path level < 200 , this patch is fater than old way. |
|
@barneygale, are you okay with this change? |
|
As an illustration of the current caching behaviour, consider: p = Path('/usr', 'local', 'foo', 'bar')
q0 = p / 'spam.txt'
q1 = p / 'eggs.txt'
str(p)
str(q0)
str(q1)The current implementation would make these calls: os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr/local/foo/bar', 'spam.txt')
os.path.join('/usr/local/foo/bar', 'eggs.txt')With this PR: os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr', 'local', 'foo', 'bar', 'spam.txt')
os.path.join('/usr', 'local', 'foo', 'bar', 'eggs.txt')The extra arguments have a cost IIRC. And maybe that's fine, I just wanted to explain the current behaviour better. |
|
@barneygale, I've added a print statement to os.path.join('/usr', 'local', 'foo', 'bar')
os.path.join('/usr', 'local', 'foo', 'bar', 'spam.txt')
os.path.join('/usr', 'local', 'foo', 'bar', 'eggs.txt') |
|
Oh! If you move |
|
Yes, then you get the behavior you described |
barneygale
left a comment
There was a problem hiding this comment.
If we can bring the optimization back in future that would be great, but for now it's better to have pathlib working in free-threaded python. Thanks for fixing this.
|
Thanks @colesbury for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. |
|
GH-139926 is a backport of this pull request to the 3.14 branch. |
|
@barneygale thanks for the review |
Don't cache the joined path in
_raw_pathbecause the caching isn't thread safe.pathlibon3.14t#139001