gh-131757: allow lru_cache functions to execute concurrently#131758
gh-131757: allow lru_cache functions to execute concurrently#131758vstinner merged 19 commits intopython:mainfrom
Conversation
|
The difference is that |
Yes, and it behaves the same way as before free-threading - it can get into the same function with the same args multiple times if the calls arrive roughly at the same time. |
+1 in principle. The C version should be at least as good as the pure python version. In practice, this is tricky to get right. @colesbury This PR modifies you previous work. Do you want to take a look at it? @serhiy-storchaka This is mostly your C code. Do you want to look this over? @tom-pytel I suggest looking at the pure python version in |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
How safe is to execute self->hits++ or self->misses++ without critical section?
In |
|
And in |
Also |
|
Also, can you share |
Its in the header of this PR. |
I double checked all these cases as you suggested and its fine. Which makes sense since all this PR really amounts to is releasing the lock on the call to the function, no other behavior is changed. |
ZeroIntensity
left a comment
There was a problem hiding this comment.
I'm a little late to the party, but this looks pretty good.
Misc/NEWS.d/next/Library/2025-03-26-10-56-22.gh-issue-131757.pFRdmN.rst
Outdated
Show resolved
Hide resolved
ZeroIntensity
left a comment
There was a problem hiding this comment.
LGTM as well, thanks for doing this.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Do not overdo this. If is a simple macro to make the code in that file clearer. We do not need a return value. Other similar macros do not use inline functions. If we need that macro in other places, we can update the implementation.
I suggested to implement a simple increment. value++ returns an old value, if this is important.
|
Merged, thank you for this new optimization. |
This PR changes
functools.lru_cacheto only hold critical sections when it is performing operations on itself and not when it calls the wrapped function being cached.Example script timing, current code:
This PR:
Explanation: The script is 16 threads doing a long operation. In the current code they run sequentially because they are serialized by
lru_cache. In this PR they are allowed to run concurrently.Script:
More detail: There are three caching functions that can be used, currently they all execute locked (and by extension the function being cached):
uncached_lru_cache_wrapper- Used whenlru_cache(maxsize=0). Just a passthrough call, this doesn't need a lock at all.infinite_lru_cache_wrapper- Used whenlru_cache(maxsize=None). This can just rely on the possible lock when doing dict operations.bounded_lru_cache_wrapper- Used otherwise. This has been split up into a pre-call function and a post-call function which are both locked individually, with the critical section being released during the actual call to the cached function. This can be done becauselru_cacheis thread-safe and accomodates for the possibility of the cache dictionary changing during execution of the cached function.NOTE: This can be reduced to a single critical section if the locked function is only ever called in its owner thread, but not sure is necessary and wanted to keep things simple for this PR.