Refine specialization caches and extend binary-op coverage#7386
Refine specialization caches and extend binary-op coverage#7386youknowone merged 14 commits intoRustPython:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCentralizes per-type specialization into a TypeSpecializationCache, adds BinaryOpExtend external specialization and caching, introduces init‑cleanup code and Latin‑1 single‑char caching in Context (used by str ToPyObject), adds ordering-aware atomic accessors, and refactors frame/with_frame and exact-args frame preparation. Changes
Sequence Diagram(s)sequenceDiagram
participant PyType
participant Cache as TypeSpecializationCache
participant Mutex as write_lock (Mutex)
participant GC
PyType->>Cache: invalidate_for_type_modified()
Cache->>Mutex: lock()
Cache->>Cache: swap_init(None) / swap_getitem(None)
Cache->>Cache: retire_old_function(old_fn)
Mutex->>Cache: unlock()
GC->>Cache: traverse()
Cache-->>GC: visit init, getitem, retired funcs
sequenceDiagram
participant FrameExec as FrameExecution
participant ExtCache as ExternalCache
participant Descr as BinaryOpExtendDescr
participant VM
FrameExec->>ExtCache: read_cached_binary_op_extend_descr()
alt cache hit
ExtCache-->>FrameExec: descr
else cache miss
FrameExec->>FrameExec: inspect operand types (compact int/float)
FrameExec->>Descr: select descriptor
FrameExec->>ExtCache: write_cached_binary_op_extend_descr(descr)
Descr-->>FrameExec: descriptor
end
FrameExec->>Descr: execute specialized action
Descr-->>FrameExec: result
FrameExec->>VM: continue execution
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
crates/vm/src/vm/context.rs (1)
476-482: The singleton fast path still materializes a temporaryPyStrfor borrowed inputs.For call sites like
new_str(" "), thes.into()happens before the Latin-1 check, so the cache hit only skips the final object allocation. If you want this path to be allocation-free for single-character borrowed inputs, it probably needs a borrowed-input fast path or a dedicatednew_char/latin1_charentry point at the API boundary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/context.rs` around lines 476 - 482, new_str currently calls s.into() before checking latin1_singleton_index, which materializes a temporary PyStr for borrowed inputs and defeats the allocation-free singleton fast path; change the implementation to avoid calling s.into() until after the Latin‑1 single-char check by either adding an overload/new API that accepts a borrowed &str (or &impl AsRef<str>) and runs Self::latin1_singleton_index(&s) first, or add a dedicated API like new_char/latin1_char used at the API boundary for single-character inputs and have new_str route to it only after converting non-borrowed inputs; ensure latin1_singleton_index is invoked on the borrowed string and that you only call s.into() when the singleton fast path fails so no temporary PyStr is allocated for borrowed single-char inputs.crates/vm/src/frame.rs (1)
4089-4098: Extend this specialized char fast path beyond ASCII if the new cache is Latin-1-wide.
BinaryOpSubscrStrIntonly hitsascii_char_cache, so one-character subscripts in'\x80'..='\xff'still bounce to the generic path even after quickening. If the new singleton cache is meant to cover all Latin-1 characters, wiring it in here would avoid that steady-state miss.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4089 - 4098, The specialized fast path in BinaryOpSubscrStrInt currently only uses vm.ctx.ascii_char_cache for one-char string subscripts, causing Latin-1 (0x80..0xFF) characters to miss the cache; update the path after obtaining ch (and ascii_idx) to branch: if ch.is_ascii() continue using vm.ctx.ascii_char_cache[ascii_idx], else index into the new Latin-1 singleton cache (e.g., vm.ctx.latin1_char_cache[ascii_idx] or vm.ctx.single_char_cache[ascii_idx], whichever your context defines) and push that clone.into(); keep the same guards (specialization_nonnegative_compact_index, getitem_by_index) and ensure ascii_idx is computed from ch.to_u32() and bounds-checked for 0..256 before indexing the Latin-1 cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 275-279: TypeSpecializationCache currently versions only getitem
via getitem_version while init is unversioned, allowing modified() to reset
tp_version_tag and later returns a stale pre-mutation init; fix by adding an
init_version (e.g., AtomicU32) to TypeSpecializationCache and update/clear it
wherever getitem_version is updated/cleared (including modified()), and ensure
any code that reads init (the places that currently check tp_version_tag and
return init) also checks init_version matches the current tp_version_tag before
returning — alternately, if you prefer clearing, explicitly clear init in
modified() wherever getitem is invalidated so init cannot be returned for a new
type version.
- Line 281: The field retired: PyRwLock<Vec<PyObjectRef>> currently holds strong
references to replaced cached functions with no reclamation when vm is None,
causing unbounded growth for live heap types; update invalidation to avoid
storing permanent strong refs by using temporary weak/VM-tied refs or by
enqueueing replacements into a VM-aware reclamation pass instead of pushing into
retired directly, and implement a grace-period reclamation sweep that clears
entries for live types (invoke clear_into() or similar) when a VirtualMachine is
available; changes should touch the retired field usage sites and the logic that
pushes into retired (the places around functions/methods handling
caching/invalidation) to convert to weak refs or VM-driven deferred drop and add
a periodic drain that reclaims entries for active heap types.
In `@crates/vm/src/frame.rs`:
- Around line 7755-7759: The cached __getitem__ specialisation is being read
without validating the type version tag, allowing stale cache use after
rebinding; update the read path (the BinaryOpSubscrGetitem handler /
get_cached_getitem_for_specialization) to mirror
get_cached_init_for_specialization: load cls.tp_version_tag with Acquire and
compare it to the cached_version before using the cached entry, and only use the
cache when cls.tp_version_tag.load(Acquire) == cached_version; otherwise treat
as cache miss. Ensure you use the same Acquire ordering and the same comparison
pattern used by get_cached_init_for_specialization and keep cache_write via
cache_getitem_for_specialization unchanged.
---
Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 4089-4098: The specialized fast path in BinaryOpSubscrStrInt
currently only uses vm.ctx.ascii_char_cache for one-char string subscripts,
causing Latin-1 (0x80..0xFF) characters to miss the cache; update the path after
obtaining ch (and ascii_idx) to branch: if ch.is_ascii() continue using
vm.ctx.ascii_char_cache[ascii_idx], else index into the new Latin-1 singleton
cache (e.g., vm.ctx.latin1_char_cache[ascii_idx] or
vm.ctx.single_char_cache[ascii_idx], whichever your context defines) and push
that clone.into(); keep the same guards
(specialization_nonnegative_compact_index, getitem_by_index) and ensure
ascii_idx is computed from ch.to_u32() and bounds-checked for 0..256 before
indexing the Latin-1 cache.
In `@crates/vm/src/vm/context.rs`:
- Around line 476-482: new_str currently calls s.into() before checking
latin1_singleton_index, which materializes a temporary PyStr for borrowed inputs
and defeats the allocation-free singleton fast path; change the implementation
to avoid calling s.into() until after the Latin‑1 single-char check by either
adding an overload/new API that accepts a borrowed &str (or &impl AsRef<str>)
and runs Self::latin1_singleton_index(&s) first, or add a dedicated API like
new_char/latin1_char used at the API boundary for single-character inputs and
have new_str route to it only after converting non-borrowed inputs; ensure
latin1_singleton_index is invoked on the borrowed string and that you only call
s.into() when the singleton fast path fails so no temporary PyStr is allocated
for borrowed single-char inputs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 0b85ddd0-26a0-4787-9ee7-d1be5d97f5bd
📒 Files selected for processing (7)
crates/vm/src/builtins/str.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/object/ext.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/types/slot.rscrates/vm/src/vm/context.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/frame.rs (1)
8715-8725:⚠️ Potential issue | 🟠 Major
extra_bytesis dropped for heap-backed__init__frames.When
datastack_frame_size_bytes()returnsNone, Lines 8721-8725 returntrueunconditionally. That makes the newCallAllocAndEnterInitfast path skip the DataStack capacity check for the init-cleanup shim, even thoughspecialization_new_init_cleanup_frame(..., true, vm)still allocates that shim on the DataStack. A heap-backed__init__can therefore pass this guard and still overflow/panic in the shim allocation path.Suggested fix
match func.datastack_frame_size_bytes() { Some(frame_size) => frame_size .checked_add(extra_bytes) .is_some_and(|size| vm.datastack_has_space(size)), - None => true, + None => extra_bytes == 0 || vm.datastack_has_space(extra_bytes), }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 8715 - 8725, specialization_has_datastack_space_for_func_with_extra currently returns true when func.datastack_frame_size_bytes() is None, which lets CallAllocAndEnterInit skip datastack capacity checks and can overflow when specialization_new_init_cleanup_frame(..., true, vm) still allocates an init-cleanup shim on the datastack; change the None branch to conservatively check vm.datastack_has_space(extra_bytes) (or vm.datastack_has_space(shim_size) if you compute shim size separately) so heap-backed (__init__) frames still verify the extra_bytes required for the init-cleanup shim before taking the fast path.crates/vm/src/builtins/function.rs (1)
1375-1388: 🛠️ Refactor suggestion | 🟠 MajorRequire
CO_OPTIMIZEDbefore taking this vectorcall fast path.
prepare_exact_args_frame()is explicitly an optimized-only path, butis_simpledoes not check that flag. A non-optimized function with exact positional args can still reach Line 1388 in release builds and bypass the regular argument setup.Suggested fix
- let is_simple = !has_kwargs + let is_simple = !has_kwargs + && code.flags.contains(bytecode::CodeFlags::OPTIMIZED) && !code.flags.contains(bytecode::CodeFlags::VARARGS) && !code.flags.contains(bytecode::CodeFlags::VARKEYWORDS) && code.kwonlyarg_count == 0 && !code .flags .intersects(bytecode::CodeFlags::GENERATOR | bytecode::CodeFlags::COROUTINE);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/function.rs` around lines 1375 - 1388, The fast-path allows calling prepare_exact_args_frame for functions that aren't marked optimized; add a check for the optimized code flag to the is_simple condition so the vectorcall fast path only triggers for optimized functions: include bytecode::CodeFlags::CO_OPTIMIZED (or the equivalent OPTIMIZED variant) in the conjuncts checked on code.flags (i.e., extend the is_simple boolean) so prepare_exact_args_frame and the fast-path branch are only taken when that optimized flag is present.
♻️ Duplicate comments (3)
crates/vm/src/frame.rs (1)
7794-7798:⚠️ Potential issue | 🟠 MajorCached
__getitem__still needs a type-version guard.Specializing the opcode here is still unsafe because the execution path at Lines 4043-4049 reads
get_cached_getitem_for_specialization()without first confirmingowner.class().tp_version_tag.load(Acquire) == cached_version. Rebinding__getitem__on a heap type can keep this fast path calling stale code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7794 - 7798, The cached __getitem__ specialization is used without verifying the class type-version, so add a type-version guard: when installing/using the specialization via cls.cache_getitem_for_specialization and when reading via get_cached_getitem_for_specialization, compare owner.class().tp_version_tag.load(Acquire) against the cached_version before taking the fast path; if they differ, fall back to the slow path and refresh the cache (or invalidate the cached handler) so rebinding __getitem__ on a heap type cannot call stale code.crates/vm/src/builtins/type.rs (2)
275-281:⚠️ Potential issue | 🟠 MajorVersion the cached
__init__entry, not just the owning type.
tp_version_tagonly changes when the type mutates.PyFunction::func_versioncan change independently via__code__,__defaults__, or__kwdefaults__, butget_cached_init_for_specialization()still reuses the cached function on a type-version match alone. That leaves the CALL_ALLOC_AND_ENTER_INIT fast path free to keep drivingprepare_exact_args_frame()with an__init__that no longer satisfies the cached exact-args contract.Suggested direction
pub struct TypeSpecializationCache { pub init: PyAtomicRef<Option<PyFunction>>, + pub init_version: AtomicU32, pub getitem: PyAtomicRef<Option<PyFunction>>, pub getitem_version: AtomicU32, @@ Self { init: PyAtomicRef::from(None::<PyRef<PyFunction>>), + init_version: AtomicU32::new(0), getitem: PyAtomicRef::from(None::<PyRef<PyFunction>>), getitem_version: AtomicU32::new(0), write_lock: PyMutex::new(()), retired: PyRwLock::new(Vec::new()), } @@ pub(crate) fn cache_init_for_specialization( &self, init: PyRef<PyFunction>, tp_version: u32, vm: &VirtualMachine, ) -> bool { + let func_version = init.get_version_for_current_state(); + if func_version == 0 { + return false; + } ... ext.specialization_cache.swap_init(Some(init), Some(vm)); + ext.specialization_cache + .init_version + .store(func_version, Ordering::Relaxed); true }You'll also need the read side and CALL_ALLOC_AND_ENTER_INIT caller to validate
init_versionbefore reusing the cached function.Also applies to: 893-931
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 275 - 281, The cached __init__ entry must be versioned separately from the owning type: add an init_version (AtomicU32) to TypeSpecializationCache and when caching a PyFunction in get_cached_init_for_specialization() store the function's current PyFunction::func_version into init_version; on read side (and in CALL_ALLOC_AND_ENTER_INIT / prepare_exact_args_frame) validate that both the cached type version (getitem_version or type's tp_version_tag) and init_version match the live PyFunction::func_version before reusing the cached PyFunction; update cache write paths to set init and init_version together and ensure callers reject the cached init if the versions differ so that prepare_exact_args_frame() never runs with a stale __init__.
303-315:⚠️ Potential issue | 🟠 Major
retiredstill grows for every live-type invalidation.
modified()always reachesinvalidate_for_type_modified()without aVirtualMachine, so bothswap_init(None, None)andswap_getitem(None, None)fall back to pushing the replaced functions intoretired.clear_into()is the only drain path here, so a long-lived heap type that alternates specialization and attribute writes will retain those strong refs indefinitely.Also applies to: 318-336, 352-364
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 303 - 315, The code currently always moves replaced function objects into the global `retired` list when `vm` is None (in `swap_init` and `swap_getitem`), causing `retired` to grow for long‑lived heap types; change both `swap_init` and `swap_getitem` so that when `vm` is None they do NOT push the old value into `retired` but instead release the strong reference immediately (or convert to a non‑retaining/weak form) to avoid accumulation — implement this by extracting the old value via the existing `self.init.swap(new_init)` path but then dropping or turning it into a non‑retaining placeholder via a new helper (e.g. `drop_old_without_retire`) rather than calling `retire_old_function`; keep the current `vm`-present path that uses `self.init.swap_to_temporary_refs(..., vm)` and `retire_old_function` unchanged so CPython-style temporary refs remain for executing frames.
🧹 Nitpick comments (2)
.cspell.json (1)
63-64: Minor: words not in alphabetical order."compactlong" and "compactlongs" start with "c" but are placed after "d" words. For consistency with the rest of the list, consider moving them to the correct alphabetical position (after "coro").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.cspell.json around lines 63 - 64, The two dictionary entries "compactlong" and "compactlongs" are out of alphabetical order; move those two entries so they appear after the "coro" entry (i.e., in the block of "c" words) to maintain consistent alphabetical ordering in the word list.crates/vm/src/vm/context.rs (1)
424-428: Give the shim its own synthetic code identity.Using
__init__forsource_path,obj_name, andqualnamewill make tracebacks and code introspection look like user-defined__init__bytecode even when the frame is just internal cleanup. A dedicated internal label such as<init_cleanup>would avoid that confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/context.rs` around lines 424 - 428, The frame/context currently reuses names.__init__ for source_path, obj_name, and qualname, which makes internal cleanup frames appear as user __init__ code; change those fields (source_path, obj_name, qualname) to a dedicated synthetic identifier like "<init_cleanup>" when constructing this shim frame (where names.__init__ is currently used) so tracebacks and introspection show an internal label instead of a user __init__; keep other fields (e.g. first_line_number, max_stackdepth) as-is or set a fixed sane value if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 1375-1388: The fast-path allows calling prepare_exact_args_frame
for functions that aren't marked optimized; add a check for the optimized code
flag to the is_simple condition so the vectorcall fast path only triggers for
optimized functions: include bytecode::CodeFlags::CO_OPTIMIZED (or the
equivalent OPTIMIZED variant) in the conjuncts checked on code.flags (i.e.,
extend the is_simple boolean) so prepare_exact_args_frame and the fast-path
branch are only taken when that optimized flag is present.
In `@crates/vm/src/frame.rs`:
- Around line 8715-8725: specialization_has_datastack_space_for_func_with_extra
currently returns true when func.datastack_frame_size_bytes() is None, which
lets CallAllocAndEnterInit skip datastack capacity checks and can overflow when
specialization_new_init_cleanup_frame(..., true, vm) still allocates an
init-cleanup shim on the datastack; change the None branch to conservatively
check vm.datastack_has_space(extra_bytes) (or vm.datastack_has_space(shim_size)
if you compute shim size separately) so heap-backed (__init__) frames still
verify the extra_bytes required for the init-cleanup shim before taking the fast
path.
---
Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 275-281: The cached __init__ entry must be versioned separately
from the owning type: add an init_version (AtomicU32) to TypeSpecializationCache
and when caching a PyFunction in get_cached_init_for_specialization() store the
function's current PyFunction::func_version into init_version; on read side (and
in CALL_ALLOC_AND_ENTER_INIT / prepare_exact_args_frame) validate that both the
cached type version (getitem_version or type's tp_version_tag) and init_version
match the live PyFunction::func_version before reusing the cached PyFunction;
update cache write paths to set init and init_version together and ensure
callers reject the cached init if the versions differ so that
prepare_exact_args_frame() never runs with a stale __init__.
- Around line 303-315: The code currently always moves replaced function objects
into the global `retired` list when `vm` is None (in `swap_init` and
`swap_getitem`), causing `retired` to grow for long‑lived heap types; change
both `swap_init` and `swap_getitem` so that when `vm` is None they do NOT push
the old value into `retired` but instead release the strong reference
immediately (or convert to a non‑retaining/weak form) to avoid accumulation —
implement this by extracting the old value via the existing
`self.init.swap(new_init)` path but then dropping or turning it into a
non‑retaining placeholder via a new helper (e.g. `drop_old_without_retire`)
rather than calling `retire_old_function`; keep the current `vm`-present path
that uses `self.init.swap_to_temporary_refs(..., vm)` and `retire_old_function`
unchanged so CPython-style temporary refs remain for executing frames.
In `@crates/vm/src/frame.rs`:
- Around line 7794-7798: The cached __getitem__ specialization is used without
verifying the class type-version, so add a type-version guard: when
installing/using the specialization via cls.cache_getitem_for_specialization and
when reading via get_cached_getitem_for_specialization, compare
owner.class().tp_version_tag.load(Acquire) against the cached_version before
taking the fast path; if they differ, fall back to the slow path and refresh the
cache (or invalidate the cached handler) so rebinding __getitem__ on a heap type
cannot call stale code.
---
Nitpick comments:
In @.cspell.json:
- Around line 63-64: The two dictionary entries "compactlong" and "compactlongs"
are out of alphabetical order; move those two entries so they appear after the
"coro" entry (i.e., in the block of "c" words) to maintain consistent
alphabetical ordering in the word list.
In `@crates/vm/src/vm/context.rs`:
- Around line 424-428: The frame/context currently reuses names.__init__ for
source_path, obj_name, and qualname, which makes internal cleanup frames appear
as user __init__ code; change those fields (source_path, obj_name, qualname) to
a dedicated synthetic identifier like "<init_cleanup>" when constructing this
shim frame (where names.__init__ is currently used) so tracebacks and
introspection show an internal label instead of a user __init__; keep other
fields (e.g. first_line_number, max_stackdepth) as-is or set a fixed sane value
if needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: ab67bac4-b3d1-49c7-852c-cf76da714cba
📒 Files selected for processing (6)
.cspell.jsoncrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/vm/context.rscrates/vm/src/vm/mod.rs
1f1dc2e to
e84ac3c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/builtins/function.rs (1)
692-733:⚠️ Potential issue | 🟠 MajorInconsistent precondition enforcement on
invoke_exact_args()call sites.While most callers of
invoke_exact_args()(e.g., lines 4214, 4261) use runtime guards likehas_exact_argcount(), the specialization fast-paths at lines 3867 and 4055 rely ondebug_assert!()to validate argument counts. In release builds, these assertions vanish, allowing silently incorrect frame construction.Make the argcount checks at lines 3867 and 4055 runtime guards (like lines 4214 and 4261) instead of debug_assert, or else fallback to
invoke()if the check fails.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/function.rs` around lines 692 - 733, The fast-path call sites that currently rely on debug_assert! for argcount must perform a runtime check and fallback instead: replace the debug_assert-based checks at the specialization fast-paths that call invoke_exact_args() with an explicit runtime guard using has_exact_argcount() (or the equivalent exact-argcount check used at other sites) and if the guard fails call the safe invoke() path; ensure both fast-path locations that call invoke_exact_args() (the ones flagged in the review) use this runtime check and fallback to invoke() rather than assuming the precondition.
♻️ Duplicate comments (2)
crates/vm/src/frame.rs (1)
4044-4051:⚠️ Potential issue | 🟠 MajorGuard cached
__getitem__dispatch with the current type version.This path still trusts
get_cached_getitem_for_specialization()without checkingowner.class().tp_version_tag. After rebinding__getitem__on a heap type, the cached function can still pass thefunc_version()check and dispatch stale code here. Please mirror the__init__cache pattern and treat a type-version mismatch as a cache miss.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4044 - 4051, The cached __getitem__ dispatch in Instruction::BinaryOpSubscrGetitem currently trusts get_cached_getitem_for_specialization() and func.func_version() but misses checking the owning class's tp_version_tag; update the guard to also compare owner.class().tp_version_tag against the cached type version (mirror the __init__ cache pattern) and treat a mismatch as a cache miss so the fast-path is skipped and normal lookup occurs; locate the check around self.nth_value(1), specialization_eval_frame_active(vm), get_cached_getitem_for_specialization(), func.func_version(), and specialization_has_datastack_space_for_func(vm, &func) and add the tp_version_tag validation there.crates/vm/src/builtins/type.rs (1)
281-281:⚠️ Potential issue | 🟠 Major
retiredstill has no live reclamation path.
invalidate_for_type_modified()always goes throughswap_*(_, None), so every live invalidation keeps the displaced specialization function strongly referenced inretired. Sinceretiredis only drained inclear_into(), long-lived heap types can still grow this list indefinitely under specialize/mutate cycles.Please route live invalidations through a VM-backed temporary-ref path when a
VirtualMachineis available, or add an opportunistic drain forretired.Also applies to: 303-336, 352-364
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` at line 281, The retired field (retired: PyRwLock<Vec<PyObjectRef>>) accumulates strong references because invalidate_for_type_modified() always calls swap_*(_, None) which pushes displaced specialization functions into retired and they are only drained by clear_into(), causing unbounded growth for long-lived heap types; fix by routing live invalidations through a VM-backed temporary-ref path when a VirtualMachine is available (use the VM to create transient/weak refs instead of pushing into retired) or add an opportunistic drain that periodically clears retired (e.g., in invalidate_for_type_modified(), swap_* callers, or type mutation paths) so retired does not retain references indefinitely—update functions invalidate_for_type_modified(), any swap_* call sites that pass None, and clear_into() to implement the chosen reclamation strategy.
🧹 Nitpick comments (3)
crates/vm/src/vm/mod.rs (1)
1566-1620: Please pin the untraced contract with a regression test.
with_frame_untraced()now deliberately bypassesdispatch_traced_frame(). A focused test thatsys.settrace/sys.setprofiledo not observe this path would help prevent accidental hook reintroduction in later frame refactors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 1566 - 1620, Add a regression test that verifies with_frame_untraced()/with_frame_impl(..., traced: false, ...) bypasses dispatch_traced_frame: register sys.settrace and sys.setprofile handlers that will fail the test if invoked (or toggle a flag), create a FrameRef and call with_frame_untraced (or call with_frame_impl with traced=false) to execute a dummy closure, then assert the trace/profile handlers were not called and normal frame cleanup occurs; reference with_frame_untraced, with_frame_impl, and dispatch_traced_frame so the test covers the untraced execution path explicitly.crates/vm/src/vm/context.rs (1)
424-428: Give the synthetic cleanup code a distinct internal identity.Reusing
__init__forsource_path,obj_name, andqualnamewill make any leaked traceback or debug output look like user__init__code. A dedicated internal name/path would make cleanup-frame failures much easier to diagnose.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/context.rs` around lines 424 - 428, The synthetic cleanup frame currently reuses names.__init__ for source_path, obj_name, and qualname which can mask real user __init__ frames; change the synthetic frame to use a distinct internal identity (e.g. a constant like INTERNAL_CLEANUP_NAME such as "<vm_cleanup>" or "<internal_cleanup>") and assign that to source_path, obj_name, and qualname instead of names.__init__; update any creation site that sets first_line_number/max_stackdepth on the synthetic cleanup frame (the code assigning source_path, obj_name, qualname in the synthetic frame construction) so all three fields use the new internal name to make leaked tracebacks clearly identifiable.crates/vm/src/frame.rs (1)
4141-4155: Use the latin-1 singleton path here too.This fast path still only interns ASCII results. Single-character latin-1 hits like
"\xE9"[0]will fall straight back to the generic subscript path, so the new singleton caching work does not apply on this opcode yet.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4141 - 4155, The BinaryOpSubscrStrInt fast path only handles ASCII single-character results via vm.ctx.ascii_char_cache, so single-character Latin-1 results (e.g. "\xE9"[0]) bypass the new singleton cache; update the Instruction::BinaryOpSubscrStrInt branch to, after computing ch (and before falling back), check for a Latin‑1 single-character (e.g. a method like ch.is_latin1() or checking ch.to_u32() in 0x80..=0xFF), and on hit pop the two operands and push the corresponding Latin‑1 singleton from the VM context (analogous to vm.ctx.ascii_char_cache, e.g. vm.ctx.latin1_char_singleton[index].clone().into()); keep the existing ASCII path and fallback unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/object/ext.rs`:
- Around line 361-370: The helpers deref_ordering, to_owned, and
to_owned_ordering currently forward any std::sync::atomic::Ordering into
self.inner.load(ordering) which will panic for Release/AcqRel; modify them to
only accept or use load-valid orderings by either (1) providing distinct methods
that take explicit valid semantics (e.g., deref_relaxed/deref_acquire and
to_owned_relaxed/to_owned_acquire) and call
inner.load(Ordering::Relaxed/Acquire), or (2) add a runtime check in
deref_ordering and to_owned_ordering that asserts ordering is one of the valid
variants (Relaxed or Acquire) before calling self.inner.load(ordering) and
return None or panic otherwise; update calls to to_owned() to delegate to the
new safe variants and ensure all uses of inner.load in these helpers reference
only load-valid Orderings.
---
Outside diff comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 692-733: The fast-path call sites that currently rely on
debug_assert! for argcount must perform a runtime check and fallback instead:
replace the debug_assert-based checks at the specialization fast-paths that call
invoke_exact_args() with an explicit runtime guard using has_exact_argcount()
(or the equivalent exact-argcount check used at other sites) and if the guard
fails call the safe invoke() path; ensure both fast-path locations that call
invoke_exact_args() (the ones flagged in the review) use this runtime check and
fallback to invoke() rather than assuming the precondition.
---
Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Line 281: The retired field (retired: PyRwLock<Vec<PyObjectRef>>) accumulates
strong references because invalidate_for_type_modified() always calls swap_*(_,
None) which pushes displaced specialization functions into retired and they are
only drained by clear_into(), causing unbounded growth for long-lived heap
types; fix by routing live invalidations through a VM-backed temporary-ref path
when a VirtualMachine is available (use the VM to create transient/weak refs
instead of pushing into retired) or add an opportunistic drain that periodically
clears retired (e.g., in invalidate_for_type_modified(), swap_* callers, or type
mutation paths) so retired does not retain references indefinitely—update
functions invalidate_for_type_modified(), any swap_* call sites that pass None,
and clear_into() to implement the chosen reclamation strategy.
In `@crates/vm/src/frame.rs`:
- Around line 4044-4051: The cached __getitem__ dispatch in
Instruction::BinaryOpSubscrGetitem currently trusts
get_cached_getitem_for_specialization() and func.func_version() but misses
checking the owning class's tp_version_tag; update the guard to also compare
owner.class().tp_version_tag against the cached type version (mirror the
__init__ cache pattern) and treat a mismatch as a cache miss so the fast-path is
skipped and normal lookup occurs; locate the check around self.nth_value(1),
specialization_eval_frame_active(vm), get_cached_getitem_for_specialization(),
func.func_version(), and specialization_has_datastack_space_for_func(vm, &func)
and add the tp_version_tag validation there.
---
Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 4141-4155: The BinaryOpSubscrStrInt fast path only handles ASCII
single-character results via vm.ctx.ascii_char_cache, so single-character
Latin-1 results (e.g. "\xE9"[0]) bypass the new singleton cache; update the
Instruction::BinaryOpSubscrStrInt branch to, after computing ch (and before
falling back), check for a Latin‑1 single-character (e.g. a method like
ch.is_latin1() or checking ch.to_u32() in 0x80..=0xFF), and on hit pop the two
operands and push the corresponding Latin‑1 singleton from the VM context
(analogous to vm.ctx.ascii_char_cache, e.g.
vm.ctx.latin1_char_singleton[index].clone().into()); keep the existing ASCII
path and fallback unchanged.
In `@crates/vm/src/vm/context.rs`:
- Around line 424-428: The synthetic cleanup frame currently reuses
names.__init__ for source_path, obj_name, and qualname which can mask real user
__init__ frames; change the synthetic frame to use a distinct internal identity
(e.g. a constant like INTERNAL_CLEANUP_NAME such as "<vm_cleanup>" or
"<internal_cleanup>") and assign that to source_path, obj_name, and qualname
instead of names.__init__; update any creation site that sets
first_line_number/max_stackdepth on the synthetic cleanup frame (the code
assigning source_path, obj_name, qualname in the synthetic frame construction)
so all three fields use the new internal name to make leaked tracebacks clearly
identifiable.
In `@crates/vm/src/vm/mod.rs`:
- Around line 1566-1620: Add a regression test that verifies
with_frame_untraced()/with_frame_impl(..., traced: false, ...) bypasses
dispatch_traced_frame: register sys.settrace and sys.setprofile handlers that
will fail the test if invoked (or toggle a flag), create a FrameRef and call
with_frame_untraced (or call with_frame_impl with traced=false) to execute a
dummy closure, then assert the trace/profile handlers were not called and normal
frame cleanup occurs; reference with_frame_untraced, with_frame_impl, and
dispatch_traced_frame so the test covers the untraced execution path explicitly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 1fd15f3c-4fb1-4d91-9904-23652e5fa048
📒 Files selected for processing (10)
.cspell.jsoncrates/vm/src/builtins/function.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/object/ext.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/types/slot.rscrates/vm/src/vm/context.rscrates/vm/src/vm/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/src/stdlib/builtins.rs
- .cspell.json
1b61f5d to
da09064
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)
4163-4174:⚠️ Potential issue | 🟠 MajorDon't redo the string index lookup on non-ASCII hits.
getitem_by_index()has already done the bounds check and decoded the character. Whenchis not ASCII, this branch discards that result and falls back to genericBINARY_OP, so specializedstr[int]sites on Unicode text pay the full lookup twice and never use the new singleton-cache path for latin-1 characters. Build the one-character result fromchdirectly, or de-specialize after the first non-ASCII miss.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4163 - 4174, The branch currently calls getitem_by_index and checks ch.is_ascii(), but on non-ASCII it drops ch and falls back to the generic BINARY_OP path, redoing the lookup later; instead, after getting ch from getitem_by_index (and when specialization_nonnegative_compact_index succeeded), build the one-character PyStr result directly from ch (using the decoded character from getitem_by_index) and push it (pop the two operands and push the new PyStr) rather than deferring to the generic path; update the block that references a_str.getitem_by_index, ch, ascii_char_cache, pop_value, and push_value to create and push a one-character PyStr from ch when bounds succeeded (and keep the ascii-char-cache fast path for ch.is_ascii()) so we avoid a second lookup on Unicode characters.
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)
7752-7754: Extract the sharedBinaryOpExtendfallback.The same
binary_op_extended_specialization()/cached_extend_descr/Instruction::BinaryOpExtendblock is repeated across multiple arms. Pulling that into one helper or a single trailing fallback will make future opcode coverage changes much less error-prone.As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
Also applies to: 7768-7770, 7784-7786, 7792-7794, 7901-7904
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7752 - 7754, Multiple match arms repeat the same pattern calling binary_op_extended_specialization(op, a, b, vm), setting cached_extend_descr, and returning Instruction::BinaryOpExtend; extract that shared fallback into one helper or a single trailing fallback to remove duplication. Locate uses of binary_op_extended_specialization, cached_extend_descr, and Instruction::BinaryOpExtend in the match (e.g., the arms around the occurrences at 7752, 7768, 7784, 7792, 7901) and refactor so you first compute the differing value(s) (op/a/b or the condition), then call a single helper function (or a final else branch) that runs binary_op_extended_specialization(op, a, b, vm), assigns cached_extend_descr = Some(descr) when present, and returns Instruction::BinaryOpExtend; keep the behavior identical but centralize the common logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 772-786: The helper datastack_frame_size_bytes_for_code currently
uses code.freevars.len() which can differ from the actual runtime closure length
used by Frame::new(), leading to undersized reservations; change the function to
take the runtime closure length (e.g., add a closure_len: usize parameter or
accept the closure slice) and use that value instead of code.freevars.len(),
keep the generator/coroutine early-return behavior, and update all callers
(including prepare_exact_args_frame and any code that reserves datastack space)
to pass the actual closure.len() so sizing matches Frame::new().
In `@crates/vm/src/frame.rs`:
- Around line 7388-7401: The current specialization installs
Instruction::LoadAttrModule after checking only tp_version_tag, which still
calls PyModule::get_attr() and can duplicate user-visible side effects if the
module dict later changes; fix by either adding a module-dict runtime guard here
(e.g., capture and cache the module.dict() version/identity alongside
tp_version_tag when writing via write_cache_u32 and check it in the fast-path)
or modify the LoadAttrModule handler to read module.dict() directly and perform
a direct dict lookup and only fall back to load_attr_slow() on a miss/absent key
(so it never re-invokes PyModule::get_attr()); update the code paths around
specialize_at, write_cache_u32, and the LoadAttrModule handler accordingly and
ensure load_attr_slow() remains the canonical slow-path.
---
Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4163-4174: The branch currently calls getitem_by_index and checks
ch.is_ascii(), but on non-ASCII it drops ch and falls back to the generic
BINARY_OP path, redoing the lookup later; instead, after getting ch from
getitem_by_index (and when specialization_nonnegative_compact_index succeeded),
build the one-character PyStr result directly from ch (using the decoded
character from getitem_by_index) and push it (pop the two operands and push the
new PyStr) rather than deferring to the generic path; update the block that
references a_str.getitem_by_index, ch, ascii_char_cache, pop_value, and
push_value to create and push a one-character PyStr from ch when bounds
succeeded (and keep the ascii-char-cache fast path for ch.is_ascii()) so we
avoid a second lookup on Unicode characters.
---
Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 7752-7754: Multiple match arms repeat the same pattern calling
binary_op_extended_specialization(op, a, b, vm), setting cached_extend_descr,
and returning Instruction::BinaryOpExtend; extract that shared fallback into one
helper or a single trailing fallback to remove duplication. Locate uses of
binary_op_extended_specialization, cached_extend_descr, and
Instruction::BinaryOpExtend in the match (e.g., the arms around the occurrences
at 7752, 7768, 7784, 7792, 7901) and refactor so you first compute the differing
value(s) (op/a/b or the condition), then call a single helper function (or a
final else branch) that runs binary_op_extended_specialization(op, a, b, vm),
assigns cached_extend_descr = Some(descr) when present, and returns
Instruction::BinaryOpExtend; keep the behavior identical but centralize the
common logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 73b94538-f3e4-4b89-87ef-d1349d860164
📒 Files selected for processing (4)
crates/vm/src/builtins/function.rscrates/vm/src/frame.rscrates/vm/src/object/core.rscrates/vm/src/protocol/callable.rs
✅ Files skipped from review due to trivial changes (1)
- crates/vm/src/object/core.rs
| pub(crate) fn datastack_frame_size_bytes_for_code(code: &Py<PyCode>) -> Option<usize> { | ||
| if code | ||
| .flags | ||
| .intersects(bytecode::CodeFlags::GENERATOR | bytecode::CodeFlags::COROUTINE) | ||
| { | ||
| return None; | ||
| } | ||
| let nlocalsplus = code | ||
| .varnames | ||
| .len() | ||
| .checked_add(code.cellvars.len())? | ||
| .checked_add(code.freevars.len())?; | ||
| let capacity = nlocalsplus.checked_add(code.max_stackdepth as usize)?; | ||
| capacity.checked_mul(core::mem::size_of::<usize>()) | ||
| } |
There was a problem hiding this comment.
Use the runtime closure length when sizing datastack frames.
Frame::new() sizes localsplus with closure.len(), but this helper hard-codes code.freevars.len(). Since set___code__ still allows swapping in a code object without revalidating the stored closure, callers that reserve datastack space via this helper can compute a smaller layout than prepare_exact_args_frame() will actually allocate.
Possible direction
pub(crate) fn datastack_frame_size_bytes(&self) -> Option<usize> {
- datastack_frame_size_bytes_for_code(&self.code)
+ datastack_frame_size_bytes_for_layout(
+ &self.code,
+ self.closure.as_ref().map_or(0, |c| c.len()),
+ )
}
-pub(crate) fn datastack_frame_size_bytes_for_code(code: &Py<PyCode>) -> Option<usize> {
+pub(crate) fn datastack_frame_size_bytes_for_layout(
+ code: &Py<PyCode>,
+ nfrees: usize,
+) -> Option<usize> {
if code
.flags
.intersects(bytecode::CodeFlags::GENERATOR | bytecode::CodeFlags::COROUTINE)
{
return None;
}
let nlocalsplus = code
.varnames
.len()
.checked_add(code.cellvars.len())?
- .checked_add(code.freevars.len())?;
+ .checked_add(nfrees)?;
let capacity = nlocalsplus.checked_add(code.max_stackdepth as usize)?;
capacity.checked_mul(core::mem::size_of::<usize>())
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/function.rs` around lines 772 - 786, The helper
datastack_frame_size_bytes_for_code currently uses code.freevars.len() which can
differ from the actual runtime closure length used by Frame::new(), leading to
undersized reservations; change the function to take the runtime closure length
(e.g., add a closure_len: usize parameter or accept the closure slice) and use
that value instead of code.freevars.len(), keep the generator/coroutine
early-return behavior, and update all callers (including
prepare_exact_args_frame and any code that reserves datastack space) to pass the
actual closure.len() so sizing matches Frame::new().
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)
3732-3754:⚠️ Potential issue | 🟠 MajorGuard
LoadAttrModuleon module-dict state, not justtp_version_tag.Line 3743 still goes through
PyModule::get_attr(), but the specialization installed at Lines 7388-7400 only guards on the module type version. If the module dict later mutates (__getattr__added, the attribute removed, lookup starts raising), the specialized path can perform module resolution and then immediately fall back toload_attr_slow(), duplicating user-visible side effects. Cache module-dict state as part of the guard, or make the specialized handler do a direct dict lookup and fall back only on a true miss.Also applies to: 7385-7401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 3732 - 3754, The fast-path for Instruction::LoadAttrModule uses only tp_version_tag to guard (owner.downcast_ref_if_exact::<PyModule>, type_version from code.instructions.read_cache_u32) but still calls PyModule::get_attr, risking duplicate side effects; change the specialization to also cache and compare the module dict state (e.g. a dict version/tag or identity pointer) and perform a direct dict lookup for attr_name (avoiding PyModule::get_attr) so you only fall back to load_attr_slow(vm, oparg) on a real miss or when the dict/version check fails; apply the same change to the duplicate specialization region referenced (around the other LoadAttrModule handler).
🧹 Nitpick comments (3)
crates/vm/src/frame.rs (2)
1179-1253: The mixed float/int extension still skips the inplace operators.
BinaryOpExtendnow coversAdd/Subtract/Multiply/TrueDivide, butInplaceAdd/InplaceSubtract/InplaceMultiply/InplaceTrueDividestill never route throughbinary_op_extended_specialization. For immutable floats that leavesx += 1,x -= 1,x *= 1, andx /= 1on the generic path even though they can reuse the same specialization.Also applies to: 7752-7795, 7901-7904
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 1179 - 1253, The float/int mixed specializations omit the inplace operators, so inplace variants never hit binary_op_extended_specialization; update the BinaryOpExtendSpecializationDescr tables (e.g., BINARY_OP_EXTEND_DESCRS) to add entries mapping bytecode::BinaryOperator::InplaceAdd, InplaceSubtract, InplaceMultiply and InplaceTrueDivide to the same guards and actions used for Add/Subtract/Multiply/TrueDivide (e.g., map InplaceAdd -> float_compactlong_guard/float_compactlong_add and InplaceTrueDivide -> nonzero_float_compactlong_guard/float_compactlong_true_div, and similarly for compactlong_float_* entries), and apply the same additions to the other identical descriptor arrays mentioned (the other occurrences around the indicated ranges) so inplace ops reuse the existing specializations.
4160-4174: Use the Latin-1 singleton path in the specializedstr[int]fast path too.This branch still only interns ASCII characters. Repeated subscripting that returns a single-byte non-ASCII Latin-1 character will keep missing the specialized result path and fall back to generic
str.__getitem__, even though this PR adds a Latin-1 singleton cache for that case.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4160 - 4174, The fast path in Instruction::BinaryOpSubscrStrInt only uses the ascii_char_cache; extend it to also handle Latin-1 single-byte non-ASCII characters by checking the character returned from PyStr::getitem_by_index for a Latin-1 single-byte (e.g., ch.is_latin1() or equivalent) and, when true, pop the two values and push the cached singleton from the Latin-1 cache (e.g., vm.ctx.latin1_char_cache[ch.to_u32() as usize].clone().into()) just like the existing ascii branch; keep the existing ascii branch unchanged and add the Latin-1 branch alongside it after getitem_by_index succeeds.crates/vm/src/builtins/type.rs (1)
345-350: Consider using.for_each()instead of.map().count().The current pattern works, but
.for_each()better expresses the intent of executing side effects without producing values.Suggested change
- self.retired - .read() - .iter() - .map(|obj| obj.traverse(tracer_fn)) - .count(); + self.retired + .read() + .iter() + .for_each(|obj| obj.traverse(tracer_fn));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 345 - 350, The code iterates over self.retired.read().iter() and calls obj.traverse(tracer_fn) but uses .map(...).count() to execute side effects; replace that with .for_each(...) to convey intent more clearly. Locate the block using self.retired.read().iter().map(|obj| obj.traverse(tracer_fn)).count() and change it to call .for_each(|obj| obj.traverse(tracer_fn)) so traverse(tracer_fn) runs for each retired object without generating/consuming a dummy count value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 965-970: The current code stores
ext.specialization_cache.getitem_version with Ordering::Relaxed after calling
ext.specialization_cache.swap_getitem (which uses AcqRel via
PyAtomicRef::swap_to_temporary_refs), creating a race where a reader can see the
new pointer but a stale version; fix by making the version store release-ordered
or by moving the store before the swap: either change the call to
ext.specialization_cache.getitem_version.store(func_version, Ordering::Release)
or perform getitem_version.store(...) immediately before calling
ext.specialization_cache.swap_getitem(Some(getitem), Some(vm)) so the version
publish is covered by Release semantics.
In `@crates/vm/src/vm/context.rs`:
- Around line 401-416: The instructions array in vm::context.rs contains an
unreachable Resume opcode placed after Instruction::ReturnValue; remove the
CodeUnit with Instruction::Resume (or move it before the ReturnValue entry) so
Resume can execute if intended—update the instructions array where CodeUnit
entries are created (the entries with Instruction::ExitInitCheck,
Instruction::ReturnValue, and Instruction::Resume) to either drop the Resume
element or reorder it to precede ReturnValue and run any associated context arg
handling accordingly.
---
Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 3732-3754: The fast-path for Instruction::LoadAttrModule uses only
tp_version_tag to guard (owner.downcast_ref_if_exact::<PyModule>, type_version
from code.instructions.read_cache_u32) but still calls PyModule::get_attr,
risking duplicate side effects; change the specialization to also cache and
compare the module dict state (e.g. a dict version/tag or identity pointer) and
perform a direct dict lookup for attr_name (avoiding PyModule::get_attr) so you
only fall back to load_attr_slow(vm, oparg) on a real miss or when the
dict/version check fails; apply the same change to the duplicate specialization
region referenced (around the other LoadAttrModule handler).
---
Nitpick comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 345-350: The code iterates over self.retired.read().iter() and
calls obj.traverse(tracer_fn) but uses .map(...).count() to execute side
effects; replace that with .for_each(...) to convey intent more clearly. Locate
the block using self.retired.read().iter().map(|obj|
obj.traverse(tracer_fn)).count() and change it to call .for_each(|obj|
obj.traverse(tracer_fn)) so traverse(tracer_fn) runs for each retired object
without generating/consuming a dummy count value.
In `@crates/vm/src/frame.rs`:
- Around line 1179-1253: The float/int mixed specializations omit the inplace
operators, so inplace variants never hit binary_op_extended_specialization;
update the BinaryOpExtendSpecializationDescr tables (e.g.,
BINARY_OP_EXTEND_DESCRS) to add entries mapping
bytecode::BinaryOperator::InplaceAdd, InplaceSubtract, InplaceMultiply and
InplaceTrueDivide to the same guards and actions used for
Add/Subtract/Multiply/TrueDivide (e.g., map InplaceAdd ->
float_compactlong_guard/float_compactlong_add and InplaceTrueDivide ->
nonzero_float_compactlong_guard/float_compactlong_true_div, and similarly for
compactlong_float_* entries), and apply the same additions to the other
identical descriptor arrays mentioned (the other occurrences around the
indicated ranges) so inplace ops reuse the existing specializations.
- Around line 4160-4174: The fast path in Instruction::BinaryOpSubscrStrInt only
uses the ascii_char_cache; extend it to also handle Latin-1 single-byte
non-ASCII characters by checking the character returned from
PyStr::getitem_by_index for a Latin-1 single-byte (e.g., ch.is_latin1() or
equivalent) and, when true, pop the two values and push the cached singleton
from the Latin-1 cache (e.g., vm.ctx.latin1_char_cache[ch.to_u32() as
usize].clone().into()) just like the existing ascii branch; keep the existing
ascii branch unchanged and add the Latin-1 branch alongside it after
getitem_by_index succeeds.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 52b4ddc5-216c-4f4c-b820-24eb3bf13220
📒 Files selected for processing (12)
.cspell.jsoncrates/vm/src/builtins/function.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/object/core.rscrates/vm/src/object/ext.rscrates/vm/src/protocol/callable.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/types/slot.rscrates/vm/src/vm/context.rscrates/vm/src/vm/mod.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- crates/vm/src/types/slot.rs
- crates/vm/src/stdlib/builtins.rs
- crates/vm/src/builtins/str.rs
- crates/vm/src/object/ext.rs
- crates/vm/src/object/core.rs
| let instructions = [ | ||
| CodeUnit { | ||
| op: Instruction::ExitInitCheck, | ||
| arg: 0.into(), | ||
| }, | ||
| CodeUnit { | ||
| op: Instruction::ReturnValue, | ||
| arg: 0.into(), | ||
| }, | ||
| CodeUnit { | ||
| op: Instruction::Resume { | ||
| context: bytecode::Arg::marker(), | ||
| }, | ||
| arg: 0.into(), | ||
| }, | ||
| ]; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if other code objects in the codebase have similar patterns with Resume after ReturnValue
ast-grep --pattern $'CodeUnit {
$$$
op: Instruction::ReturnValue,
$$$
}'Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check how Resume instruction is typically used in the codebase
rg -n "Instruction::Resume" --type rust -C 3Repository: RustPython/RustPython
Length of output: 9063
🏁 Script executed:
# Read the full context of the new_init_cleanup_code function
sed -n '395,440p' crates/vm/src/vm/context.rsRepository: RustPython/RustPython
Length of output: 1733
🏁 Script executed:
# Search for comments or documentation about this specific function
rg -n "new_init_cleanup_code\|ExitInitCheck\|init_result" crates/vm/src/vm/context.rs -B 2 -A 2Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check if Resume is ever expected to appear after ReturnValue in other code objects
rg -B 5 -A 5 "Instruction::ReturnValue" crates/codegen/src/compile.rs | head -100Repository: RustPython/RustPython
Length of output: 3730
🏁 Script executed:
# Search for tests or usages of new_init_cleanup_code
rg -n "new_init_cleanup_code" crates/ --type rustRepository: RustPython/RustPython
Length of output: 289
🏁 Script executed:
# Check git log for this function (recent commits)
git log --oneline -20 -- crates/vm/src/vm/context.rs | head -20Repository: RustPython/RustPython
Length of output: 122
🏁 Script executed:
# Check if there's any documentation or comments about Resume placement in similar contexts
rg -n "Resume.*context\|context.*Resume" crates/vm/src/vm/ --type rust -B 2 -A 2Repository: RustPython/RustPython
Length of output: 47
Remove unreachable Resume instruction — it appears after ReturnValue and will never execute.
The Resume instruction at index 2 is placed after ReturnValue, which exits the frame. Since ReturnValue terminates execution, the Resume instruction is unreachable dead code and should be removed or repositioned before ReturnValue if quickening initialization is needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/vm/context.rs` around lines 401 - 416, The instructions array
in vm::context.rs contains an unreachable Resume opcode placed after
Instruction::ReturnValue; remove the CodeUnit with Instruction::Resume (or move
it before the ReturnValue entry) so Resume can execute if intended—update the
instructions array where CodeUnit entries are created (the entries with
Instruction::ExitInitCheck, Instruction::ReturnValue, and Instruction::Resume)
to either drop the Resume element or reorder it to precede ReturnValue and run
any associated context arg handling accordingly.
df126ca to
b91fc30
Compare
…IMIZED in vectorcall fast path
…_cleanup frames, guard trace dispatch - Extract datastack_frame_size_bytes_for_code as free function, use it to compute init_cleanup stack bytes instead of hardcoded constant - Add monitoring_disabled_for_code to skip instrumentation for synthetic init_cleanup code object in RESUME and execute_instrumented - Add is_trace_event guard so profile-only events skip trace_func dispatch - Reformat core.rs (rustfmt)
ea2469b to
dbd8b1a
Compare
Summary
_spec_cache(init/getitem) to HeapTypeExt with proper GC traverse/clear and lock-guarded version validationTest plan
cargo clippycleanextra_tests/snippets/vm_specialization.pypasses🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Chores