Harden CALL specialization guards and cache callables#7360
Harden CALL specialization guards and cache callables#7360youknowone merged 6 commits intoRustPython:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds VM-level callable cache and datastack utilities, per-type specialization caches for init/getitem, extended function/frame invocation paths using OPTIMIZED semantics and locals propagation, a tp_alloc slot with generic allocator wiring, and stricter native method first-arg dropping rules. (49 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant VM as VirtualMachine
participant Cache as CallableCache
participant Frame as ExecutingFrame
participant Type as PyType
participant Func as PyFunction
Client->>VM: execute bytecode / CALL
VM->>Cache: init_callable_cache()
Cache-->>VM: builtin callables cached
Frame->>VM: datastack_has_space(size)?
Frame->>Cache: consult callable cache
alt specialization viable
Frame->>Type: get_cached_init_for_specialization(tp_version)
Type-->>Frame: maybe PyFunction
Frame->>Func: is_optimized_for_call_specialization / has_exact_argcount
Frame->>Frame: build frame with locals and use vectorcall/specialized path
else fallback
Frame->>Func: invoke_with_locals(func_args, Option<locals>, vm)
Func-->>Frame: generic invoke (or generator/coroutine path)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)
4519-4561:⚠️ Potential issue | 🔴 CriticalAdd recursion guard to FASTCALL method-descriptor specialized paths.
Line 4559 and Line 4692 call
func(vm, args)without the C-stack guard already added in the NOARGS/O descriptor fastpaths. This can bypass recursion protection in these specialized non-vectorcall paths.Proposed fix
@@ let args = FuncArgs { args: all_args, kwargs: Default::default(), }; + if vm.reached_c_stack_limit() { + return Err(vm.new_recursion_error(String::new())); + } let result = func(vm, args)?; self.push_value(result); return Ok(None); @@ let args = FuncArgs { args: all_args, kwargs: Default::default(), }; + if vm.reached_c_stack_limit() { + return Err(vm.new_recursion_error(String::new())); + } let result = func(vm, args)?; self.push_value(result); return Ok(None);Also applies to: 4650-4694
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4519 - 4561, The FASTCALL PyMethodDescriptor fast-path calls func(vm, args) (the local variable func representing descr.method.func) without the C-stack/recursion guard used in the NOARGS/O fastpaths, so add the same recursion/C-stack guard wrapper around the call to func(vm, args) (and any other identical calls in the FASTCALL-specialized blocks around the PyMethodDescriptor handling, e.g., the blocks guarded by descr.method.flags.contains(PyMethodFlags::METHOD) && ... == PyMethodFlags::FASTCALL) so the native call executes inside the existing guard helper used elsewhere in frame.rs for NOARGS/O paths (wrap the FuncArgs call the same way the NOARGS/O fastpath wraps descr.method.func).
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)
4418-4455: Consider extracting shared descriptor-args assembly into one helper.The four
CALL_METHOD_DESCRIPTOR_*branches duplicate nearly identical stack-pop andFuncArgsconstruction logic. A single helper would reduce drift and make guard consistency easier to maintain.As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
Also applies to: 4469-4506, 4519-4557, 4650-4690
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4418 - 4455, Duplicate logic across the CALL_METHOD_DESCRIPTOR_* branches should be consolidated: extract the shared sequence that pops the callable/positional/self values from the frame (using self.pop_multiple(n), self.pop_value_opt(), self.pop_value()), checks/uses nth_value and localsplus.stack_index, and constructs the FuncArgs (args Vec and kwargs default) into a single helper function (e.g., assemble_descriptor_call_args or similar) that returns the built FuncArgs and the method func to invoke; then replace each branch (the ones checking PyMethodDescriptor, PyMethodFlags and class guard) to compute only the differing guard/value (descr and any flag-dependent selection) and call the new helper to get the FuncArgs before continuing, updating usages of positional_args, all_args, self_or_null, and func to the helper's return values; apply the same refactor at the other listed regions (around lines 4469-4506, 4519-4557, 4650-4690) so the common stack-manipulation and FuncArgs construction is implemented once and reused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4519-4561: The FASTCALL PyMethodDescriptor fast-path calls
func(vm, args) (the local variable func representing descr.method.func) without
the C-stack/recursion guard used in the NOARGS/O fastpaths, so add the same
recursion/C-stack guard wrapper around the call to func(vm, args) (and any other
identical calls in the FASTCALL-specialized blocks around the PyMethodDescriptor
handling, e.g., the blocks guarded by
descr.method.flags.contains(PyMethodFlags::METHOD) && ... ==
PyMethodFlags::FASTCALL) so the native call executes inside the existing guard
helper used elsewhere in frame.rs for NOARGS/O paths (wrap the FuncArgs call the
same way the NOARGS/O fastpath wraps descr.method.func).
---
Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 4418-4455: Duplicate logic across the CALL_METHOD_DESCRIPTOR_*
branches should be consolidated: extract the shared sequence that pops the
callable/positional/self values from the frame (using self.pop_multiple(n),
self.pop_value_opt(), self.pop_value()), checks/uses nth_value and
localsplus.stack_index, and constructs the FuncArgs (args Vec and kwargs
default) into a single helper function (e.g., assemble_descriptor_call_args or
similar) that returns the built FuncArgs and the method func to invoke; then
replace each branch (the ones checking PyMethodDescriptor, PyMethodFlags and
class guard) to compute only the differing guard/value (descr and any
flag-dependent selection) and call the new helper to get the FuncArgs before
continuing, updating usages of positional_args, all_args, self_or_null, and func
to the helper's return values; apply the same refactor at the other listed
regions (around lines 4469-4506, 4519-4557, 4650-4690) so the common
stack-manipulation and FuncArgs construction is implemented once and reused.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 0ecc623d-10ac-48b4-a663-24eae592ed81
📒 Files selected for processing (6)
crates/derive-impl/src/pyclass.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rs
0fc9ffe to
72e91ac
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)
4555-4601:⚠️ Potential issue | 🟠 MajorComplete recursion/C-stack guard coverage across remaining direct specialized CALL paths.
CallMethodDescriptorFast,CallMethodDescriptorFastWithKeywords, andCallKwBoundMethodstill miss the same guard/fallback pattern already added to sibling specialized paths. This leaves inconsistent safety behavior for deep recursive call chains.Suggested patch
@@ Instruction::CallMethodDescriptorFast => { let nargs: u32 = arg.into(); @@ if total_nargs > 0 && let Some(descr) = callable.downcast_ref_if_exact::<PyMethodDescriptor>(vm) @@ { + if vm.reached_c_stack_limit() { + return self.execute_call_vectorcall(nargs, vm); + } let func = descr.method.func; @@ Instruction::CallMethodDescriptorFastWithKeywords => { @@ if total_nargs > 0 && let Some(descr) = callable.downcast_ref_if_exact::<PyMethodDescriptor>(vm) @@ { + if vm.reached_c_stack_limit() { + return self.execute_call_vectorcall(nargs, vm); + } let func = descr.method.func; @@ Instruction::CallKwBoundMethod => { @@ if self.specialization_eval_frame_active(vm) { return self.execute_call_kw_vectorcall(nargs, vm); } + if self.specialization_call_recursion_guard(vm) { + return self.execute_call_kw_vectorcall(nargs, vm); + }Also applies to: 4695-4742, 4865-4909
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4555 - 4601, The optimized fast-paths for Instruction::CallMethodDescriptorFast (and the sibling cases CallMethodDescriptorFastWithKeywords and CallKwBoundMethod) bypass the recursion/C-stack guard that other specialized CALL paths use; update these cases to perform the same recursion/C-stack safety check and fallback to the generic call path when the guard triggers (i.e., before executing the inlined descriptor fast-path, run the same guard logic used by the other CALL specializations and, on guard failure, call self.execute_call_vectorcall(...) or the existing generic fallback), ensuring the fast-path retains the same conditions (downcast to PyMethodDescriptor, flag checks, self type check) but only proceeds after the guard passes so deep recursive chains fall back consistently.
🧹 Nitpick comments (1)
crates/vm/src/builtins/function.rs (1)
643-663: Reduce predicate drift between specialization helpers.
is_simple_for_call_specializationandcan_specialize_callduplicate most eligibility checks. It would be safer to compose one from the other.💡 Suggested refactor
pub(crate) fn can_specialize_call(&self, effective_nargs: u32) -> bool { - let code: &Py<PyCode> = &self.code; - let flags = code.flags; - flags.contains(bytecode::CodeFlags::OPTIMIZED) - && !flags.intersects(bytecode::CodeFlags::VARARGS | bytecode::CodeFlags::VARKEYWORDS) - && code.kwonlyarg_count == 0 - && code.arg_count == effective_nargs + self.is_simple_for_call_specialization() && self.code.arg_count == effective_nargs }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/function.rs` around lines 643 - 663, Both helpers duplicate the same eligibility checks; simplify by composing them: keep is_simple_for_call_specialization as the single predicate for flag/kw-only checks, and change can_specialize_call to call is_simple_for_call_specialization(&self) and then only compare code.arg_count == effective_nargs to decide specialization. Update can_specialize_call to reference self.code.arg_count after delegating the common checks to is_simple_for_call_specialization and remove the duplicated flags/kwonly logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 790-818: Re-check tp_version_tag while holding the appropriate
cache lock before writing/returning cached __init__: in
cache_init_for_specialization, after obtaining ext.specialization_init.write()
(the write lock), re-load and compare
self.tp_version_tag.load(Ordering::Acquire) with the provided tp_version and
return false if they differ, only then store Some(init); similarly, in
get_cached_init_for_specialization acquire ext.specialization_init.read() (the
read lock) and re-check tp_version_tag under that lock before returning the
cached value (return None if the version changed). Ensure you still keep the
early None/false checks for tp_version == 0 and the initial
heaptype_ext.as_ref()?.
- Around line 272-273: PyType currently holds specialization_init:
PyRwLock<Option<PyRef<PyFunction>>> but its manual GC hooks don’t visit or clear
that strong edge; update PyType’s traverse implementation to acquire a read lock
on specialization_init and, if Some(func), call the visitor on the contained
PyRef<PyFunction>, and update PyType’s clear implementation to acquire a write
lock and set specialization_init to None so the reference is dropped during
attribute clearing. Ensure you use the existing traverse/clear patterns for
other Option<PyRef<...>> fields to match locking/error handling conventions.
---
Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4555-4601: The optimized fast-paths for
Instruction::CallMethodDescriptorFast (and the sibling cases
CallMethodDescriptorFastWithKeywords and CallKwBoundMethod) bypass the
recursion/C-stack guard that other specialized CALL paths use; update these
cases to perform the same recursion/C-stack safety check and fallback to the
generic call path when the guard triggers (i.e., before executing the inlined
descriptor fast-path, run the same guard logic used by the other CALL
specializations and, on guard failure, call self.execute_call_vectorcall(...) or
the existing generic fallback), ensuring the fast-path retains the same
conditions (downcast to PyMethodDescriptor, flag checks, self type check) but
only proceeds after the guard passes so deep recursive chains fall back
consistently.
---
Nitpick comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 643-663: Both helpers duplicate the same eligibility checks;
simplify by composing them: keep is_simple_for_call_specialization as the single
predicate for flag/kw-only checks, and change can_specialize_call to call
is_simple_for_call_specialization(&self) and then only compare code.arg_count ==
effective_nargs to decide specialization. Update can_specialize_call to
reference self.code.arg_count after delegating the common checks to
is_simple_for_call_specialization and remove the duplicated flags/kwonly logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d9f04714-53af-4172-8555-919635db788f
📒 Files selected for processing (6)
crates/derive-impl/src/pyclass.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/src/vm/thread.rs
- crates/derive-impl/src/pyclass.rs
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin specialization |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)
4701-4745:⚠️ Potential issue | 🟠 Major
CallKwBoundMethodis missing the recursion guard fallback.This specialized path has the eval-frame guard, but unlike
CallKwPyit does not applyspecialization_call_recursion_guardbefore taking the non-vectorcall fastpath. Add the same early fallback toexecute_call_kw_vectorcall.Suggested patch
Instruction::CallKwBoundMethod => { let instr_idx = self.lasti() as usize - 1; let cache_base = instr_idx + 1; let cached_version = self.code.instructions.read_cache_u32(cache_base + 1); let nargs: u32 = arg.into(); if self.specialization_eval_frame_active(vm) { return self.execute_call_kw_vectorcall(nargs, vm); } + if self.specialization_call_recursion_guard(vm) { + return self.execute_call_kw_vectorcall(nargs, vm); + } // Stack: [callable, self_or_null, arg1, ..., argN, kwarg_names]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 4701 - 4745, The CallKwBoundMethod specialized path needs the same recursion-guard fallback as CallKwPy: before taking the non-vectorcall/bound-method fastpath (i.e., before unpacking kwarg names/all_args and calling vectorcall_function inside the CallKwBoundMethod branch), call specialization_call_recursion_guard(self, vm) and if it indicates recursion, return the generic fallback (the non-specialized execute_call_kw_vectorcall path) instead of continuing the fastpath; update the branch in execute_call_kw_vectorcall/CallKwBoundMethod to perform this early guard check.
♻️ Duplicate comments (2)
crates/vm/src/builtins/type.rs (2)
225-263:⚠️ Potential issue | 🟠 MajorTrack and clear
specialization_initinPyTypemanual GC hooks.
HeapTypeExt::specialization_initintroduces a strongPyRef<PyFunction>edge, butPyType::traverse/clearstill don’t visit or clear it. This can keep type/function cycles alive.💡 Suggested fix
unsafe impl crate::object::Traverse for PyType { fn traverse(&self, tracer_fn: &mut crate::object::TraverseFn<'_>) { self.base.traverse(tracer_fn); self.bases.traverse(tracer_fn); self.mro.traverse(tracer_fn); self.subclasses.traverse(tracer_fn); self.attributes .read_recursive() .iter() .map(|(_, v)| v.traverse(tracer_fn)) .count(); + if let Some(ext) = self.heaptype_ext.as_ref() { + ext.specialization_init.read().traverse(tracer_fn); + } } /// type_clear: break reference cycles in type objects fn clear(&mut self, out: &mut Vec<crate::PyObjectRef>) { + if let Some(ext) = self.heaptype_ext.as_ref() + && let Some(mut guard) = ext.specialization_init.try_write() + && let Some(init) = guard.take() + { + out.push(init.into()); + } if let Some(base) = self.base.take() { out.push(base.into()); } // ... } }#!/bin/bash set -euo pipefail file="crates/vm/src/builtins/type.rs" # Verify specialization_init exists and inspect manual GC hooks. rg -n "specialization_init" "$file" rg -n -A25 -B5 "unsafe impl crate::object::Traverse for PyType" "$file" rg -n -A45 -B5 "fn clear\(&mut self, out: &mut Vec<crate::PyObjectRef>\)" "$file"Also applies to: 267-273
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 225 - 263, The manual GC hooks for PyType are missing handling for HeapTypeExt::specialization_init which holds a strong PyRef<PyFunction>; update PyType::traverse to call tracer_fn on the specialization_init edge (e.g., read the specialization_init field and invoke its traverse/tracer function) and update PyType::clear to take/clear specialization_init (push the held PyObjectRef into out when present) so that the strong reference is visited and broken during GC; locate the fields and handlers via the HeapTypeExt::specialization_init symbol and the unsafe impl crate::object::Traverse for PyType (methods traverse and clear) and add symmetric visit/clear logic.
800-830:⚠️ Potential issue | 🟠 MajorRe-check
tp_version_tagwhile holding the cache lock.Both cache accessors validate
tp_version_tagbefore acquiringspecialization_initlock. A concurrentmodified()can race and allow stale__init__to be written/read across versions.💡 Suggested fix
pub(crate) fn cache_init_for_specialization( &self, init: PyRef<PyFunction>, tp_version: u32, ) -> bool { let Some(ext) = self.heaptype_ext.as_ref() else { return false; }; - if tp_version == 0 || self.tp_version_tag.load(Ordering::Acquire) != tp_version { + if tp_version == 0 { return false; } - *ext.specialization_init.write() = Some(init); + let mut guard = ext.specialization_init.write(); + if self.tp_version_tag.load(Ordering::Acquire) != tp_version { + return false; + } + *guard = Some(init); true } pub(crate) fn get_cached_init_for_specialization( &self, tp_version: u32, ) -> Option<PyRef<PyFunction>> { let ext = self.heaptype_ext.as_ref()?; - if tp_version == 0 || self.tp_version_tag.load(Ordering::Acquire) != tp_version { + if tp_version == 0 { return None; } - ext.specialization_init - .read() - .as_ref() - .map(|init| init.to_owned()) + let guard = ext.specialization_init.read(); + if self.tp_version_tag.load(Ordering::Acquire) != tp_version { + return None; + } + guard.as_ref().cloned() }#!/bin/bash set -euo pipefail file="crates/vm/src/builtins/type.rs" # Inspect lock acquisition order vs tp_version_tag checks. rg -n -A30 -B5 "fn cache_init_for_specialization" "$file" rg -n -A30 -B5 "fn get_cached_init_for_specialization" "$file" rg -n -A20 -B5 "pub fn modified\(&self\)" "$file"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 800 - 830, The tp_version_tag check must be re-validated while holding the specialization_init lock to prevent a race with modified(); in cache_init_for_specialization() acquire ext.specialization_init.write() first, then re-check tp_version_tag (and tp_version != 0) before storing Some(init), returning false if the tag changed; similarly in get_cached_init_for_specialization() acquire ext.specialization_init.read() first, then re-check tp_version_tag (and tp_version != 0) while holding the read lock and return None if it changed, otherwise clone and return the cached init. Ensure you use the same ext from heaptype_ext.as_ref() and the existing read()/write() guards to avoid changing locking primitives.
🧹 Nitpick comments (2)
crates/vm/src/builtins/function.rs (1)
643-651: Deduplicate specialization predicates to reduce drift risk.
can_specialize_callandis_simple_for_call_specializationshare the same shape checks. Reusing the helper keeps future updates consistent.♻️ Proposed refactor
pub(crate) fn can_specialize_call(&self, effective_nargs: u32) -> bool { - let code: &Py<PyCode> = &self.code; - let flags = code.flags; - flags.contains(bytecode::CodeFlags::OPTIMIZED) - && !flags.intersects(bytecode::CodeFlags::VARARGS | bytecode::CodeFlags::VARKEYWORDS) - && code.kwonlyarg_count == 0 - && code.arg_count == effective_nargs + self.is_simple_for_call_specialization() && self.code.arg_count == effective_nargs }Also applies to: 656-663
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/function.rs` around lines 643 - 651, Both can_specialize_call and is_simple_for_call_specialization duplicate the same "simple function" shape checks; extract the shared predicate into a single helper (e.g., has_simple_call_shape or similar) and call it from both functions. Update is_simple_for_call_specialization to delegate to the helper (using the existing self.code and its flags/kwonlyarg_count logic) and change can_specialize_call to use the same helper so future flag/arg-count changes are made in one place; reference the existing symbols can_specialize_call and is_simple_for_call_specialization when locating where to replace the duplicated logic.crates/vm/src/builtins/object.rs (1)
108-121: Add a defensive assertion for_nitemsingeneric_alloc.Since
_nitemsis currently ignored, a debug assertion helps catch accidental variable-size allocation usage early.🛡️ Suggested guard rail
pub(crate) fn generic_alloc(cls: PyTypeRef, _nitems: usize, vm: &VirtualMachine) -> PyResult { + debug_assert_eq!( + _nitems, 0, + "generic_alloc currently supports fixed-size allocations only" + ); // Only create dict if the class has HAS_DICT flag (i.e., __slots__ was not defined // or __dict__ is in __slots__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/object.rs` around lines 108 - 121, Add a defensive debug assertion in the generic_alloc function to ensure the ignored _nitems argument is zero so callers don't accidentally request variable-size allocation; inside generic_alloc (which takes cls: PyTypeRef, _nitems: usize, vm: &VirtualMachine) add a debug_assert_eq!(_nitems, 0, "...") (or debug_assert!(_nitems == 0, "...")) with a brief message indicating variable-size allocation is not supported for PyBaseObject.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 4490-4527: In CallAllocAndEnterInit's fast-path (the block that
checks specialization_eval_frame_active, cached_version,
cls.get_cached_init_for_specialization and then calls
init_func.invoke_exact_args), add the same recursion-guard check used by other
non-vectorcall specialization CALL paths before invoking init_func: attempt to
acquire the specialization/recursion guard (the same mechanism those paths use)
and if acquiring the guard fails, fall back to calling
self.execute_call_vectorcall(nargs, vm) instead of invoking init_func directly;
ensure the guard is released appropriately around the
init_func.invoke_exact_args call so behavior matches other specialized CALLs
(reference init_func.invoke_exact_args,
self.specialization_has_datastack_space_for_func, and
self.execute_call_vectorcall to locate the code).
- Around line 8270-8272: specialization_call_recursion_guard currently only
compares current_recursion_depth() with recursion_limit and thus misses the VM's
C-stack-aware recursion/C-stack headroom check; replace that depth-only test in
fn specialization_call_recursion_guard(&self, vm: &VirtualMachine) with the same
recursion guard primitive used by normal call paths (i.e., call the VM method
that performs the full C-stack-aware recursion check/guard used for regular
calls, rather than using current_recursion_depth().saturating_add(1) >=
recursion_limit), or if that primitive exposes a conservative API to force
vectorcall, invoke that to ensure the vectorcall fastpath is disabled when
C-stack headroom is low; keep references to VirtualMachine methods
(current_recursion_depth, recursion_limit) only for context and remove the
depth-only comparison.
---
Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4701-4745: The CallKwBoundMethod specialized path needs the same
recursion-guard fallback as CallKwPy: before taking the
non-vectorcall/bound-method fastpath (i.e., before unpacking kwarg
names/all_args and calling vectorcall_function inside the CallKwBoundMethod
branch), call specialization_call_recursion_guard(self, vm) and if it indicates
recursion, return the generic fallback (the non-specialized
execute_call_kw_vectorcall path) instead of continuing the fastpath; update the
branch in execute_call_kw_vectorcall/CallKwBoundMethod to perform this early
guard check.
---
Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 225-263: The manual GC hooks for PyType are missing handling for
HeapTypeExt::specialization_init which holds a strong PyRef<PyFunction>; update
PyType::traverse to call tracer_fn on the specialization_init edge (e.g., read
the specialization_init field and invoke its traverse/tracer function) and
update PyType::clear to take/clear specialization_init (push the held
PyObjectRef into out when present) so that the strong reference is visited and
broken during GC; locate the fields and handlers via the
HeapTypeExt::specialization_init symbol and the unsafe impl
crate::object::Traverse for PyType (methods traverse and clear) and add
symmetric visit/clear logic.
- Around line 800-830: The tp_version_tag check must be re-validated while
holding the specialization_init lock to prevent a race with modified(); in
cache_init_for_specialization() acquire ext.specialization_init.write() first,
then re-check tp_version_tag (and tp_version != 0) before storing Some(init),
returning false if the tag changed; similarly in
get_cached_init_for_specialization() acquire ext.specialization_init.read()
first, then re-check tp_version_tag (and tp_version != 0) while holding the read
lock and return None if it changed, otherwise clone and return the cached init.
Ensure you use the same ext from heaptype_ext.as_ref() and the existing
read()/write() guards to avoid changing locking primitives.
---
Nitpick comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 643-651: Both can_specialize_call and
is_simple_for_call_specialization duplicate the same "simple function" shape
checks; extract the shared predicate into a single helper (e.g.,
has_simple_call_shape or similar) and call it from both functions. Update
is_simple_for_call_specialization to delegate to the helper (using the existing
self.code and its flags/kwonlyarg_count logic) and change can_specialize_call to
use the same helper so future flag/arg-count changes are made in one place;
reference the existing symbols can_specialize_call and
is_simple_for_call_specialization when locating where to replace the duplicated
logic.
In `@crates/vm/src/builtins/object.rs`:
- Around line 108-121: Add a defensive debug assertion in the generic_alloc
function to ensure the ignored _nitems argument is zero so callers don't
accidentally request variable-size allocation; inside generic_alloc (which takes
cls: PyTypeRef, _nitems: usize, vm: &VirtualMachine) add a
debug_assert_eq!(_nitems, 0, "...") (or debug_assert!(_nitems == 0, "...")) with
a brief message indicating variable-size allocation is not supported for
PyBaseObject.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: ace14bfe-8231-4e06-8403-f4017c4f9cb7
📒 Files selected for processing (6)
crates/vm/src/builtins/function.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rscrates/vm/src/vm/mod.rs
| if !self.specialization_eval_frame_active(vm) | ||
| && !self_or_null_is_some | ||
| && cached_version != 0 | ||
| && let Some(cls) = callable.downcast_ref::<PyType>() | ||
| && cls.tp_version_tag.load(Acquire) == cached_version | ||
| && let Some(init_func) = cls.get_cached_init_for_specialization(cached_version) | ||
| && let (Some(cls_alloc), Some(object_alloc_fn)) = | ||
| (cls.slots.alloc.load(), object_alloc) | ||
| && cls_alloc as usize == object_alloc_fn as usize | ||
| { | ||
| // Look up __init__ (guarded by type_version) | ||
| if let Some(init) = cls.get_attr(identifier!(vm, __init__)) | ||
| && let Some(init_func) = init.downcast_ref_if_exact::<PyFunction>(vm) | ||
| && init_func.can_specialize_call(nargs + 1) | ||
| { | ||
| // Allocate object directly (tp_new == object.__new__) | ||
| let dict = if cls | ||
| .slots | ||
| .flags | ||
| .has_feature(crate::types::PyTypeFlags::HAS_DICT) | ||
| { | ||
| Some(vm.ctx.new_dict()) | ||
| } else { | ||
| None | ||
| }; | ||
| let cls_ref = cls.to_owned(); | ||
| let new_obj: PyObjectRef = | ||
| PyRef::new_ref(PyBaseObject, cls_ref, dict).into(); | ||
|
|
||
| // Build args: [new_obj, arg1, ..., argN] | ||
| let pos_args: Vec<PyObjectRef> = | ||
| self.pop_multiple(nargs as usize).collect(); | ||
| let _null = self.pop_value_opt(); // self_or_null (None) | ||
| let _callable = self.pop_value(); // callable (type) | ||
| if !self.specialization_has_datastack_space_for_func(vm, &init_func) { | ||
| return self.execute_call_vectorcall(nargs, vm); | ||
| } | ||
| // Allocate object directly (tp_new == object.__new__, tp_alloc == generic). | ||
| let cls_ref = cls.to_owned(); | ||
| let new_obj = cls_alloc(cls_ref, 0, vm)?; | ||
|
|
||
| let mut all_args = Vec::with_capacity(pos_args.len() + 1); | ||
| all_args.push(new_obj.clone()); | ||
| all_args.extend(pos_args); | ||
| // Build args: [new_obj, arg1, ..., argN] | ||
| let pos_args: Vec<PyObjectRef> = self.pop_multiple(nargs as usize).collect(); | ||
| let _null = self.pop_value_opt(); // self_or_null (None) | ||
| let _callable = self.pop_value(); // callable (type) | ||
|
|
||
| let init_result = init_func.invoke_exact_args(all_args, vm)?; | ||
| let mut all_args = Vec::with_capacity(pos_args.len() + 1); | ||
| all_args.push(new_obj.clone()); | ||
| all_args.extend(pos_args); | ||
|
|
||
| // EXIT_INIT_CHECK: __init__ must return None | ||
| if !vm.is_none(&init_result) { | ||
| return Err( | ||
| vm.new_type_error("__init__() should return None".to_owned()) | ||
| ); | ||
| } | ||
| let init_result = init_func.invoke_exact_args(all_args, vm)?; | ||
|
|
||
| self.push_value(new_obj); | ||
| return Ok(None); | ||
| // EXIT_INIT_CHECK: __init__ must return None | ||
| if !vm.is_none(&init_result) { | ||
| return Err(vm.new_type_error(format!( | ||
| "__init__() should return None, not '{}'", | ||
| init_result.class().name() | ||
| ))); | ||
| } | ||
|
|
||
| self.push_value(new_obj); | ||
| return Ok(None); |
There was a problem hiding this comment.
CallAllocAndEnterInit should apply recursion guard before fast __init__ invoke.
This path directly calls init_func.invoke_exact_args(...) but does not perform the recursion guard fallback used in other non-vectorcall specialized CALL paths. Add the same guard before invoking cached __init__.
Suggested patch
let object_alloc = vm.ctx.types.object_type.slots.alloc.load();
if !self.specialization_eval_frame_active(vm)
&& !self_or_null_is_some
+ && !self.specialization_call_recursion_guard(vm)
&& cached_version != 0
&& let Some(cls) = callable.downcast_ref::<PyType>()
&& cls.tp_version_tag.load(Acquire) == cached_version
&& let Some(init_func) = cls.get_cached_init_for_specialization(cached_version)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/frame.rs` around lines 4490 - 4527, In CallAllocAndEnterInit's
fast-path (the block that checks specialization_eval_frame_active,
cached_version, cls.get_cached_init_for_specialization and then calls
init_func.invoke_exact_args), add the same recursion-guard check used by other
non-vectorcall specialization CALL paths before invoking init_func: attempt to
acquire the specialization/recursion guard (the same mechanism those paths use)
and if acquiring the guard fails, fall back to calling
self.execute_call_vectorcall(nargs, vm) instead of invoking init_func directly;
ensure the guard is released appropriately around the
init_func.invoke_exact_args call so behavior matches other specialized CALLs
(reference init_func.invoke_exact_args,
self.specialization_has_datastack_space_for_func, and
self.execute_call_vectorcall to locate the code).
| fn specialization_call_recursion_guard(&self, vm: &VirtualMachine) -> bool { | ||
| vm.current_recursion_depth().saturating_add(1) >= vm.recursion_limit.get() | ||
| } |
There was a problem hiding this comment.
specialization_call_recursion_guard is depth-only, not a C-stack guard.
At Line 8271 this helper only checks logical recursion depth. That does not provide the same protection as the VM’s full recursion/C-stack check, so non-vectorcall fastpaths can still run when C-stack headroom is low.
Please switch this helper to the same guard primitive used by normal call recursion checks (or conservatively force vectorcall via that primitive).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/frame.rs` around lines 8270 - 8272,
specialization_call_recursion_guard currently only compares
current_recursion_depth() with recursion_limit and thus misses the VM's
C-stack-aware recursion/C-stack headroom check; replace that depth-only test in
fn specialization_call_recursion_guard(&self, vm: &VirtualMachine) with the same
recursion guard primitive used by normal call paths (i.e., call the VM method
that performs the full C-stack-aware recursion check/guard used for regular
calls, rather than using current_recursion_depth().saturating_add(1) >=
recursion_limit), or if that primitive exposes a conservative API to force
vectorcall, invoke that to ensure the vectorcall fastpath is disabled when
C-stack headroom is low; keep references to VirtualMachine methods
(current_recursion_depth, recursion_limit) only for context and remove the
depth-only comparison.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 855-874: cache_getitem_for_specialization and
get_cached_getitem_for_specialization currently check tp_version_tag before
taking the lock, leaving a race with modified(); mirror the init-cache locking
pattern by re-checking tp_version_tag after acquiring the specialization_getitem
write/read lock (i.e., take ext.specialization_getitem.write()/read() first,
then verify self.tp_version_tag.load(Ordering::Acquire) == tp_version and
func_version != 0, and only then store the getitem / return the cached value and
set specialization_getitem_version with Ordering::Release); reference the
functions cache_getitem_for_specialization,
get_cached_getitem_for_specialization, tp_version_tag, heaptype_ext,
specialization_getitem, and specialization_getitem_version when applying the
change.
- Around line 236-238: The code only traverses/clears
HeapTypeExt.specialization_init but misses the strong PyRef edge
HeapTypeExt.specialization_getitem, so update the GC traversal/clear sites to
include specialization_getitem alongside specialization_init; specifically,
where you call ext.specialization_init.read().traverse(tracer_fn) also call
ext.specialization_getitem.read().traverse(tracer_fn), and where you clear
specialization_init (e.g., ext.specialization_init.write().clear(...) or
similar) also clear ext.specialization_getitem; apply the same change at the
other two related spots referenced (around the other traverse/clear blocks at
the indicated ranges).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 426e4d46-2da2-4c9b-929d-136900f2d6ac
📒 Files selected for processing (1)
crates/vm/src/builtins/type.rs
f352115 to
a91dd29
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/builtins/function.rs (1)
536-565:⚠️ Potential issue | 🟠 MajorDon’t drop explicit
localson the JIT fast path.
invoke_with_locals()now accepts a caller-supplied locals mapping, but the early#[cfg(feature = "jit")]return still ignores it. A jitted function invoked with explicit locals will run in the wrong namespace.Proposed fix
#[cfg(feature = "jit")] - if let Some(jitted_code) = self.jitted_code.lock().as_ref() { - use crate::convert::ToPyObject; - match jit::get_jit_args(self, &func_args, jitted_code, vm) { - Ok(args) => { - return Ok(args.invoke().to_pyobject(vm)); - } - Err(err) => info!( - "jit: function `{}` is falling back to being interpreted because of the \ - error: {}", - self.code.obj_name, err - ), + if locals.is_none() { + if let Some(jitted_code) = self.jitted_code.lock().as_ref() { + use crate::convert::ToPyObject; + match jit::get_jit_args(self, &func_args, jitted_code, vm) { + Ok(args) => { + return Ok(args.invoke().to_pyobject(vm)); + } + Err(err) => info!( + "jit: function `{}` is falling back to being interpreted because of the \ + error: {}", + self.code.obj_name, err + ), + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/function.rs` around lines 536 - 565, The JIT fast-path in invoke_with_locals currently ignores the caller-supplied locals; compute the effective locals the same way as the interpreted path before the #[cfg(feature = "jit")] block (i.e. if code.flags contains CodeFlags::NEWLOCALS then effective_locals = None else use the provided locals or Some(ArgMapping::from_dict_exact(self.globals.clone()))), and pass that resolved locals into the JIT call (jit::get_jit_args / jitted invocation) instead of returning early without them so jitted_code, jit::get_jit_args, func_args and the locals parameter are honored.
♻️ Duplicate comments (2)
crates/vm/src/builtins/type.rs (2)
236-238:⚠️ Potential issue | 🟠 Major
specialization_getitemalso needs manual traverse/clear.
HeapTypeExtnow owns another strongPyRef<PyFunction>, but onlyspecialization_initparticipates inPyType’s manual GC hooks. That can keep class/function cycles alive aftertype_clear().Proposed fix
fn traverse(&self, tracer_fn: &mut crate::object::TraverseFn<'_>) { self.base.traverse(tracer_fn); self.bases.traverse(tracer_fn); self.mro.traverse(tracer_fn); self.subclasses.traverse(tracer_fn); self.attributes .read_recursive() .iter() .map(|(_, v)| v.traverse(tracer_fn)) .count(); if let Some(ext) = self.heaptype_ext.as_ref() { ext.specialization_init.read().traverse(tracer_fn); + ext.specialization_getitem.read().traverse(tracer_fn); } } @@ if let Some(ext) = self.heaptype_ext.as_ref() && let Some(mut guard) = ext.specialization_init.try_write() && let Some(init) = guard.take() { out.push(init.into()); } + if let Some(ext) = self.heaptype_ext.as_ref() + && let Some(mut guard) = ext.specialization_getitem.try_write() + && let Some(getitem) = guard.take() + { + out.push(getitem.into()); + ext.specialization_getitem_version.store(0, Ordering::Release); + } }Also applies to: 266-283
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 236 - 238, HeapTypeExt also holds a strong PyRef for specialization_getitem which isn’t included in the manual GC hooks; update the manual traversal and clearing code in the PyType hooks to also visit and drop this field. In the traversal path (where specialization_init is traversed) add a call to traverse the specialization_getitem PyRef (e.g., ext.specialization_getitem.read().traverse(tracer_fn)); in the clear path (where specialization_init is cleared) also clear/take the specialization_getitem PyRef (e.g., ext.specialization_getitem.write().take() or set to None) so both specialization_init and specialization_getitem are handled; update both the block around the existing traverse (the one calling ext.specialization_init.read().traverse(...)) and the corresponding clear block mirrored in the other function region (the section covering lines ~266-283).
855-890:⚠️ Potential issue | 🟠 MajorGuard the
__getitem__cache with the type version under the lock.
cache_getitem_for_specialization()still validatestp_version_tagbefore taking the write lock, andget_cached_getitem_for_specialization()no longer receives the specialization-time type version at all. A concurrentmodified()can republish or read a stale cached__getitem__after the type is re-versioned.Proposed fix
pub(crate) fn cache_getitem_for_specialization( &self, getitem: PyRef<PyFunction>, tp_version: u32, ) -> bool { let Some(ext) = self.heaptype_ext.as_ref() else { return false; }; - if tp_version == 0 || self.tp_version_tag.load(Ordering::Acquire) != tp_version { + if tp_version == 0 { return false; } let func_version = getitem.get_version_for_current_state(); if func_version == 0 { return false; } - *ext.specialization_getitem.write() = Some(getitem); - ext.specialization_getitem_version - .store(func_version, Ordering::Release); + let mut guard = ext.specialization_getitem.write(); + if self.tp_version_tag.load(Ordering::Acquire) != tp_version { + return false; + } + *guard = Some(getitem); + ext.specialization_getitem_version.store(func_version, Ordering::Release); true } @@ - pub(crate) fn get_cached_getitem_for_specialization(&self) -> Option<(PyRef<PyFunction>, u32)> { + pub(crate) fn get_cached_getitem_for_specialization( + &self, + tp_version: u32, + ) -> Option<(PyRef<PyFunction>, u32)> { let ext = self.heaptype_ext.as_ref()?; - if self.tp_version_tag.load(Ordering::Acquire) == 0 { + if tp_version == 0 { return None; } + let guard = ext.specialization_getitem.read(); + if self.tp_version_tag.load(Ordering::Acquire) != tp_version { + return None; + } let cached_version = ext.specialization_getitem_version.load(Ordering::Acquire); if cached_version == 0 { return None; } - ext.specialization_getitem - .read() - .as_ref() - .map(|getitem| (getitem.to_owned(), cached_version)) + guard + .as_ref() + .map(|getitem| (getitem.to_owned(), cached_version)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 855 - 890, cache_getitem_for_specialization and get_cached_getitem_for_specialization must validate the type version while holding the specialization_getitem lock to avoid races with modified(); change cache_getitem_for_specialization to acquire ext.specialization_getitem.write() first, then re-check self.tp_version_tag.load(...) == tp_version before storing the getitem and version; change get_cached_getitem_for_specialization to acquire ext.specialization_getitem.read(), then check self.tp_version_tag (or accept a tp_version param and compare) while holding that read lock and return None if the tag/version changed or is zero, only returning the cached (getitem, cached_version) when both the lock-held check and cached_version are valid (use specialization_getitem and specialization_getitem_version fields as now).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/object.rs`:
- Around line 100-120: The current implementation of object.__new__ always uses
generic_alloc and ignores the type's allocator slot; update slot_new (the path
that currently calls generic_alloc) to dispatch through cls.slots.alloc when
present (i.e., call the type's alloc slot with the same parameters that tp_alloc
would get) and fall back to generic_alloc only if no alloc slot is installed,
ensuring cls.slots.alloc is used consistently for types that override allocation
rather than bypassing tp_alloc.
---
Outside diff comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 536-565: The JIT fast-path in invoke_with_locals currently ignores
the caller-supplied locals; compute the effective locals the same way as the
interpreted path before the #[cfg(feature = "jit")] block (i.e. if code.flags
contains CodeFlags::NEWLOCALS then effective_locals = None else use the provided
locals or Some(ArgMapping::from_dict_exact(self.globals.clone()))), and pass
that resolved locals into the JIT call (jit::get_jit_args / jitted invocation)
instead of returning early without them so jitted_code, jit::get_jit_args,
func_args and the locals parameter are honored.
---
Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 236-238: HeapTypeExt also holds a strong PyRef for
specialization_getitem which isn’t included in the manual GC hooks; update the
manual traversal and clearing code in the PyType hooks to also visit and drop
this field. In the traversal path (where specialization_init is traversed) add a
call to traverse the specialization_getitem PyRef (e.g.,
ext.specialization_getitem.read().traverse(tracer_fn)); in the clear path (where
specialization_init is cleared) also clear/take the specialization_getitem PyRef
(e.g., ext.specialization_getitem.write().take() or set to None) so both
specialization_init and specialization_getitem are handled; update both the
block around the existing traverse (the one calling
ext.specialization_init.read().traverse(...)) and the corresponding clear block
mirrored in the other function region (the section covering lines ~266-283).
- Around line 855-890: cache_getitem_for_specialization and
get_cached_getitem_for_specialization must validate the type version while
holding the specialization_getitem lock to avoid races with modified(); change
cache_getitem_for_specialization to acquire ext.specialization_getitem.write()
first, then re-check self.tp_version_tag.load(...) == tp_version before storing
the getitem and version; change get_cached_getitem_for_specialization to acquire
ext.specialization_getitem.read(), then check self.tp_version_tag (or accept a
tp_version param and compare) while holding that read lock and return None if
the tag/version changed or is zero, only returning the cached (getitem,
cached_version) when both the lock-held check and cached_version are valid (use
specialization_getitem and specialization_getitem_version fields as now).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d3962f85-b798-4349-bb16-790f6e80b78d
📒 Files selected for processing (8)
crates/derive-impl/src/pyclass.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/type.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/derive-impl/src/pyclass.rs
- crates/vm/src/types/slot.rs
- crates/vm/src/vm/thread.rs
63fc837 to
e97dab9
Compare
37501dc to
e65cbc0
Compare
Summary
len,isinstance, andlist.appendcallables inVirtualMachine::callable_cachefor identity-based CALL guards (replacing per-site pointer caches)__init__PyFunction inHeapTypeExtforCALL_ALLOC_AND_ENTER_INIT, invalidated ontype.modified()objclasstype check toCALL_METHOD_DESCRIPTOR_*runtime guardsCO_OPTIMIZEDflag is absentdrop_first_typedfor#[pymethod(raw)]in call-flags inferenceTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Refactor
Bug Fixes