X Tutup
Skip to content

Harden CALL specialization guards and cache callables#7360

Merged
youknowone merged 6 commits intoRustPython:mainfrom
youknowone:specialization
Mar 8, 2026
Merged

Harden CALL specialization guards and cache callables#7360
youknowone merged 6 commits intoRustPython:mainfrom
youknowone:specialization

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 5, 2026

Summary

  • Cache canonical len, isinstance, and list.append callables in VirtualMachine::callable_cache for identity-based CALL guards (replacing per-site pointer caches)
  • Cache __init__ PyFunction in HeapTypeExt for CALL_ALLOC_AND_ENTER_INIT, invalidated on type.modified()
  • Add objclass type check to CALL_METHOD_DESCRIPTOR_* runtime guards
  • Add C-stack recursion guard on non-vectorcall CALL fastpaths
  • Keep EXIT-style CALL specialized opcodes on guard miss instead of deoptimizing
  • Skip CALL specialization when CO_OPTIMIZED flag is absent
  • Fix drop_first_typed for #[pymethod(raw)] in call-flags inference

Test plan

  • CI passes on all platforms (Ubuntu, macOS, Windows)
  • Specialization guards correctly reject mismatched callables
  • Raw methods infer correct call convention flags

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor

    • Enhanced call specialization, caching, and fast-paths to speed common function/method invocations, reduce deoptimization, and prefer vectorcall-like routes.
    • Added per-type cached init/getitem specializations with invalidation on type changes.
    • Introduced a per-VM callable cache propagated across threads.
    • Added a pluggable allocation hook for object/type allocation and a tp_alloc slot.
  • Bug Fixes

    • Adjusted method argument handling to correct binding behavior in certain edge cases.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds VM-level callable cache and datastack utilities, per-type specialization caches for init/getitem, extended function/frame invocation paths using OPTIMIZED semantics and locals propagation, a tp_alloc slot with generic allocator wiring, and stricter native method first-arg dropping rules. (49 words)

Changes

Cohort / File(s) Summary
VM callable cache & datastack
crates/vm/src/vm/mod.rs, crates/vm/src/vm/thread.rs
Add CallableCache to VirtualMachine, initialize it at startup, propagate to new threads, and add datastack_has_space helper.
Function invocation & specialization
crates/vm/src/builtins/function.rs
Add is_optimized_for_call_specialization, is_simple_for_call_specialization, has_exact_argcount, datastack_frame_size_bytes; change gating to OPTIMIZED; extend invoke_with_locals to accept Option<ArgMapping> and route generators/coroutines to safe invoke paths.
Frame-level specialization & call paths
crates/vm/src/frame.rs
Rewrite many opcodes to prefer cache-guided specialization with guarded fallbacks to vectorcall; add specialization guards, datastack checks, and harmonized argument assembly for bound methods/descriptor call paths.
Per-type specialization caches & slot propagation
crates/vm/src/builtins/type.rs
Add specialization_init / specialization_getitem caches with versioning, cache accessors, initialize/clear logic, and set_alloc helper to propagate alloc slot from base types.
Type slots: alloc support
crates/vm/src/types/slot.rs
Introduce AllocFunc alias and add alloc: AtomicCell<Option<AllocFunc>> to PyTypeSlots to represent tp_alloc.
Object allocation plumbing
crates/vm/src/builtins/object.rs
Introduce pub(crate) fn generic_alloc(...) and wire object_type.slots.alloc to use it, centralizing instance allocation and conditional dict creation based on HAS_DICT.
Method-binding semantics
crates/derive-impl/src/pyclass.rs
Tighten drop_first_typed for AttrName::Method/ClassMethod to require both "no receiver" and raw == false, changing infer_native_call_flags behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant VM as VirtualMachine
    participant Cache as CallableCache
    participant Frame as ExecutingFrame
    participant Type as PyType
    participant Func as PyFunction

    Client->>VM: execute bytecode / CALL
    VM->>Cache: init_callable_cache()
    Cache-->>VM: builtin callables cached

    Frame->>VM: datastack_has_space(size)?
    Frame->>Cache: consult callable cache
    alt specialization viable
        Frame->>Type: get_cached_init_for_specialization(tp_version)
        Type-->>Frame: maybe PyFunction
        Frame->>Func: is_optimized_for_call_specialization / has_exact_argcount
        Frame->>Frame: build frame with locals and use vectorcall/specialized path
    else fallback
        Frame->>Func: invoke_with_locals(func_args, Option<locals>, vm)
        Func-->>Frame: generic invoke (or generator/coroutine path)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 I hop through caches, quick and spry,
I stash len and isinstance nearby.
Init and getitem snug in type's den,
Fastpaths leap — fall back now and then.
A rabbit cheers: optimized calls again!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Harden CALL specialization guards and cache callables' directly and clearly summarizes the primary changes in this changeset, which center on strengthening CALL specialization guards and implementing callable caching.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)

4519-4561: ⚠️ Potential issue | 🔴 Critical

Add recursion guard to FASTCALL method-descriptor specialized paths.

Line 4559 and Line 4692 call func(vm, args) without the C-stack guard already added in the NOARGS/O descriptor fastpaths. This can bypass recursion protection in these specialized non-vectorcall paths.

Proposed fix
@@
                     let args = FuncArgs {
                         args: all_args,
                         kwargs: Default::default(),
                     };
+                    if vm.reached_c_stack_limit() {
+                        return Err(vm.new_recursion_error(String::new()));
+                    }
                     let result = func(vm, args)?;
                     self.push_value(result);
                     return Ok(None);
@@
                     let args = FuncArgs {
                         args: all_args,
                         kwargs: Default::default(),
                     };
+                    if vm.reached_c_stack_limit() {
+                        return Err(vm.new_recursion_error(String::new()));
+                    }
                     let result = func(vm, args)?;
                     self.push_value(result);
                     return Ok(None);

Also applies to: 4650-4694

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4519 - 4561, The FASTCALL
PyMethodDescriptor fast-path calls func(vm, args) (the local variable func
representing descr.method.func) without the C-stack/recursion guard used in the
NOARGS/O fastpaths, so add the same recursion/C-stack guard wrapper around the
call to func(vm, args) (and any other identical calls in the
FASTCALL-specialized blocks around the PyMethodDescriptor handling, e.g., the
blocks guarded by descr.method.flags.contains(PyMethodFlags::METHOD) && ... ==
PyMethodFlags::FASTCALL) so the native call executes inside the existing guard
helper used elsewhere in frame.rs for NOARGS/O paths (wrap the FuncArgs call the
same way the NOARGS/O fastpath wraps descr.method.func).
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

4418-4455: Consider extracting shared descriptor-args assembly into one helper.

The four CALL_METHOD_DESCRIPTOR_* branches duplicate nearly identical stack-pop and FuncArgs construction logic. A single helper would reduce drift and make guard consistency easier to maintain.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

Also applies to: 4469-4506, 4519-4557, 4650-4690

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4418 - 4455, Duplicate logic across the
CALL_METHOD_DESCRIPTOR_* branches should be consolidated: extract the shared
sequence that pops the callable/positional/self values from the frame (using
self.pop_multiple(n), self.pop_value_opt(), self.pop_value()), checks/uses
nth_value and localsplus.stack_index, and constructs the FuncArgs (args Vec and
kwargs default) into a single helper function (e.g.,
assemble_descriptor_call_args or similar) that returns the built FuncArgs and
the method func to invoke; then replace each branch (the ones checking
PyMethodDescriptor, PyMethodFlags and class guard) to compute only the differing
guard/value (descr and any flag-dependent selection) and call the new helper to
get the FuncArgs before continuing, updating usages of positional_args,
all_args, self_or_null, and func to the helper's return values; apply the same
refactor at the other listed regions (around lines 4469-4506, 4519-4557,
4650-4690) so the common stack-manipulation and FuncArgs construction is
implemented once and reused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4519-4561: The FASTCALL PyMethodDescriptor fast-path calls
func(vm, args) (the local variable func representing descr.method.func) without
the C-stack/recursion guard used in the NOARGS/O fastpaths, so add the same
recursion/C-stack guard wrapper around the call to func(vm, args) (and any other
identical calls in the FASTCALL-specialized blocks around the PyMethodDescriptor
handling, e.g., the blocks guarded by
descr.method.flags.contains(PyMethodFlags::METHOD) && ... ==
PyMethodFlags::FASTCALL) so the native call executes inside the existing guard
helper used elsewhere in frame.rs for NOARGS/O paths (wrap the FuncArgs call the
same way the NOARGS/O fastpath wraps descr.method.func).

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 4418-4455: Duplicate logic across the CALL_METHOD_DESCRIPTOR_*
branches should be consolidated: extract the shared sequence that pops the
callable/positional/self values from the frame (using self.pop_multiple(n),
self.pop_value_opt(), self.pop_value()), checks/uses nth_value and
localsplus.stack_index, and constructs the FuncArgs (args Vec and kwargs
default) into a single helper function (e.g., assemble_descriptor_call_args or
similar) that returns the built FuncArgs and the method func to invoke; then
replace each branch (the ones checking PyMethodDescriptor, PyMethodFlags and
class guard) to compute only the differing guard/value (descr and any
flag-dependent selection) and call the new helper to get the FuncArgs before
continuing, updating usages of positional_args, all_args, self_or_null, and func
to the helper's return values; apply the same refactor at the other listed
regions (around lines 4469-4506, 4519-4557, 4650-4690) so the common
stack-manipulation and FuncArgs construction is implemented once and reused.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0ecc623d-10ac-48b4-a663-24eae592ed81

📥 Commits

Reviewing files that changed from the base of the PR and between 3b91466 and 0fc9ffe.

📒 Files selected for processing (6)
  • crates/derive-impl/src/pyclass.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/thread.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)

4555-4601: ⚠️ Potential issue | 🟠 Major

Complete recursion/C-stack guard coverage across remaining direct specialized CALL paths.

CallMethodDescriptorFast, CallMethodDescriptorFastWithKeywords, and CallKwBoundMethod still miss the same guard/fallback pattern already added to sibling specialized paths. This leaves inconsistent safety behavior for deep recursive call chains.

Suggested patch
@@
             Instruction::CallMethodDescriptorFast => {
                 let nargs: u32 = arg.into();
@@
                 if total_nargs > 0
                     && let Some(descr) = callable.downcast_ref_if_exact::<PyMethodDescriptor>(vm)
@@
                 {
+                    if vm.reached_c_stack_limit() {
+                        return self.execute_call_vectorcall(nargs, vm);
+                    }
                     let func = descr.method.func;
@@
             Instruction::CallMethodDescriptorFastWithKeywords => {
@@
                 if total_nargs > 0
                     && let Some(descr) = callable.downcast_ref_if_exact::<PyMethodDescriptor>(vm)
@@
                 {
+                    if vm.reached_c_stack_limit() {
+                        return self.execute_call_vectorcall(nargs, vm);
+                    }
                     let func = descr.method.func;
@@
             Instruction::CallKwBoundMethod => {
@@
                 if self.specialization_eval_frame_active(vm) {
                     return self.execute_call_kw_vectorcall(nargs, vm);
                 }
+                if self.specialization_call_recursion_guard(vm) {
+                    return self.execute_call_kw_vectorcall(nargs, vm);
+                }

Also applies to: 4695-4742, 4865-4909

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4555 - 4601, The optimized fast-paths
for Instruction::CallMethodDescriptorFast (and the sibling cases
CallMethodDescriptorFastWithKeywords and CallKwBoundMethod) bypass the
recursion/C-stack guard that other specialized CALL paths use; update these
cases to perform the same recursion/C-stack safety check and fallback to the
generic call path when the guard triggers (i.e., before executing the inlined
descriptor fast-path, run the same guard logic used by the other CALL
specializations and, on guard failure, call self.execute_call_vectorcall(...) or
the existing generic fallback), ensuring the fast-path retains the same
conditions (downcast to PyMethodDescriptor, flag checks, self type check) but
only proceeds after the guard passes so deep recursive chains fall back
consistently.
🧹 Nitpick comments (1)
crates/vm/src/builtins/function.rs (1)

643-663: Reduce predicate drift between specialization helpers.

is_simple_for_call_specialization and can_specialize_call duplicate most eligibility checks. It would be safer to compose one from the other.

💡 Suggested refactor
 pub(crate) fn can_specialize_call(&self, effective_nargs: u32) -> bool {
-    let code: &Py<PyCode> = &self.code;
-    let flags = code.flags;
-    flags.contains(bytecode::CodeFlags::OPTIMIZED)
-        && !flags.intersects(bytecode::CodeFlags::VARARGS | bytecode::CodeFlags::VARKEYWORDS)
-        && code.kwonlyarg_count == 0
-        && code.arg_count == effective_nargs
+    self.is_simple_for_call_specialization() && self.code.arg_count == effective_nargs
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 643 - 663, Both helpers
duplicate the same eligibility checks; simplify by composing them: keep
is_simple_for_call_specialization as the single predicate for flag/kw-only
checks, and change can_specialize_call to call
is_simple_for_call_specialization(&self) and then only compare code.arg_count ==
effective_nargs to decide specialization. Update can_specialize_call to
reference self.code.arg_count after delegating the common checks to
is_simple_for_call_specialization and remove the duplicated flags/kwonly logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 790-818: Re-check tp_version_tag while holding the appropriate
cache lock before writing/returning cached __init__: in
cache_init_for_specialization, after obtaining ext.specialization_init.write()
(the write lock), re-load and compare
self.tp_version_tag.load(Ordering::Acquire) with the provided tp_version and
return false if they differ, only then store Some(init); similarly, in
get_cached_init_for_specialization acquire ext.specialization_init.read() (the
read lock) and re-check tp_version_tag under that lock before returning the
cached value (return None if the version changed). Ensure you still keep the
early None/false checks for tp_version == 0 and the initial
heaptype_ext.as_ref()?.
- Around line 272-273: PyType currently holds specialization_init:
PyRwLock<Option<PyRef<PyFunction>>> but its manual GC hooks don’t visit or clear
that strong edge; update PyType’s traverse implementation to acquire a read lock
on specialization_init and, if Some(func), call the visitor on the contained
PyRef<PyFunction>, and update PyType’s clear implementation to acquire a write
lock and set specialization_init to None so the reference is dropped during
attribute clearing. Ensure you use the existing traverse/clear patterns for
other Option<PyRef<...>> fields to match locking/error handling conventions.

---

Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4555-4601: The optimized fast-paths for
Instruction::CallMethodDescriptorFast (and the sibling cases
CallMethodDescriptorFastWithKeywords and CallKwBoundMethod) bypass the
recursion/C-stack guard that other specialized CALL paths use; update these
cases to perform the same recursion/C-stack safety check and fallback to the
generic call path when the guard triggers (i.e., before executing the inlined
descriptor fast-path, run the same guard logic used by the other CALL
specializations and, on guard failure, call self.execute_call_vectorcall(...) or
the existing generic fallback), ensuring the fast-path retains the same
conditions (downcast to PyMethodDescriptor, flag checks, self type check) but
only proceeds after the guard passes so deep recursive chains fall back
consistently.

---

Nitpick comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 643-663: Both helpers duplicate the same eligibility checks;
simplify by composing them: keep is_simple_for_call_specialization as the single
predicate for flag/kw-only checks, and change can_specialize_call to call
is_simple_for_call_specialization(&self) and then only compare code.arg_count ==
effective_nargs to decide specialization. Update can_specialize_call to
reference self.code.arg_count after delegating the common checks to
is_simple_for_call_specialization and remove the duplicated flags/kwonly logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d9f04714-53af-4172-8555-919635db788f

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc9ffe and 72e91ac.

📒 Files selected for processing (6)
  • crates/derive-impl/src/pyclass.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/thread.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/vm/src/vm/thread.rs
  • crates/derive-impl/src/pyclass.rs

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin specialization

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)

4701-4745: ⚠️ Potential issue | 🟠 Major

CallKwBoundMethod is missing the recursion guard fallback.

This specialized path has the eval-frame guard, but unlike CallKwPy it does not apply specialization_call_recursion_guard before taking the non-vectorcall fastpath. Add the same early fallback to execute_call_kw_vectorcall.

Suggested patch
             Instruction::CallKwBoundMethod => {
                 let instr_idx = self.lasti() as usize - 1;
                 let cache_base = instr_idx + 1;
                 let cached_version = self.code.instructions.read_cache_u32(cache_base + 1);
                 let nargs: u32 = arg.into();
                 if self.specialization_eval_frame_active(vm) {
                     return self.execute_call_kw_vectorcall(nargs, vm);
                 }
+                if self.specialization_call_recursion_guard(vm) {
+                    return self.execute_call_kw_vectorcall(nargs, vm);
+                }
                 // Stack: [callable, self_or_null, arg1, ..., argN, kwarg_names]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4701 - 4745, The CallKwBoundMethod
specialized path needs the same recursion-guard fallback as CallKwPy: before
taking the non-vectorcall/bound-method fastpath (i.e., before unpacking kwarg
names/all_args and calling vectorcall_function inside the CallKwBoundMethod
branch), call specialization_call_recursion_guard(self, vm) and if it indicates
recursion, return the generic fallback (the non-specialized
execute_call_kw_vectorcall path) instead of continuing the fastpath; update the
branch in execute_call_kw_vectorcall/CallKwBoundMethod to perform this early
guard check.
♻️ Duplicate comments (2)
crates/vm/src/builtins/type.rs (2)

225-263: ⚠️ Potential issue | 🟠 Major

Track and clear specialization_init in PyType manual GC hooks.

HeapTypeExt::specialization_init introduces a strong PyRef<PyFunction> edge, but PyType::traverse/clear still don’t visit or clear it. This can keep type/function cycles alive.

💡 Suggested fix
 unsafe impl crate::object::Traverse for PyType {
     fn traverse(&self, tracer_fn: &mut crate::object::TraverseFn<'_>) {
         self.base.traverse(tracer_fn);
         self.bases.traverse(tracer_fn);
         self.mro.traverse(tracer_fn);
         self.subclasses.traverse(tracer_fn);
         self.attributes
             .read_recursive()
             .iter()
             .map(|(_, v)| v.traverse(tracer_fn))
             .count();
+        if let Some(ext) = self.heaptype_ext.as_ref() {
+            ext.specialization_init.read().traverse(tracer_fn);
+        }
     }

     /// type_clear: break reference cycles in type objects
     fn clear(&mut self, out: &mut Vec<crate::PyObjectRef>) {
+        if let Some(ext) = self.heaptype_ext.as_ref()
+            && let Some(mut guard) = ext.specialization_init.try_write()
+            && let Some(init) = guard.take()
+        {
+            out.push(init.into());
+        }
         if let Some(base) = self.base.take() {
             out.push(base.into());
         }
         // ...
     }
 }
#!/bin/bash
set -euo pipefail
file="crates/vm/src/builtins/type.rs"

# Verify specialization_init exists and inspect manual GC hooks.
rg -n "specialization_init" "$file"
rg -n -A25 -B5 "unsafe impl crate::object::Traverse for PyType" "$file"
rg -n -A45 -B5 "fn clear\(&mut self, out: &mut Vec<crate::PyObjectRef>\)" "$file"

Also applies to: 267-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 225 - 263, The manual GC hooks
for PyType are missing handling for HeapTypeExt::specialization_init which holds
a strong PyRef<PyFunction>; update PyType::traverse to call tracer_fn on the
specialization_init edge (e.g., read the specialization_init field and invoke
its traverse/tracer function) and update PyType::clear to take/clear
specialization_init (push the held PyObjectRef into out when present) so that
the strong reference is visited and broken during GC; locate the fields and
handlers via the HeapTypeExt::specialization_init symbol and the unsafe impl
crate::object::Traverse for PyType (methods traverse and clear) and add
symmetric visit/clear logic.

800-830: ⚠️ Potential issue | 🟠 Major

Re-check tp_version_tag while holding the cache lock.

Both cache accessors validate tp_version_tag before acquiring specialization_init lock. A concurrent modified() can race and allow stale __init__ to be written/read across versions.

💡 Suggested fix
 pub(crate) fn cache_init_for_specialization(
     &self,
     init: PyRef<PyFunction>,
     tp_version: u32,
 ) -> bool {
     let Some(ext) = self.heaptype_ext.as_ref() else {
         return false;
     };
-    if tp_version == 0 || self.tp_version_tag.load(Ordering::Acquire) != tp_version {
+    if tp_version == 0 {
         return false;
     }
-    *ext.specialization_init.write() = Some(init);
+    let mut guard = ext.specialization_init.write();
+    if self.tp_version_tag.load(Ordering::Acquire) != tp_version {
+        return false;
+    }
+    *guard = Some(init);
     true
 }

 pub(crate) fn get_cached_init_for_specialization(
     &self,
     tp_version: u32,
 ) -> Option<PyRef<PyFunction>> {
     let ext = self.heaptype_ext.as_ref()?;
-    if tp_version == 0 || self.tp_version_tag.load(Ordering::Acquire) != tp_version {
+    if tp_version == 0 {
         return None;
     }
-    ext.specialization_init
-        .read()
-        .as_ref()
-        .map(|init| init.to_owned())
+    let guard = ext.specialization_init.read();
+    if self.tp_version_tag.load(Ordering::Acquire) != tp_version {
+        return None;
+    }
+    guard.as_ref().cloned()
 }
#!/bin/bash
set -euo pipefail
file="crates/vm/src/builtins/type.rs"

# Inspect lock acquisition order vs tp_version_tag checks.
rg -n -A30 -B5 "fn cache_init_for_specialization" "$file"
rg -n -A30 -B5 "fn get_cached_init_for_specialization" "$file"
rg -n -A20 -B5 "pub fn modified\(&self\)" "$file"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 800 - 830, The tp_version_tag
check must be re-validated while holding the specialization_init lock to prevent
a race with modified(); in cache_init_for_specialization() acquire
ext.specialization_init.write() first, then re-check tp_version_tag (and
tp_version != 0) before storing Some(init), returning false if the tag changed;
similarly in get_cached_init_for_specialization() acquire
ext.specialization_init.read() first, then re-check tp_version_tag (and
tp_version != 0) while holding the read lock and return None if it changed,
otherwise clone and return the cached init. Ensure you use the same ext from
heaptype_ext.as_ref() and the existing read()/write() guards to avoid changing
locking primitives.
🧹 Nitpick comments (2)
crates/vm/src/builtins/function.rs (1)

643-651: Deduplicate specialization predicates to reduce drift risk.

can_specialize_call and is_simple_for_call_specialization share the same shape checks. Reusing the helper keeps future updates consistent.

♻️ Proposed refactor
 pub(crate) fn can_specialize_call(&self, effective_nargs: u32) -> bool {
-    let code: &Py<PyCode> = &self.code;
-    let flags = code.flags;
-    flags.contains(bytecode::CodeFlags::OPTIMIZED)
-        && !flags.intersects(bytecode::CodeFlags::VARARGS | bytecode::CodeFlags::VARKEYWORDS)
-        && code.kwonlyarg_count == 0
-        && code.arg_count == effective_nargs
+    self.is_simple_for_call_specialization() && self.code.arg_count == effective_nargs
 }

Also applies to: 656-663

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 643 - 651, Both
can_specialize_call and is_simple_for_call_specialization duplicate the same
"simple function" shape checks; extract the shared predicate into a single
helper (e.g., has_simple_call_shape or similar) and call it from both functions.
Update is_simple_for_call_specialization to delegate to the helper (using the
existing self.code and its flags/kwonlyarg_count logic) and change
can_specialize_call to use the same helper so future flag/arg-count changes are
made in one place; reference the existing symbols can_specialize_call and
is_simple_for_call_specialization when locating where to replace the duplicated
logic.
crates/vm/src/builtins/object.rs (1)

108-121: Add a defensive assertion for _nitems in generic_alloc.

Since _nitems is currently ignored, a debug assertion helps catch accidental variable-size allocation usage early.

🛡️ Suggested guard rail
 pub(crate) fn generic_alloc(cls: PyTypeRef, _nitems: usize, vm: &VirtualMachine) -> PyResult {
+    debug_assert_eq!(
+        _nitems, 0,
+        "generic_alloc currently supports fixed-size allocations only"
+    );
     // Only create dict if the class has HAS_DICT flag (i.e., __slots__ was not defined
     // or __dict__ is in __slots__)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/object.rs` around lines 108 - 121, Add a defensive
debug assertion in the generic_alloc function to ensure the ignored _nitems
argument is zero so callers don't accidentally request variable-size allocation;
inside generic_alloc (which takes cls: PyTypeRef, _nitems: usize, vm:
&VirtualMachine) add a debug_assert_eq!(_nitems, 0, "...") (or
debug_assert!(_nitems == 0, "...")) with a brief message indicating
variable-size allocation is not supported for PyBaseObject.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 4490-4527: In CallAllocAndEnterInit's fast-path (the block that
checks specialization_eval_frame_active, cached_version,
cls.get_cached_init_for_specialization and then calls
init_func.invoke_exact_args), add the same recursion-guard check used by other
non-vectorcall specialization CALL paths before invoking init_func: attempt to
acquire the specialization/recursion guard (the same mechanism those paths use)
and if acquiring the guard fails, fall back to calling
self.execute_call_vectorcall(nargs, vm) instead of invoking init_func directly;
ensure the guard is released appropriately around the
init_func.invoke_exact_args call so behavior matches other specialized CALLs
(reference init_func.invoke_exact_args,
self.specialization_has_datastack_space_for_func, and
self.execute_call_vectorcall to locate the code).
- Around line 8270-8272: specialization_call_recursion_guard currently only
compares current_recursion_depth() with recursion_limit and thus misses the VM's
C-stack-aware recursion/C-stack headroom check; replace that depth-only test in
fn specialization_call_recursion_guard(&self, vm: &VirtualMachine) with the same
recursion guard primitive used by normal call paths (i.e., call the VM method
that performs the full C-stack-aware recursion check/guard used for regular
calls, rather than using current_recursion_depth().saturating_add(1) >=
recursion_limit), or if that primitive exposes a conservative API to force
vectorcall, invoke that to ensure the vectorcall fastpath is disabled when
C-stack headroom is low; keep references to VirtualMachine methods
(current_recursion_depth, recursion_limit) only for context and remove the
depth-only comparison.

---

Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4701-4745: The CallKwBoundMethod specialized path needs the same
recursion-guard fallback as CallKwPy: before taking the
non-vectorcall/bound-method fastpath (i.e., before unpacking kwarg
names/all_args and calling vectorcall_function inside the CallKwBoundMethod
branch), call specialization_call_recursion_guard(self, vm) and if it indicates
recursion, return the generic fallback (the non-specialized
execute_call_kw_vectorcall path) instead of continuing the fastpath; update the
branch in execute_call_kw_vectorcall/CallKwBoundMethod to perform this early
guard check.

---

Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 225-263: The manual GC hooks for PyType are missing handling for
HeapTypeExt::specialization_init which holds a strong PyRef<PyFunction>; update
PyType::traverse to call tracer_fn on the specialization_init edge (e.g., read
the specialization_init field and invoke its traverse/tracer function) and
update PyType::clear to take/clear specialization_init (push the held
PyObjectRef into out when present) so that the strong reference is visited and
broken during GC; locate the fields and handlers via the
HeapTypeExt::specialization_init symbol and the unsafe impl
crate::object::Traverse for PyType (methods traverse and clear) and add
symmetric visit/clear logic.
- Around line 800-830: The tp_version_tag check must be re-validated while
holding the specialization_init lock to prevent a race with modified(); in
cache_init_for_specialization() acquire ext.specialization_init.write() first,
then re-check tp_version_tag (and tp_version != 0) before storing Some(init),
returning false if the tag changed; similarly in
get_cached_init_for_specialization() acquire ext.specialization_init.read()
first, then re-check tp_version_tag (and tp_version != 0) while holding the read
lock and return None if it changed, otherwise clone and return the cached init.
Ensure you use the same ext from heaptype_ext.as_ref() and the existing
read()/write() guards to avoid changing locking primitives.

---

Nitpick comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 643-651: Both can_specialize_call and
is_simple_for_call_specialization duplicate the same "simple function" shape
checks; extract the shared predicate into a single helper (e.g.,
has_simple_call_shape or similar) and call it from both functions. Update
is_simple_for_call_specialization to delegate to the helper (using the existing
self.code and its flags/kwonlyarg_count logic) and change can_specialize_call to
use the same helper so future flag/arg-count changes are made in one place;
reference the existing symbols can_specialize_call and
is_simple_for_call_specialization when locating where to replace the duplicated
logic.

In `@crates/vm/src/builtins/object.rs`:
- Around line 108-121: Add a defensive debug assertion in the generic_alloc
function to ensure the ignored _nitems argument is zero so callers don't
accidentally request variable-size allocation; inside generic_alloc (which takes
cls: PyTypeRef, _nitems: usize, vm: &VirtualMachine) add a
debug_assert_eq!(_nitems, 0, "...") (or debug_assert!(_nitems == 0, "...")) with
a brief message indicating variable-size allocation is not supported for
PyBaseObject.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: ace14bfe-8231-4e06-8403-f4017c4f9cb7

📥 Commits

Reviewing files that changed from the base of the PR and between 72e91ac and 2b87143.

📒 Files selected for processing (6)
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/types/slot.rs
  • crates/vm/src/vm/mod.rs

Comment on lines +4490 to +4527
if !self.specialization_eval_frame_active(vm)
&& !self_or_null_is_some
&& cached_version != 0
&& let Some(cls) = callable.downcast_ref::<PyType>()
&& cls.tp_version_tag.load(Acquire) == cached_version
&& let Some(init_func) = cls.get_cached_init_for_specialization(cached_version)
&& let (Some(cls_alloc), Some(object_alloc_fn)) =
(cls.slots.alloc.load(), object_alloc)
&& cls_alloc as usize == object_alloc_fn as usize
{
// Look up __init__ (guarded by type_version)
if let Some(init) = cls.get_attr(identifier!(vm, __init__))
&& let Some(init_func) = init.downcast_ref_if_exact::<PyFunction>(vm)
&& init_func.can_specialize_call(nargs + 1)
{
// Allocate object directly (tp_new == object.__new__)
let dict = if cls
.slots
.flags
.has_feature(crate::types::PyTypeFlags::HAS_DICT)
{
Some(vm.ctx.new_dict())
} else {
None
};
let cls_ref = cls.to_owned();
let new_obj: PyObjectRef =
PyRef::new_ref(PyBaseObject, cls_ref, dict).into();

// Build args: [new_obj, arg1, ..., argN]
let pos_args: Vec<PyObjectRef> =
self.pop_multiple(nargs as usize).collect();
let _null = self.pop_value_opt(); // self_or_null (None)
let _callable = self.pop_value(); // callable (type)
if !self.specialization_has_datastack_space_for_func(vm, &init_func) {
return self.execute_call_vectorcall(nargs, vm);
}
// Allocate object directly (tp_new == object.__new__, tp_alloc == generic).
let cls_ref = cls.to_owned();
let new_obj = cls_alloc(cls_ref, 0, vm)?;

let mut all_args = Vec::with_capacity(pos_args.len() + 1);
all_args.push(new_obj.clone());
all_args.extend(pos_args);
// Build args: [new_obj, arg1, ..., argN]
let pos_args: Vec<PyObjectRef> = self.pop_multiple(nargs as usize).collect();
let _null = self.pop_value_opt(); // self_or_null (None)
let _callable = self.pop_value(); // callable (type)

let init_result = init_func.invoke_exact_args(all_args, vm)?;
let mut all_args = Vec::with_capacity(pos_args.len() + 1);
all_args.push(new_obj.clone());
all_args.extend(pos_args);

// EXIT_INIT_CHECK: __init__ must return None
if !vm.is_none(&init_result) {
return Err(
vm.new_type_error("__init__() should return None".to_owned())
);
}
let init_result = init_func.invoke_exact_args(all_args, vm)?;

self.push_value(new_obj);
return Ok(None);
// EXIT_INIT_CHECK: __init__ must return None
if !vm.is_none(&init_result) {
return Err(vm.new_type_error(format!(
"__init__() should return None, not '{}'",
init_result.class().name()
)));
}

self.push_value(new_obj);
return Ok(None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CallAllocAndEnterInit should apply recursion guard before fast __init__ invoke.

This path directly calls init_func.invoke_exact_args(...) but does not perform the recursion guard fallback used in other non-vectorcall specialized CALL paths. Add the same guard before invoking cached __init__.

Suggested patch
                 let object_alloc = vm.ctx.types.object_type.slots.alloc.load();
                 if !self.specialization_eval_frame_active(vm)
                     && !self_or_null_is_some
+                    && !self.specialization_call_recursion_guard(vm)
                     && cached_version != 0
                     && let Some(cls) = callable.downcast_ref::<PyType>()
                     && cls.tp_version_tag.load(Acquire) == cached_version
                     && let Some(init_func) = cls.get_cached_init_for_specialization(cached_version)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4490 - 4527, In CallAllocAndEnterInit's
fast-path (the block that checks specialization_eval_frame_active,
cached_version, cls.get_cached_init_for_specialization and then calls
init_func.invoke_exact_args), add the same recursion-guard check used by other
non-vectorcall specialization CALL paths before invoking init_func: attempt to
acquire the specialization/recursion guard (the same mechanism those paths use)
and if acquiring the guard fails, fall back to calling
self.execute_call_vectorcall(nargs, vm) instead of invoking init_func directly;
ensure the guard is released appropriately around the
init_func.invoke_exact_args call so behavior matches other specialized CALLs
(reference init_func.invoke_exact_args,
self.specialization_has_datastack_space_for_func, and
self.execute_call_vectorcall to locate the code).

Comment on lines +8270 to +8272
fn specialization_call_recursion_guard(&self, vm: &VirtualMachine) -> bool {
vm.current_recursion_depth().saturating_add(1) >= vm.recursion_limit.get()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

specialization_call_recursion_guard is depth-only, not a C-stack guard.

At Line 8271 this helper only checks logical recursion depth. That does not provide the same protection as the VM’s full recursion/C-stack check, so non-vectorcall fastpaths can still run when C-stack headroom is low.

Please switch this helper to the same guard primitive used by normal call recursion checks (or conservatively force vectorcall via that primitive).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 8270 - 8272,
specialization_call_recursion_guard currently only compares
current_recursion_depth() with recursion_limit and thus misses the VM's
C-stack-aware recursion/C-stack headroom check; replace that depth-only test in
fn specialization_call_recursion_guard(&self, vm: &VirtualMachine) with the same
recursion guard primitive used by normal call paths (i.e., call the VM method
that performs the full C-stack-aware recursion check/guard used for regular
calls, rather than using current_recursion_depth().saturating_add(1) >=
recursion_limit), or if that primitive exposes a conservative API to force
vectorcall, invoke that to ensure the vectorcall fastpath is disabled when
C-stack headroom is low; keep references to VirtualMachine methods
(current_recursion_depth, recursion_limit) only for context and remove the
depth-only comparison.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 855-874: cache_getitem_for_specialization and
get_cached_getitem_for_specialization currently check tp_version_tag before
taking the lock, leaving a race with modified(); mirror the init-cache locking
pattern by re-checking tp_version_tag after acquiring the specialization_getitem
write/read lock (i.e., take ext.specialization_getitem.write()/read() first,
then verify self.tp_version_tag.load(Ordering::Acquire) == tp_version and
func_version != 0, and only then store the getitem / return the cached value and
set specialization_getitem_version with Ordering::Release); reference the
functions cache_getitem_for_specialization,
get_cached_getitem_for_specialization, tp_version_tag, heaptype_ext,
specialization_getitem, and specialization_getitem_version when applying the
change.
- Around line 236-238: The code only traverses/clears
HeapTypeExt.specialization_init but misses the strong PyRef edge
HeapTypeExt.specialization_getitem, so update the GC traversal/clear sites to
include specialization_getitem alongside specialization_init; specifically,
where you call ext.specialization_init.read().traverse(tracer_fn) also call
ext.specialization_getitem.read().traverse(tracer_fn), and where you clear
specialization_init (e.g., ext.specialization_init.write().clear(...) or
similar) also clear ext.specialization_getitem; apply the same change at the
other two related spots referenced (around the other traverse/clear blocks at
the indicated ranges).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 426e4d46-2da2-4c9b-929d-136900f2d6ac

📥 Commits

Reviewing files that changed from the base of the PR and between 2b87143 and f352115.

📒 Files selected for processing (1)
  • crates/vm/src/builtins/type.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/builtins/function.rs (1)

536-565: ⚠️ Potential issue | 🟠 Major

Don’t drop explicit locals on the JIT fast path.

invoke_with_locals() now accepts a caller-supplied locals mapping, but the early #[cfg(feature = "jit")] return still ignores it. A jitted function invoked with explicit locals will run in the wrong namespace.

Proposed fix
     #[cfg(feature = "jit")]
-    if let Some(jitted_code) = self.jitted_code.lock().as_ref() {
-        use crate::convert::ToPyObject;
-        match jit::get_jit_args(self, &func_args, jitted_code, vm) {
-            Ok(args) => {
-                return Ok(args.invoke().to_pyobject(vm));
-            }
-            Err(err) => info!(
-                "jit: function `{}` is falling back to being interpreted because of the \
-                error: {}",
-                self.code.obj_name, err
-            ),
+    if locals.is_none() {
+        if let Some(jitted_code) = self.jitted_code.lock().as_ref() {
+            use crate::convert::ToPyObject;
+            match jit::get_jit_args(self, &func_args, jitted_code, vm) {
+                Ok(args) => {
+                    return Ok(args.invoke().to_pyobject(vm));
+                }
+                Err(err) => info!(
+                    "jit: function `{}` is falling back to being interpreted because of the \
+                    error: {}",
+                    self.code.obj_name, err
+                ),
+            }
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 536 - 565, The JIT fast-path
in invoke_with_locals currently ignores the caller-supplied locals; compute the
effective locals the same way as the interpreted path before the #[cfg(feature =
"jit")] block (i.e. if code.flags contains CodeFlags::NEWLOCALS then
effective_locals = None else use the provided locals or
Some(ArgMapping::from_dict_exact(self.globals.clone()))), and pass that resolved
locals into the JIT call (jit::get_jit_args / jitted invocation) instead of
returning early without them so jitted_code, jit::get_jit_args, func_args and
the locals parameter are honored.
♻️ Duplicate comments (2)
crates/vm/src/builtins/type.rs (2)

236-238: ⚠️ Potential issue | 🟠 Major

specialization_getitem also needs manual traverse/clear.

HeapTypeExt now owns another strong PyRef<PyFunction>, but only specialization_init participates in PyType’s manual GC hooks. That can keep class/function cycles alive after type_clear().

Proposed fix
     fn traverse(&self, tracer_fn: &mut crate::object::TraverseFn<'_>) {
         self.base.traverse(tracer_fn);
         self.bases.traverse(tracer_fn);
         self.mro.traverse(tracer_fn);
         self.subclasses.traverse(tracer_fn);
         self.attributes
             .read_recursive()
             .iter()
             .map(|(_, v)| v.traverse(tracer_fn))
             .count();
         if let Some(ext) = self.heaptype_ext.as_ref() {
             ext.specialization_init.read().traverse(tracer_fn);
+            ext.specialization_getitem.read().traverse(tracer_fn);
         }
     }
@@
         if let Some(ext) = self.heaptype_ext.as_ref()
             && let Some(mut guard) = ext.specialization_init.try_write()
             && let Some(init) = guard.take()
         {
             out.push(init.into());
         }
+        if let Some(ext) = self.heaptype_ext.as_ref()
+            && let Some(mut guard) = ext.specialization_getitem.try_write()
+            && let Some(getitem) = guard.take()
+        {
+            out.push(getitem.into());
+            ext.specialization_getitem_version.store(0, Ordering::Release);
+        }
     }

Also applies to: 266-283

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 236 - 238, HeapTypeExt also
holds a strong PyRef for specialization_getitem which isn’t included in the
manual GC hooks; update the manual traversal and clearing code in the PyType
hooks to also visit and drop this field. In the traversal path (where
specialization_init is traversed) add a call to traverse the
specialization_getitem PyRef (e.g.,
ext.specialization_getitem.read().traverse(tracer_fn)); in the clear path (where
specialization_init is cleared) also clear/take the specialization_getitem PyRef
(e.g., ext.specialization_getitem.write().take() or set to None) so both
specialization_init and specialization_getitem are handled; update both the
block around the existing traverse (the one calling
ext.specialization_init.read().traverse(...)) and the corresponding clear block
mirrored in the other function region (the section covering lines ~266-283).

855-890: ⚠️ Potential issue | 🟠 Major

Guard the __getitem__ cache with the type version under the lock.

cache_getitem_for_specialization() still validates tp_version_tag before taking the write lock, and get_cached_getitem_for_specialization() no longer receives the specialization-time type version at all. A concurrent modified() can republish or read a stale cached __getitem__ after the type is re-versioned.

Proposed fix
     pub(crate) fn cache_getitem_for_specialization(
         &self,
         getitem: PyRef<PyFunction>,
         tp_version: u32,
     ) -> bool {
         let Some(ext) = self.heaptype_ext.as_ref() else {
             return false;
         };
-        if tp_version == 0 || self.tp_version_tag.load(Ordering::Acquire) != tp_version {
+        if tp_version == 0 {
             return false;
         }
         let func_version = getitem.get_version_for_current_state();
         if func_version == 0 {
             return false;
         }
-        *ext.specialization_getitem.write() = Some(getitem);
-        ext.specialization_getitem_version
-            .store(func_version, Ordering::Release);
+        let mut guard = ext.specialization_getitem.write();
+        if self.tp_version_tag.load(Ordering::Acquire) != tp_version {
+            return false;
+        }
+        *guard = Some(getitem);
+        ext.specialization_getitem_version.store(func_version, Ordering::Release);
         true
     }
@@
-    pub(crate) fn get_cached_getitem_for_specialization(&self) -> Option<(PyRef<PyFunction>, u32)> {
+    pub(crate) fn get_cached_getitem_for_specialization(
+        &self,
+        tp_version: u32,
+    ) -> Option<(PyRef<PyFunction>, u32)> {
         let ext = self.heaptype_ext.as_ref()?;
-        if self.tp_version_tag.load(Ordering::Acquire) == 0 {
+        if tp_version == 0 {
             return None;
         }
+        let guard = ext.specialization_getitem.read();
+        if self.tp_version_tag.load(Ordering::Acquire) != tp_version {
+            return None;
+        }
         let cached_version = ext.specialization_getitem_version.load(Ordering::Acquire);
         if cached_version == 0 {
             return None;
         }
-        ext.specialization_getitem
-            .read()
-            .as_ref()
-            .map(|getitem| (getitem.to_owned(), cached_version))
+        guard
+            .as_ref()
+            .map(|getitem| (getitem.to_owned(), cached_version))
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 855 - 890,
cache_getitem_for_specialization and get_cached_getitem_for_specialization must
validate the type version while holding the specialization_getitem lock to avoid
races with modified(); change cache_getitem_for_specialization to acquire
ext.specialization_getitem.write() first, then re-check
self.tp_version_tag.load(...) == tp_version before storing the getitem and
version; change get_cached_getitem_for_specialization to acquire
ext.specialization_getitem.read(), then check self.tp_version_tag (or accept a
tp_version param and compare) while holding that read lock and return None if
the tag/version changed or is zero, only returning the cached (getitem,
cached_version) when both the lock-held check and cached_version are valid (use
specialization_getitem and specialization_getitem_version fields as now).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/object.rs`:
- Around line 100-120: The current implementation of object.__new__ always uses
generic_alloc and ignores the type's allocator slot; update slot_new (the path
that currently calls generic_alloc) to dispatch through cls.slots.alloc when
present (i.e., call the type's alloc slot with the same parameters that tp_alloc
would get) and fall back to generic_alloc only if no alloc slot is installed,
ensuring cls.slots.alloc is used consistently for types that override allocation
rather than bypassing tp_alloc.

---

Outside diff comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 536-565: The JIT fast-path in invoke_with_locals currently ignores
the caller-supplied locals; compute the effective locals the same way as the
interpreted path before the #[cfg(feature = "jit")] block (i.e. if code.flags
contains CodeFlags::NEWLOCALS then effective_locals = None else use the provided
locals or Some(ArgMapping::from_dict_exact(self.globals.clone()))), and pass
that resolved locals into the JIT call (jit::get_jit_args / jitted invocation)
instead of returning early without them so jitted_code, jit::get_jit_args,
func_args and the locals parameter are honored.

---

Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 236-238: HeapTypeExt also holds a strong PyRef for
specialization_getitem which isn’t included in the manual GC hooks; update the
manual traversal and clearing code in the PyType hooks to also visit and drop
this field. In the traversal path (where specialization_init is traversed) add a
call to traverse the specialization_getitem PyRef (e.g.,
ext.specialization_getitem.read().traverse(tracer_fn)); in the clear path (where
specialization_init is cleared) also clear/take the specialization_getitem PyRef
(e.g., ext.specialization_getitem.write().take() or set to None) so both
specialization_init and specialization_getitem are handled; update both the
block around the existing traverse (the one calling
ext.specialization_init.read().traverse(...)) and the corresponding clear block
mirrored in the other function region (the section covering lines ~266-283).
- Around line 855-890: cache_getitem_for_specialization and
get_cached_getitem_for_specialization must validate the type version while
holding the specialization_getitem lock to avoid races with modified(); change
cache_getitem_for_specialization to acquire ext.specialization_getitem.write()
first, then re-check self.tp_version_tag.load(...) == tp_version before storing
the getitem and version; change get_cached_getitem_for_specialization to acquire
ext.specialization_getitem.read(), then check self.tp_version_tag (or accept a
tp_version param and compare) while holding that read lock and return None if
the tag/version changed or is zero, only returning the cached (getitem,
cached_version) when both the lock-held check and cached_version are valid (use
specialization_getitem and specialization_getitem_version fields as now).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d3962f85-b798-4349-bb16-790f6e80b78d

📥 Commits

Reviewing files that changed from the base of the PR and between f352115 and a91dd29.

📒 Files selected for processing (8)
  • crates/derive-impl/src/pyclass.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/types/slot.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/thread.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/derive-impl/src/pyclass.rs
  • crates/vm/src/types/slot.rs
  • crates/vm/src/vm/thread.rs

@youknowone youknowone merged commit 46abff6 into RustPython:main Mar 8, 2026
11 of 13 checks passed
@youknowone youknowone deleted the specialization branch March 8, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

X Tutup