X Tutup
Skip to content

Refine specialization caches and extend binary-op coverage#7386

Merged
youknowone merged 14 commits intoRustPython:mainfrom
youknowone:specialization
Mar 10, 2026
Merged

Refine specialization caches and extend binary-op coverage#7386
youknowone merged 14 commits intoRustPython:mainfrom
youknowone:specialization

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 8, 2026

Summary

  • Align BINARY_OP_EXTEND with descriptor-cache model: table-driven specialization for int|int bitwise, float|int and int|float mixed arithmetic
  • Add _spec_cache (init/getitem) to HeapTypeExt with proper GC traverse/clear and lock-guarded version validation
  • Tighten CALL_ALLOC_AND_ENTER_INIT stack-space and recursion guards
  • Add latin-1 singleton string interning path for single-char subscript results

Test plan

  • cargo clippy clean
  • extra_tests/snippets/vm_specialization.py passes
  • CI: macOS, Ubuntu, Windows (existing flaky failures only)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Clarified error message for invalid init return values.
  • Chores

    • Performance improvements for mixed-type arithmetic and binary operations via a new extended fast-path.
    • Faster handling and caching for single-character strings.
    • Centralized and more robust type-specialization caching; improved frame dispatch and init/cleanup handling.
    • Vectorcall and exact-args call path tightened for safer fast-path use.
    • Tighter tracing: trace callbacks now run only for actual trace events.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Centralizes per-type specialization into a TypeSpecializationCache, adds BinaryOpExtend external specialization and caching, introduces init‑cleanup code and Latin‑1 single‑char caching in Context (used by str ToPyObject), adds ordering-aware atomic accessors, and refactors frame/with_frame and exact-args frame preparation.

Changes

Cohort / File(s) Summary
Type Specialization Cache Refactor
crates/vm/src/builtins/type.rs
Replaces per-field specialization_init/specialization_getitem with a unified TypeSpecializationCache; adds lifecycle, write_lock, versioning and traversal/clear integration; updates PyType accessors to use the cache.
BinaryOpExtend specialization & frame helpers
crates/vm/src/frame.rs
Adds BinaryOpExtend specialization descriptors, guards/actions, external cache read/write helpers, dispatch path binary_op_extended_specialization, recursion‑guard/init‑cleanup shim logic, and extracted helpers for compact int/float and datastack frame sizing.
Frame dispatch & exact‑args frame prep
crates/vm/src/vm/mod.rs, crates/vm/src/builtins/function.rs
Introduces with_frame_impl and with_frame_untraced to centralize frame linkage; adds prepare_exact_args_frame/invoke_exact_args and funnels vectorcall exact‑args fast path through the new frame prep and datastack sizing helper.
Context init‑cleanup & Latin‑1 single‑char cache
crates/vm/src/vm/context.rs, crates/vm/src/builtins/str.rs
Adds init_cleanup_code in Context genesis; adds latin1_singleton_index and latin1_char cached single‑byte PyStrs; Context::new_str now returns cached Latin‑1 single‑char PyStrs and str ToPyObject paths use direct latin1 construction for single bytes.
Atomic ref ordering APIs
crates/vm/src/object/ext.rs
Adds ordering-aware accessors deref_ordering() / to_owned_ordering() for PyAtomicRef<Option<T>> and PyAtomicRef<Option<PyObject>>; existing deref()/to_owned() delegate to relaxed-ordering variants.
Minor text/defaults & spelllist
crates/vm/src/types/slot.rs, crates/vm/src/stdlib/builtins.rs, .cspell.json
Tweaks __init__() error wording, standardizes print defaults to use vm.ctx.new_str(), and adds descrs to spelling list.
Formatting/core tweaks
crates/vm/src/object/core.rs
Non‑semantic reflow/formatting adjustments only.

Sequence Diagram(s)

sequenceDiagram
    participant PyType
    participant Cache as TypeSpecializationCache
    participant Mutex as write_lock (Mutex)
    participant GC

    PyType->>Cache: invalidate_for_type_modified()
    Cache->>Mutex: lock()
    Cache->>Cache: swap_init(None) / swap_getitem(None)
    Cache->>Cache: retire_old_function(old_fn)
    Mutex->>Cache: unlock()
    GC->>Cache: traverse()
    Cache-->>GC: visit init, getitem, retired funcs
Loading
sequenceDiagram
    participant FrameExec as FrameExecution
    participant ExtCache as ExternalCache
    participant Descr as BinaryOpExtendDescr
    participant VM

    FrameExec->>ExtCache: read_cached_binary_op_extend_descr()
    alt cache hit
        ExtCache-->>FrameExec: descr
    else cache miss
        FrameExec->>FrameExec: inspect operand types (compact int/float)
        FrameExec->>Descr: select descriptor
        FrameExec->>ExtCache: write_cached_binary_op_extend_descr(descr)
        Descr-->>FrameExec: descriptor
    end
    FrameExec->>Descr: execute specialized action
    Descr-->>FrameExec: result
    FrameExec->>VM: continue execution
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • coolreader18

Poem

🐇
I nibble cached bytes in a quiet, clever tune,
one Latin‑1 hop beneath a silver moon.
Descriptors snug in their cozy little nest,
frames leap faster — I rest, I jest.
Hooray for hops that make execution swoon!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and accurately summarizes the main objectives: refining specialization caches (via the new TypeSpecializationCache and ordering-aware accessors) and extending binary-op coverage (via BinaryOpExtend specialization descriptors and table-driven int/float operations).
Docstring Coverage ✅ Passed Docstring coverage is 91.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
crates/vm/src/vm/context.rs (1)

476-482: The singleton fast path still materializes a temporary PyStr for borrowed inputs.

For call sites like new_str(" "), the s.into() happens before the Latin-1 check, so the cache hit only skips the final object allocation. If you want this path to be allocation-free for single-character borrowed inputs, it probably needs a borrowed-input fast path or a dedicated new_char/latin1_char entry point at the API boundary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/context.rs` around lines 476 - 482, new_str currently calls
s.into() before checking latin1_singleton_index, which materializes a temporary
PyStr for borrowed inputs and defeats the allocation-free singleton fast path;
change the implementation to avoid calling s.into() until after the Latin‑1
single-char check by either adding an overload/new API that accepts a borrowed
&str (or &impl AsRef<str>) and runs Self::latin1_singleton_index(&s) first, or
add a dedicated API like new_char/latin1_char used at the API boundary for
single-character inputs and have new_str route to it only after converting
non-borrowed inputs; ensure latin1_singleton_index is invoked on the borrowed
string and that you only call s.into() when the singleton fast path fails so no
temporary PyStr is allocated for borrowed single-char inputs.
crates/vm/src/frame.rs (1)

4089-4098: Extend this specialized char fast path beyond ASCII if the new cache is Latin-1-wide.

BinaryOpSubscrStrInt only hits ascii_char_cache, so one-character subscripts in '\x80'..='\xff' still bounce to the generic path even after quickening. If the new singleton cache is meant to cover all Latin-1 characters, wiring it in here would avoid that steady-state miss.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4089 - 4098, The specialized fast path
in BinaryOpSubscrStrInt currently only uses vm.ctx.ascii_char_cache for one-char
string subscripts, causing Latin-1 (0x80..0xFF) characters to miss the cache;
update the path after obtaining ch (and ascii_idx) to branch: if ch.is_ascii()
continue using vm.ctx.ascii_char_cache[ascii_idx], else index into the new
Latin-1 singleton cache (e.g., vm.ctx.latin1_char_cache[ascii_idx] or
vm.ctx.single_char_cache[ascii_idx], whichever your context defines) and push
that clone.into(); keep the same guards
(specialization_nonnegative_compact_index, getitem_by_index) and ensure
ascii_idx is computed from ch.to_u32() and bounds-checked for 0..256 before
indexing the Latin-1 cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 275-279: TypeSpecializationCache currently versions only getitem
via getitem_version while init is unversioned, allowing modified() to reset
tp_version_tag and later returns a stale pre-mutation init; fix by adding an
init_version (e.g., AtomicU32) to TypeSpecializationCache and update/clear it
wherever getitem_version is updated/cleared (including modified()), and ensure
any code that reads init (the places that currently check tp_version_tag and
return init) also checks init_version matches the current tp_version_tag before
returning — alternately, if you prefer clearing, explicitly clear init in
modified() wherever getitem is invalidated so init cannot be returned for a new
type version.
- Line 281: The field retired: PyRwLock<Vec<PyObjectRef>> currently holds strong
references to replaced cached functions with no reclamation when vm is None,
causing unbounded growth for live heap types; update invalidation to avoid
storing permanent strong refs by using temporary weak/VM-tied refs or by
enqueueing replacements into a VM-aware reclamation pass instead of pushing into
retired directly, and implement a grace-period reclamation sweep that clears
entries for live types (invoke clear_into() or similar) when a VirtualMachine is
available; changes should touch the retired field usage sites and the logic that
pushes into retired (the places around functions/methods handling
caching/invalidation) to convert to weak refs or VM-driven deferred drop and add
a periodic drain that reclaims entries for active heap types.

In `@crates/vm/src/frame.rs`:
- Around line 7755-7759: The cached __getitem__ specialisation is being read
without validating the type version tag, allowing stale cache use after
rebinding; update the read path (the BinaryOpSubscrGetitem handler /
get_cached_getitem_for_specialization) to mirror
get_cached_init_for_specialization: load cls.tp_version_tag with Acquire and
compare it to the cached_version before using the cached entry, and only use the
cache when cls.tp_version_tag.load(Acquire) == cached_version; otherwise treat
as cache miss. Ensure you use the same Acquire ordering and the same comparison
pattern used by get_cached_init_for_specialization and keep cache_write via
cache_getitem_for_specialization unchanged.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 4089-4098: The specialized fast path in BinaryOpSubscrStrInt
currently only uses vm.ctx.ascii_char_cache for one-char string subscripts,
causing Latin-1 (0x80..0xFF) characters to miss the cache; update the path after
obtaining ch (and ascii_idx) to branch: if ch.is_ascii() continue using
vm.ctx.ascii_char_cache[ascii_idx], else index into the new Latin-1 singleton
cache (e.g., vm.ctx.latin1_char_cache[ascii_idx] or
vm.ctx.single_char_cache[ascii_idx], whichever your context defines) and push
that clone.into(); keep the same guards
(specialization_nonnegative_compact_index, getitem_by_index) and ensure
ascii_idx is computed from ch.to_u32() and bounds-checked for 0..256 before
indexing the Latin-1 cache.

In `@crates/vm/src/vm/context.rs`:
- Around line 476-482: new_str currently calls s.into() before checking
latin1_singleton_index, which materializes a temporary PyStr for borrowed inputs
and defeats the allocation-free singleton fast path; change the implementation
to avoid calling s.into() until after the Latin‑1 single-char check by either
adding an overload/new API that accepts a borrowed &str (or &impl AsRef<str>)
and runs Self::latin1_singleton_index(&s) first, or add a dedicated API like
new_char/latin1_char used at the API boundary for single-character inputs and
have new_str route to it only after converting non-borrowed inputs; ensure
latin1_singleton_index is invoked on the borrowed string and that you only call
s.into() when the singleton fast path fails so no temporary PyStr is allocated
for borrowed single-char inputs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0b85ddd0-26a0-4787-9ee7-d1be5d97f5bd

📥 Commits

Reviewing files that changed from the base of the PR and between ce8952b and 622f3c6.

📒 Files selected for processing (7)
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/object/ext.rs
  • crates/vm/src/stdlib/builtins.rs
  • crates/vm/src/types/slot.rs
  • crates/vm/src/vm/context.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/vm/src/frame.rs (1)

8715-8725: ⚠️ Potential issue | 🟠 Major

extra_bytes is dropped for heap-backed __init__ frames.

When datastack_frame_size_bytes() returns None, Lines 8721-8725 return true unconditionally. That makes the new CallAllocAndEnterInit fast path skip the DataStack capacity check for the init-cleanup shim, even though specialization_new_init_cleanup_frame(..., true, vm) still allocates that shim on the DataStack. A heap-backed __init__ can therefore pass this guard and still overflow/panic in the shim allocation path.

Suggested fix
     match func.datastack_frame_size_bytes() {
         Some(frame_size) => frame_size
             .checked_add(extra_bytes)
             .is_some_and(|size| vm.datastack_has_space(size)),
-        None => true,
+        None => extra_bytes == 0 || vm.datastack_has_space(extra_bytes),
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 8715 - 8725,
specialization_has_datastack_space_for_func_with_extra currently returns true
when func.datastack_frame_size_bytes() is None, which lets CallAllocAndEnterInit
skip datastack capacity checks and can overflow when
specialization_new_init_cleanup_frame(..., true, vm) still allocates an
init-cleanup shim on the datastack; change the None branch to conservatively
check vm.datastack_has_space(extra_bytes) (or vm.datastack_has_space(shim_size)
if you compute shim size separately) so heap-backed (__init__) frames still
verify the extra_bytes required for the init-cleanup shim before taking the fast
path.
crates/vm/src/builtins/function.rs (1)

1375-1388: 🛠️ Refactor suggestion | 🟠 Major

Require CO_OPTIMIZED before taking this vectorcall fast path.

prepare_exact_args_frame() is explicitly an optimized-only path, but is_simple does not check that flag. A non-optimized function with exact positional args can still reach Line 1388 in release builds and bypass the regular argument setup.

Suggested fix
-    let is_simple = !has_kwargs
+    let is_simple = !has_kwargs
+        && code.flags.contains(bytecode::CodeFlags::OPTIMIZED)
         && !code.flags.contains(bytecode::CodeFlags::VARARGS)
         && !code.flags.contains(bytecode::CodeFlags::VARKEYWORDS)
         && code.kwonlyarg_count == 0
         && !code
             .flags
             .intersects(bytecode::CodeFlags::GENERATOR | bytecode::CodeFlags::COROUTINE);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 1375 - 1388, The fast-path
allows calling prepare_exact_args_frame for functions that aren't marked
optimized; add a check for the optimized code flag to the is_simple condition so
the vectorcall fast path only triggers for optimized functions: include
bytecode::CodeFlags::CO_OPTIMIZED (or the equivalent OPTIMIZED variant) in the
conjuncts checked on code.flags (i.e., extend the is_simple boolean) so
prepare_exact_args_frame and the fast-path branch are only taken when that
optimized flag is present.
♻️ Duplicate comments (3)
crates/vm/src/frame.rs (1)

7794-7798: ⚠️ Potential issue | 🟠 Major

Cached __getitem__ still needs a type-version guard.

Specializing the opcode here is still unsafe because the execution path at Lines 4043-4049 reads get_cached_getitem_for_specialization() without first confirming owner.class().tp_version_tag.load(Acquire) == cached_version. Rebinding __getitem__ on a heap type can keep this fast path calling stale code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7794 - 7798, The cached __getitem__
specialization is used without verifying the class type-version, so add a
type-version guard: when installing/using the specialization via
cls.cache_getitem_for_specialization and when reading via
get_cached_getitem_for_specialization, compare
owner.class().tp_version_tag.load(Acquire) against the cached_version before
taking the fast path; if they differ, fall back to the slow path and refresh the
cache (or invalidate the cached handler) so rebinding __getitem__ on a heap type
cannot call stale code.
crates/vm/src/builtins/type.rs (2)

275-281: ⚠️ Potential issue | 🟠 Major

Version the cached __init__ entry, not just the owning type.

tp_version_tag only changes when the type mutates. PyFunction::func_version can change independently via __code__, __defaults__, or __kwdefaults__, but get_cached_init_for_specialization() still reuses the cached function on a type-version match alone. That leaves the CALL_ALLOC_AND_ENTER_INIT fast path free to keep driving prepare_exact_args_frame() with an __init__ that no longer satisfies the cached exact-args contract.

Suggested direction
 pub struct TypeSpecializationCache {
     pub init: PyAtomicRef<Option<PyFunction>>,
+    pub init_version: AtomicU32,
     pub getitem: PyAtomicRef<Option<PyFunction>>,
     pub getitem_version: AtomicU32,
@@
         Self {
             init: PyAtomicRef::from(None::<PyRef<PyFunction>>),
+            init_version: AtomicU32::new(0),
             getitem: PyAtomicRef::from(None::<PyRef<PyFunction>>),
             getitem_version: AtomicU32::new(0),
             write_lock: PyMutex::new(()),
             retired: PyRwLock::new(Vec::new()),
         }
@@
     pub(crate) fn cache_init_for_specialization(
         &self,
         init: PyRef<PyFunction>,
         tp_version: u32,
         vm: &VirtualMachine,
     ) -> bool {
+        let func_version = init.get_version_for_current_state();
+        if func_version == 0 {
+            return false;
+        }
         ...
         ext.specialization_cache.swap_init(Some(init), Some(vm));
+        ext.specialization_cache
+            .init_version
+            .store(func_version, Ordering::Relaxed);
         true
     }

You'll also need the read side and CALL_ALLOC_AND_ENTER_INIT caller to validate init_version before reusing the cached function.

Also applies to: 893-931

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 275 - 281, The cached __init__
entry must be versioned separately from the owning type: add an init_version
(AtomicU32) to TypeSpecializationCache and when caching a PyFunction in
get_cached_init_for_specialization() store the function's current
PyFunction::func_version into init_version; on read side (and in
CALL_ALLOC_AND_ENTER_INIT / prepare_exact_args_frame) validate that both the
cached type version (getitem_version or type's tp_version_tag) and init_version
match the live PyFunction::func_version before reusing the cached PyFunction;
update cache write paths to set init and init_version together and ensure
callers reject the cached init if the versions differ so that
prepare_exact_args_frame() never runs with a stale __init__.

303-315: ⚠️ Potential issue | 🟠 Major

retired still grows for every live-type invalidation.

modified() always reaches invalidate_for_type_modified() without a VirtualMachine, so both swap_init(None, None) and swap_getitem(None, None) fall back to pushing the replaced functions into retired. clear_into() is the only drain path here, so a long-lived heap type that alternates specialization and attribute writes will retain those strong refs indefinitely.

Also applies to: 318-336, 352-364

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 303 - 315, The code currently
always moves replaced function objects into the global `retired` list when `vm`
is None (in `swap_init` and `swap_getitem`), causing `retired` to grow for
long‑lived heap types; change both `swap_init` and `swap_getitem` so that when
`vm` is None they do NOT push the old value into `retired` but instead release
the strong reference immediately (or convert to a non‑retaining/weak form) to
avoid accumulation — implement this by extracting the old value via the existing
`self.init.swap(new_init)` path but then dropping or turning it into a
non‑retaining placeholder via a new helper (e.g. `drop_old_without_retire`)
rather than calling `retire_old_function`; keep the current `vm`-present path
that uses `self.init.swap_to_temporary_refs(..., vm)` and `retire_old_function`
unchanged so CPython-style temporary refs remain for executing frames.
🧹 Nitpick comments (2)
.cspell.json (1)

63-64: Minor: words not in alphabetical order.

"compactlong" and "compactlongs" start with "c" but are placed after "d" words. For consistency with the rest of the list, consider moving them to the correct alphabetical position (after "coro").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cspell.json around lines 63 - 64, The two dictionary entries "compactlong"
and "compactlongs" are out of alphabetical order; move those two entries so they
appear after the "coro" entry (i.e., in the block of "c" words) to maintain
consistent alphabetical ordering in the word list.
crates/vm/src/vm/context.rs (1)

424-428: Give the shim its own synthetic code identity.

Using __init__ for source_path, obj_name, and qualname will make tracebacks and code introspection look like user-defined __init__ bytecode even when the frame is just internal cleanup. A dedicated internal label such as <init_cleanup> would avoid that confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/context.rs` around lines 424 - 428, The frame/context
currently reuses names.__init__ for source_path, obj_name, and qualname, which
makes internal cleanup frames appear as user __init__ code; change those fields
(source_path, obj_name, qualname) to a dedicated synthetic identifier like
"<init_cleanup>" when constructing this shim frame (where names.__init__ is
currently used) so tracebacks and introspection show an internal label instead
of a user __init__; keep other fields (e.g. first_line_number, max_stackdepth)
as-is or set a fixed sane value if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 1375-1388: The fast-path allows calling prepare_exact_args_frame
for functions that aren't marked optimized; add a check for the optimized code
flag to the is_simple condition so the vectorcall fast path only triggers for
optimized functions: include bytecode::CodeFlags::CO_OPTIMIZED (or the
equivalent OPTIMIZED variant) in the conjuncts checked on code.flags (i.e.,
extend the is_simple boolean) so prepare_exact_args_frame and the fast-path
branch are only taken when that optimized flag is present.

In `@crates/vm/src/frame.rs`:
- Around line 8715-8725: specialization_has_datastack_space_for_func_with_extra
currently returns true when func.datastack_frame_size_bytes() is None, which
lets CallAllocAndEnterInit skip datastack capacity checks and can overflow when
specialization_new_init_cleanup_frame(..., true, vm) still allocates an
init-cleanup shim on the datastack; change the None branch to conservatively
check vm.datastack_has_space(extra_bytes) (or vm.datastack_has_space(shim_size)
if you compute shim size separately) so heap-backed (__init__) frames still
verify the extra_bytes required for the init-cleanup shim before taking the fast
path.

---

Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 275-281: The cached __init__ entry must be versioned separately
from the owning type: add an init_version (AtomicU32) to TypeSpecializationCache
and when caching a PyFunction in get_cached_init_for_specialization() store the
function's current PyFunction::func_version into init_version; on read side (and
in CALL_ALLOC_AND_ENTER_INIT / prepare_exact_args_frame) validate that both the
cached type version (getitem_version or type's tp_version_tag) and init_version
match the live PyFunction::func_version before reusing the cached PyFunction;
update cache write paths to set init and init_version together and ensure
callers reject the cached init if the versions differ so that
prepare_exact_args_frame() never runs with a stale __init__.
- Around line 303-315: The code currently always moves replaced function objects
into the global `retired` list when `vm` is None (in `swap_init` and
`swap_getitem`), causing `retired` to grow for long‑lived heap types; change
both `swap_init` and `swap_getitem` so that when `vm` is None they do NOT push
the old value into `retired` but instead release the strong reference
immediately (or convert to a non‑retaining/weak form) to avoid accumulation —
implement this by extracting the old value via the existing
`self.init.swap(new_init)` path but then dropping or turning it into a
non‑retaining placeholder via a new helper (e.g. `drop_old_without_retire`)
rather than calling `retire_old_function`; keep the current `vm`-present path
that uses `self.init.swap_to_temporary_refs(..., vm)` and `retire_old_function`
unchanged so CPython-style temporary refs remain for executing frames.

In `@crates/vm/src/frame.rs`:
- Around line 7794-7798: The cached __getitem__ specialization is used without
verifying the class type-version, so add a type-version guard: when
installing/using the specialization via cls.cache_getitem_for_specialization and
when reading via get_cached_getitem_for_specialization, compare
owner.class().tp_version_tag.load(Acquire) against the cached_version before
taking the fast path; if they differ, fall back to the slow path and refresh the
cache (or invalidate the cached handler) so rebinding __getitem__ on a heap type
cannot call stale code.

---

Nitpick comments:
In @.cspell.json:
- Around line 63-64: The two dictionary entries "compactlong" and "compactlongs"
are out of alphabetical order; move those two entries so they appear after the
"coro" entry (i.e., in the block of "c" words) to maintain consistent
alphabetical ordering in the word list.

In `@crates/vm/src/vm/context.rs`:
- Around line 424-428: The frame/context currently reuses names.__init__ for
source_path, obj_name, and qualname, which makes internal cleanup frames appear
as user __init__ code; change those fields (source_path, obj_name, qualname) to
a dedicated synthetic identifier like "<init_cleanup>" when constructing this
shim frame (where names.__init__ is currently used) so tracebacks and
introspection show an internal label instead of a user __init__; keep other
fields (e.g. first_line_number, max_stackdepth) as-is or set a fixed sane value
if needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: ab67bac4-b3d1-49c7-852c-cf76da714cba

📥 Commits

Reviewing files that changed from the base of the PR and between 622f3c6 and e8556b6.

📒 Files selected for processing (6)
  • .cspell.json
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/vm/context.rs
  • crates/vm/src/vm/mod.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/builtins/function.rs (1)

692-733: ⚠️ Potential issue | 🟠 Major

Inconsistent precondition enforcement on invoke_exact_args() call sites.

While most callers of invoke_exact_args() (e.g., lines 4214, 4261) use runtime guards like has_exact_argcount(), the specialization fast-paths at lines 3867 and 4055 rely on debug_assert!() to validate argument counts. In release builds, these assertions vanish, allowing silently incorrect frame construction.

Make the argcount checks at lines 3867 and 4055 runtime guards (like lines 4214 and 4261) instead of debug_assert, or else fallback to invoke() if the check fails.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 692 - 733, The fast-path
call sites that currently rely on debug_assert! for argcount must perform a
runtime check and fallback instead: replace the debug_assert-based checks at the
specialization fast-paths that call invoke_exact_args() with an explicit runtime
guard using has_exact_argcount() (or the equivalent exact-argcount check used at
other sites) and if the guard fails call the safe invoke() path; ensure both
fast-path locations that call invoke_exact_args() (the ones flagged in the
review) use this runtime check and fallback to invoke() rather than assuming the
precondition.
♻️ Duplicate comments (2)
crates/vm/src/frame.rs (1)

4044-4051: ⚠️ Potential issue | 🟠 Major

Guard cached __getitem__ dispatch with the current type version.

This path still trusts get_cached_getitem_for_specialization() without checking owner.class().tp_version_tag. After rebinding __getitem__ on a heap type, the cached function can still pass the func_version() check and dispatch stale code here. Please mirror the __init__ cache pattern and treat a type-version mismatch as a cache miss.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4044 - 4051, The cached __getitem__
dispatch in Instruction::BinaryOpSubscrGetitem currently trusts
get_cached_getitem_for_specialization() and func.func_version() but misses
checking the owning class's tp_version_tag; update the guard to also compare
owner.class().tp_version_tag against the cached type version (mirror the
__init__ cache pattern) and treat a mismatch as a cache miss so the fast-path is
skipped and normal lookup occurs; locate the check around self.nth_value(1),
specialization_eval_frame_active(vm), get_cached_getitem_for_specialization(),
func.func_version(), and specialization_has_datastack_space_for_func(vm, &func)
and add the tp_version_tag validation there.
crates/vm/src/builtins/type.rs (1)

281-281: ⚠️ Potential issue | 🟠 Major

retired still has no live reclamation path.

invalidate_for_type_modified() always goes through swap_*(_, None), so every live invalidation keeps the displaced specialization function strongly referenced in retired. Since retired is only drained in clear_into(), long-lived heap types can still grow this list indefinitely under specialize/mutate cycles.

Please route live invalidations through a VM-backed temporary-ref path when a VirtualMachine is available, or add an opportunistic drain for retired.

Also applies to: 303-336, 352-364

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` at line 281, The retired field (retired:
PyRwLock<Vec<PyObjectRef>>) accumulates strong references because
invalidate_for_type_modified() always calls swap_*(_, None) which pushes
displaced specialization functions into retired and they are only drained by
clear_into(), causing unbounded growth for long-lived heap types; fix by routing
live invalidations through a VM-backed temporary-ref path when a VirtualMachine
is available (use the VM to create transient/weak refs instead of pushing into
retired) or add an opportunistic drain that periodically clears retired (e.g.,
in invalidate_for_type_modified(), swap_* callers, or type mutation paths) so
retired does not retain references indefinitely—update functions
invalidate_for_type_modified(), any swap_* call sites that pass None, and
clear_into() to implement the chosen reclamation strategy.
🧹 Nitpick comments (3)
crates/vm/src/vm/mod.rs (1)

1566-1620: Please pin the untraced contract with a regression test.

with_frame_untraced() now deliberately bypasses dispatch_traced_frame(). A focused test that sys.settrace / sys.setprofile do not observe this path would help prevent accidental hook reintroduction in later frame refactors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 1566 - 1620, Add a regression test that
verifies with_frame_untraced()/with_frame_impl(..., traced: false, ...) bypasses
dispatch_traced_frame: register sys.settrace and sys.setprofile handlers that
will fail the test if invoked (or toggle a flag), create a FrameRef and call
with_frame_untraced (or call with_frame_impl with traced=false) to execute a
dummy closure, then assert the trace/profile handlers were not called and normal
frame cleanup occurs; reference with_frame_untraced, with_frame_impl, and
dispatch_traced_frame so the test covers the untraced execution path explicitly.
crates/vm/src/vm/context.rs (1)

424-428: Give the synthetic cleanup code a distinct internal identity.

Reusing __init__ for source_path, obj_name, and qualname will make any leaked traceback or debug output look like user __init__ code. A dedicated internal name/path would make cleanup-frame failures much easier to diagnose.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/context.rs` around lines 424 - 428, The synthetic cleanup
frame currently reuses names.__init__ for source_path, obj_name, and qualname
which can mask real user __init__ frames; change the synthetic frame to use a
distinct internal identity (e.g. a constant like INTERNAL_CLEANUP_NAME such as
"<vm_cleanup>" or "<internal_cleanup>") and assign that to source_path,
obj_name, and qualname instead of names.__init__; update any creation site that
sets first_line_number/max_stackdepth on the synthetic cleanup frame (the code
assigning source_path, obj_name, qualname in the synthetic frame construction)
so all three fields use the new internal name to make leaked tracebacks clearly
identifiable.
crates/vm/src/frame.rs (1)

4141-4155: Use the latin-1 singleton path here too.

This fast path still only interns ASCII results. Single-character latin-1 hits like "\xE9"[0] will fall straight back to the generic subscript path, so the new singleton caching work does not apply on this opcode yet.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4141 - 4155, The BinaryOpSubscrStrInt
fast path only handles ASCII single-character results via
vm.ctx.ascii_char_cache, so single-character Latin-1 results (e.g. "\xE9"[0])
bypass the new singleton cache; update the Instruction::BinaryOpSubscrStrInt
branch to, after computing ch (and before falling back), check for a Latin‑1
single-character (e.g. a method like ch.is_latin1() or checking ch.to_u32() in
0x80..=0xFF), and on hit pop the two operands and push the corresponding Latin‑1
singleton from the VM context (analogous to vm.ctx.ascii_char_cache, e.g.
vm.ctx.latin1_char_singleton[index].clone().into()); keep the existing ASCII
path and fallback unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/object/ext.rs`:
- Around line 361-370: The helpers deref_ordering, to_owned, and
to_owned_ordering currently forward any std::sync::atomic::Ordering into
self.inner.load(ordering) which will panic for Release/AcqRel; modify them to
only accept or use load-valid orderings by either (1) providing distinct methods
that take explicit valid semantics (e.g., deref_relaxed/deref_acquire and
to_owned_relaxed/to_owned_acquire) and call
inner.load(Ordering::Relaxed/Acquire), or (2) add a runtime check in
deref_ordering and to_owned_ordering that asserts ordering is one of the valid
variants (Relaxed or Acquire) before calling self.inner.load(ordering) and
return None or panic otherwise; update calls to to_owned() to delegate to the
new safe variants and ensure all uses of inner.load in these helpers reference
only load-valid Orderings.

---

Outside diff comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 692-733: The fast-path call sites that currently rely on
debug_assert! for argcount must perform a runtime check and fallback instead:
replace the debug_assert-based checks at the specialization fast-paths that call
invoke_exact_args() with an explicit runtime guard using has_exact_argcount()
(or the equivalent exact-argcount check used at other sites) and if the guard
fails call the safe invoke() path; ensure both fast-path locations that call
invoke_exact_args() (the ones flagged in the review) use this runtime check and
fallback to invoke() rather than assuming the precondition.

---

Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Line 281: The retired field (retired: PyRwLock<Vec<PyObjectRef>>) accumulates
strong references because invalidate_for_type_modified() always calls swap_*(_,
None) which pushes displaced specialization functions into retired and they are
only drained by clear_into(), causing unbounded growth for long-lived heap
types; fix by routing live invalidations through a VM-backed temporary-ref path
when a VirtualMachine is available (use the VM to create transient/weak refs
instead of pushing into retired) or add an opportunistic drain that periodically
clears retired (e.g., in invalidate_for_type_modified(), swap_* callers, or type
mutation paths) so retired does not retain references indefinitely—update
functions invalidate_for_type_modified(), any swap_* call sites that pass None,
and clear_into() to implement the chosen reclamation strategy.

In `@crates/vm/src/frame.rs`:
- Around line 4044-4051: The cached __getitem__ dispatch in
Instruction::BinaryOpSubscrGetitem currently trusts
get_cached_getitem_for_specialization() and func.func_version() but misses
checking the owning class's tp_version_tag; update the guard to also compare
owner.class().tp_version_tag against the cached type version (mirror the
__init__ cache pattern) and treat a mismatch as a cache miss so the fast-path is
skipped and normal lookup occurs; locate the check around self.nth_value(1),
specialization_eval_frame_active(vm), get_cached_getitem_for_specialization(),
func.func_version(), and specialization_has_datastack_space_for_func(vm, &func)
and add the tp_version_tag validation there.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 4141-4155: The BinaryOpSubscrStrInt fast path only handles ASCII
single-character results via vm.ctx.ascii_char_cache, so single-character
Latin-1 results (e.g. "\xE9"[0]) bypass the new singleton cache; update the
Instruction::BinaryOpSubscrStrInt branch to, after computing ch (and before
falling back), check for a Latin‑1 single-character (e.g. a method like
ch.is_latin1() or checking ch.to_u32() in 0x80..=0xFF), and on hit pop the two
operands and push the corresponding Latin‑1 singleton from the VM context
(analogous to vm.ctx.ascii_char_cache, e.g.
vm.ctx.latin1_char_singleton[index].clone().into()); keep the existing ASCII
path and fallback unchanged.

In `@crates/vm/src/vm/context.rs`:
- Around line 424-428: The synthetic cleanup frame currently reuses
names.__init__ for source_path, obj_name, and qualname which can mask real user
__init__ frames; change the synthetic frame to use a distinct internal identity
(e.g. a constant like INTERNAL_CLEANUP_NAME such as "<vm_cleanup>" or
"<internal_cleanup>") and assign that to source_path, obj_name, and qualname
instead of names.__init__; update any creation site that sets
first_line_number/max_stackdepth on the synthetic cleanup frame (the code
assigning source_path, obj_name, qualname in the synthetic frame construction)
so all three fields use the new internal name to make leaked tracebacks clearly
identifiable.

In `@crates/vm/src/vm/mod.rs`:
- Around line 1566-1620: Add a regression test that verifies
with_frame_untraced()/with_frame_impl(..., traced: false, ...) bypasses
dispatch_traced_frame: register sys.settrace and sys.setprofile handlers that
will fail the test if invoked (or toggle a flag), create a FrameRef and call
with_frame_untraced (or call with_frame_impl with traced=false) to execute a
dummy closure, then assert the trace/profile handlers were not called and normal
frame cleanup occurs; reference with_frame_untraced, with_frame_impl, and
dispatch_traced_frame so the test covers the untraced execution path explicitly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 1fd15f3c-4fb1-4d91-9904-23652e5fa048

📥 Commits

Reviewing files that changed from the base of the PR and between e8556b6 and e84ac3c.

📒 Files selected for processing (10)
  • .cspell.json
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/object/ext.rs
  • crates/vm/src/stdlib/builtins.rs
  • crates/vm/src/types/slot.rs
  • crates/vm/src/vm/context.rs
  • crates/vm/src/vm/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/vm/src/stdlib/builtins.rs
  • .cspell.json

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)

4163-4174: ⚠️ Potential issue | 🟠 Major

Don't redo the string index lookup on non-ASCII hits.

getitem_by_index() has already done the bounds check and decoded the character. When ch is not ASCII, this branch discards that result and falls back to generic BINARY_OP, so specialized str[int] sites on Unicode text pay the full lookup twice and never use the new singleton-cache path for latin-1 characters. Build the one-character result from ch directly, or de-specialize after the first non-ASCII miss.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4163 - 4174, The branch currently calls
getitem_by_index and checks ch.is_ascii(), but on non-ASCII it drops ch and
falls back to the generic BINARY_OP path, redoing the lookup later; instead,
after getting ch from getitem_by_index (and when
specialization_nonnegative_compact_index succeeded), build the one-character
PyStr result directly from ch (using the decoded character from
getitem_by_index) and push it (pop the two operands and push the new PyStr)
rather than deferring to the generic path; update the block that references
a_str.getitem_by_index, ch, ascii_char_cache, pop_value, and push_value to
create and push a one-character PyStr from ch when bounds succeeded (and keep
the ascii-char-cache fast path for ch.is_ascii()) so we avoid a second lookup on
Unicode characters.
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

7752-7754: Extract the shared BinaryOpExtend fallback.

The same binary_op_extended_specialization() / cached_extend_descr / Instruction::BinaryOpExtend block is repeated across multiple arms. Pulling that into one helper or a single trailing fallback will make future opcode coverage changes much less error-prone.

As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

Also applies to: 7768-7770, 7784-7786, 7792-7794, 7901-7904

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7752 - 7754, Multiple match arms repeat
the same pattern calling binary_op_extended_specialization(op, a, b, vm),
setting cached_extend_descr, and returning Instruction::BinaryOpExtend; extract
that shared fallback into one helper or a single trailing fallback to remove
duplication. Locate uses of binary_op_extended_specialization,
cached_extend_descr, and Instruction::BinaryOpExtend in the match (e.g., the
arms around the occurrences at 7752, 7768, 7784, 7792, 7901) and refactor so you
first compute the differing value(s) (op/a/b or the condition), then call a
single helper function (or a final else branch) that runs
binary_op_extended_specialization(op, a, b, vm), assigns cached_extend_descr =
Some(descr) when present, and returns Instruction::BinaryOpExtend; keep the
behavior identical but centralize the common logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 772-786: The helper datastack_frame_size_bytes_for_code currently
uses code.freevars.len() which can differ from the actual runtime closure length
used by Frame::new(), leading to undersized reservations; change the function to
take the runtime closure length (e.g., add a closure_len: usize parameter or
accept the closure slice) and use that value instead of code.freevars.len(),
keep the generator/coroutine early-return behavior, and update all callers
(including prepare_exact_args_frame and any code that reserves datastack space)
to pass the actual closure.len() so sizing matches Frame::new().

In `@crates/vm/src/frame.rs`:
- Around line 7388-7401: The current specialization installs
Instruction::LoadAttrModule after checking only tp_version_tag, which still
calls PyModule::get_attr() and can duplicate user-visible side effects if the
module dict later changes; fix by either adding a module-dict runtime guard here
(e.g., capture and cache the module.dict() version/identity alongside
tp_version_tag when writing via write_cache_u32 and check it in the fast-path)
or modify the LoadAttrModule handler to read module.dict() directly and perform
a direct dict lookup and only fall back to load_attr_slow() on a miss/absent key
(so it never re-invokes PyModule::get_attr()); update the code paths around
specialize_at, write_cache_u32, and the LoadAttrModule handler accordingly and
ensure load_attr_slow() remains the canonical slow-path.

---

Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 4163-4174: The branch currently calls getitem_by_index and checks
ch.is_ascii(), but on non-ASCII it drops ch and falls back to the generic
BINARY_OP path, redoing the lookup later; instead, after getting ch from
getitem_by_index (and when specialization_nonnegative_compact_index succeeded),
build the one-character PyStr result directly from ch (using the decoded
character from getitem_by_index) and push it (pop the two operands and push the
new PyStr) rather than deferring to the generic path; update the block that
references a_str.getitem_by_index, ch, ascii_char_cache, pop_value, and
push_value to create and push a one-character PyStr from ch when bounds
succeeded (and keep the ascii-char-cache fast path for ch.is_ascii()) so we
avoid a second lookup on Unicode characters.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 7752-7754: Multiple match arms repeat the same pattern calling
binary_op_extended_specialization(op, a, b, vm), setting cached_extend_descr,
and returning Instruction::BinaryOpExtend; extract that shared fallback into one
helper or a single trailing fallback to remove duplication. Locate uses of
binary_op_extended_specialization, cached_extend_descr, and
Instruction::BinaryOpExtend in the match (e.g., the arms around the occurrences
at 7752, 7768, 7784, 7792, 7901) and refactor so you first compute the differing
value(s) (op/a/b or the condition), then call a single helper function (or a
final else branch) that runs binary_op_extended_specialization(op, a, b, vm),
assigns cached_extend_descr = Some(descr) when present, and returns
Instruction::BinaryOpExtend; keep the behavior identical but centralize the
common logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 73b94538-f3e4-4b89-87ef-d1349d860164

📥 Commits

Reviewing files that changed from the base of the PR and between e84ac3c and 1b61f5d.

📒 Files selected for processing (4)
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/protocol/callable.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/vm/src/object/core.rs

Comment on lines +772 to +786
pub(crate) fn datastack_frame_size_bytes_for_code(code: &Py<PyCode>) -> Option<usize> {
if code
.flags
.intersects(bytecode::CodeFlags::GENERATOR | bytecode::CodeFlags::COROUTINE)
{
return None;
}
let nlocalsplus = code
.varnames
.len()
.checked_add(code.cellvars.len())?
.checked_add(code.freevars.len())?;
let capacity = nlocalsplus.checked_add(code.max_stackdepth as usize)?;
capacity.checked_mul(core::mem::size_of::<usize>())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the runtime closure length when sizing datastack frames.

Frame::new() sizes localsplus with closure.len(), but this helper hard-codes code.freevars.len(). Since set___code__ still allows swapping in a code object without revalidating the stored closure, callers that reserve datastack space via this helper can compute a smaller layout than prepare_exact_args_frame() will actually allocate.

Possible direction
 pub(crate) fn datastack_frame_size_bytes(&self) -> Option<usize> {
-    datastack_frame_size_bytes_for_code(&self.code)
+    datastack_frame_size_bytes_for_layout(
+        &self.code,
+        self.closure.as_ref().map_or(0, |c| c.len()),
+    )
 }
 
-pub(crate) fn datastack_frame_size_bytes_for_code(code: &Py<PyCode>) -> Option<usize> {
+pub(crate) fn datastack_frame_size_bytes_for_layout(
+    code: &Py<PyCode>,
+    nfrees: usize,
+) -> Option<usize> {
     if code
         .flags
         .intersects(bytecode::CodeFlags::GENERATOR | bytecode::CodeFlags::COROUTINE)
     {
         return None;
     }
     let nlocalsplus = code
         .varnames
         .len()
         .checked_add(code.cellvars.len())?
-        .checked_add(code.freevars.len())?;
+        .checked_add(nfrees)?;
     let capacity = nlocalsplus.checked_add(code.max_stackdepth as usize)?;
     capacity.checked_mul(core::mem::size_of::<usize>())
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 772 - 786, The helper
datastack_frame_size_bytes_for_code currently uses code.freevars.len() which can
differ from the actual runtime closure length used by Frame::new(), leading to
undersized reservations; change the function to take the runtime closure length
(e.g., add a closure_len: usize parameter or accept the closure slice) and use
that value instead of code.freevars.len(), keep the generator/coroutine
early-return behavior, and update all callers (including
prepare_exact_args_frame and any code that reserves datastack space) to pass the
actual closure.len() so sizing matches Frame::new().

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)

3732-3754: ⚠️ Potential issue | 🟠 Major

Guard LoadAttrModule on module-dict state, not just tp_version_tag.

Line 3743 still goes through PyModule::get_attr(), but the specialization installed at Lines 7388-7400 only guards on the module type version. If the module dict later mutates (__getattr__ added, the attribute removed, lookup starts raising), the specialized path can perform module resolution and then immediately fall back to load_attr_slow(), duplicating user-visible side effects. Cache module-dict state as part of the guard, or make the specialized handler do a direct dict lookup and fall back only on a true miss.

Also applies to: 7385-7401

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 3732 - 3754, The fast-path for
Instruction::LoadAttrModule uses only tp_version_tag to guard
(owner.downcast_ref_if_exact::<PyModule>, type_version from
code.instructions.read_cache_u32) but still calls PyModule::get_attr, risking
duplicate side effects; change the specialization to also cache and compare the
module dict state (e.g. a dict version/tag or identity pointer) and perform a
direct dict lookup for attr_name (avoiding PyModule::get_attr) so you only fall
back to load_attr_slow(vm, oparg) on a real miss or when the dict/version check
fails; apply the same change to the duplicate specialization region referenced
(around the other LoadAttrModule handler).
🧹 Nitpick comments (3)
crates/vm/src/frame.rs (2)

1179-1253: The mixed float/int extension still skips the inplace operators.

BinaryOpExtend now covers Add/Subtract/Multiply/TrueDivide, but InplaceAdd/InplaceSubtract/InplaceMultiply/InplaceTrueDivide still never route through binary_op_extended_specialization. For immutable floats that leaves x += 1, x -= 1, x *= 1, and x /= 1 on the generic path even though they can reuse the same specialization.

Also applies to: 7752-7795, 7901-7904

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 1179 - 1253, The float/int mixed
specializations omit the inplace operators, so inplace variants never hit
binary_op_extended_specialization; update the BinaryOpExtendSpecializationDescr
tables (e.g., BINARY_OP_EXTEND_DESCRS) to add entries mapping
bytecode::BinaryOperator::InplaceAdd, InplaceSubtract, InplaceMultiply and
InplaceTrueDivide to the same guards and actions used for
Add/Subtract/Multiply/TrueDivide (e.g., map InplaceAdd ->
float_compactlong_guard/float_compactlong_add and InplaceTrueDivide ->
nonzero_float_compactlong_guard/float_compactlong_true_div, and similarly for
compactlong_float_* entries), and apply the same additions to the other
identical descriptor arrays mentioned (the other occurrences around the
indicated ranges) so inplace ops reuse the existing specializations.

4160-4174: Use the Latin-1 singleton path in the specialized str[int] fast path too.

This branch still only interns ASCII characters. Repeated subscripting that returns a single-byte non-ASCII Latin-1 character will keep missing the specialized result path and fall back to generic str.__getitem__, even though this PR adds a Latin-1 singleton cache for that case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 4160 - 4174, The fast path in
Instruction::BinaryOpSubscrStrInt only uses the ascii_char_cache; extend it to
also handle Latin-1 single-byte non-ASCII characters by checking the character
returned from PyStr::getitem_by_index for a Latin-1 single-byte (e.g.,
ch.is_latin1() or equivalent) and, when true, pop the two values and push the
cached singleton from the Latin-1 cache (e.g.,
vm.ctx.latin1_char_cache[ch.to_u32() as usize].clone().into()) just like the
existing ascii branch; keep the existing ascii branch unchanged and add the
Latin-1 branch alongside it after getitem_by_index succeeds.
crates/vm/src/builtins/type.rs (1)

345-350: Consider using .for_each() instead of .map().count().

The current pattern works, but .for_each() better expresses the intent of executing side effects without producing values.

Suggested change
-        self.retired
-            .read()
-            .iter()
-            .map(|obj| obj.traverse(tracer_fn))
-            .count();
+        self.retired
+            .read()
+            .iter()
+            .for_each(|obj| obj.traverse(tracer_fn));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 345 - 350, The code iterates
over self.retired.read().iter() and calls obj.traverse(tracer_fn) but uses
.map(...).count() to execute side effects; replace that with .for_each(...) to
convey intent more clearly. Locate the block using
self.retired.read().iter().map(|obj| obj.traverse(tracer_fn)).count() and change
it to call .for_each(|obj| obj.traverse(tracer_fn)) so traverse(tracer_fn) runs
for each retired object without generating/consuming a dummy count value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 965-970: The current code stores
ext.specialization_cache.getitem_version with Ordering::Relaxed after calling
ext.specialization_cache.swap_getitem (which uses AcqRel via
PyAtomicRef::swap_to_temporary_refs), creating a race where a reader can see the
new pointer but a stale version; fix by making the version store release-ordered
or by moving the store before the swap: either change the call to
ext.specialization_cache.getitem_version.store(func_version, Ordering::Release)
or perform getitem_version.store(...) immediately before calling
ext.specialization_cache.swap_getitem(Some(getitem), Some(vm)) so the version
publish is covered by Release semantics.

In `@crates/vm/src/vm/context.rs`:
- Around line 401-416: The instructions array in vm::context.rs contains an
unreachable Resume opcode placed after Instruction::ReturnValue; remove the
CodeUnit with Instruction::Resume (or move it before the ReturnValue entry) so
Resume can execute if intended—update the instructions array where CodeUnit
entries are created (the entries with Instruction::ExitInitCheck,
Instruction::ReturnValue, and Instruction::Resume) to either drop the Resume
element or reorder it to precede ReturnValue and run any associated context arg
handling accordingly.

---

Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 3732-3754: The fast-path for Instruction::LoadAttrModule uses only
tp_version_tag to guard (owner.downcast_ref_if_exact::<PyModule>, type_version
from code.instructions.read_cache_u32) but still calls PyModule::get_attr,
risking duplicate side effects; change the specialization to also cache and
compare the module dict state (e.g. a dict version/tag or identity pointer) and
perform a direct dict lookup for attr_name (avoiding PyModule::get_attr) so you
only fall back to load_attr_slow(vm, oparg) on a real miss or when the
dict/version check fails; apply the same change to the duplicate specialization
region referenced (around the other LoadAttrModule handler).

---

Nitpick comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 345-350: The code iterates over self.retired.read().iter() and
calls obj.traverse(tracer_fn) but uses .map(...).count() to execute side
effects; replace that with .for_each(...) to convey intent more clearly. Locate
the block using self.retired.read().iter().map(|obj|
obj.traverse(tracer_fn)).count() and change it to call .for_each(|obj|
obj.traverse(tracer_fn)) so traverse(tracer_fn) runs for each retired object
without generating/consuming a dummy count value.

In `@crates/vm/src/frame.rs`:
- Around line 1179-1253: The float/int mixed specializations omit the inplace
operators, so inplace variants never hit binary_op_extended_specialization;
update the BinaryOpExtendSpecializationDescr tables (e.g.,
BINARY_OP_EXTEND_DESCRS) to add entries mapping
bytecode::BinaryOperator::InplaceAdd, InplaceSubtract, InplaceMultiply and
InplaceTrueDivide to the same guards and actions used for
Add/Subtract/Multiply/TrueDivide (e.g., map InplaceAdd ->
float_compactlong_guard/float_compactlong_add and InplaceTrueDivide ->
nonzero_float_compactlong_guard/float_compactlong_true_div, and similarly for
compactlong_float_* entries), and apply the same additions to the other
identical descriptor arrays mentioned (the other occurrences around the
indicated ranges) so inplace ops reuse the existing specializations.
- Around line 4160-4174: The fast path in Instruction::BinaryOpSubscrStrInt only
uses the ascii_char_cache; extend it to also handle Latin-1 single-byte
non-ASCII characters by checking the character returned from
PyStr::getitem_by_index for a Latin-1 single-byte (e.g., ch.is_latin1() or
equivalent) and, when true, pop the two values and push the cached singleton
from the Latin-1 cache (e.g., vm.ctx.latin1_char_cache[ch.to_u32() as
usize].clone().into()) just like the existing ascii branch; keep the existing
ascii branch unchanged and add the Latin-1 branch alongside it after
getitem_by_index succeeds.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 52b4ddc5-216c-4f4c-b820-24eb3bf13220

📥 Commits

Reviewing files that changed from the base of the PR and between 1b61f5d and da09064.

📒 Files selected for processing (12)
  • .cspell.json
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/object/ext.rs
  • crates/vm/src/protocol/callable.rs
  • crates/vm/src/stdlib/builtins.rs
  • crates/vm/src/types/slot.rs
  • crates/vm/src/vm/context.rs
  • crates/vm/src/vm/mod.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • crates/vm/src/types/slot.rs
  • crates/vm/src/stdlib/builtins.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/object/ext.rs
  • crates/vm/src/object/core.rs

Comment on lines +401 to +416
let instructions = [
CodeUnit {
op: Instruction::ExitInitCheck,
arg: 0.into(),
},
CodeUnit {
op: Instruction::ReturnValue,
arg: 0.into(),
},
CodeUnit {
op: Instruction::Resume {
context: bytecode::Arg::marker(),
},
arg: 0.into(),
},
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if other code objects in the codebase have similar patterns with Resume after ReturnValue
ast-grep --pattern $'CodeUnit {
  $$$
  op: Instruction::ReturnValue,
  $$$
}'

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check how Resume instruction is typically used in the codebase
rg -n "Instruction::Resume" --type rust -C 3

Repository: RustPython/RustPython

Length of output: 9063


🏁 Script executed:

# Read the full context of the new_init_cleanup_code function
sed -n '395,440p' crates/vm/src/vm/context.rs

Repository: RustPython/RustPython

Length of output: 1733


🏁 Script executed:

# Search for comments or documentation about this specific function
rg -n "new_init_cleanup_code\|ExitInitCheck\|init_result" crates/vm/src/vm/context.rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Check if Resume is ever expected to appear after ReturnValue in other code objects
rg -B 5 -A 5 "Instruction::ReturnValue" crates/codegen/src/compile.rs | head -100

Repository: RustPython/RustPython

Length of output: 3730


🏁 Script executed:

# Search for tests or usages of new_init_cleanup_code
rg -n "new_init_cleanup_code" crates/ --type rust

Repository: RustPython/RustPython

Length of output: 289


🏁 Script executed:

# Check git log for this function (recent commits)
git log --oneline -20 -- crates/vm/src/vm/context.rs | head -20

Repository: RustPython/RustPython

Length of output: 122


🏁 Script executed:

# Check if there's any documentation or comments about Resume placement in similar contexts
rg -n "Resume.*context\|context.*Resume" crates/vm/src/vm/ --type rust -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 47


Remove unreachable Resume instruction — it appears after ReturnValue and will never execute.

The Resume instruction at index 2 is placed after ReturnValue, which exits the frame. Since ReturnValue terminates execution, the Resume instruction is unreachable dead code and should be removed or repositioned before ReturnValue if quickening initialization is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/context.rs` around lines 401 - 416, The instructions array
in vm::context.rs contains an unreachable Resume opcode placed after
Instruction::ReturnValue; remove the CodeUnit with Instruction::Resume (or move
it before the ReturnValue entry) so Resume can execute if intended—update the
instructions array where CodeUnit entries are created (the entries with
Instruction::ExitInitCheck, Instruction::ReturnValue, and Instruction::Resume)
to either drop the Resume element or reorder it to precede ReturnValue and run
any associated context arg handling accordingly.

@youknowone youknowone force-pushed the specialization branch 2 times, most recently from df126ca to b91fc30 Compare March 9, 2026 13:52
@youknowone youknowone merged commit d248a04 into RustPython:main Mar 10, 2026
13 checks passed
@youknowone youknowone deleted the specialization branch March 10, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

X Tutup