X Tutup
Skip to content

Reduce per-instruction overhead in interpreter loop#7315

Merged
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:loop-opt
Mar 3, 2026
Merged

Reduce per-instruction overhead in interpreter loop#7315
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:loop-opt

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 2, 2026

close #5281

  • Defer prev_line update to when tracing is active
  • Inline skip_caches_if_fallthrough, compute cache_entries once per instruction
  • Unmark test_pdb_set_trace EXPECTED_FAILURE (now passing due to prev_line fix)

Note: Moving check_signals to JumpBackward only was attempted but reverted — it broke signal delivery in non-loop code (test_signal, test_io, test_faulthandler).

Summary by CodeRabbit

  • Refactor
    • Optimized virtual machine instruction cache handling and streamlined tracing logic for improved consistency and performance across execution paths.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

Inlines inline-cache handling into the VM frame dispatcher, removing the skip_caches_if_fallthrough helper; refines tracing so prev_line and opcode trace events are updated/emitted only when tracing is enabled and certain opcodes are excluded.

Changes

Cohort / File(s) Summary
Frame dispatch & cache handling
crates/vm/src/frame.rs
Removed skip_caches_if_fallthrough; inlined op.cache_entries() usage after dispatch in original, instrumented-line, and instrumented-instruction paths to advance lasti on fallthrough.
Tracing gating
crates/vm/src/frame.rs
Guarded prev_line updates and opcode trace emission behind vm.use_tracing and trace_opcodes, and excluded opcodes (Resume, InstrumentedResume, ExtendedArg, InstrumentedLine) from trace events.

Sequence Diagram(s)

sequenceDiagram
    participant VM as VM (dispatcher)
    participant Frame as Frame
    participant Op as Opcode
    participant Cache as InlineCache
    participant Trace as Tracer

    VM->>Frame: dispatch instruction
    Frame->>Op: resolve original/instrumented opcode
    Op-->>Frame: execute / may jump
    Frame->>Cache: op.cache_entries()
    alt no jump (fallthrough)
        Frame->>Frame: lasti += cache_count
    end
    alt tracing enabled
        Frame->>Trace: emit opcode trace (filtered)
        Trace-->>Frame: ack
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • instruction CACHE #7251: Modifies crates/vm/src/frame.rs with cache-aware dispatch helpers and relative-jump handling; directly overlaps with this PR's cache/dispatch changes.

Poem

🐰 I nibble at frames with joyful cheer,
Inline caches hop — no helper near.
Tracing watches only when invited,
Opcodes and jumps now neatly sighted.
Hooray — the VM thumps softly, light and clear!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main optimization focus: reducing per-instruction overhead in the interpreter loop through cache handling and tracing logic refinements.
Linked Issues check ✅ Passed The PR addresses the core objective of #5281 by deferring prev_line updates and inlining cache handling, though the attempted check_signals optimization was reverted due to side effects.
Out of Scope Changes check ✅ Passed Changes are scoped to interpreter loop optimization: cache handling, prev_line update deferral, and test unmarking directly support the stated objectives with no extraneous modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the loop-opt branch 3 times, most recently from be78ddd to 236c1cb Compare March 3, 2026 00:41
@youknowone youknowone marked this pull request as ready for review March 3, 2026 00:41
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/doctest.py
[ ] test: cpython/Lib/test/test_doctest (TODO: 6)

dependencies:

  • doctest

dependent tests: (33 tests)

  • doctest: test_builtin test_cmd test_code test_collections test_ctypes test_decimal test_deque test_descrtut test_difflib test_doctest test_doctest2 test_enum test_extcall test_generators test_getopt test_heapq test_http_cookies test_itertools test_listcomps test_math test_metaclass test_pep646_syntax test_pickle test_pickletools test_setcomps test_statistics test_syntax test_threading_local test_typing test_unpack test_unpack_ex test_weakref test_zipimport

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)

1193-1193: ⚠️ Potential issue | 🟠 Major

Remove check_signals call from execute_instruction and add it only to backward jump handling.

The PR objective includes "Move check_signals from every instruction to JumpBackward only," but vm.check_signals()?; remains in execute_instruction() (line 1193), which is invoked for every instruction. This call defeats the optimization goal. To complete the PR objective, remove this call and add check_signals only to JumpBackward and JumpBackwardNoInterrupt instruction handlers to reduce per-instruction overhead while maintaining signal handling in loops.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` at line 1193, Remove the per-instruction
vm.check_signals()? call from execute_instruction() so it no longer runs for
every opcode; instead, insert vm.check_signals()? only inside the handlers for
JumpBackward and JumpBackwardNoInterrupt (immediately before performing the
backward jump/pc update and preserving the error propagation with ?). Update
JumpBackward and JumpBackwardNoInterrupt handlers to call vm.check_signals()? at
the point where you detect/perform a backward jump so loop interruptions are
still handled, and ensure you remove the redundant call from execute_instruction
to avoid per-instruction overhead.
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

3419-3422: Consider extracting cache-skip logic into a helper (optional).

The cache-adjustment pattern appears in three places: the main loop (lines 786-789), InstrumentedLine (lines 3419-3422), and InstrumentedInstruction (lines 3454-3457). While the instrumented paths are cold and compute orig_caches differently, a small helper could reduce duplication:

#[inline]
fn skip_caches_if_fallthrough(&mut self, caches: u8, lasti_before: u32) {
    if caches > 0 && self.lasti() == lasti_before {
        self.update_lasti(|i| *i += caches as u32);
    }
}

This is optional given the cold path context and slight differences in how caches is computed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 3419 - 3422, Extract the duplicated
cache-adjustment block into a small helper (e.g., fn
skip_caches_if_fallthrough(&mut self, caches: u8, lasti_before: u32)) and
replace the three occurrences (main loop, InstrumentedLine block using
orig_caches from original_op, and InstrumentedInstruction) with calls to that
helper; ensure the helper checks caches > 0 and self.lasti() == lasti_before and
then calls self.update_lasti(|i| *i += caches as u32), keep it #[inline] if
desired, and adjust call sites to pass the computed caches and lasti_before
values (matching the types used where orig_caches is computed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Line 1193: Remove the per-instruction vm.check_signals()? call from
execute_instruction() so it no longer runs for every opcode; instead, insert
vm.check_signals()? only inside the handlers for JumpBackward and
JumpBackwardNoInterrupt (immediately before performing the backward jump/pc
update and preserving the error propagation with ?). Update JumpBackward and
JumpBackwardNoInterrupt handlers to call vm.check_signals()? at the point where
you detect/perform a backward jump so loop interruptions are still handled, and
ensure you remove the redundant call from execute_instruction to avoid
per-instruction overhead.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Around line 3419-3422: Extract the duplicated cache-adjustment block into a
small helper (e.g., fn skip_caches_if_fallthrough(&mut self, caches: u8,
lasti_before: u32)) and replace the three occurrences (main loop,
InstrumentedLine block using orig_caches from original_op, and
InstrumentedInstruction) with calls to that helper; ensure the helper checks
caches > 0 and self.lasti() == lasti_before and then calls self.update_lasti(|i|
*i += caches as u32), keep it #[inline] if desired, and adjust call sites to
pass the computed caches and lasti_before values (matching the types used where
orig_caches is computed).

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43e2df3 and 236c1cb.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_doctest/test_doctest.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/vm/src/frame.rs

- Defer prev_line update to when tracing is active
- Inline skip_caches_if_fallthrough, compute cache_entries once per instruction
prev_line deferral fix corrects pdb line tracking, making
these doctests pass.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 755-763: The prev_line update logic in frame.rs currently skips
Instruction::Resume but not Instruction::InstrumentedResume, which allows
InstrumentedResume to pre-set prev_line and suppress the first real Line trace;
update the conditional inside the vm.use_tracing.get() branch (the match against
op) to also exclude Instruction::InstrumentedResume alongside Resume,
ExtendedArg, and InstrumentedLine so prev_line is not updated for instrumented
resume operations (look for the match that checks Instruction::Resume { .. } |
Instruction::ExtendedArg | Instruction::InstrumentedLine and add
Instruction::InstrumentedResume { .. } to that set).

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 236c1cb and b078243.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_doctest/test_doctest.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/vm/src/frame.rs

Comment on lines +755 to +763
// Update prev_line only when tracing or monitoring is active.
// When neither is enabled, prev_line is stale but unused.
if vm.use_tracing.get() {
if !matches!(
op,
Instruction::Resume { .. }
| Instruction::InstrumentedResume
| Instruction::ExtendedArg
)
{
vm.trace_event(crate::protocol::TraceEvent::Opcode, None)?;
| Instruction::InstrumentedLine
) && let Some((loc, _)) = self.code.locations.get(idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exclude InstrumentedResume when updating prev_line.

Line 758 excludes Resume but not InstrumentedResume. Under tracing + monitoring instrumentation, this can pre-set prev_line on resume and suppress the first real Line trace event.

🔧 Proposed fix
-            // Update prev_line only when tracing or monitoring is active.
-            // When neither is enabled, prev_line is stale but unused.
+            // Update prev_line only when tracing is active.
+            // Keep RESUME variants excluded so the first real instruction
+            // can still trigger a line event.
             if vm.use_tracing.get() {
                 if !matches!(
                     op,
                     Instruction::Resume { .. }
+                        | Instruction::InstrumentedResume
                         | Instruction::ExtendedArg
                         | Instruction::InstrumentedLine
                 ) && let Some((loc, _)) = self.code.locations.get(idx)
                 {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Update prev_line only when tracing or monitoring is active.
// When neither is enabled, prev_line is stale but unused.
if vm.use_tracing.get() {
if !matches!(
op,
Instruction::Resume { .. }
| Instruction::InstrumentedResume
| Instruction::ExtendedArg
)
{
vm.trace_event(crate::protocol::TraceEvent::Opcode, None)?;
| Instruction::InstrumentedLine
) && let Some((loc, _)) = self.code.locations.get(idx)
// Update prev_line only when tracing is active.
// Keep RESUME variants excluded so the first real instruction
// can still trigger a line event.
if vm.use_tracing.get() {
if !matches!(
op,
Instruction::Resume { .. }
| Instruction::InstrumentedResume
| Instruction::ExtendedArg
| Instruction::InstrumentedLine
) && let Some((loc, _)) = self.code.locations.get(idx)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 755 - 763, The prev_line update logic in
frame.rs currently skips Instruction::Resume but not
Instruction::InstrumentedResume, which allows InstrumentedResume to pre-set
prev_line and suppress the first real Line trace; update the conditional inside
the vm.use_tracing.get() branch (the match against op) to also exclude
Instruction::InstrumentedResume alongside Resume, ExtendedArg, and
InstrumentedLine so prev_line is not updated for instrumented resume operations
(look for the match that checks Instruction::Resume { .. } |
Instruction::ExtendedArg | Instruction::InstrumentedLine and add
Instruction::InstrumentedResume { .. } to that set).

@youknowone youknowone merged commit 7727acc into RustPython:main Mar 3, 2026
14 checks passed
@youknowone youknowone deleted the loop-opt branch March 3, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

X Tutup