X Tutup
Skip to content

More fork safety#7380

Merged
youknowone merged 9 commits intoRustPython:mainfrom
youknowone:implock
Mar 8, 2026
Merged

More fork safety#7380
youknowone merged 9 commits intoRustPython:mainfrom
youknowone:implock

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 7, 2026

Summary by CodeRabbit

  • New Features

    • Thread API: ThreadHandle adds ident, is_done, _set_done and join(timeout); thread-start APIs accept consolidated arg form.
    • Mutexes: lock wait points can run a user-provided wrapper while waiting.
  • Bug Fixes

    • Reduced spurious signal handling and tightened stop-the-world/shutdown synchronization to avoid lost wakeups and races.
    • Improved fork-time warnings when forking from multi-threaded processes.
  • Performance

    • Blocking I/O and OS waits now release the interpreter lock for better concurrency.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a lock_wrapped API and uses it with vm.allow_threads to release the VM/GIL during blocking waits; refactors stop-the-world state to world_stopped plus a thread_countdown, adds per-thread stop_requested, tightens signal-exception gating, and overhauls thread start/join/shutdown lifecycles.

Changes

Cohort / File(s) Summary
Lock wrapping
crates/common/src/lock/thread_mutex.rs
Added lock_wrapped to RawThreadMutex and ThreadMutex to run a user-provided wrapper (e.g., vm.allow_threads) around the blocking lock operation.
GIL-release for OS waits & multiplexing
crates/stdlib/src/multiprocessing.rs, crates/stdlib/src/select.rs
Blocking OS calls (semaphores, select, poll, epoll) now execute inside vm.allow_threads closures so the VM/GIL is released during OS-level waits.
I/O locking and blocking behavior
crates/vm/src/stdlib/io.rs
Replaced direct mutex locks with `lock_wrapped(
Signal handling
crates/vm/src/frame.rs, crates/vm/src/signal.rs
Signal-exception handling in ExecutingFrame::run is now gated by eval_breaker_tripped(); added pub(crate) fn is_triggered() and updated the signal-exception handler signature.
Stop-the-world redesign
crates/vm/src/vm/mod.rs, crates/vm/src/vm/thread.rs
Replaced detach-yields with world_stopped: AtomicBool and thread_countdown: AtomicI64; added per-thread stop_requested bit, countdown helpers, requester/notify helpers, and simplified attach/detach/suspend/resume flows; changed detach_thread signature.
Thread API & lifecycle overhaul
crates/vm/src/stdlib/thread.rs
Refactored start_new_thread/start_joinable_thread to accept FuncArgs, added strict arg validation and audit hook calls, added ThreadHandle methods (ident, is_done, _set_done, join) and join helpers; reworked shutdown and join semantics.
Posix fork/thread counting
crates/vm/src/stdlib/posix.rs
Added OS-thread-count heuristics (macOS/Linux/fallback) and refactored fork warning logic to use OS data when available and warn after parent resumes.

Sequence Diagram(s)

sequenceDiagram
    participant Py as Python code (caller)
    participant VM as VirtualMachine
    participant Mutex as ThreadMutex/RawThreadMutex
    participant OS as OS blocking syscall

    Py->>VM: request lock_wrapped(wrap_fn)
    VM->>Mutex: lock_wrapped(wrap_fn)
    Mutex->>VM: invoke wrap_fn(do_lock)
    VM->>OS: vm.allow_threads(do_lock)  -- releases VM/GIL
    OS-->>Mutex: blocking wait returns
    Mutex-->>VM: set owner and return
    VM-->>Py: return guard / success
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~110 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • fanninpm

Poem

🐇
I nudged the lock and let it sleep,
While other threads could dance and leap.
A breaker hums, a countdown peeps,
I boxed the wait and freed the heap,
Hooray — concurrency’s tiny beep!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'More fork safety' is vague and generic, using non-descriptive language that doesn't clearly convey the specific changes made in this comprehensive pull request. Consider using a more descriptive title that captures the main focus, such as 'Add lock_wrapped for GIL-safe mutex operations' or 'Improve thread safety in fork and stop-the-world logic'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the implock branch 3 times, most recently from db453fc to b29c3b8 Compare March 7, 2026 15:36
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] lib: cpython/Lib/asyncio
[ ] test: cpython/Lib/test/test_asyncio (TODO: 37)

dependencies:

  • asyncio (native: _asyncio, _overlapped, _pyrepl.console, _pyrepl.main, _pyrepl.simple_interact, _remote_debugging, _winapi, asyncio.tools, base_events, collections.abc, concurrent.futures, coroutines, errno, events, exceptions, futures, graph, itertools, locks, log, math, msvcrt, protocols, queues, readline, runners, streams, sys, taskgroups, tasks, threads, time, timeouts, transports, unix_events, windows_events)
    • collections (native: _collections, _weakref, itertools, sys)
    • logging (native: atexit, collections.abc, email.message, email.utils, errno, http.client, logging.handlers, multiprocessing.queues, select, sys, time, urllib.parse, win32evtlog, win32evtlogutil)
    • site (native: _io, _pyrepl.main, _pyrepl.pager, _pyrepl.readline, _pyrepl.unix_console, _pyrepl.windows_console, atexit, builtins, errno, readline, sitecustomize, sys, usercustomize)
    • tokenize (native: _tokenize, builtins, itertools, sys)
    • _colorize, argparse, ast, contextlib, contextvars, dataclasses, enum, functools, heapq, inspect, io, linecache, os, reprlib, rlcompleter, selectors, signal, socket, ssl, stat, struct, subprocess, tempfile, threading, traceback, types, warnings, weakref

dependent tests: (7 tests)

  • asyncio: test_asyncio test_contextlib_async test_inspect test_logging test_os test_sys_settrace test_unittest

[x] lib: cpython/Lib/threading.py
[x] lib: cpython/Lib/_threading_local.py
[ ] test: cpython/Lib/test/test_threading.py (TODO: 16)
[ ] test: cpython/Lib/test/test_threadedtempfile.py
[ ] test: cpython/Lib/test/test_threading_local.py (TODO: 3)

dependencies:

  • threading

dependent tests: (148 tests)

  • threading: test_android test_asyncio test_bz2 test_code test_concurrent_futures test_contextlib test_ctypes test_decimal test_docxmlrpc test_email test_enum test_fork1 test_frame test_ftplib test_functools test_gc test_hashlib test_httplib test_httpservers test_imaplib test_importlib test_inspect test_io test_itertools test_largefile test_linecache test_logging test_memoryview test_opcache test_pathlib test_poll test_queue test_robotparser test_sched test_signal test_smtplib test_socket test_socketserver test_sqlite3 test_ssl test_subprocess test_super test_sys test_syslog test_termios test_threadedtempfile test_threading test_threading_local test_time test_urllib2_localnet test_weakref test_winreg test_wsgiref test_xmlrpc test_zstd
    • asyncio: test_asyncio test_contextlib_async test_os test_sys_settrace test_unittest
    • bdb: test_bdb
    • concurrent.futures._base: test_concurrent_futures
    • concurrent.futures.process: test_compileall test_concurrent_futures
    • concurrent.futures.thread: test_genericalias
    • dummy_threading: test_dummy_threading
    • http.cookiejar: test_http_cookiejar test_urllib2
      • urllib.request: test_pathlib test_pydoc test_sax test_site test_urllib test_urllib2net test_urllibnet
    • importlib.util: test_ctypes test_doctest test_importlib test_pkgutil test_py_compile test_reprlib test_runpy test_zipfile test_zipimport
      • py_compile: test_argparse test_cmd_line_script test_importlib test_multiprocessing_main_handling
      • pyclbr: test_pyclbr
      • sysconfig: test_c_locale_coercion test_dtrace test_launcher test_osx_env test_posix test_pyexpat test_regrtest test_support test_sysconfig test_tools test_venv
      • zipfile: test_shutil test_zipapp test_zipfile test_zipfile64
    • logging: test_unittest
      • hashlib: test_hmac test_tarfile test_unicodedata
    • multiprocessing: test_fcntl test_re
    • queue: test_dummy_thread
    • subprocess: test_atexit test_audit test_cmd_line test_ctypes test_faulthandler test_file_eintr test_gzip test_json test_msvcrt test_ntpath test_platform test_plistlib test_quopri test_repl test_script_helper test_select test_tempfile test_traceback test_unittest test_utf8_mode test_wait3 test_webbrowser
      • ctypes.util: test_ctypes
      • ensurepip: test_ensurepip
      • platform: test__locale test__osx_support test_baseexception test_builtin test_cmath test_math test_mimetypes
    • sysconfig:
      • trace: test_trace
    • zipfile:
      • shutil: test_filecmp test_glob test_string_literals test_unicode_file test_zoneinfo

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone marked this pull request as ready for review March 7, 2026 16:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/vm/thread.rs (1)

187-209: ⚠️ Potential issue | 🔴 Critical

Detaching a requested thread needs to wake the stop-the-world requester.

When park_detached_threads() marks an ATTACHED thread with stop_requested = true, a subsequent call to detach_thread() (via allow_threads()) can transition it to THREAD_DETACHED without notifying the requester. This leaves the thread countdown outstanding. The blocked stop_the_world() loop will only advance when either the countdown reaches zero or park_detached_threads() is called again to rescan the detached thread, causing unnecessary stalling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/thread.rs` around lines 187 - 209, detaching an ATTACHED
thread can race with a prior park_detached_threads() that set
stop_requested=true and so detach_thread() must notify the stop-the-world
requester; update detach_thread() (inside CURRENT_THREAD_SLOT handling when
compare_exchange succeeds going from THREAD_ATTACHED to THREAD_DETACHED) to
check the thread's stop_requested flag and, if true, call the same notification
used by park_detached_threads()/stop_the_world (i.e., invoke the stop-the-world
wake/notify function used elsewhere) so the stop_the_world() countdown/loop is
advanced when a requested thread is detached.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/io.rs`:
- Line 5339: The code currently uses std::io::Read::read_to_end(), which hides
EINTR and prevents vm.check_signals() from running on the unbounded-read path
(size < 0); replace that call with an explicit chunked loop that uses
crt_fd::read() to fill a Vec<u8> (read fixed-size chunks, e.g., 8KiB), append
each successful chunk to the buffer, break on EOF (0 bytes), and on read errors
return Err so EINTR is propagated to the outer error handler; make sure to
reference and replace the read_to_end() call site and keep vm.check_signals()
semantics by letting crt_fd::read() surface EINTR instead of swallowing it.

In `@crates/vm/src/vm/mod.rs`:
- Around line 334-343: The fast-return path when initial_countdown == 0 leaves
stats_last_wait_ns unchanged; update the code in the branch (inside the block
after self.world_stopped.store(...) and before returning) to reset the VM's
last-wait statistic (the atomic backing field used by
stats_last_wait_ns/stats_snapshot) to zero using the same atomic semantics used
elsewhere (e.g., an atomic store with appropriate Ordering), so stats_snapshot()
does not report the previous stop's wait time after a no-wait stop; locate the
change around init_thread_countdown and world_stopped in stop() in vm::mod.rs.
- Around line 1931-1949: The fast-path in eval_breaker_tripped() performs a
Relaxed load of state.finalizing which can bypass the later Acquire load in
check_signals(), breaking the Release→Acquire synchronization from the shutdown
write; change the atomic load on state.finalizing in eval_breaker_tripped() to
Ordering::Acquire (or otherwise ensure an Acquire-load or appropriate fence is
executed when checking finalizing) so that the shutdown write in interpreter.rs
(Release) becomes visible on other threads; references: eval_breaker_tripped,
state.finalizing, check_signals, and the shutdown write in interpreter.rs.

In `@crates/vm/src/vm/thread.rs`:
- Around line 426-462: The cleanup_current_thread_frames path can call
vm.state.stop_the_world.notify_thread_gone() for threads that were
THREAD_DETACHED before the stop-the-world began; only decrement the requester
countdown for threads that were counted by init_thread_countdown. Update the
final conditional in cleanup_current_thread_frames to also ensure the removed
slot's state is not THREAD_DETACHED (e.g. add slot.state.load(...) !=
THREAD_DETACHED alongside the existing THREAD_SUSPENDED check) before calling
stop_the_world.notify_thread_gone(); reference symbols:
cleanup_current_thread_frames, vm.state.stop_the_world.notify_thread_gone,
init_thread_countdown, THREAD_DETACHED, THREAD_SUSPENDED, slot.state.
- Around line 247-252: Reads of the stop-request bit in suspend_if_needed and
stop_requested_for_current_thread currently use Ordering::Relaxed; change those
.load(Ordering::Relaxed) calls on the stop_requested AtomicBool to
.load(Ordering::Acquire) so they synchronize with park_detached_threads() which
publishes the bit with Ordering::Release (ensuring eval_breaker_tripped() sees
the update and threads suspend correctly).

---

Outside diff comments:
In `@crates/vm/src/vm/thread.rs`:
- Around line 187-209: detaching an ATTACHED thread can race with a prior
park_detached_threads() that set stop_requested=true and so detach_thread() must
notify the stop-the-world requester; update detach_thread() (inside
CURRENT_THREAD_SLOT handling when compare_exchange succeeds going from
THREAD_ATTACHED to THREAD_DETACHED) to check the thread's stop_requested flag
and, if true, call the same notification used by
park_detached_threads()/stop_the_world (i.e., invoke the stop-the-world
wake/notify function used elsewhere) so the stop_the_world() countdown/loop is
advanced when a requested thread is detached.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9daf3542-35f7-4fa9-8ecd-777f0f7c9a5d

📥 Commits

Reviewing files that changed from the base of the PR and between f680f8a and b29c3b8.

⛔ Files ignored due to path filters (2)
  • Lib/test/test_asyncio/test_events.py is excluded by !Lib/**
  • Lib/test/test_threading.py is excluded by !Lib/**
📒 Files selected for processing (9)
  • crates/common/src/lock/thread_mutex.rs
  • crates/stdlib/src/multiprocessing.rs
  • crates/stdlib/src/select.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/signal.rs
  • crates/vm/src/stdlib/io.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/thread.rs

};
use crossbeam_utils::atomic::AtomicCell;
use std::io::{Read, Write};
use std::io::Read;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and get its total line count
wc -l crates/vm/src/stdlib/io.rs

Repository: RustPython/RustPython

Length of output: 96


🏁 Script executed:

# Read the specific lines mentioned in the review (5339 and 5764-5786)
# Start with line 5339
sed -n '5335,5345p' crates/vm/src/stdlib/io.rs

Repository: RustPython/RustPython

Length of output: 427


🏁 Script executed:

# Read the lines 5764-5786 to see the full context of the issue
sed -n '5760,5790p' crates/vm/src/stdlib/io.rs

Repository: RustPython/RustPython

Length of output: 1218


🏁 Script executed:

# Also check around line 5767 mentioned in the comment
sed -n '5760,5800p' crates/vm/src/stdlib/io.rs

Repository: RustPython/RustPython

Length of output: 1533


🏁 Script executed:

# Search for crt_fd module usage and definition
rg "crt_fd" crates/vm/src/stdlib/io.rs | head -20

Repository: RustPython/RustPython

Length of output: 859


🏁 Script executed:

# Check the Rust std::io::Read documentation behavior - search for how read_to_end is used elsewhere
rg "read_to_end" crates/vm/src/stdlib/io.rs

Repository: RustPython/RustPython

Length of output: 114


🏁 Script executed:

# Look for crt_fd module definition
fd "crt_fd" crates/

Repository: RustPython/RustPython

Length of output: 92


🏁 Script executed:

# Check if there's a read function being used with crt_fd elsewhere
rg "crt_fd::read" crates/vm/src/stdlib/io.rs

Repository: RustPython/RustPython

Length of output: 222


🏁 Script executed:

# Search for DEFAULT_BUFFER_SIZE constant
rg "DEFAULT_BUFFER_SIZE" crates/vm/src/stdlib/io.rs | head -5

Repository: RustPython/RustPython

Length of output: 360


🏁 Script executed:

# Check the crt_fd module to understand the read function
cat crates/common/src/crt_fd.rs

Repository: RustPython/RustPython

Length of output: 9961


🏁 Script executed:

# Look for the context where crt_fd::read is used in that loop
rg -B5 -A5 "crt_fd::read\(handle" crates/vm/src/stdlib/io.rs

Repository: RustPython/RustPython

Length of output: 1070


Replace read_to_end() with explicit crt_fd::read() loop to preserve signal checking on EINTR.

The current code uses std::io::Read::read_to_end(), which internally retries on EINTR, preventing the outer error handler from triggering vm.check_signals(). This breaks the unbounded-read path (size < 0) signal handling.

Use a chunked read loop with crt_fd::read() instead, which surfaces EINTR to the error handler:

Replace read_to_end() with chunked crt_fd::read() loop
-                let mut bytes = vec![];
-                // Loop on EINTR (PEP 475)
-                loop {
-                    match vm.allow_threads(|| {
-                        let mut h = handle;
-                        h.read_to_end(&mut bytes)
-                    }) {
-                        Ok(_) => break,
-                        Err(e) if e.raw_os_error() == Some(libc::EINTR) => {
-                            vm.check_signals()?;
-                            continue;
-                        }
-                        // Non-blocking mode: return None if EAGAIN (only if no data read yet)
-                        Err(e) if e.raw_os_error() == Some(libc::EAGAIN) => {
-                            if bytes.is_empty() {
-                                return Ok(None);
-                            }
-                            break;
-                        }
-                        Err(e) => return Err(Self::io_error(zelf, e, vm)),
-                    }
-                }
+                let mut bytes = vec![];
+                loop {
+                    let mut chunk = vec![0; super::DEFAULT_BUFFER_SIZE];
+                    match vm.allow_threads(|| crt_fd::read(handle, &mut chunk)) {
+                        Ok(0) => break,
+                        Ok(n) => bytes.extend_from_slice(&chunk[..n]),
+                        Err(e) if e.raw_os_error() == Some(libc::EINTR) => {
+                            vm.check_signals()?;
+                            continue;
+                        }
+                        Err(e) if e.raw_os_error() == Some(libc::EAGAIN) => {
+                            if bytes.is_empty() {
+                                return Ok(None);
+                            }
+                            break;
+                        }
+                        Err(e) => return Err(Self::io_error(zelf, e, vm)),
+                    }
+                }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/io.rs` at line 5339, The code currently uses
std::io::Read::read_to_end(), which hides EINTR and prevents vm.check_signals()
from running on the unbounded-read path (size < 0); replace that call with an
explicit chunked loop that uses crt_fd::read() to fill a Vec<u8> (read
fixed-size chunks, e.g., 8KiB), append each successful chunk to the buffer,
break on EOF (0 bytes), and on read errors return Err so EINTR is propagated to
the outer error handler; make sure to reference and replace the read_to_end()
call site and keep vm.check_signals() semantics by letting crt_fd::read()
surface EINTR instead of swallowing it.

Comment on lines +334 to +343
let initial_countdown = self.init_thread_countdown(vm);
stw_trace(format_args!("stop begin requester={requester_ident}"));
if initial_countdown == 0 {
self.world_stopped.store(true, Ordering::Release);
#[cfg(debug_assertions)]
self.debug_assert_all_non_requester_suspended(vm);
stw_trace(format_args!(
"stop end requester={requester_ident} wait_ns=0 polls=0"
));
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reset last_wait_ns on the zero-thread fast path.

When Lines 336-343 return early, stats_last_wait_ns is left unchanged, so stats_snapshot() can report the previous stop's wait time after a no-wait stop.

Proposed fix
         let initial_countdown = self.init_thread_countdown(vm);
         stw_trace(format_args!("stop begin requester={requester_ident}"));
         if initial_countdown == 0 {
+            self.stats_last_wait_ns.store(0, Ordering::Relaxed);
             self.world_stopped.store(true, Ordering::Release);
             #[cfg(debug_assertions)]
             self.debug_assert_all_non_requester_suspended(vm);
             stw_trace(format_args!(
                 "stop end requester={requester_ident} wait_ns=0 polls=0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let initial_countdown = self.init_thread_countdown(vm);
stw_trace(format_args!("stop begin requester={requester_ident}"));
if initial_countdown == 0 {
self.world_stopped.store(true, Ordering::Release);
#[cfg(debug_assertions)]
self.debug_assert_all_non_requester_suspended(vm);
stw_trace(format_args!(
"stop end requester={requester_ident} wait_ns=0 polls=0"
));
return;
let initial_countdown = self.init_thread_countdown(vm);
stw_trace(format_args!("stop begin requester={requester_ident}"));
if initial_countdown == 0 {
self.stats_last_wait_ns.store(0, Ordering::Relaxed);
self.world_stopped.store(true, Ordering::Release);
#[cfg(debug_assertions)]
self.debug_assert_all_non_requester_suspended(vm);
stw_trace(format_args!(
"stop end requester={requester_ident} wait_ns=0 polls=0"
));
return;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 334 - 343, The fast-return path when
initial_countdown == 0 leaves stats_last_wait_ns unchanged; update the code in
the branch (inside the block after self.world_stopped.store(...) and before
returning) to reset the VM's last-wait statistic (the atomic backing field used
by stats_last_wait_ns/stats_snapshot) to zero using the same atomic semantics
used elsewhere (e.g., an atomic store with appropriate Ordering), so
stats_snapshot() does not report the previous stop's wait time after a no-wait
stop; locate the change around init_thread_countdown and world_stopped in stop()
in vm::mod.rs.

Comment on lines +1931 to +1949
#[inline]
pub(crate) fn eval_breaker_tripped(&self) -> bool {
#[cfg(feature = "threading")]
if self.state.finalizing.load(Ordering::Relaxed) && !self.is_main_thread() {
return true;
}

#[cfg(all(unix, feature = "threading"))]
if thread::stop_requested_for_current_thread() {
return true;
}

#[cfg(not(target_arch = "wasm32"))]
if crate::signal::is_triggered() {
return true;
}

false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== eval_breaker_tripped call sites ==\n'
rg -n -C2 '\beval_breaker_tripped\s*\(' crates/vm/src

printf '\n== finalizing atomic loads/stores ==\n'
rg -n -C2 'finalizing\.(load|store)\(Ordering::' crates/vm/src

printf '\n== stop_requested atomic loads/stores ==\n'
rg -n -C2 'stop_requested\.(load|store)\(Ordering::' crates/vm/src/vm

printf '\n== current-thread stop_requested helper ==\n'
rg -n -C3 '\bstop_requested_for_current_thread\s*\(' crates/vm/src/vm/thread.rs

Repository: RustPython/RustPython

Length of output: 3615


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== finalizing atomic stores (write side) ==\n'
rg -n -C3 'finalizing\.store\(Ordering::' crates/vm/src

printf '\n== finalizing field initialization ==\n'
rg -n -B2 -A2 'finalizing\s*[:=].*Atomic' crates/vm/src

printf '\n== context around finalizing in state struct ==\n'
rg -n -B5 -A5 'struct.*VmState\|finalizing:' crates/vm/src/vm/mod.rs | head -50

printf '\n== check_signals context (lines around 1954-1970) ==\n'
sed -n '1950,1975p' crates/vm/src/vm/mod.rs

Repository: RustPython/RustPython

Length of output: 109


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== VmState struct definition ==\n'
rg -n -B2 -A15 'pub.*struct VmState' crates/vm/src/vm/mod.rs

printf '\n== All references to .finalizing ==\n'
rg -n '\.finalizing' crates/vm/src/vm/mod.rs | head -20

printf '\n== frame.rs context around eval_breaker_tripped call ==\n'
sed -n '1195,1215p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 97


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Find files containing VmState ==\n'
rg -l 'struct VmState' crates/vm/src

printf '\n== Search for VmState definition more broadly ==\n'
rg -n 'struct VmState' crates/vm/src

printf '\n== Get frame.rs lines 1195-1215 ==\n'
cat -n crates/vm/src/frame.rs | sed -n '1195,1215p'

printf '\n== Search for finalizing field definition ==\n'
rg -n 'finalizing.*AtomicBool\|finalizing.*Atomic' crates/vm/src

Repository: RustPython/RustPython

Length of output: 101


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Check if files exist ==\n'
ls -la crates/vm/src/vm/mod.rs crates/vm/src/frame.rs

printf '\n== View lines 1930-1950 of vm/mod.rs ==\n'
sed -n '1930,1950p' crates/vm/src/vm/mod.rs

printf '\n== View lines 1950-1975 of vm/mod.rs ==\n'
sed -n '1950,1975p' crates/vm/src/vm/mod.rs

printf '\n== View lines 1195,1215 of frame.rs ==\n'
sed -n '1195,1215p' crates/vm/src/frame.rs

printf '\n== Simple search for finalizing in vm/mod.rs ==\n'
grep -n 'finalizing' crates/vm/src/vm/mod.rs | head -15

Repository: RustPython/RustPython

Length of output: 2988


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Search for finalizing assignments/modifications ==\n'
grep -rn 'finalizing\s*=' crates/vm/src --include='*.rs' | grep -v '//' | head -20

printf '\n== Search for finalize or finalization functions that might set finalizing ==\n'
grep -rn 'fn.*finali' crates/vm/src/vm/mod.rs | head -10

printf '\n== Get context around line 581 (field definition) ==\n'
sed -n '575,595p' crates/vm/src/vm/mod.rs

printf '\n== Look for Release orderings related to finalizing ==\n'
grep -rn 'Release\|Release,' crates/vm/src/vm/mod.rs | grep -i finali

Repository: RustPython/RustPython

Length of output: 119


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Get struct VmState definition around line 581 ==\n'
sed -n '550,620p' crates/vm/src/vm/mod.rs

printf '\n== Search for new() or initialization of VmState ==\n'
grep -n 'fn new\|impl.*VmState' crates/vm/src/vm/mod.rs | head -10

printf '\n== Get broader context - search for any function containing finalizing ==\n'
sed -n '1075,1090p' crates/vm/src/vm/mod.rs

printf '\n== Look for finalization methods ==\n'
sed -n '1000,1100p' crates/vm/src/vm/mod.rs | grep -n -B2 -A2 'finali'

Repository: RustPython/RustPython

Length of output: 4755


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Search for store operations on finalizing ==\n'
grep -n 'finalizing\.store\|finalizing\.swap' crates/vm/src --include='*.rs' -r

printf '\n== Search for any assignment to finalizing ==\n'
grep -n 'finalizing' crates/vm/src --include='*.rs' -r | grep -v 'load\|//'

printf '\n== Check thread.rs for stop_the_world and synchronization ==\n'
sed -n '345,365p' crates/vm/src/vm/thread.rs

printf '\n== Look for Python shutdown/finalization code ==\n'
grep -n 'finalize\|shutdown' crates/vm/src/stdlib/sys.rs | head -15

Repository: RustPython/RustPython

Length of output: 4447


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== Context around Release store at interpreter.rs:433 ==\n'
sed -n '420,450p' crates/vm/src/vm/interpreter.rs

printf '\n== Check what triggers the shutdown sequence ==\n'
sed -n '400,440p' crates/vm/src/vm/interpreter.rs

printf '\n== Look for comments about memory ordering ==\n'
grep -n -B3 -A3 'Relaxed\|Acquire\|Release\|memory\|ordering' crates/vm/src/vm/mod.rs | head -40

Repository: RustPython/RustPython

Length of output: 4937


Review the memory ordering on the eval_breaker_tripped() fast-path gate.

eval_breaker_tripped() at line 1934 uses Ordering::Relaxed to read finalizing, but this check gates the only path to check_signals() (frame.rs:1202-1203), which uses Ordering::Acquire. Since a Relaxed load returning false prevents the Acquire load from ever executing, the Release→Acquire synchronization pair from the shutdown write (interpreter.rs:433) is broken on weak memory models. A stale Relaxed read can cause non-main threads to miss finalization for an indeterminate period, delaying stop-the-world visibility. Consider using Ordering::Acquire in the hot-path gate or restructuring to ensure the Acquire read always executes after finalization is signaled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 1931 - 1949, The fast-path in
eval_breaker_tripped() performs a Relaxed load of state.finalizing which can
bypass the later Acquire load in check_signals(), breaking the Release→Acquire
synchronization from the shutdown write; change the atomic load on
state.finalizing in eval_breaker_tripped() to Ordering::Acquire (or otherwise
ensure an Acquire-load or appropriate fence is executed when checking
finalizing) so that the shutdown write in interpreter.rs (Release) becomes
visible on other threads; references: eval_breaker_tripped, state.finalizing,
check_signals, and the shutdown write in interpreter.rs.

Comment on lines 247 to +252
pub fn suspend_if_needed(stw: &super::StopTheWorldState) {
if !stw.requested.load(Ordering::Relaxed) {
let should_suspend = CURRENT_THREAD_SLOT.with(|slot| {
slot.borrow()
.as_ref()
.is_some_and(|s| s.stop_requested.load(Ordering::Relaxed))
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n crates/vm/src/vm/thread.rs | sed -n '240,260p'

Repository: RustPython/RustPython

Length of output: 951


🏁 Script executed:

cat -n crates/vm/src/vm/thread.rs | sed -n '340,360p'

Repository: RustPython/RustPython

Length of output: 882


🏁 Script executed:

rg "stop_requested" crates/vm/src/vm/ -A 2 -B 2

Repository: RustPython/RustPython

Length of output: 5985


🏁 Script executed:

cat -n crates/vm/src/vm/mod.rs | sed -n '250,330p'

Repository: RustPython/RustPython

Length of output: 4299


🏁 Script executed:

rg "eval_breaker_tripped" crates/vm/src/vm/ -A 3 -B 1

Repository: RustPython/RustPython

Length of output: 402


🏁 Script executed:

cat -n crates/vm/src/vm/mod.rs | sed -n '470,495p'

Repository: RustPython/RustPython

Length of output: 1066


🏁 Script executed:

rg "eval_breaker_tripped" crates/vm/src/vm/mod.rs -A 10

Repository: RustPython/RustPython

Length of output: 421


🏁 Script executed:

rg "check_signals" crates/vm/src/ -l

Repository: RustPython/RustPython

Length of output: 306


🏁 Script executed:

fd "codeobject\|ceval" crates/vm/src/ --extension rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

rg "check_signals" crates/vm/src/vm/mod.rs -B 2 -A 5

Repository: RustPython/RustPython

Length of output: 1238


Use Acquire for the stop-bit loads to ensure proper synchronization.

Lines 251 and 352 read stop_requested with Ordering::Relaxed, but park_detached_threads() publishes that bit with Ordering::Release in crates/vm/src/vm/mod.rs (lines 281, 304, and elsewhere). Since eval_breaker_tripped() gates suspension on these reads, stale loads can leave a requested thread running past stop-the-world.

Minimal ordering fix
-            .is_some_and(|s| s.stop_requested.load(Ordering::Relaxed))
+            .is_some_and(|s| s.stop_requested.load(Ordering::Acquire))

Apply the same change to both suspend_if_needed() (line 251) and stop_requested_for_current_thread() (line 352).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn suspend_if_needed(stw: &super::StopTheWorldState) {
if !stw.requested.load(Ordering::Relaxed) {
let should_suspend = CURRENT_THREAD_SLOT.with(|slot| {
slot.borrow()
.as_ref()
.is_some_and(|s| s.stop_requested.load(Ordering::Relaxed))
});
pub fn suspend_if_needed(stw: &super::StopTheWorldState) {
let should_suspend = CURRENT_THREAD_SLOT.with(|slot| {
slot.borrow()
.as_ref()
.is_some_and(|s| s.stop_requested.load(Ordering::Acquire))
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/thread.rs` around lines 247 - 252, Reads of the stop-request
bit in suspend_if_needed and stop_requested_for_current_thread currently use
Ordering::Relaxed; change those .load(Ordering::Relaxed) calls on the
stop_requested AtomicBool to .load(Ordering::Acquire) so they synchronize with
park_detached_threads() which publishes the bit with Ordering::Release (ensuring
eval_breaker_tripped() sees the update and threads suspend correctly).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1210-1247: The join_internal helper currently enforces public join
validations (calls check_started and checks VM finalization/current-ident) which
breaks callers from _shutdown; update join_internal to accept an internal flag
(e.g., for_shutdown: bool) or a separate private helper so _shutdown can bypass
public-only checks: remove or guard the check_started(inner, vm) call and the
finalization/current-thread checks inside join_internal (references:
join_internal, check_started, _shutdown, ThreadHandle_join, get_ident,
ThreadHandleState::Running, vm.state.finalizing) and ensure the public join()
wrapper still performs check_started and finalization checks before delegating
to join_internal(for_shutdown=false) while _shutdown calls
join_internal(for_shutdown=true) to skip those validations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 300d9b9c-9e25-44bf-b7b6-e9cd4c933f3a

📥 Commits

Reviewing files that changed from the base of the PR and between b29c3b8 and dc18133.

📒 Files selected for processing (1)
  • crates/vm/src/stdlib/thread.rs

Comment on lines +1210 to +1247
fn join_internal(
inner: &Arc<parking_lot::Mutex<ThreadHandleInner>>,
done_event: &Arc<(parking_lot::Mutex<bool>, parking_lot::Condvar)>,
timeout_duration: Option<Duration>,
vm: &VirtualMachine,
) -> PyResult<()> {
// Convert timeout to Duration (None or negative = infinite wait)
let timeout_duration = match timeout.flatten() {
Some(Either::A(t)) if t >= 0.0 => Some(Duration::from_secs_f64(t)),
Some(Either::B(t)) if t >= 0 => Some(Duration::from_secs(t as u64)),
_ => None,
};
Self::check_started(inner, vm)?;

// Check for self-join first
{
let inner = self.inner.lock();
let current_ident = get_ident();
if inner.ident == current_ident && inner.state == ThreadHandleState::Running {
return Err(vm.new_runtime_error("cannot join current thread"));
}
}
let deadline =
timeout_duration.and_then(|timeout| std::time::Instant::now().checked_add(timeout));

// Wait for thread completion using Condvar (supports timeout)
// Loop to handle spurious wakeups
let (lock, cvar) = &*self.done_event;
let (lock, cvar) = &**done_event;
let mut done = lock.lock();

// ThreadHandle_join semantics: self-join/finalizing checks
// apply only while target thread has not reported it is exiting yet.
if !*done {
let inner_guard = inner.lock();
let current_ident = get_ident();
if inner_guard.ident == current_ident
&& inner_guard.state == ThreadHandleState::Running
{
return Err(vm.new_runtime_error("Cannot join current thread"));
}
if vm
.state
.finalizing
.load(core::sync::atomic::Ordering::Acquire)
{
return Err(vm.new_exception_msg(
vm.ctx.exceptions.python_finalization_error.to_owned(),
"cannot join thread at interpreter shutdown"
.to_owned()
.into(),
));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Split public join() validation from the internal shutdown wait path.

Because _shutdown() now goes through this helper, the checks at Lines 1216-1247 and Lines 1318-1324 are no longer just public-API validation. _shutdown() can legitimately see Starting handles, and once finalization is active this helper will raise instead of waiting. That sends _shutdown() down the run_unraisable() path and it returns without joining the remaining non-daemon threads. Keep check_started() / finalization validation in the public join() wrapper, or add an internal mode that skips them for shutdown callers.

Also applies to: 1314-1325

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 1210 - 1247, The join_internal
helper currently enforces public join validations (calls check_started and
checks VM finalization/current-ident) which breaks callers from _shutdown;
update join_internal to accept an internal flag (e.g., for_shutdown: bool) or a
separate private helper so _shutdown can bypass public-only checks: remove or
guard the check_started(inner, vm) call and the finalization/current-thread
checks inside join_internal (references: join_internal, check_started,
_shutdown, ThreadHandle_join, get_ident, ThreadHandleState::Running,
vm.state.finalizing) and ensure the public join() wrapper still performs
check_started and finalization checks before delegating to
join_internal(for_shutdown=false) while _shutdown calls
join_internal(for_shutdown=true) to skip those validations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
crates/vm/src/stdlib/thread.rs (1)

1210-1247: ⚠️ Potential issue | 🔴 Critical

Keep _shutdown() off the public join() validation path.

join_internal() still runs check_started() and the finalization guard even when _shutdown() is the caller. That means shutdown can legitimately hit Starting / NotStarted handles from the registry, or trip the finalizing check, and then _shutdown() logs the exception and returns early instead of waiting for the remaining non-daemon threads. The public checks need to stay in join(), or join_internal() needs an internal mode that skips them for shutdown/fork cleanup callers.

Possible direction
-        fn join_internal(
+        fn join_internal(
             inner: &Arc<parking_lot::Mutex<ThreadHandleInner>>,
             done_event: &Arc<(parking_lot::Mutex<bool>, parking_lot::Condvar)>,
             timeout_duration: Option<Duration>,
+            public_checks: bool,
             vm: &VirtualMachine,
         ) -> PyResult<()> {
-            Self::check_started(inner, vm)?;
+            if public_checks {
+                Self::check_started(inner, vm)?;
+            }

             let deadline =
                 timeout_duration.and_then(|timeout| std::time::Instant::now().checked_add(timeout));

             let (lock, cvar) = &**done_event;
             let mut done = lock.lock();

-            if !*done {
+            if public_checks && !*done {
                 let inner_guard = inner.lock();
                 let current_ident = get_ident();
                 if inner_guard.ident == current_ident
                     && inner_guard.state == ThreadHandleState::Running
                 {
                     return Err(vm.new_runtime_error("Cannot join current thread"));
                 }
                 if vm.state.finalizing.load(core::sync::atomic::Ordering::Acquire) {
                     return Err(...);
                 }
             }
-            Self::join_internal(&self.inner, &self.done_event, timeout_duration, vm)
+            Self::join_internal(&self.inner, &self.done_event, timeout_duration, true, vm)
-                    if let Err(exc) = ThreadHandle::join_internal(&inner, &done_event, None, vm) {
+                    if let Err(exc) =
+                        ThreadHandle::join_internal(&inner, &done_event, None, false, vm)
+                    {

Also applies to: 1314-1325

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 1210 - 1247, The join_internal()
path runs public validations (check_started() and the vm.state.finalizing guard)
even when invoked from internal shutdown/fork cleanup, so change the call
boundary: move the public checks back into the public join() method and/or add
an internal-only mode to join_internal() (e.g., add a bool flag like
skip_public_checks or is_shutdown_call to join_internal(inner, done_event,
timeout_duration, vm, skip_public_checks: bool)), have join() call
join_internal(..., false) and _shutdown()/fork cleanup call join_internal(...,
true), and inside join_internal() skip check_started() and the finalizing check
when skip_public_checks is true; update all callers (including _shutdown()) to
use the new flag and keep ThreadHandleState/ThreadHandle_join semantics intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1210-1247: The join_internal() path runs public validations
(check_started() and the vm.state.finalizing guard) even when invoked from
internal shutdown/fork cleanup, so change the call boundary: move the public
checks back into the public join() method and/or add an internal-only mode to
join_internal() (e.g., add a bool flag like skip_public_checks or
is_shutdown_call to join_internal(inner, done_event, timeout_duration, vm,
skip_public_checks: bool)), have join() call join_internal(..., false) and
_shutdown()/fork cleanup call join_internal(..., true), and inside
join_internal() skip check_started() and the finalizing check when
skip_public_checks is true; update all callers (including _shutdown()) to use
the new flag and keep ThreadHandleState/ThreadHandle_join semantics intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: de154942-d61b-4972-bdb4-43be2e906d06

📥 Commits

Reviewing files that changed from the base of the PR and between dc18133 and d36a0b1.

📒 Files selected for processing (1)
  • crates/vm/src/stdlib/thread.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
crates/vm/src/stdlib/thread.rs (1)

1211-1249: ⚠️ Potential issue | 🔴 Critical

Keep shutdown waits off the public join() validation path.

join_internal() still rejects Starting handles at Line 1217 and still raises PythonFinalizationError at Lines 1237-1248. _shutdown() legitimately hits both states while draining non-daemon threads, so this helper can still abort shutdown early and leave threads unjoined. Move those checks into the public join() wrapper, or add an internal for_shutdown mode that skips them.

Refactor sketch
-        fn join_internal(
+        fn join_internal(
             inner: &Arc<parking_lot::Mutex<ThreadHandleInner>>,
             done_event: &Arc<(parking_lot::Mutex<bool>, parking_lot::Condvar)>,
             timeout_duration: Option<Duration>,
+            for_shutdown: bool,
             vm: &VirtualMachine,
         ) -> PyResult<()> {
-            Self::check_started(inner, vm)?;
+            if !for_shutdown {
+                Self::check_started(inner, vm)?;
+            }

             let deadline =
                 timeout_duration.and_then(|timeout| std::time::Instant::now().checked_add(timeout));

             let (lock, cvar) = &**done_event;
             let mut done = lock.lock();

-            if !*done {
+            if !*done && !for_shutdown {
                 let inner_guard = inner.lock();
                 let current_ident = get_ident();
                 if inner_guard.ident == current_ident
                     && inner_guard.state == ThreadHandleState::Running
                 {
@@
-            Self::join_internal(&self.inner, &self.done_event, timeout_duration, vm)
+            Self::check_started(&self.inner, vm)?;
+            Self::join_internal(&self.inner, &self.done_event, timeout_duration, false, vm)
-                    if let Err(exc) = ThreadHandle::join_internal(&inner, &done_event, None, vm) {
+                    if let Err(exc) =
+                        ThreadHandle::join_internal(&inner, &done_event, None, true, vm)
+                    {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 1211 - 1249, join_internal
currently enforces "Starting" and interpreter-finalizing checks (via
Self::check_started and the PythonFinalizationError branch) which causes
_shutdown to abort while draining threads; modify join_internal to accept a
for_shutdown: bool parameter (or remove those checks here) and move the
Starting-state verification and the vm.state.finalizing check into the public
join() wrapper so normal joins still validate but calls from _shutdown can call
join_internal(..., for_shutdown=true) (or call a new internal no-validate
helper) to skip those two checks; update all callers (public join() and
_shutdown) to call the appropriate variant and ensure unique symbols referenced:
join_internal, check_started, ThreadHandle_join semantic check, and _shutdown
are adjusted accordingly.
crates/vm/src/vm/mod.rs (2)

1932-1937: ⚠️ Potential issue | 🔴 Critical

Use Acquire for the finalization gate in eval_breaker_tripped().

Line 1935 still does a Relaxed read even though this branch decides whether check_signals() runs at all. A stale false here skips the later Acquire load in check_signals(), so non-main threads can miss finalization on weak memory models.

Proposed fix
         #[cfg(feature = "threading")]
-        if self.state.finalizing.load(Ordering::Relaxed) && !self.is_main_thread() {
+        if self.state.finalizing.load(Ordering::Acquire) && !self.is_main_thread() {
             return true;
         }
#!/bin/bash
set -euo pipefail

printf '\n== eval_breaker_tripped ==\n'
sed -n '1932,1958p' crates/vm/src/vm/mod.rs

printf '\n== bytecode-loop gate ==\n'
sed -n '1198,1206p' crates/vm/src/frame.rs

printf '\n== finalizing loads/stores ==\n'
rg -n -C2 'finalizing\.(load|store)\(' crates/vm/src
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 1932 - 1937, The read of the
finalization gate in eval_breaker_tripped() uses Ordering::Relaxed which can let
a stale false skip check_signals() on weak memory models; change the load to use
Ordering::Acquire (i.e., replace state.finalizing.load(Ordering::Relaxed) with
state.finalizing.load(Ordering::Acquire)) so that the subsequent
check_signals()'s synchronization is not bypassed, leaving all other logic in
eval_breaker_tripped() and callers like check_signals() unchanged.

332-339: ⚠️ Potential issue | 🟡 Minor

Reset stats_last_wait_ns on the zero-thread fast path.

When Line 332 returns immediately, stats_snapshot() keeps reporting the previous stop's wait time even though this stop waited 0ns. Clear stats_last_wait_ns before returning.

Proposed fix
         if initial_countdown == 0 {
+            self.stats_last_wait_ns.store(0, Ordering::Relaxed);
             self.world_stopped.store(true, Ordering::Release);
             #[cfg(debug_assertions)]
             self.debug_assert_all_non_requester_suspended(vm);
             stw_trace(format_args!(
                 "stop end requester={requester_ident} wait_ns=0 polls=0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 332 - 339, The zero-thread fast path
returns without resetting the last-stop duration, so stats_snapshot() reports
stale wait time; in the initial_countdown == 0 branch reset
self.stats_last_wait_ns to zero (use the same atomic store pattern and Ordering
used elsewhere in this module) before the stw_trace/return (after
debug_assert_all_non_requester_suspended(vm) if present) so the reported wait_ns
is correctly 0 for this stop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 994-997: The code currently drops the result of
crate::stdlib::warnings::warn (using let _ = ...) which prevents
warning-as-error filters from taking effect; change the call site to propagate
the PyResult by removing the discard and using the try operator (i.e. return the
warn() result), change warn_if_multi_threaded to return PyResult<()> (update its
signature and return points) and propagate that result at each call site (e.g.,
where warn_if_multi_threaded("fork", num_os_threads, vm) is invoked), and update
the surrounding comment to state that CPython propagates warning errors rather
than silently ignoring them.

---

Duplicate comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 1211-1249: join_internal currently enforces "Starting" and
interpreter-finalizing checks (via Self::check_started and the
PythonFinalizationError branch) which causes _shutdown to abort while draining
threads; modify join_internal to accept a for_shutdown: bool parameter (or
remove those checks here) and move the Starting-state verification and the
vm.state.finalizing check into the public join() wrapper so normal joins still
validate but calls from _shutdown can call join_internal(..., for_shutdown=true)
(or call a new internal no-validate helper) to skip those two checks; update all
callers (public join() and _shutdown) to call the appropriate variant and ensure
unique symbols referenced: join_internal, check_started, ThreadHandle_join
semantic check, and _shutdown are adjusted accordingly.

In `@crates/vm/src/vm/mod.rs`:
- Around line 1932-1937: The read of the finalization gate in
eval_breaker_tripped() uses Ordering::Relaxed which can let a stale false skip
check_signals() on weak memory models; change the load to use Ordering::Acquire
(i.e., replace state.finalizing.load(Ordering::Relaxed) with
state.finalizing.load(Ordering::Acquire)) so that the subsequent
check_signals()'s synchronization is not bypassed, leaving all other logic in
eval_breaker_tripped() and callers like check_signals() unchanged.
- Around line 332-339: The zero-thread fast path returns without resetting the
last-stop duration, so stats_snapshot() reports stale wait time; in the
initial_countdown == 0 branch reset self.stats_last_wait_ns to zero (use the
same atomic store pattern and Ordering used elsewhere in this module) before the
stw_trace/return (after debug_assert_all_non_requester_suspended(vm) if present)
so the reported wait_ns is correctly 0 for this stop.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: a2aa3b12-c448-410a-8e2d-4b744c66252d

📥 Commits

Reviewing files that changed from the base of the PR and between d36a0b1 and 026eb24.

📒 Files selected for processing (3)
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/vm/mod.rs

Comment on lines +994 to +997
// Match PyErr_WarnFormat(..., stacklevel=1) in CPython.
// Best effort: ignore failures like CPython does in this path.
let _ =
crate::stdlib::warnings::warn(vm.ctx.exceptions.deprecation_warning, msg, 1, vm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the relevant section of posix.rs
wc -l crates/vm/src/stdlib/posix.rs

Repository: RustPython/RustPython

Length of output: 99


🏁 Script executed:

# Read the specific lines mentioned in the review
sed -n '990,1010p' crates/vm/src/stdlib/posix.rs

Repository: RustPython/RustPython

Length of output: 866


🏁 Script executed:

# Find the warn_if_multi_threaded function definition to see its current signature
rg -n "fn warn_if_multi_threaded" crates/vm/src/stdlib/posix.rs -A 10

Repository: RustPython/RustPython

Length of output: 689


🏁 Script executed:

# Find the warnings::warn function to verify its return type
rg -n "fn warn" crates/vm/src/stdlib/warnings.rs -A 5

Repository: RustPython/RustPython

Length of output: 820


🏁 Script executed:

# Check where warn_if_multi_threaded is called to understand the context
rg -n "warn_if_multi_threaded" crates/vm/src/stdlib/posix.rs

Repository: RustPython/RustPython

Length of output: 225


🏁 Script executed:

# Search for any related discussions or TODOs about fork warning handling
rg -i "cpython|fork" crates/vm/src/stdlib/posix.rs -B 2 -A 2 | grep -A 5 -B 5 "warn\|error"

Repository: RustPython/RustPython

Length of output: 1656


🏁 Script executed:

# Check if there are any tests for the fork warning behavior
fd -e rs | xargs rg "test.*fork\|fork.*test" -l | head -5

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Let's check the broader context of the fork function to understand the error handling strategy
sed -n '1025,1060p' crates/vm/src/stdlib/posix.rs

Repository: RustPython/RustPython

Length of output: 1241


🏁 Script executed:

# Check if there are any other similar patterns where PyResult is dropped in critical functions
rg "let _ = .*warn" crates/vm/src/stdlib/posix.rs -B 3 -A 3

Repository: RustPython/RustPython

Length of output: 47


🌐 Web query:

CPython fork() warning error handling deprecation posixmodule.c

💡 Result:

In CPython, the “fork() in a multi-threaded process” DeprecationWarning is implemented in Modules/posixmodule.c (helper often named like warn_about_fork_with_threads()), and it’s emitted when CPython can detect more than 1 thread in the current process:

  • It tries to detect the thread count via OS-specific means (e.g., Linux /proc/self/stat thread field), and if that fails it falls back to counting what the threading module knows about (threading._active + threading._limbo). (raw.githubusercontent.com)
  • If num_threads > 1, it calls PyErr_WarnFormat(PyExc_DeprecationWarning, ...) with the message:
    “This process (pid=...) is multi-threaded, use of fork() may lead to deadlocks in the child.” (raw.githubusercontent.com)
  • This behavior is documented as a Python 3.12 change (“os.fork() can now raise a DeprecationWarning…; use a different multiprocessing start method like spawn or forkserver”). (docs.python.org)

The “warning vs error” handling difference (why it sometimes doesn’t fail under -Werror)

There has been a real behavioral difference across versions/branches in how the C code handles the return/exception state from PyErr_WarnFormat:

  • In the 3.12.0 code, after issuing the warning, CPython clears the warning/error indicator unconditionally (PyErr_Clear()), meaning that even if your warnings filter turns DeprecationWarning into an exception (e.g. -W error), the exception state can get swallowed inside posixmodule.c rather than propagating to Python code. (raw.githubusercontent.com)
  • In current main, the helper returns the result of PyErr_WarnFormat(...) to the caller (instead of clearing it), allowing “warnings as errors” configurations to actually behave like errors. (raw.githubusercontent.com)

Practical takeaway

  • If you want this to be fatal in CI: be aware that some released versions (notably early 3.12 code paths) may clear the exception state in C and thus won’t reliably fail with -W error in all scenarios. (raw.githubusercontent.com)
  • If you want to fix the root cause: don’t fork() from a multi-threaded process—switch to multiprocessing start methods (spawn/forkserver) or use subprocess/posix_spawn style process creation where applicable. (docs.python.org)

Citations:


Propagate warning errors instead of silently dropping them.

The current code uses let _ = to drop the PyResult from warnings::warn(), which prevents warning-as-error filters from working. If a user configures DeprecationWarning as an error via warning filters, os.fork() will still succeed instead of raising an exception. This diverges from CPython's current behavior, which propagates the warning result.

Suggested changes:

  1. Remove the error-dropping pattern and propagate the result:
crate::stdlib::warnings::warn(
    vm.ctx.exceptions.deprecation_warning,
    msg,
    1,
    vm,
)?;
  1. Update warn_if_multi_threaded to return PyResult<()> and propagate at the call site:
fn warn_if_multi_threaded(
    name: &str,
    num_os_threads: isize,
    vm: &VirtualMachine,
) -> PyResult<()> {
    // ...
}

warn_if_multi_threaded("fork", num_os_threads, vm)?;

Update the comment to accurately reflect the current CPython behavior (propagating rather than ignoring).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 994 - 997, The code currently
drops the result of crate::stdlib::warnings::warn (using let _ = ...) which
prevents warning-as-error filters from taking effect; change the call site to
propagate the PyResult by removing the discard and using the try operator (i.e.
return the warn() result), change warn_if_multi_threaded to return PyResult<()>
(update its signature and return points) and propagate that result at each call
site (e.g., where warn_if_multi_threaded("fork", num_os_threads, vm) is
invoked), and update the surrounding comment to state that CPython propagates
warning errors rather than silently ignoring them.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/vm/src/stdlib/posix.rs (1)

904-923: macOS: Calling task_for_pid on self is unnecessary overhead.

Using task_for_pid(mach_task_self(), getpid(), &mut proc_task) is redundant since mach_task_self() already returns the task port for the current process. Additionally, task_for_pid typically requires special entitlements (taskgated) when used for other processes, and while it works for self, it's an extra syscall.

♻️ Suggested simplification
-            let self_task = unsafe { mach_task_self() };
-            let mut proc_task: MachPortT = 0;
-            if unsafe { task_for_pid(self_task, libc::getpid(), &mut proc_task) } == KERN_SUCCESS {
-                let mut threads: ThreadActArrayT = core::ptr::null_mut();
-                let mut n_threads: MachMsgTypeNumberT = 0;
-                if unsafe { task_threads(proc_task, &mut threads, &mut n_threads) } == KERN_SUCCESS
+            let self_task = unsafe { mach_task_self() };
+            let mut threads: ThreadActArrayT = core::ptr::null_mut();
+            let mut n_threads: MachMsgTypeNumberT = 0;
+            if unsafe { task_threads(self_task, &mut threads, &mut n_threads) } == KERN_SUCCESS
+            {
+                if !threads.is_null() {
+                    let _ = unsafe {
+                        vm_deallocate(
+                            self_task,
+                            threads as libc::uintptr_t,
+                            (n_threads as usize * core::mem::size_of::<MachPortT>())
+                                as libc::uintptr_t,
+                        )
+                    };
+                }
+                return n_threads as isize;
+            }
+            0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 904 - 923, The code redundantly
calls task_for_pid(self_task, getpid(), &mut proc_task); replace that flow by
assigning proc_task = self_task (from mach_task_self()) and then call
task_threads(proc_task, &mut threads, &mut n_threads) as before; keep the
existing vm_deallocate call using self_task and threads, and return n_threads as
isize, removing the task_for_pid branch and its associated conditional while
preserving error handling around task_threads and vm_deallocate (symbols:
mach_task_self, proc_task, task_for_pid, task_threads, vm_deallocate,
n_threads).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 926-946: The current parsing of /proc/self/stat is fragile because
split_whitespace() will break when the comm field (inside parentheses) contains
spaces; instead, after reading the file and converting to string, find the index
of the last ')' in the line, take the substring after that character, then
split_whitespace() on that remainder and parse the desired field (the same field
you previously accessed with nth(19)) from the resultant iterator (preserve the
existing return behavior using parse::<isize>().unwrap_or(0)); replace the
direct line.split_whitespace().nth(19) usage with this "find last ')', take
substring after it, then split" approach to reliably locate the field even when
comm contains spaces.

---

Nitpick comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 904-923: The code redundantly calls task_for_pid(self_task,
getpid(), &mut proc_task); replace that flow by assigning proc_task = self_task
(from mach_task_self()) and then call task_threads(proc_task, &mut threads, &mut
n_threads) as before; keep the existing vm_deallocate call using self_task and
threads, and return n_threads as isize, removing the task_for_pid branch and its
associated conditional while preserving error handling around task_threads and
vm_deallocate (symbols: mach_task_self, proc_task, task_for_pid, task_threads,
vm_deallocate, n_threads).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: d8154526-8adb-41ea-84aa-3bb0378c4963

📥 Commits

Reviewing files that changed from the base of the PR and between 026eb24 and 939a1ff.

📒 Files selected for processing (1)
  • crates/vm/src/stdlib/posix.rs

Comment on lines +926 to +946
#[cfg(target_os = "linux")]
{
use std::io::Read as _;
let mut file = match std::fs::File::open("/proc/self/stat") {
Ok(f) => f,
Err(_) => return 0,
};
let mut buf = [0u8; 160];
let n = match file.read(&mut buf) {
Ok(n) => n,
Err(_) => return 0,
};
let line = match core::str::from_utf8(&buf[..n]) {
Ok(s) => s,
Err(_) => return 0,
};
if let Some(field) = line.split_whitespace().nth(19) {
return field.parse::<isize>().unwrap_or(0);
}
0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Linux: /proc/self/stat parsing is fragile when comm contains spaces.

The comm field (field 2 in /proc/self/stat) is enclosed in parentheses and can contain spaces, parentheses, or newlines. Using split_whitespace().nth(19) will yield incorrect results if the process name contains spaces.

For example, a process named "my app" produces:

123 (my app) S 456 ...

split_whitespace() splits this into ["123", "(my", "app)", "S", ...], making nth(19) point to the wrong field.

CPython handles this by finding the last ) to locate the end of the comm field, then counting fields from there.

🛡️ Suggested fix to handle comm field correctly
         #[cfg(target_os = "linux")]
         {
             use std::io::Read as _;
             let mut file = match std::fs::File::open("/proc/self/stat") {
                 Ok(f) => f,
                 Err(_) => return 0,
             };
             let mut buf = [0u8; 160];
             let n = match file.read(&mut buf) {
                 Ok(n) => n,
                 Err(_) => return 0,
             };
             let line = match core::str::from_utf8(&buf[..n]) {
                 Ok(s) => s,
                 Err(_) => return 0,
             };
-            if let Some(field) = line.split_whitespace().nth(19) {
-                return field.parse::<isize>().unwrap_or(0);
+            // Find end of comm field (last ')') to handle spaces in process name
+            let after_comm = match line.rfind(')') {
+                Some(idx) => &line[idx + 1..],
+                None => return 0,
+            };
+            // Field 20 (num_threads) is the 17th field after comm (fields 3-19 are 17 fields)
+            if let Some(field) = after_comm.split_whitespace().nth(17) {
+                return field.parse::<isize>().unwrap_or(0);
             }
             0
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 926 - 946, The current parsing of
/proc/self/stat is fragile because split_whitespace() will break when the comm
field (inside parentheses) contains spaces; instead, after reading the file and
converting to string, find the index of the last ')' in the line, take the
substring after that character, then split_whitespace() on that remainder and
parse the desired field (the same field you previously accessed with nth(19))
from the resultant iterator (preserve the existing return behavior using
parse::<isize>().unwrap_or(0)); replace the direct
line.split_whitespace().nth(19) usage with this "find last ')', take substring
after it, then split" approach to reliably locate the field even when comm
contains spaces.

- Fix park_detached_threads: successful CAS no longer sets
  all_suspended=false, avoiding unnecessary polling rounds
- Replace park_timeout(50µs) with park() in wait_while_suspended
- Remove redundant self-suspension in attach_thread and detach_thread;
  the STW controller handles DETACHED→SUSPENDED via park_detached_threads
- Add double-check under mutex before condvar wait to prevent lost wakes
- Remove dead stats_detach_wait_yields field and add_detach_wait_yields
Like CPython's ThreadHandle_start, set RUNNING state in the parent
thread immediately after spawn() succeeds, rather than in the child.
This eliminates a race where join() could see Starting state if called
before the child thread executes.

Also reverts the macOS skip for test_start_new_thread_failed since the
root cause is fixed.
Add lock_wrapped to ThreadMutex for detaching thread state
while waiting on contended locks. Use it for buffered and
text IO locks. Wrap FileIO read/write in allow_threads via
crt_fd to prevent STW hangs on blocking file operations.
Replace parking_lot Mutex/Condvar with std::sync (pthread-based)
for started_event and handle_ready_event. This prevents hangs
in forked children where parking_lot's global HASHTABLE may be
corrupted.
@youknowone youknowone merged commit 45d8129 into RustPython:main Mar 8, 2026
13 of 14 checks passed
@youknowone youknowone deleted the implock branch March 8, 2026 09:06
youknowone added a commit to youknowone/RustPython that referenced this pull request Mar 8, 2026
* apply more allow_threads

* Simplify STW thread state transitions

- Fix park_detached_threads: successful CAS no longer sets
  all_suspended=false, avoiding unnecessary polling rounds
- Replace park_timeout(50µs) with park() in wait_while_suspended
- Remove redundant self-suspension in attach_thread and detach_thread;
  the STW controller handles DETACHED→SUSPENDED via park_detached_threads
- Add double-check under mutex before condvar wait to prevent lost wakes
- Remove dead stats_detach_wait_yields field and add_detach_wait_yields

* Representable for ThreadHandle

* Set ThreadHandle state to Running in parent thread after spawn

Like CPython's ThreadHandle_start, set RUNNING state in the parent
thread immediately after spawn() succeeds, rather than in the child.
This eliminates a race where join() could see Starting state if called
before the child thread executes.

Also reverts the macOS skip for test_start_new_thread_failed since the
root cause is fixed.

* Set ThreadHandle state to Running in parent thread after spawn

* Add debug_assert for thread state in start_the_world

* Unskip now-passing test_get_event_loop_thread and test_start_new_thread_at_finalization

* Wrap IO locks and file ops in allow_threads

Add lock_wrapped to ThreadMutex for detaching thread state
while waiting on contended locks. Use it for buffered and
text IO locks. Wrap FileIO read/write in allow_threads via
crt_fd to prevent STW hangs on blocking file operations.

* Use std::sync for thread start/ready events

Replace parking_lot Mutex/Condvar with std::sync (pthread-based)
for started_event and handle_ready_event. This prevents hangs
in forked children where parking_lot's global HASHTABLE may be
corrupted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

X Tutup