Remove PyStr::as_str by youknowone · Pull Request #7218 · RustPython/RustPython

youknowone · 2026-02-24T12:05:21Z

Summary by CodeRabbit

Improvements
- Wide rollout of UTF‑8/WTF‑8 handling across the VM and standard library for safer, binary‑preserving text processing and improved non‑ASCII rendering.
- Many object reprs and I/O paths now correctly display surrogate and non‑ASCII content.
New Features
- UTF‑8 string types and VM helpers exposed to APIs for explicit UTF‑8 handling.
Bug Fixes
- Stricter string extraction yields earlier, clearer errors and reduces lossy conversions.

coderabbitai · 2026-02-24T12:05:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Large cross-crate migration to UTF‑8/WTF‑8 handling: adds Wtf8Concat/Wtf8Buf, PyUtf8Str (+interned), replaces many PyStr/PyStrRef/String usages with PyUtf8StrRef/Wtf8Buf, updates repr/format paths to repr_wtf8, and threads WTF‑8 through error/message, I/O, ctypes, codecs, and stdlib modules.

Changes

Cohort / File(s)	Summary
Core WTF‑8 infra `crates/wtf8/src/concat.rs`, `crates/wtf8/src/lib.rs`	Add Wtf8Concat trait, wtf8_concat! macro, Wtf8Buf utilities (join, case funcs) and export Wtf8Concat.
PyUtf8Str & str core `crates/common/src/str.rs`, `crates/vm/src/builtins/str.rs`, `crates/vm/src/intern.rs`, `crates/vm/src/vm/context.rs`, `crates/vm/src/vm/mod.rs`	Introduce PyUtf8Str and PyUtf8StrInterned, add as_wtf8/as_str/as_bytes accessors and new_utf8_str, change core string ops to use Wtf8/code_points.
Exception/message plumbing `crates/vm/src/vm/vm_new.rs`, `crates/vm/src/vm/buffer.rs`, `crates/vm/src/exceptions.rs`, `crates/vm/src/exception_group.rs`	Change exception constructors to accept Into/Wtf8Buf and update call sites to pass msg.into(); rework SyntaxError and exception rendering to use Wtf8Buf and code‑point logic.
Representations & formatting many files under `crates/vm/src/builtins/`, `crates/vm/src/stdlib/`, `crates/stdlib/src/*` (e.g., list/dict/set/tuple/slice/template/itertools/pystruct/_asyncio/sre/etc.)	Convert many Representable impls from repr_str→repr_wtf8 and return Wtf8Buf; change format entry types to PyUtf8StrRef and build reprs with Wtf8Buf/wtf8_concat instead of String/format!.
Public API param/type migrations `crates/stdlib/src/`, `crates/vm/src/stdlib/`, `crates/vm/src/*` (examples: json.rs, csv.rs, socket.rs, ssl.rs, codecs.rs, io.rs, time.rs, posix.rs, imp.rs)	Migrate numerous function/Arg struct signatures from PyStr/PyStrRef to PyUtf8Str/PyUtf8StrRef; update imports and callers accordingly.
ctypes and wchar handling `crates/vm/src/stdlib/ctypes/*`, `crates/vm/src/stdlib/ctypes/base.rs`	Change str_to_wchar_bytes to accept &Wtf8 and iterate code_points(); switch many downcasts/field-name paths to PyUtf8Str and use as_wtf8/as_bytes where appropriate.
I/O, codecs, encodings, JSON, CSV `crates/vm/src/stdlib/io.rs`, `crates/vm/src/codecs.rs`, `crates/vm/src/stdlib/codecs.rs`, `crates/stdlib/src/json.rs`, `crates/stdlib/src/csv.rs`, `crates/vm/src/bytes_inner.rs`	Thread PyUtf8StrRef and Wtf8Buf through encoder/decoder/TextIO/JSON/C SV parsing, centralize UTF‑8-aware error creation, and replace many byte/char extraction sites with UTF‑8-safe equivalents.
Collections, dict keys & utils `crates/vm/src/dict_inner.rs`, `crates/vm/src/utils.rs`, `crates/vm/src/stdlib/collections.rs`	Add DictKey impl for Py, change collection_repr to return Wtf8Buf, and update container reprs to incremental Wtf8Buf construction.
Platform, Windows, wasm and small fixes `crates/vm/src/stdlib/_winapi.rs`, `crates/vm/src/stdlib/nt.rs`, `crates/vm/src/stdlib/winreg.rs`, `crates/wasm/`, `src/`	Replace many as_str() with expect_str()/as_wtf8(), switch embedded-null checks to byte-level, use vm.fsencode for fs paths, and minor .into() conversions for exception messages.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Convert new_X_error to use a macro #5814 — changes to vm_new/exception-constructor generation and message types; directly overlaps the Wtf8Buf migration of exception APIs.
ctypes overhaul #6450 — large ctypes refactor touching wchar/field-name handling and str_to_wchar_bytes; strong overlap with the ctypes changes here.
Better codecs and fix lots of test_io and other expectedFailures #6997 — updates to core str/code-point handling in common crate; relates to code-point-based to_ascii and Wtf8 handling.

Suggested reviewers

coolreader18

Poem

🐰 I nibble bytes and stitch each rune,
Wtf8 buffers hum beneath the moon.
Surrogates safe, reprs now light,
PyUtf8 hops and strings take flight.
Hoppity-hop — concat delight!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Remove PyStr::as_str' clearly summarizes the main change: removing the as_str method and replacing it with safer alternatives.
Linked Issues check	✅ Passed	The changes comprehensively address issue `#7196` by replacing PyStr::as_str() calls with PyStr::to_str() and alternatives throughout the codebase, establishing the naming convention for zero-cost vs small-cost conversions.
Out of Scope Changes check	✅ Passed	All code changes are directly related to replacing PyStr::as_str() with safer alternatives (to_str, as_wtf8, expect_str, as_bytes) across the entire codebase, with no unrelated modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-24T14:01:03Z

Code has been automatically formatted

The code in this PR has been formatted using:

cargo fmt --all
Please pull the latest changes before pushing again:

git pull origin to-str

coderabbitai

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)

crates/vm/src/builtins/classmethod.rs (1)
210-224: ⚠️ Potential issue | 🟠 Major

Switch to repr_wtf8() implementation to properly preserve WTF-8 surrogates.

The current repr_str() implementation uses format! with .as_wtf8() values (lines 214, 218), which invokes Wtf8's Display trait. This silently replaces unpaired surrogates with the Unicode replacement character, defeating the purpose of WTF-8 support. Follow the pattern from other types like PyPartial in functools.rs: implement repr_wtf8() returning Wtf8Buf instead, and use write!() or Wtf8Concat to concatenate without lossy Display conversion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/classmethod.rs` around lines 210 - 224, The repr
implementation in classmethod.rs currently uses
.__qualname__().downcast_ref::<PyStr>().map(|n| n.as_wtf8()) and
.__module__().downcast_ref::<PyStr>().map(|m| m.as_wtf8()) with format!, which
loses WTF-8 surrogates; change the repr_str() style code to implement
repr_wtf8() that returns a Wtf8Buf (follow the PyPartial pattern in
functools.rs), fetch qualname and module as PyStr and call their
repr_wtf8()/repr_wtf8-equivalents (or convert to Wtf8Buf) instead of as_wtf8(),
and build the final representation using write! into the Wtf8Buf or use
Wtf8Concat so you concatenate module, qualname, class.slot_name() and callable
without going through Display and losing surrogate information.
crates/vm/src/stdlib/_winapi.rs (1)
420-428: ⚠️ Potential issue | 🟠 Major

Avoid byte‑index slicing of Unicode env keys.

k[1..] can panic when the first character is multi‑byte (valid for Unicode env var names). Use a char‑based check to keep this path panic‑free.
Proposed fix
-            if k.is_empty() || k[1..].contains('=') {
+            if k.is_empty() || k.chars().skip(1).any(|c| c == '=') {
                 return Err(vm.new_value_error("illegal environment variable name"));
             }
As per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/_winapi.rs` around lines 420 - 428, The code slices the
Unicode string k with k[1..], which can panic on multi‑byte first characters;
replace that byte-index slice with a char-based check such as k.is_empty() ||
k.chars().skip(1).any(|c| c == '=') (locate the branch that currently uses
k[1..].contains('=') after PyStrRef::expect_str()) so the illegal-name check is
unicode-safe and panic-free.
crates/vm/src/builtins/code.rs (1)
547-563: ⚠️ Potential issue | 🟠 Major

Avoid reintroducing surrogate panics in repr_str.
Now that source_path, obj_name, and qualname are interned from WTF-8, repr_str still calls self.source_path().as_str() (Line 421), which can panic on surrogates. Please switch to a non-panicking rendering path (e.g., to_str() with a fallback or formatting the WTF-8 value directly) to keep the #7196 fix intact.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/code.rs` around lines 547 - 563, repr_str currently
calls self.source_path().as_str() (and likely similar calls for
obj_name/qualname) which can panic on surrogate-containing WTF-8 interned
strings; update the repr_str implementation to use a non-panicking rendering
path (e.g., call to_str() and use its Option fallback or format the underlying
WTF-8 value directly with a lossless/fallback representation) for source_path,
obj_name, and qualname so surrogate data won't cause a panic; modify the
repr_str function in the CodeObject impl to use to_str() or a safe formatter and
include a sensible fallback when to_str() returns None.
crates/vm/src/stdlib/posix.rs (1)
1969-1987: ⚠️ Potential issue | 🟡 Minor

Remove the unnecessary inner block in SysconfName::TryFromObject (lines 2374–2383).

The block around let name = s.as_str() can be inlined. The downcast to PyUtf8Str is intentional as part of the UTF-8 migration effort (see commit e77e62f), and as_str() will succeed on normal Python strings; any failure on strings with surrogates is expected and correctly reports "configuration names must be strings or integers".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 1969 - 1987, The downcast branch
in PathconfName::try_from_object currently introduces an unnecessary inner block
around retrieving the UTF-8 string; inline the call so you directly call
s.as_str().parse::<PathconfVar>() instead of creating a separate inner
block/variable scope. Update the obj.downcast::<PyUtf8Str>() branch in
TryFromObject (for PathconfName::try_from_object) to map_err as before but
immediately call s.as_str().parse::<PathconfVar>() and handle the parse error
with vm.new_value_error("unrecognized configuration name"), returning the parsed
i32 as Self(i).
crates/stdlib/src/_sqlite3.rs (1)
721-728: ⚠️ Potential issue | 🟠 Major

Replace expect_str() with error-checked UTF-8 conversion to avoid panic on surrogate input.

The expect_str() call panics on surrogate-containing strings. Since register_converter is exposed as a @pyfunction where user input flows into the typename parameter as a PyStrRef, a user can trigger a crash by passing a type name with surrogates. Use typename.try_into_utf8(vm)?.as_str().to_uppercase() instead to fail gracefully with a proper error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/_sqlite3.rs` around lines 721 - 728, The code in
register_converter uses typename.expect_str() which can panic on
surrogate-containing input; replace this with a fallible UTF-8 conversion that
returns a Python error instead of panicking by calling
typename.try_into_utf8(vm)? to obtain a Utf8Str (or String via
as_str().to_uppercase()) and then pass the uppercased name to
converters().set_item; update the assignment of name to use
typename.try_into_utf8(vm)?.as_str().to_uppercase() (or equivalent) so invalid
UTF-8 is handled gracefully.
crates/vm/src/codecs.rs (1)
114-136: ⚠️ Potential issue | 🟠 Major

get_incremental_encoder/get_incremental_decoder should accept Option<PyUtf8StrRef> to match the encode/decode API.

The methods encode and decode accept Option<PyUtf8StrRef> and internally convert it via into_wtf8(), while get_incremental_encoder and get_incremental_decoder accept Option<PyStrRef> directly. For API consistency, the incremental methods should follow the same pattern and accept Option<PyUtf8StrRef>, performing the conversion internally as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/codecs.rs` around lines 114 - 136, Change the signatures of
get_incremental_encoder and get_incremental_decoder to accept
Option<PyUtf8StrRef> (not Option<PyStrRef>), and when building args convert the
Utf8 ref to a WTF-8 PyStr using its into_wtf8() before calling vm.call_method;
e.g. match the errors with Some(e) => vec![e.into_wtf8().into()] and None =>
vec![] so the API and conversion match encode/decode (refer to
get_incremental_encoder, get_incremental_decoder and the into_wtf8() helper).
crates/vm/src/builtins/type.rs (1)
1968-1979: ⚠️ Potential issue | 🟡 Minor

Preserve surrogate __qualname__ in __repr__.
__qualname__ can contain surrogates; downcasting to PyUtf8Str drops those cases and falls back to name. Using PyStr + as_wtf8() retains the original value.
🔧 Surrogate-preserving repr
-            Some(module) if module != "builtins" => {
-                let name = zelf.name();
-                format!(
-                    "<class '{}.{}'>",
-                    module,
-                    zelf.__qualname__(vm)
-                        .downcast_ref::<PyUtf8Str>()
-                        .map(|n| n.as_str())
-                        .unwrap_or_else(|| &name)
-                )
-            }
+            Some(module) if module != "builtins" => {
+                let name = zelf.name();
+                let qualname = zelf.__qualname__(vm);
+                if let Some(q) = qualname.downcast_ref::<PyStr>() {
+                    format!("<class '{}.{}'>", module, q.as_wtf8())
+                } else {
+                    format!("<class '{}.{}'>", module, name)
+                }
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1968 - 1979, The repr
construction drops surrogate-containing __qualname__ by downcasting to
PyUtf8Str; change the downcast to PyStr and use its as_wtf8()
(surrogate-preserving) when formatting the class repr for zelf.__qualname__(vm)
so that surrogate sequences are retained instead of falling back to name; adjust
the map/unwrap_or_else flow around
zelf.__qualname__(vm).downcast_ref::<PyUtf8Str>() to use
.downcast_ref::<PyStr>().map(|n| n.as_wtf8()) and keep the existing fallback to
name.
crates/vm/src/builtins/namespace.rs (1)
141-173: ⚠️ Potential issue | 🟠 Major

Avoid potential panic in SimpleNamespace repr: guard quote-stripping.

&key_str[1..key_str.len() - 1] assumes the key repr is always quoted and length ≥ 2. Users can mutate __dict__ to include non-string keys (e.g., ns.__dict__[1] = 2), whose repr may be "1" (len 1) and will panic here.
Proposed safe guard (keep current behavior for normal string keys)
-                let k = key.repr(vm)?;
-                let key_str = k.as_wtf8();
-                // Strip quotes from key repr
-                result.push_wtf8(&key_str[1..key_str.len() - 1]);
+                let k = key.repr(vm)?;
+                let key_str = k.as_wtf8();
+                // Strip quotes from key repr (only when it actually looks like a quoted string)
+                if key_str.len() >= 2 {
+                    let b = key_str.as_bytes();
+                    let first = b[0];
+                    let last = b[b.len() - 1];
+                    if (first == b'\'' || first == b'"') && last == first {
+                        result.push_wtf8(&key_str[1..key_str.len() - 1]);
+                    } else {
+                        result.push_wtf8(key_str);
+                    }
+                } else {
+                    result.push_wtf8(key_str);
+                }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/namespace.rs` around lines 141 - 173, The repr
implementation for PyNamespace (fn repr_wtf8 in the Representable impl)
currently slices key reprs with &key_str[1..key_str.len()-1], which can panic
for short or unquoted reprs; update the loop that processes dict keys to first
check that key_str.len() >= 2 and that key_str starts and ends with matching
quote characters before stripping, and if those checks fail fall back to using
the full key_str (or key.repr(vm)? as-is) so non-string or short reprs don’t
cause a panic while preserving the current quoted-key behavior for normal string
keys.
crates/vm/src/stdlib/os.rs (1)
452-476: ⚠️ Potential issue | 🟠 Major

expect_str() panics on surrogate-containing strings in Windows putenv/unsetenv.

Both functions call key.expect_str() / value.expect_str() on user-provided PyStrRef arguments, but expect_str() panics unconditionally when the string contains surrogates (via .expect("PyStr contains surrogates")). This violates the PR's goal of preventing unexpected panics on surrogate-containing input.

Use PyUtf8StrRef in the function signature to validate at binding time (raising a Python exception instead), or explicitly call ensure_valid_utf8(vm)? before working with the string as UTF-8.

Also applies to: lines 500–523 (unsetenv)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/os.rs` around lines 452 - 476, The putenv and unsetenv
functions currently call key.expect_str()/value.expect_str(), which can panic on
surrogate-containing PyStrRef; change the bindings to accept validated UTF-8
types (use PyUtf8StrRef for the key and value parameters) or call
ensure_valid_utf8(vm)? on each PyStrRef before converting to native strings so
that invalid surrogates raise a Python exception instead of panicking; update
both putenv and unsetenv to perform this validation (refer to the functions
named putenv and unsetenv and the use of
expect_str()/to_wide_with_nul()/check_env_var_len) so all surrogate-containing
input is handled safely.
crates/vm/src/stdlib/ast/python.rs (1)
129-165: ⚠️ Potential issue | 🟠 Major

Verify PyUtf8Str downcasts may silently skip WTF-8 string keys, causing false keyword argument errors.

PyUtf8Str downcasting requires strict UTF-8 validation (see PyPayload::validate_downcastable_from() and try_downcast_from() in crates/vm/src/builtins/str.rs), which fails silently for Python strings containing unpaired surrogates (valid WTF-8 but not UTF-8).

This creates inconsistent behavior:

Line 155: Keys with surrogates are silently ignored by if let Ok(key) = key.downcast::<PyUtf8Str>(), leaving them in expecting set

Line 172: Non-UTF8 keys in field_types are skipped via let Ok(key) = key.downcast::<PyUtf8Str>() else { continue; }

Line 227: Downcast failures get mapped to misleading "keywords must be strings" error (key is a valid string, just not UTF-8)

Result: Mismatched keys are not properly removed from validation sets, causing false "unexpected keyword argument" or "missing required argument" errors even when the argument was provided with a valid (but non-UTF8) string key.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ast/python.rs` around lines 129 - 165, The code
incorrectly downcasts dict keys to PyUtf8Str (strict UTF-8) which silently skips
valid WTF-8 Python strings and leaves keys in the expecting set; update the key
handling in the loops using dict.items_vec() and args.kwargs to accept general
PyStr (WTF-8) rather than PyUtf8Str so non-UTF8 keys are recognized and removed
from expecting; specifically, replace downcast::<PyUtf8Str>() checks with
downcast::<PyStr>() (or use the existing PyStrRef/try_downcast_from for PyStr)
in the loops that iterate dict.items_vec() and args.kwargs and adjust the error
path that reports "keywords must be strings" to only trigger when the object is
truly non-string, keeping PyStr-typed keys even if not UTF-8.
crates/vm/src/stdlib/collections.rs (1)
560-584: ⚠️ Potential issue | 🟡 Minor

Fix empty deque repr when maxlen is set.

format!("{class_name}([{closing_part})") yields deque([]], maxlen=...) (extra ]) when maxlen is present. Please construct the empty-case string explicitly.
🔧 Suggested fix
-            if zelf.__len__() == 0 {
-                return Ok(vm.ctx.new_str(format!("{class_name}([{closing_part})")));
-            }
+            if zelf.__len__() == 0 {
+                let s = match zelf.maxlen {
+                    Some(maxlen) => format!("{class_name}([], maxlen={maxlen})"),
+                    None => format!("{class_name}([])"),
+                };
+                return Ok(vm.ctx.new_str(s));
+            }
Also applies to: 587-589
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/collections.rs` around lines 560 - 584, The empty-case
repr for PyDeque in impl Representable::repr incorrectly composes the string
using closing_part, producing an extra ']' when maxlen is set; change the empty
branch in repr to build the two forms explicitly: if zelf.maxlen.is_some()
return Ok(vm.ctx.new_str(format!("{class_name}([], maxlen={})",
zelf.maxlen.unwrap()))), else return
Ok(vm.ctx.new_str(format!("{class_name}([])"))); apply the same explicit
construction for the other empty-case occurrence (the similar code at the later
block referenced).
crates/vm/src/stdlib/io.rs (1)
2904-2965: ⚠️ Potential issue | 🔴 Critical

Fix type mismatch: errors Wtf8Buf vs StatelessIncremental* fields.
errors is converted to Wtf8Buf, but the fallback stateless encoder/decoder structs still store Option<PyStrRef>, which will not type-check. Align the field types (or keep errors as PyUtf8StrRef and avoid into_wtf8() here).
🛠️ Suggested fix (align `errors` to Wtf8Buf)
struct StatelessIncrementalEncoder {
    encode: PyObjectRef,
-   errors: Option<PyStrRef>,
+   errors: Option<Wtf8Buf>,
    name: Option<PyStrRef>,
}

struct StatelessIncrementalDecoder {
    decode: PyObjectRef,
-   errors: Option<PyStrRef>,
+   errors: Option<Wtf8Buf>,
}
-        if let Some(errors) = &self.errors {
-            args.push(errors.to_owned().into());
-        }
+        if let Some(errors) = &self.errors {
+            args.push(vm.ctx.new_str(errors.clone()).into());
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/io.rs` around lines 2904 - 2965, The local variable
errors is converted to a Wtf8Buf but the fallback structs
StatelessIncrementalEncoder and StatelessIncrementalDecoder still expect
Option<PyStrRef>, causing a type mismatch; update those structs to store
Option<Wtf8Buf> (or change their error field types accordingly) and pass
Some(errors.clone()) into their constructors in the get_incremental_encoder and
get_incremental_decoder fallback branches (adjust the decoder branch where it
currently uses Some(errors)). Update any related field uses and clones so types
align (references: errors variable, StatelessIncrementalEncoder,
StatelessIncrementalDecoder, codec.get_incremental_encoder,
codec.get_incremental_decoder, and the encode/decode constructors like
StatelessIncrementalEncoder { encode: ..., errors: Some(errors.clone()), ... }).

♻️ Duplicate comments (2)

crates/stdlib/src/_sqlite3.rs (2)

2916-2923: Same surrogate-safety concern as register_converter.

This expect_str() usage has the same potential panic risk; please apply the same UTF‑8‑safe handling here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/_sqlite3.rs` around lines 2916 - 2923, The code currently
calls isolation_level.expect_str().bytes(), which can panic on invalid UTF-8 the
same way register_converter did; change the conversion to a UTF-8-safe form
(e.g., use to_string_lossy() and then extend with its as_bytes()) so the string
bytes are produced without panicking, e.g. replace
isolation_level.expect_str().bytes() with
isolation_level.expect_str().to_string_lossy().as_bytes() (and keep the
surrounding s.extend(...) and self._exec(&s, vm) logic).

2192-2202: Same surrogate-safety concern as register_converter.

This expect_str() usage has the same potential panic risk; please apply the same UTF‑8‑safe handling here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/_sqlite3.rs` around lines 2192 - 2202, The code uses
expect_str() on both name and obj which can panic on invalid UTF-8; replace
these with a UTF-8-safe conversion (use the PyStr->to_str / try-to-str API that
returns Result<&str, _> or an Option) for the variables used in the comparison,
handle the error by skipping/breaking the comparison instead of panicking, and
then use the resulting &str to build a_iter and b_iter as before; locate the
occurrences in the loop that iterates self.description (the block that uses
description.iter().zip(0..), name, obj, a_iter, b_iter and returns
self.data.getitem_by_index(vm, i)) and change the expect_str() calls to the safe
conversion and appropriate error handling.

🧹 Nitpick comments (10)

crates/stdlib/src/binascii.rs (1)

851-853: Optional: tighten the new_binascii_error wrapper signature to match the migrated API

Now that vm.new_exception_msg accepts Into<Wtf8Buf>, the wrapper can take impl Into<Wtf8Buf> (or just Wtf8Buf) directly, which lets every call site drop the unnecessary .to_owned() allocation on string literals.

♻️ Proposed refactor

-fn new_binascii_error(msg: String, vm: &VirtualMachine) -> PyBaseExceptionRef {
-    vm.new_exception_msg(decl::error_type(vm), msg.into())
+fn new_binascii_error(msg: impl Into<rustpython_vm::common::wtf8::Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef {
+    vm.new_exception_msg(decl::error_type(vm), msg.into())
 }

Call sites can then pass &str directly:

-    return Err(super::new_binascii_error("Odd-length string".to_owned(), vm));
+    return Err(super::new_binascii_error("Odd-length string", vm));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/binascii.rs` around lines 851 - 853, Change the signature
of new_binascii_error to accept impl Into<Wtf8Buf> (or Wtf8Buf) instead of
String and forward that into vm.new_exception_msg (which now accepts
Into<Wtf8Buf>); update the function declaration new_binascii_error(msg: impl
Into<Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef and call
vm.new_exception_msg(decl::error_type(vm), msg) so call sites can pass &str
without .to_owned().

crates/vm/src/macros.rs (1)

191-199: Extract metavariable expansion outside the unsafe block instead of suppressing the lint.

#[allow(clippy::macro_metavars_in_unsafe)] is a workaround for a lint whose intended fix is to bind the expression before the unsafe block. Doing so keeps the safe-code/unsafe-code boundary explicit and removes the need for the suppression entirely:

♻️ Proposed refactor

     ($as_ctx:expr, $name:ident) => {{
         // Safety: All known identifiers are ascii strings.
-        #[allow(clippy::macro_metavars_in_unsafe)]
-        unsafe {
-            $crate::builtins::PyUtf8StrInterned::from_str_interned_unchecked(
-                $as_ctx.as_ref().names.$name
-            )
-        }
+        let interned = $as_ctx.as_ref().names.$name;
+        unsafe { $crate::builtins::PyUtf8StrInterned::from_str_interned_unchecked(interned) }
     }};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/macros.rs` around lines 191 - 199, The macro currently silences
clippy::macro_metavars_in_unsafe; instead bind the metavariable expansion
outside the unsafe block: evaluate and store the expression
$as_ctx.as_ref().names.$name into a local let (e.g. names_val) before entering
unsafe, then call
$crate::builtins::PyUtf8StrInterned::from_str_interned_unchecked(names_val)
inside the unsafe block and remove the
#[allow(clippy::macro_metavars_in_unsafe)] attribute; this keeps the safe/unsafe
boundary explicit for the macro arm that uses $as_ctx and $name.

crates/vm/src/stdlib/posix.rs (1)

2374-2383: Unnecessary inner block in SysconfName::TryFromObject.

The { let name = s.as_str(); ... } scope is redundant — s is in scope through the entire Err arm and the lifetime of the borrow doesn't require restriction.

♻️ Proposed simplification

-                    {
-                        let name = s.as_str();
-                        name.parse::<SysconfVar>().or_else(|_| {
-                            if name == "SC_PAGESIZE" {
-                                Ok(SysconfVar::SC_PAGESIZE)
-                            } else {
-                                Err(vm.new_value_error("unrecognized configuration name"))
-                            }
-                        })? as i32
-                    }
+                    let name = s.as_str();
+                    name.parse::<SysconfVar>().or_else(|_| {
+                        if name == "SC_PAGESIZE" {
+                            Ok(SysconfVar::SC_PAGESIZE)
+                        } else {
+                            Err(vm.new_value_error("unrecognized configuration name"))
+                        }
+                    })? as i32

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 2374 - 2383, In
SysconfName::TryFromObject, remove the unnecessary inner block and local
shadowing of name; directly call s.as_str().parse::<SysconfVar>().or_else(|_| {
if s.as_str() == "SC_PAGESIZE" { Ok(SysconfVar::SC_PAGESIZE) } else {
Err(vm.new_value_error("unrecognized configuration name")) } })? as i32 so the
borrow of s lasts naturally without the extra scope and the code is simplified
while keeping the same logic for SysconfVar and the SC_PAGESIZE special-case.

crates/stdlib/src/unicodedata.rs (1)

128-139: to_str() chain is correct; minor note on the error-message Display for surrogate inputs.

When name.to_str() returns None (lone-surrogate string), the chain falls through to the KeyError at line 134 — which is the correct Python behavior since every valid Unicode character name is ASCII. No correctness concern there.

The let_chains syntax (lines 128–129) is fully supported: the project's MSRV is 1.93.0, which exceeds the Rust 1.88.0 requirement where let_chains stabilized (June 2025), and the project uses the 2024 edition where the feature is available.

One small point: the error message on line 136 formats name via its Display impl:
format!("undefined character name '{name}'")
If name contains lone surrogates, Display for PyStrRef will likely substitute U+FFFD replacement characters, so the error message won't faithfully reproduce the raw surrogate codepoints. CPython would surface the exact code units. This is well within acceptable range for an improvement, but if tighter CPython parity is desired down the line, the error path could use name.as_wtf8() with an escaped representation instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/unicodedata.rs` around lines 128 - 139, The error message
currently formats the PyStrRef `name` via its Display implementation
(format!("undefined character name '{name}'")) which may replace lone surrogates
with U+FFFD; modify the KeyError formatting in the branch that returns
vm.new_key_error so it emits an escaped representation preserving raw code units
(e.g., use name.as_wtf8() and produce a diagnostic-escaped string) instead of
relying on Display; update the call site that builds the message passed into
vm.ctx.new_str(...) so the message shows exact code units for
surrogate-containing inputs while leaving the rest of the logic (name.to_str(),
unicode_names2::character, self.check_age) unchanged.

crates/stdlib/src/ssl.rs (1)

3913-3921: Consider aligning setter arg type with PyUtf8StrRef.

This would make the setter consistent with the updated call sites and remove the extra try_into_utf8 step.

♻️ Suggested adjustment

-        fn set_server_hostname(
-            &self,
-            value: Option<PyStrRef>,
-            vm: &VirtualMachine,
-        ) -> PyResult<()> {
+        fn set_server_hostname(
+            &self,
+            value: Option<PyUtf8StrRef>,
+            vm: &VirtualMachine,
+        ) -> PyResult<()> {
             // Check if handshake is already done
             if *self.handshake_done.lock() {
                 return Err(
                     vm.new_value_error("Cannot set server_hostname on socket after handshake")
                 );
             }

             // Validate hostname
             let hostname_string = value
                 .map(|s| {
-                    let utf8 = s.try_into_utf8(vm)?;
-                    validate_hostname(utf8.as_str(), vm)?;
-                    Ok::<String, _>(utf8.as_str().to_owned())
+                    validate_hostname(s.as_str(), vm)?;
+                    Ok::<String, _>(s.as_str().to_owned())
                 })
                 .transpose()?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/ssl.rs` around lines 3913 - 3921, The setter currently
accepts an Option<PyObjectRef> (or similar) and calls try_into_utf8 inside the
closure; change the setter signature to accept Option<PyUtf8StrRef> (or
Option<&PyUtf8StrRef> as appropriate) so callers can pass already-validated
UTF-8 refs, then remove the try_into_utf8 call in the closure and use the
PyUtf8StrRef's .as_str() directly for validate_hostname and when creating the
owned String; update uses around server_hostname.write() assignment to take the
Option<PyUtf8StrRef> and set hostname_string = value.map(|s| {
validate_hostname(s.as_str(), vm)?; Ok::<String, _>(s.as_str().to_owned())
}).transpose()? accordingly.

crates/stdlib/src/socket.rs (1)

2911-2916: Simplify by accepting PyUtf8StrRef directly (like sethostname/inet_aton/inet_pton).

gethostbyaddr, gethostbyname, and gethostbyname_ex still take PyStrRef and explicitly call .try_into_utf8(vm).map_err(IoOrPyException::from)?. Since IoOrPyException: From<PyBaseExceptionRef>, the #[pyfunction] macro can propagate a TryFromObject error as IoOrPyException::Py automatically — the same end result as the current explicit conversion.
♻️ Proposed simplification (shown for `gethostbyaddr`; apply identically to `gethostbyname` and `gethostbyname_ex`)
 #[pyfunction]
 fn gethostbyaddr(
-    addr: PyStrRef,
+    addr: PyUtf8StrRef,
     vm: &VirtualMachine,
 ) -> Result<(String, PyListRef, PyListRef), IoOrPyException> {
-    let addr = addr.try_into_utf8(vm).map_err(IoOrPyException::from)?;
     let addr = get_addr(vm, addr, c::AF_UNSPEC)?;
Also applies to: 2928-2930, 2941-2943
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/socket.rs` around lines 2911 - 2916, Change the functions
gethostbyaddr, gethostbyname, and gethostbyname_ex to accept PyUtf8StrRef (like
sethostname/inet_aton/inet_pton) instead of PyStrRef and remove the explicit
.try_into_utf8(vm).map_err(IoOrPyException::from)? conversions; let the
#[pyfunction] macro propagate TryFromObject errors as IoOrPyException::Py
automatically, then use the received PyUtf8StrRef directly in the existing
get_addr calls (e.g., replace the local addr conversion in gethostbyaddr and the
analogous conversions in gethostbyname and gethostbyname_ex).

crates/vm/src/stdlib/ctypes/array.rs (1)

4-5: Run rustfmt & clippy before merge.

Please confirm cargo fmt and cargo clippy were run for these Rust changes.
As per coding guidelines: "Follow the default rustfmt code style by running cargo fmt to format Rust code" and "Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints that are introduced by your changes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/array.rs` around lines 4 - 5, Run rustfmt and
clippy for the VM crate and fix any formatting or lints introduced by these
changes: run cargo fmt for crates/vm (ensure files like
crates/vm/src/stdlib/ctypes/array.rs are formatted) and run cargo clippy --fix
or cargo clippy to surface warnings; address all clippy suggestions and warnings
(including any issues related to imports such as LazyLock and SaturatedSliceIter
or unused/incorrect imports) before merging.

crates/vm/src/stdlib/ctypes/union.rs (1)

620-624: Silent skip when field name isn't valid UTF-8 — verify this is intentional.

If downcast_ref::<PyUtf8Str>() fails (e.g., a field name contains surrogates), the positional argument is silently skipped without error. In practice, process_fields (line 220) already validates field names as PyUtf8Str, so this path shouldn't be reachable with malformed names. However, the structure.rs counterpart at line 751 has the identical pattern — so at least they're consistent.

If you want extra safety, consider logging or returning an error when the downcast fails, rather than silently skipping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/union.rs` around lines 620 - 624, The code
currently silently skips a field when tuple.first().downcast_ref::<PyUtf8Str>()
fails (i.e., non-UTF-8 names); update the handling in the union field-processing
block (the branch using PyTuple and downcast_ref::<PyUtf8Str>()) to treat a
failed downcast as an explicit error instead of silently skipping: return or
propagate a descriptive error (e.g., invalid field name encoding) from the
surrounding function (see process_fields and the analogous handling in
structure.rs) so callers get notified when a field name is not valid UTF-8.

crates/vm/src/utils.rs (1)

1-2: Pre-merge hygiene: please run rustfmt + clippy for this refactor-heavy PR.

These changes touch a lot of core codepaths; it’s worth ensuring no formatting drift and no new clippy warnings slipped in.

As per coding guidelines “Follow the default rustfmt code style by running cargo fmt” and “Always run clippy (cargo clippy) before completing tasks and fix any warnings or lints that are introduced by your changes”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/utils.rs` around lines 1 - 2, This PR introduced formatting and
potential lints in core code; run rustfmt and clippy and fix any resulting
issues: run cargo fmt (or rustfmt) to apply default formatting to
crates/vm/src/utils.rs and the repo, then run cargo clippy --all-targets
--all-features -- -D warnings to surface lints and address them (fix unused
imports like Wtf8/Wtf8Buf if unused, correct style issues, replace
unwraps/expect where appropriate, and resolve any borrow/ownership or
performance warnings reported for functions in utils.rs). Ensure the file
symbols (Wtf8, Wtf8Buf and any functions in utils.rs) compile cleanly with no
clippy warnings before merging.

crates/vm/src/builtins/memory.rs (1)

808-830: Good: memoryview.cast() now requires UTF‑8-safe format strings (PyUtf8StrRef).

This change removes the previous “may panic on surrogates” footgun by ensuring cast_to_1d() can safely use format.as_str() and still keep BufferDescriptor.format in sync.

If you want, I can help draft a small regression test that passes a surrogate-containing format and asserts an exception (not a panic).

Also applies to: 957-963
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/memory.rs` around lines 808 - 830, Change calls that
accept raw strings to use PyUtf8StrRef and then use format.as_str() so
BufferDescriptor.format stays UTF‑8 safe: in the cast_to_1d implementation
replace any use of a raw format string with a PyUtf8StrRef parameter, call
Self::parse_format(format.as_str(), vm) and set BufferDescriptor.format to
format.as_str().to_owned().into(); ensure the same pattern (use PyUtf8StrRef,
parse_format(format.as_str(), vm), and assign format via
format.as_str().to_owned().into()) is applied to the other memoryview cast
routine(s) that construct BufferDescriptor (the other function that currently
calls parse_format/uses format.as_str()/builds dim_desc), keeping the rest of
the struct fields (buffer.clone(), AtomicCell::new(false), OnceCell::new(),
etc.) unchanged.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f250a0 and e77e62f.

📒 Files selected for processing (120)

crates/common/src/str.rs
crates/stdlib/src/_asyncio.rs
crates/stdlib/src/_sqlite3.rs
crates/stdlib/src/array.rs
crates/stdlib/src/binascii.rs
crates/stdlib/src/contextvars.rs
crates/stdlib/src/csv.rs
crates/stdlib/src/grp.rs
crates/stdlib/src/hashlib.rs
crates/stdlib/src/json.rs
crates/stdlib/src/locale.rs
crates/stdlib/src/lzma.rs
crates/stdlib/src/multiprocessing.rs
crates/stdlib/src/overlapped.rs
crates/stdlib/src/posixshmem.rs
crates/stdlib/src/pyexpat.rs
crates/stdlib/src/pystruct.rs
crates/stdlib/src/socket.rs
crates/stdlib/src/ssl.rs
crates/stdlib/src/unicodedata.rs
crates/stdlib/src/zlib.rs
crates/vm/src/buffer.rs
crates/vm/src/builtins/bool.rs
crates/vm/src/builtins/classmethod.rs
crates/vm/src/builtins/code.rs
crates/vm/src/builtins/complex.rs
crates/vm/src/builtins/dict.rs
crates/vm/src/builtins/float.rs
crates/vm/src/builtins/function.rs
crates/vm/src/builtins/function/jit.rs
crates/vm/src/builtins/genericalias.rs
crates/vm/src/builtins/int.rs
crates/vm/src/builtins/interpolation.rs
crates/vm/src/builtins/list.rs
crates/vm/src/builtins/mappingproxy.rs
crates/vm/src/builtins/memory.rs
crates/vm/src/builtins/mod.rs
crates/vm/src/builtins/module.rs
crates/vm/src/builtins/namespace.rs
crates/vm/src/builtins/object.rs
crates/vm/src/builtins/range.rs
crates/vm/src/builtins/set.rs
crates/vm/src/builtins/slice.rs
crates/vm/src/builtins/staticmethod.rs
crates/vm/src/builtins/str.rs
crates/vm/src/builtins/template.rs
crates/vm/src/builtins/tuple.rs
crates/vm/src/builtins/type.rs
crates/vm/src/builtins/weakproxy.rs
crates/vm/src/bytes_inner.rs
crates/vm/src/codecs.rs
crates/vm/src/coroutine.rs
crates/vm/src/dict_inner.rs
crates/vm/src/exception_group.rs
crates/vm/src/exceptions.rs
crates/vm/src/frame.rs
crates/vm/src/function/buffer.rs
crates/vm/src/function/fspath.rs
crates/vm/src/function/mod.rs
crates/vm/src/import.rs
crates/vm/src/intern.rs
crates/vm/src/macros.rs
crates/vm/src/protocol/object.rs
crates/vm/src/stdlib/_abc.rs
crates/vm/src/stdlib/_winapi.rs
crates/vm/src/stdlib/_wmi.rs
crates/vm/src/stdlib/ast.rs
crates/vm/src/stdlib/ast/basic.rs
crates/vm/src/stdlib/ast/python.rs
crates/vm/src/stdlib/ast/repr.rs
crates/vm/src/stdlib/builtins.rs
crates/vm/src/stdlib/codecs.rs
crates/vm/src/stdlib/collections.rs
crates/vm/src/stdlib/ctypes.rs
crates/vm/src/stdlib/ctypes/array.rs
crates/vm/src/stdlib/ctypes/base.rs
crates/vm/src/stdlib/ctypes/function.rs
crates/vm/src/stdlib/ctypes/pointer.rs
crates/vm/src/stdlib/ctypes/simple.rs
crates/vm/src/stdlib/ctypes/structure.rs
crates/vm/src/stdlib/ctypes/union.rs
crates/vm/src/stdlib/functools.rs
crates/vm/src/stdlib/imp.rs
crates/vm/src/stdlib/io.rs
crates/vm/src/stdlib/itertools.rs
crates/vm/src/stdlib/marshal.rs
crates/vm/src/stdlib/msvcrt.rs
crates/vm/src/stdlib/nt.rs
crates/vm/src/stdlib/operator.rs
crates/vm/src/stdlib/os.rs
crates/vm/src/stdlib/posix.rs
crates/vm/src/stdlib/signal.rs
crates/vm/src/stdlib/sre.rs
crates/vm/src/stdlib/symtable.rs
crates/vm/src/stdlib/sys.rs
crates/vm/src/stdlib/thread.rs
crates/vm/src/stdlib/time.rs
crates/vm/src/stdlib/typevar.rs
crates/vm/src/stdlib/typing.rs
crates/vm/src/stdlib/winreg.rs
crates/vm/src/stdlib/winsound.rs
crates/vm/src/suggestion.rs
crates/vm/src/types/slot.rs
crates/vm/src/types/structseq.rs
crates/vm/src/utils.rs
crates/vm/src/vm/context.rs
crates/vm/src/vm/interpreter.rs
crates/vm/src/vm/mod.rs
crates/vm/src/vm/vm_new.rs
crates/vm/src/warn.rs
crates/wasm/src/browser_module.rs
crates/wasm/src/convert.rs
crates/wasm/src/js_module.rs
crates/wasm/src/wasm_builtins.rs
crates/wtf8/src/concat.rs
crates/wtf8/src/lib.rs
example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
src/lib.rs
src/shell.rs
src/shell/helper.rs

crates/vm/src/builtins/interpolation.rs

crates/vm/src/builtins/str.rs

crates/vm/src/builtins/template.rs

crates/vm/src/intern.rs

crates/vm/src/stdlib/_winapi.rs

crates/vm/src/vm/mod.rs

coderabbitai · 2026-02-24T15:24:47Z

crates/wasm/src/browser_module.rs

        let response_format = match response_format {
-            Some(s) => FetchResponseFormat::from_str(vm, s.as_str())?,
+            Some(s) => FetchResponseFormat::from_str(vm, s.expect_str())?,
            None => FetchResponseFormat::Text,
        };

        let opts = web_sys::RequestInit::new();

-        match method {
-            Some(s) => opts.set_method(s.as_str()),
-            None => opts.set_method("GET"),
-        };
+        opts.set_method(method.map_or("GET", |s| s.expect_str()));

        if let Some(body) = body {
            opts.set_body(&convert::py_to_js(vm, body));
        }

-        let request = web_sys::Request::new_with_str_and_init(url.as_str(), &opts)
+        let request = web_sys::Request::new_with_str_and_init(url.expect_str(), &opts)
            .map_err(|err| convert::js_py_typeerror(vm, err))?;

        if let Some(headers) = headers {
            let h = request.headers();
            for (key, value) in headers {
                let key = key.str(vm)?;
                let value = value.str(vm)?;
-                h.set(key.as_str(), value.as_str())
+                h.set(key.expect_str(), value.expect_str())
                    .map_err(|err| convert::js_py_typeerror(vm, err))?;
            }
        }

        if let Some(content_type) = content_type {
            request
                .headers()
-                .set("Content-Type", content_type.as_str())
+                .set("Content-Type", content_type.expect_str())
                .map_err(|err| convert::js_py_typeerror(vm, err))?;


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the definition of expect_str rg -n "fn expect_str" crates/ -A 5 -B 2

Repository: RustPython/RustPython

Length of output: 584

🏁 Script executed:

# Also find to_str for comparison rg -n "fn to_str" crates/ -A 5 -B 2 | head -100

Repository: RustPython/RustPython

Length of output: 5179

🏁 Script executed:

# Check the PyStr and PyUtf8Str trait implementations rg -n "impl.*PyStr|impl.*PyUtf8Str" crates/ -A 10 | head -150

Repository: RustPython/RustPython

Length of output: 10174

🏁 Script executed:

# Look at the actual code in browser_module.rs at the mentioned lines cat -n crates/wasm/src/browser_module.rs | sed -n '60,100p'

Repository: RustPython/RustPython

Length of output: 1714

🏁 Script executed:

# Also check lines 168-216 and 227-245 cat -n crates/wasm/src/browser_module.rs | sed -n '160,220p'

Repository: RustPython/RustPython

Length of output: 2175

🏁 Script executed:

cat -n crates/wasm/src/browser_module.rs | sed -n '220,250p'

Repository: RustPython/RustPython

Length of output: 1564

Use to_str() or to_string_lossy() instead of expect_str() to prevent surrogate panics.

All uses of expect_str() at these call sites will panic if the Python string contains surrogates, aborting the WASM runtime instead of raising a Python error. These are user-input-facing operations (URLs, headers, selectors, attributes, module paths) where panics break proper error handling. Replace with to_str() + error propagation or to_string_lossy() to handle surrogates safely.

Locations: line 67 (response_format), 73 (method), 79 (URL), 87 (headers), 95 (content-type), 171 (CSS selector), 206 (element attribute get), 215 (element attribute set), 227 (module path), 244 (module name).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/wasm/src/browser_module.rs` around lines 66 - 96, Several call sites (e.g., FetchResponseFormat::from_str(vm, s.expect_str()), opts.set_method(... method.map_or(...)), url.expect_str(), header key/value handling via key.str()/value.str(), and request.headers().set("Content-Type", content_type.expect_str())) use expect_str() which can panic on surrogate-containing Python strings; replace these with safe conversions (use to_str() and propagate a Python/JS error or to_string_lossy() where lossy fallback is acceptable) and ensure the error is converted to the existing error path (e.g., map_err(|e| convert::js_py_typeerror(vm, e)) or return a VM error) so WASM does not abort; apply the same change to the other mentioned locations (CSS selector handling, element attribute get/set, module path/name) using the same safe-to-string pattern and error propagation.

coderabbitai · 2026-02-24T15:24:48Z

crates/wasm/src/wasm_builtins.rs

        move |_self: PyObjectRef, data: PyStrRef, vm: &VirtualMachine| -> PyResult<()> {
-            write_f(data.as_str(), vm)
+            write_f(data.expect_str(), vm)
        },


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find definitions of expect_str() and to_str() methods rg -n "fn\s+(expect_str|to_str)" --type rust -C 5

Repository: RustPython/RustPython

Length of output: 7601

🏁 Script executed:

#!/bin/bash # Search for PyStrRef definition and its methods fd -e rs | xargs rg -l "PyStrRef" | head -20

Repository: RustPython/RustPython

Length of output: 627

🏁 Script executed:

#!/bin/bash # Look for expect_str implementation details rg -n "expect_str" --type rust -C 3 -g "*.rs"

Repository: RustPython/RustPython

Length of output: 14863

🏁 Script executed:

#!/bin/bash # Check the PR context - look at the file being changed head -50 crates/wasm/src/wasm_builtins.rs tail -20 crates/wasm/src/wasm_builtins.rs

Repository: RustPython/RustPython

Length of output: 2138

Use to_string_lossy() instead of expect_str() to handle surrogates gracefully (Line 36).

expect_str() explicitly panics on surrogate-containing strings, which conflicts with this PR's objective. For WASM console output, use to_string_lossy() to replace surrogates with U+FFFD instead:

Suggested fix

- write_f(data.expect_str(), vm) + write_f(&data.to_string_lossy(), vm)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/wasm/src/wasm_builtins.rs` around lines 35 - 37, The closure handling WASM console output currently calls data.expect_str() which panics on surrogate-containing strings; replace that with data.to_string_lossy() to convert surrogates to U+FFFD and pass the resulting string (or &str from the Cow) into write_f instead. Update the move |_self: PyObjectRef, data: PyStrRef, vm: &VirtualMachine| -> PyResult<()> { write_f(data.expect_str(), vm) } closure to call data.to_string_lossy() and forward the lossy string to write_f so surrogate-containing input is handled gracefully.

coderabbitai · 2026-02-24T15:24:48Z

src/shell.rs

+            Ok(ref s) => s.expect_str(),
            Err(_) => "",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code at lines 145-146 in src/shell.rs head -150 src/shell.rs | tail -20

Repository: RustPython/RustPython

Length of output: 616

🏁 Script executed:

# Find the definition of expect_str() to understand its actual behavior rg -n "fn\s+expect_str|def\s+expect_str" --type rust -A 5

Repository: RustPython/RustPython

Length of output: 482

🏁 Script executed:

# Also search for to_str() definition to compare rg -n "fn\s+to_str" --type rust -A 3 | head -50

Repository: RustPython/RustPython

Length of output: 2982

🏁 Script executed:

# Check what type 's' is in this context by looking at more surrounding code sed -n '130,160p' src/shell.rs

Repository: RustPython/RustPython

Length of output: 970

Panic reintroduced via expect_str() on user-controlled prompt

expect_str() calls .expect("PyStr contains surrogates") and will panic if the PyStr contains Unicode surrogates. Since sys.ps1 and sys.ps2 are user-controllable, this reintroduces the panic scenario the PR aims to eliminate. Use to_str().unwrap_or("") or to_string_lossy() instead.

Safer prompt extraction

- let prompt = match prompt { - Ok(ref s) => s.expect_str(), - Err(_) => "", - }; + let prompt = match prompt { + Ok(ref s) => s.to_str().unwrap_or(""), + Err(_) => "", + };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Ok(ref s) => s.expect_str(),

Err(_) => "",

let prompt = match prompt {

Ok(ref s) => s.to_str().unwrap_or(""),

Err(_) => "",

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/shell.rs` around lines 145 - 146, The match arm uses PyStr::expect_str() (Ok(ref s) => s.expect_str()), which can panic on user-controlled sys.ps1/sys.ps2 containing Unicode surrogates; replace that call with a non-panicking conversion such as s.to_str().unwrap_or("") or s.to_string_lossy().as_ref() so the prompt extraction never panics—locate the expect_str() usage in src/shell.rs and swap it for one of these safe conversions.

coderabbitai · 2026-02-24T15:24:48Z

src/shell/helper.rs

        Some((
            startpos,
            completions
                .into_iter()
-                .map(|s| s.as_str().to_owned())
+                .map(|s| s.expect_str().to_owned())
                .collect(),


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the repository structure and locate relevant files git ls-files | grep -E "\.(rs|toml)$" | head -20

Repository: RustPython/RustPython

Length of output: 1359

🏁 Script executed:

# Find PyStrRef and its expect_str implementation rg "fn expect_str" -g '*.rs' -A 5

Repository: RustPython/RustPython

Length of output: 458

🏁 Script executed:

# Look at the complete_opt function in the file being reviewed rg -n "fn complete_opt" -g '*.rs' -B 2 -A 50

Repository: RustPython/RustPython

Length of output: 3145

🏁 Script executed:

# Search for PyStrRef definition rg "struct PyStrRef|type PyStrRef" -g '*.rs' -B 2 -A 10

Repository: RustPython/RustPython

Length of output: 741

🏁 Script executed:

# Check what to_str() does on PyStrRef (if it exists) rg "fn to_str\|to_str()" -g '*.rs' -B 2 -A 5 | head -100

Repository: RustPython/RustPython

Length of output: 47

🏁 Script executed:

# Search for to_str method on PyStr rg "fn to_str" crates/vm/src/builtins/str.rs -B 2 -A 5

Repository: RustPython/RustPython

Length of output: 419

🏁 Script executed:

# Look for all methods available on PyStr to understand the API rg "impl.*PyStr|fn (as_|to_)" crates/vm/src/builtins/str.rs -B 1 -A 3 | head -150

Repository: RustPython/RustPython

Length of output: 3118

🏁 Script executed:

# Check the actual type at line 113 - what does get_available_completions return rg -n "get_available_completions" src/shell/helper.rs -B 5 -A 15

Repository: RustPython/RustPython

Length of output: 1896

Avoid panic from expect_str() in completion mapping.

Line 143 calls expect_str() on completions that may contain surrogates, causing a panic with "PyStr contains surrogates." This contradicts the surrounding code's use of as_wtf8() for surrogate-safe sorting. Use a fallible conversion instead:

Suggested fix

- Some(( - startpos, - completions - .into_iter() - .map(|s| s.expect_str().to_owned()) - .collect(), - )) + Some(( + startpos, + completions + .into_iter() + .filter_map(|s| s.to_str().map(str::to_owned)) + .collect(), + ))

This skips invalid UTF-8 entries, which is appropriate for shell completions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Some((

startpos,

completions

.into_iter()

.map(|s| s.as_str().to_owned())

.map(|s| s.expect_str().to_owned())

.collect(),

Some((

startpos,

completions

.into_iter()

.filter_map(|s| s.to_str().map(str::to_owned))

.collect(),

))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/shell/helper.rs` around lines 139 - 144, The current map calls expect_str() on completion items which can panic ("PyStr contains surrogates"); instead make the conversion fallible and skip invalid entries: locate the completion mapping where completions.into_iter().map(|s| s.expect_str().to_owned()) is used and replace it with a filter_map that attempts a surrogate-safe conversion (use the same as_wtf8()/fallible API already used elsewhere) and .map(|s| s.to_owned()) only on success, so invalid UTF-8/composite entries are skipped rather than panicking.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

crates/vm/src/stdlib/nt.rs (1)

1230-1256: ⚠️ Potential issue | 🟠 Major

Replace expect_str() with to_str() for proper error handling to prevent panic on surrogate characters.

expect_str() panics if the string contains surrogates, and since PyStr supports WTF-8, surrogates are possible in user input. The argv and env conversions (lines 1232, 1255–1256) should handle this gracefully instead of panicking.

Use to_str().ok_or_else(...) to convert None to a proper error, following the pattern established throughout the codebase:

Corrected fix

         let argv = vm.extract_elements_with(argv.as_ref(), |obj| {
             let arg = PyStrRef::try_from_object(vm, obj)?;
-            make_widestring(arg.expect_str())
+            let arg_str = arg.to_str()
+                .ok_or_else(|| vm.new_value_error("string contains surrogates"))?;
+            make_widestring(arg_str)
         })?;

-            let key_str = key.expect_str();
-            let value_str = value.expect_str();
+            let key_str = key.to_str()
+                .ok_or_else(|| vm.new_value_error("string contains surrogates"))?;
+            let value_str = value.to_str()
+                .ok_or_else(|| vm.new_value_error("string contains surrogates"))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/nt.rs` around lines 1230 - 1256, The code currently
calls expect_str() on PyStrRef values when building argv and env (in the argv
extraction map closure and in the env loop handling key/value), which can panic
on surrogate-containing WTF-8 input; replace those expect_str() uses with
to_str().ok_or_else(...) and return a VM error instead of panicking: for the
argv mapping (where PyStrRef::try_from_object is used and the first element is
checked) use arg.to_str().ok_or_else(|| vm.new_value_error("execve: argv first
element contains invalid surrogates")) (or a similar descriptive vm error) and
for the env loop (key and value) replace key.expect_str() and value.expect_str()
with key.to_str().ok_or_else(|| vm.new_value_error("execve: env contains invalid
surrogates")) and value.to_str().ok_or_else(...); keep the rest of the
argv_execve/env_strings construction unchanged.

crates/vm/src/stdlib/ast/python.rs (1)

129-231: ⚠️ Potential issue | 🟠 Major

Fix type inconsistency: use Vec<PyUtf8StrRef> when building payload in __replace__.

Lines 102 and 110 cast fields and attributes to Vec<PyStrRef>, but earlier in the same function (lines 131-138) they are cast to Vec<PyUtf8StrRef>. Keys added via field.as_object() and attr.as_object() at lines 105 and 113 will be PyStr objects. However, line 128 attempts to downcast all payload keys to PyUtf8Str, which will fail if those keys are non-UTF8 PyStr objects, triggering the "keywords must be strings" error.

Change lines 102 and 110 to use Vec<PyUtf8StrRef> to ensure payload keys are consistently UTF-8 strings.

Suggested fix

             if let Some(fields) = fields.clone() {
-                let fields: Vec<PyStrRef> = fields.try_to_value(vm)?;
+                let fields: Vec<PyUtf8StrRef> = fields.try_to_value(vm)?;
                 for field in fields {
                     if let Some(value) = dict.get_item_opt::<Wtf8>(field.as_wtf8(), vm)? {
                         payload.set_item(field.as_object(), value, vm)?;
                     }
                 }
             }
             if let Some(attributes) = attributes.clone() {
-                let attributes: Vec<PyStrRef> = attributes.try_to_value(vm)?;
+                let attributes: Vec<PyUtf8StrRef> = attributes.try_to_value(vm)?;
                 for attr in attributes {
                     if let Some(value) = dict.get_item_opt::<Wtf8>(attr.as_wtf8(), vm)? {
                         payload.set_item(attr.as_object(), value, vm)?;
                     }
                 }
             }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ast/python.rs` around lines 129 - 231, The payload
construction in __replace__ uses Vec<PyStrRef> for fields and attributes but
earlier the code uses Vec<PyUtf8StrRef>, causing a mismatch when payload keys
are later downcast to PyUtf8Str; change the casts that build payload (the
conversions that create fields and attributes before calling dict.get_item_opt
and payload.set_item) from Vec<PyStrRef> to Vec<PyUtf8StrRef> so keys inserted
via field.as_object()/attr.as_object() are guaranteed UTF‑8 and will downcast
successfully to PyUtf8Str in the final kwargs collection.

crates/vm/src/frame.rs (1)

2635-2660: ⚠️ Potential issue | 🟠 Major

Use safe error handling for PyUtf8Str keyword name downcast to avoid panics on malformed bytecode.

collect_keyword_args uses downcast_ref::<PyUtf8Str>().unwrap() on kwarg names, but the tuple contains PyStr objects created by ctx.new_str(), which can contain WTF-8 surrogates and are not guaranteed to be valid UTF-8. This will panic if keyword argument strings contain invalid UTF-8. In contrast, collect_ex_args correctly handles this with ok_or_else() returning a proper error. Apply consistent error handling using either a safe conversion like try_as_utf8() or return a PyResult to propagate errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 2635 - 2660, The current kwarg name
mapping in the function that builds FuncArgs (the block that maps
kwarg_names.as_slice().iter() and calls downcast_ref::<PyUtf8Str>().unwrap())
can panic on invalid UTF-8; change it to propagate a PyResult error like
collect_ex_args does: perform a safe downcast (use
downcast_ref::<PyUtf8Str>().ok_or_else(|| vm.new_type_error("keywords must be
strings"))? or use try_as_utf8()/equivalent) and return a PyResult<FuncArgs> on
failure, then call FuncArgs::with_kwargs_names only after successful conversion
so malformed keyword names produce a Python TypeError instead of panicking.
Ensure you reference the same symbols: kwarg_names, PyUtf8Str,
FuncArgs::with_kwargs_names and the surrounding function signature to adjust the
return type if needed.

crates/stdlib/src/ssl.rs (1)

2019-2039: ⚠️ Potential issue | 🟠 Major

Accept password=None to match CPython behavior.

OptionalArg::Present(None) currently raises a TypeError, but Python allows password=None. This breaks compatibility and makes load_cert_chain(password=None) fail.

🐛 Suggested fix

             match password {
                 OptionalArg::Present(p) => {
+                    if vm.is_none(p) {
+                        return Ok((None, None));
+                    }
                     // Try string first
                     if let Ok(pwd_str) = PyUtf8StrRef::try_from_object(vm, p.clone()) {
                         Ok((Some(pwd_str.as_str().to_owned()), None))
                     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/ssl.rs` around lines 2019 - 2039, The match on password
currently treats OptionalArg::Present(None) as invalid; change the handling in
the password match (in the load_cert_chain/password parsing block) to accept
OptionalArg::Present(None) and return Ok((None, None)) (i.e., treat explicit
None as no password), while preserving existing branches for Present(Some(obj))
that try string/bytes/callable and for OptionalArg::Missing; update the match
arm that inspects OptionalArg::Present to first check for p.is_none() (or match
OptionalArg::Present(None)) and handle it accordingly so
load_cert_chain/password behavior matches CPython.

♻️ Duplicate comments (17)

crates/vm/src/stdlib/msvcrt.rs (2)

83-90: ⚠️ Potential issue | 🟠 Major

Avoid expect_str() to prevent surrogate-triggered panics in putwch.

PyStrRef can contain surrogates; expect_str() can panic and crash the VM. Use WTF‑8 code points instead.

✅ Suggested surrogate-safe change

-        let c = s
-            .expect_str()
-            .chars()
-            .exactly_one()
-            .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?;
-        unsafe { suppress_iph!(_putwch(c as u16)) };
+        let c = s
+            .as_wtf8()
+            .code_points()
+            .exactly_one()
+            .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?;
+        unsafe { suppress_iph!(_putwch(c.to_u32() as u16)) };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/msvcrt.rs` around lines 83 - 90, The putwch function
uses PyStrRef::expect_str(), which can panic on surrogate-containing strings and
crash the VM; update putwch to extract the string via a surrogate-safe
WTF‑8/UTF‑8 accessor (do not call expect_str()), iterate or decode WTF‑8 code
points to ensure exactly one Unicode scalar or code unit as intended, and map
errors to vm.new_type_error("putch() argument must be a string of length 1")
instead of panicking; locate the logic in putwch and replace the
expect_str()/chars().exactly_one() path with a surrogate-safe WTF‑8
check/iterator that returns a controlled PyResult error when the length is not
one, then call _putwch with the correct u16 value.

108-114: ⚠️ Potential issue | 🟠 Major

Avoid expect_str() to prevent surrogate-triggered panics in ungetwch.

Same issue as putwch: expect_str() can panic on surrogate-containing strings.

✅ Suggested surrogate-safe change

-        let c =
-            s.expect_str().chars().exactly_one().map_err(|_| {
-                vm.new_type_error("ungetwch() argument must be a string of length 1")
-            })?;
-        let ret = unsafe { suppress_iph!(_ungetwch(c as u32)) };
+        let c = s
+            .as_wtf8()
+            .code_points()
+            .exactly_one()
+            .map_err(|_| vm.new_type_error("ungetwch() argument must be a string of length 1"))?;
+        let ret = unsafe { suppress_iph!(_ungetwch(c.to_u32())) };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/msvcrt.rs` around lines 108 - 114, In ungetwch, avoid
calling s.expect_str() (which can panic on surrogate-containing strings);
instead obtain the Python string without expect and iterate its characters
safely (as in the putwch fix) so you call .chars().exactly_one().map_err(...) on
the surrogate-safe string view, then pass the resulting code point to the unsafe
suppress_iph!(_ungetwch(c as u32)) call; update the code around ungetwch and the
character-extraction logic to mirror the surrogate-safe approach used for
putwch.

crates/vm/src/stdlib/_wmi.rs (1)

563-565: ⚠️ Potential issue | 🔴 Critical

expect_str() still panics on surrogate-containing strings — PR objective unmet

Replacing as_str() with expect_str() is a rename-only change; expect_str() still calls .expect(…) and will panic on any PyStr that holds lone surrogates. This file needs to complete the migration to to_str() with proper error propagation, which is exactly what the rest of the PR does elsewhere.
🐛 Proposed fix
-        let query_str = query.expect_str();
+        let query_str = query.to_str().map_err(|e| vm.new_value_error(e.to_string()))?;
query_str is only used for the ASCII prefix check (lines 566-569) and the wide_str(query_str) call (line 573), both of which accept &str, so no further changes are needed downstream.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/_wmi.rs` around lines 563 - 565, The code still calls
expect_str() which can panic on surrogate-containing PyStr; change exec_query to
call query.to_str(vm)? (propagating the PyErr) and bind query_str as &str for
the ASCII prefix check and wide_str(query_str) call so errors are returned
instead of panicking; update the usage of query_str accordingly in exec_query.

crates/wasm/src/browser_module.rs (1)

66-68: ⚠️ Potential issue | 🔴 Critical

Replace expect_str() with fallible conversion to avoid WASM panics.

expect_str() will panic on surrogate-containing Python strings; these inputs are user-controlled (URLs, headers, selectors, module names) and will abort the WASM runtime instead of raising a Python error. Please switch to to_str() (propagate a Python error) or to_string_lossy() where a lossy fallback is acceptable, and map errors through the existing convert::js_py_typeerror / py_err_to_js_err paths.
#!/bin/bash
# Inspect PyStrRef conversion APIs and error types to wire correct fallible conversion paths.
rg -n "fn expect_str|fn to_str|fn to_string_lossy" crates -g '*.rs' -C3
rg -n "impl .*PyStrRef|impl .*PyStr" crates -g '*.rs' -C3
Also applies to: 73-73, 79-80, 87-88, 95-96, 171-172, 206-208, 215-216, 227-228, 244-245
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/wasm/src/browser_module.rs` around lines 66 - 68, Replace all uses of
expect_str() that can panic (e.g., in the response_format match where
FetchResponseFormat::from_str(vm, s.expect_str()) is called) with a fallible
conversion and propagate a Python error: call s.to_str() (or s.to_string_lossy()
if a lossy fallback is acceptable) and map the Err through the existing
conversion helpers (convert::js_py_typeerror or py_err_to_js_err) before passing
the resulting &str/String into FetchResponseFormat::from_str and other callers;
apply the same change to the other occurrences flagged (headers, url, selector,
module-name conversions) so no user-controlled PyStrRef can trigger a WASM
panic.

crates/vm/src/stdlib/functools.rs (1)

480-492: Lossy format! on &Wtf8 reference still present — surrogates replaced via Display.

The format!("{module_name}.{qualname_str}") at line 486 (and subsequently Wtf8Buf::from(format!("{qualified_name}(")) at line 492) triggers the Display implementation for Wtf8, which substitutes unpaired surrogates with U+FFFD. This was flagged in a previous review and the proposed fix (build the prefix directly with Wtf8Buf::push_wtf8 / push_str) was not applied.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/functools.rs` around lines 480 - 492, The code uses
format! on Wtf8 values (format!("{module_name}.{qualname_str}") and
format!("{qualified_name}(")) which invokes Display and loses unpaired
surrogates; instead construct the strings with Wtf8-aware APIs: avoid format!
and build a Wtf8Buf directly (use Wtf8Buf::new / Wtf8Buf::from and call
push_wtf8 or push_str) to append module_name, a literal ".", qualname_str, and
finally "(" so surrogates are preserved; update the qualified_name creation and
the result initialization (the variables/match arm around
module.downcast::<crate::builtins::PyStr>(), module_name, qualname_str,
qualified_name, and result) to use these push_* methods rather than format!.

crates/vm/src/intern.rs (1)

160-168: as_str() can still panic on interned strings containing surrogates.

StringPool::intern accepts Wtf8Buf and &Wtf8 (lines 292–305), so interned strings aren't guaranteed to be valid UTF-8. The unwrap_or_else(|| panic!(...)) reintroduces exactly the kind of panic this PR aims to eliminate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/intern.rs` around lines 160 - 168, The current as_str() on
PyInterned<PyStr> calls self.inner.to_str().unwrap_or_else(...) which can still
panic for WTF-8 data with surrogates; replace this panic-prone API by returning
a non-panicking string view (e.g., change as_str() to return Cow<'_, str> and
use self.inner.to_string_lossy() instead of to_str().unwrap_or_else(...)), or
alternatively add a new as_wtf8() that returns &Wtf8 and keep a lossy
as_str_lossy() that returns Cow<'_, str>; update uses of as_str accordingly to
avoid any unwrap/panic on invalid UTF-8.

crates/vm/src/stdlib/_winapi.rs (3)

298-302: ⚠️ Potential issue | 🔴 Critical

expect_str() panics on surrogate-containing strings — contradicts PR objective.

The wstr closure calls s.expect_str() which panics when the PyStr contains surrogates. On Windows, file paths and command lines can legitimately contain unpaired surrogates (e.g., filenames created via UTF-16 APIs). This reintroduces the exact class of panics the PR aims to eliminate.

Consider using s.as_wtf8().to_wide_with_nul() directly (bypassing the WideCString::from_str path) since the Windows APIs accept UTF-16 including surrogates, or use s.to_str() with proper error propagation.
Suggested approach using WTF-8 → wide conversion
         let wstr = |s: PyStrRef| {
-            let ws = widestring::WideCString::from_str(s.expect_str())
-                .map_err(|err| err.to_pyexception(vm))?;
-            Ok(ws.into_vec_with_nul())
+            Ok(s.as_wtf8().to_wide_with_nul())
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/_winapi.rs` around lines 298 - 302, The wstr closure
currently calls s.expect_str(), which can panic on PyStrRef values containing
surrogate code points; replace that path with a surrogate-safe conversion by
using s.as_wtf8().to_wide_with_nul() (or an equivalent WTF-8 → UTF-16
conversion) and avoid WideCString::from_str/from_str error branches that assume
valid UTF-8; update the closure (wstr) to take PyStrRef, call as_wtf8(), convert
to a wide Vec<u16> including NUL with to_wide_with_nul(), and propagate any
conversion errors via map_err to the vm rather than using expect_str().
399-400: ⚠️ Potential issue | 🔴 Critical

expect_str() in CreateJunction will panic on surrogate paths.

Same issue as in the wstr closure. Windows junction paths can contain surrogates. Use to_str() with error handling or convert via WTF-8 → wide string.
Suggested fix using to_str()
-        let src_path = std::path::Path::new(src_path.expect_str());
-        let dest_path = std::path::Path::new(dest_path.expect_str());
+        let src_str = src_path.to_str().ok_or_else(|| {
+            vm.new_value_error("src_path contains surrogates")
+        })?;
+        let dest_str = dest_path.to_str().ok_or_else(|| {
+            vm.new_value_error("dest_path contains surrogates")
+        })?;
+        let src_path = std::path::Path::new(src_str);
+        let dest_path = std::path::Path::new(dest_str);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/_winapi.rs` around lines 399 - 400, The CreateJunction
implementation currently calls expect_str() when building src_path and dest_path
(and similarly uses the wstr closure), which will panic on Windows paths
containing UTF-16 surrogates; change these to avoid expect_str(): call to_str()
and handle the None case (return a proper error) or convert the OsStr via a
WTF-8 → wide-string path conversion before passing to Windows APIs, updating the
CreateJunction function and the wstr helper to accept and handle
surrogate-containing paths safely instead of unwrapping.
419-430: ⚠️ Potential issue | 🔴 Critical

expect_str() in getenvironment will panic on surrogate env vars.

Windows environment variables can contain unpaired surrogates. Both k.expect_str() and v.expect_str() at lines 421 and 423 will panic in that case.

Additionally, line 424 uses k.contains('\0') and v.contains('\0') on &str — this is correct only if expect_str() succeeds, but the code will panic before reaching the null check if surrogates are present.

Consider converting to wide strings via WTF-8 directly rather than going through &str.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/_winapi.rs` around lines 419 - 430, The code in
getenvironment uses PyStrRef::expect_str() for k and v which will panic on
unpaired surrogates; replace the expect_str() calls and subsequent &str
operations with a WTF-8/raw wide-string path (e.g. obtain the WTF-8/wide
representation from the PyStrRef returned by PyStrRef::try_from_object instead
of calling expect_str()), perform the null byte check against the raw/wtf8
bytes, validate the name emptiness and '=' rule using the WTF-8-aware view
(checking the first scalar/code-unit appropriately rather than slicing &str),
and then perform the uppercase/normalization step on a Unicode-safe
representation (or normalize to wide/uppercase after conversion) so no
unpaired-surrogate panics can occur; update references to expect_str(), the
temporary k/v &str usage, and key_upper creation to use the new
WTF-8/wide-string handling.

crates/vm/src/types/slot.rs (1)

1715-1724: ⚠️ Potential issue | 🟠 Major

Break the default repr recursion to avoid stack overflow.
repr_wtf8 and repr_str still call each other when neither is overridden.

Suggested fix (one example)

-    fn repr_str(zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<String> {
-        Self::repr_wtf8(zelf, vm).map(|w| w.to_string())
-    }
+    fn repr_str(_zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<String> {
+        Err(vm.new_type_error("repr_str or repr_wtf8 must be implemented"))
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/types/slot.rs` around lines 1715 - 1724, The two methods
repr_wtf8 and repr_str mutually call each other when neither is overridden,
causing infinite recursion; break this by giving repr_wtf8 a non-recursive
default implementation that builds a fallback representation (e.g., using the
object's class/type name and address/ID) instead of calling repr_str; keep
repr_str converting the Wtf8Buf to String, and ensure repr (which currently
calls repr_wtf8) still uses that fallback when no override exists so the
recursion is resolved (update the repr_wtf8 implementation and leave repr_str as
the simple .to_string() conversion).

crates/vm/src/stdlib/builtins.rs (1)

566-575: ⚠️ Potential issue | 🟠 Major

Guard expect_str() to avoid surrogate panics.
This still panics on non‑UTF‑8 prompts; switch to try_into_utf8(vm) and fall back to the std‑I/O path when conversion fails.

🛠️ Suggested safe prompt conversion

-        if use_rustyline {
-            let prompt = prompt.as_ref().map_or("", |s| s.expect_str());
+        let (prompt_utf8, use_rustyline) = match prompt.as_ref() {
+            Some(s) => {
+                let prompt_utf8 = s.try_into_utf8(vm).ok();
+                (prompt_utf8, use_rustyline && prompt_utf8.is_some())
+            }
+            None => (None, use_rustyline),
+        };
+        if use_rustyline {
+            let prompt = prompt_utf8.as_ref().map_or("", |s| s.as_str());
             let mut readline = Readline::new(());
             match readline.readline(prompt) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/builtins.rs` around lines 566 - 575, The code
unconditionally calls expect_str() to build the prompt before using Readline
(inside the use_rustyline branch), which panics on non‑UTF‑8 prompts; replace
that call with a safe conversion using try_into_utf8(vm) (or equivalent) and if
the conversion returns an Err, skip the rustyline branch and fall back to the
normal std‑IO path. Specifically, change the prompt extraction around
expect_str() so the prompt is attempted via try_into_utf8(vm) and only proceed
to create Readline and call readline.readline(prompt) when conversion succeeds;
otherwise treat use_rustyline as false (or jump to the existing std I/O
handling).

crates/vm/src/vm/mod.rs (1)

639-660: ⚠️ Potential issue | 🟡 Minor

Add a separator after msg in unraisable output.
Without it, output concatenates; same issue as the earlier review.

✏️ Suggested tweak

-        let msg_str = if let Some(msg) = msg {
-            format!("Exception ignored {}", msg)
+        let msg_str = if let Some(msg) = msg {
+            format!("Exception ignored {msg}: ")
         } else {
             "Exception ignored in: ".to_owned()
         };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 639 - 660, The unraisable output
concatenates object repr because the Some(msg) branch of msg_str lacks a
separator; update the construction of msg_str (where msg_str is built from msg
in the let msg_str = if let Some(msg) { ... } else { ... } block) to append a
separator (for example ": " or " in: ") after the interpolated msg so it matches
the None branch's separator before the subsequent write_to_stderr and
repr_result/repr_wtf8 output; ensure the change affects the same msg_str/ msg
handling before write_to_stderr is called.

crates/vm/src/stdlib/ctypes/function.rs (2)

753-759: ⚠️ Potential issue | 🟠 Major

Avoid panic if _type_ is mutated to non‑UTF‑8.

to_str().expect(...) can panic if Python mutates _type_ to a non‑UTF‑8 value; treat invalid UTF‑8 as a non‑match instead.

🛠️ Suggested fix

-        let c = s
-            .to_str()
-            .expect("_type_ is validated as ASCII at type creation");
-        if c.len() == 1 && "sPzUZXO".contains(c) {
-            return true;
-        }
+        if let Some(c) = s.to_str() {
+            if c.len() == 1 && "sPzUZXO".contains(c) {
+                return true;
+            }
+        }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 753 - 759, The code
currently calls s.to_str().expect(...) which can panic if the Python-side
`_type_` string contains invalid UTF‑8; change the logic to safely handle
invalid UTF‑8 by attempting to convert with to_str() and only proceeding when it
returns Ok(&str) (treat Err as a non‑match), so replace the expect-based
conversion in the block that reads ctype.get_attr("_type_", vm) and downcasts to
PyStr with a guarded to_str() check before using c (referencing the PyStr value
and the existing conditional that checks c.len() == 1 && "sPzUZXO".contains(c)).

282-289: ⚠️ Potential issue | 🟠 Major

Use platform‑aware wchar_t conversion (UTF‑16 here breaks non‑Windows).

encode_wide() yields UTF‑16; on Unix, libc::wchar_t is typically 32‑bit, so this path will mis-encode wide strings.

🛠️ Suggested fix

-        // Convert to null-terminated UTF-16, preserving lone surrogates
-        let wide: Vec<u16> = s
-            .as_wtf8()
-            .encode_wide()
-            .chain(core::iter::once(0))
-            .collect();
-        let wide_bytes: Vec<u8> = wide.iter().flat_map(|&x| x.to_ne_bytes()).collect();
-        let keep = vm.ctx.new_bytes(wide_bytes);
+        // Convert to platform wchar_t bytes (UTF-16 on Windows, UTF-32 on Unix)
+        let wide_bytes = super::base::str_to_wchar_bytes(s.as_wtf8());
+        let keep = vm.ctx.new_bytes(wide_bytes);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 282 - 289, The current
branch in the PyStr handling uses encode_wide() and treats wchar_t as 16-bit
which breaks on platforms where libc::wchar_t is 32-bit; change the conversion
in the value.downcast_ref::<PyStr>() branch (symbols: s, wide, wide_bytes) to be
platform-aware: check std::mem::size_of::<libc::wchar_t>() (or cfg(target_os =
"windows")) and if it's 2 bytes keep the existing UTF‑16 flow
(s.as_wtf8().encode_wide() + 0u16), otherwise produce UTF‑32 code units by
iterating the WTF-8 string into Unicode scalar values (e.g. s.as_wtf8().chars()
or equivalent that preserves lone surrogates semantics), append a 0u32
terminator, and then convert those u32 units to native-endian bytes for
wide_bytes so the produced buffer matches the platform's wchar_t size.

crates/vm/src/builtins/interpolation.rs (1)

205-223: ⚠️ Potential issue | 🟠 Major

Avoid push_str on PyStr reprs to prevent surrogate panics.
expr_repr / spec_repr are PyStr values; push_str(&expr_repr) routes through AsRef<str> and can panic on WTF‑8 surrogates. Use push_wtf8(..) like the other parts.

🐛 Proposed fix

-        result.push_str(&expr_repr);
+        result.push_wtf8(expr_repr.as_wtf8());
         result.push_str(", ");
         if vm.is_none(&zelf.conversion) {
             result.push_str("None");
         } else {
             result.push_wtf8(zelf.conversion.repr(vm)?.as_wtf8());
         }
         result.push_str(", ");
-        result.push_str(&spec_repr);
+        result.push_wtf8(spec_repr.as_wtf8());

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/interpolation.rs` around lines 205 - 223, In repr_wtf8
replace the uses of push_str(&expr_repr) and push_str(&spec_repr) — which call
AsRef<str> and can panic on WTF‑8 surrogates — with push_wtf8 using the
PyStr/WTF‑8 backing (e.g. call result.push_wtf8(expr_repr.as_wtf8()) and
result.push_wtf8(spec_repr.as_wtf8())); keep the other push_wtf8 usages and the
conversion handling in the repr_wtf8 function unchanged.

crates/vm/src/stdlib/ctypes/base.rs (1)

366-373: Clarify the _type_ ASCII validation scope in the comment.
The to_str().expect(...) assumption is safe only for the simple-type _type_ path; consider noting that explicitly so the invariant is clear to future readers.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/base.rs` around lines 366 - 373, Clarify that the
ASCII validation referenced by the expect() call applies only to the simple-type
`_type_` path: update the comment near the field_type.get_attr("_type_", vm) /
downcast_ref::<PyStr>() branch to explicitly state that `_type_` was validated
as ASCII at type creation for simple types so that the to_str().expect("_type_
is validated as ASCII at type creation") invariant is clearly scoped to this
code path (the simple-type `_type_` attribute handling in this block).

crates/vm/src/builtins/str.rs (1)

1363-1390: ⚠️ Potential issue | 🟠 Major

Do not swallow non-KeyError exceptions in translate.
Errors other than KeyError from table.__getitem__ should propagate; treating them as “missing mapping” can mask failures.

🐛 Suggested fix

-            match table.get_item(&*cp.to_u32().to_pyobject(vm), vm) {
+            match table.get_item(&*cp.to_u32().to_pyobject(vm), vm) {
                 Ok(value) => {
                     if let Some(text) = value.downcast_ref::<Self>() {
                         translated.push_wtf8(text.as_wtf8());
                     } else if let Some(bigint) = value.downcast_ref::<PyInt>() {
                         let mapped = bigint
                             .as_bigint()
                             .to_u32()
                             .and_then(CodePoint::from_u32)
                             .ok_or_else(|| {
                                 vm.new_value_error("character mapping must be in range(0x110000)")
                             })?;
                         translated.push(mapped);
                     } else if !vm.is_none(&value) {
                         return Err(
                             vm.new_type_error("character mapping must return integer, None or str")
                         );
                     }
                 }
-                _ => translated.push(cp),
+                Err(e) if e.fast_isinstance(vm.ctx.exceptions.key_error) => {
+                    translated.push(cp)
+                }
+                Err(e) => return Err(e),
             }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 1363 - 1390, The translate method
currently treats any Err from table.get_item as a missing mapping; change the
Err arm to capture the exception (e.g., Err(err)) and only swallow it if it is a
KeyError, otherwise propagate it by returning Err(err). Use the VM helper to
test the exception type (e.g., vm.is_instance(&err,
&vm.ctx.exceptions.key_error) or equivalent) so non-KeyError exceptions from
table.__getitem__ are returned to the caller instead of being treated as "no
mapping".

🧹 Nitpick comments (9)

crates/stdlib/src/zlib.rs (1)

468-471: Optional: collapse the two-step into() chain by widening the parameter bound.

The intermediate String binding exists solely to bridge impl Into<String> → Wtf8Buf. Since every call site already passes &str or String — both of which implement Into<Wtf8Buf> after this PR's additions — the signature can be updated directly, eliminating the extra allocation on already-String inputs.
♻️ Proposed simplification
-    fn new_zlib_error(message: impl Into<String>, vm: &VirtualMachine) -> PyBaseExceptionRef {
-        let msg: String = message.into();
-        vm.new_exception_msg(vm.class("zlib", "error"), msg.into())
+    fn new_zlib_error(message: impl Into<Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef {
+        vm.new_exception_msg(vm.class("zlib", "error"), message.into())
     }
Wtf8Buf would also need to be brought into scope at the top of the inner zlib module:
use rustpython_vm::common::wtf8::Wtf8Buf; // adjust path as needed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/zlib.rs` around lines 468 - 471, Change new_zlib_error to
accept message: impl Into<Wtf8Buf> (instead of impl Into<String>), bring Wtf8Buf
into scope in the inner zlib module, and pass message.into() directly to
vm.new_exception_msg to avoid the intermediate String allocation; update the
signature and the call site in fn new_zlib_error accordingly so the function
uses Wtf8Buf conversion once.

crates/vm/src/stdlib/time.rs (1)

103-104: Consider s.as_wtf8().as_bytes() for API consistency

s.as_bytes() is correct here (ASCII is identical under WTF-8 and UTF-8), but the rest of this PR consistently uses as_wtf8() to make the WTF-8 nature explicit. Using the same pattern here keeps the migration surface uniform.
🔧 Suggested change
-               && s.as_bytes() == b"negative duration"
+               && s.as_wtf8().as_bytes() == b"negative duration"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/time.rs` around lines 103 - 104, Replace the UTF-8 byte
conversion call to maintain consistency with the rest of the PR: in the
conditional that extracts the first argument string from e.args() (the closure
using arg.str(vm).ok()), change s.as_bytes() to s.as_wtf8().as_bytes() so the
WTF-8 representation is used consistently; keep the same comparison to
b"negative duration".

crates/vm/src/stdlib/sre.rs (2)

856-856: Prefer .expect() over .unwrap() to document the invariant.

get_slice(0, …) returns None only when regs[0] contains a negative span. Since regs[0] is always set from a usize in Match::new, the unwrap() is safe, but the invariant is silent. Using expect() documents it clearly.

♻️ Suggested change

-                let match_repr = zelf.get_slice(0, str_drive, vm).unwrap().repr(vm)?;
+                let match_repr = zelf
+                    .get_slice(0, str_drive, vm)
+                    .expect("group 0 always has a valid span")
+                    .repr(vm)?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/sre.rs` at line 856, Replace the silent unwrap on
zelf.get_slice(0, str_drive, vm).unwrap() with expect() to document the
invariant: change to zelf.get_slice(0, str_drive, vm).expect("regs[0] is
non-negative by construction in Match::new").repr(vm)?; this makes it explicit
that get_slice can only be None for a negative span and that regs[0] is always
set from a usize in Match::new.

857-860: Wtf8Buf::from(format!(…)) allocates an intermediate String.

Since only ASCII literals and integers are used, the intermediate String can be avoided by pushing pieces directly—matching the style used elsewhere in this PR (e.g. mappingproxy.rs, template.rs).

♻️ Suggested change

-                let mut result = Wtf8Buf::from(format!(
-                    "<re.Match object; span=({}, {}), match=",
-                    zelf.regs[0].0, zelf.regs[0].1,
-                ));
+                let mut result = Wtf8Buf::from("<re.Match object; span=(");
+                result.push_str(&zelf.regs[0].0.to_string());
+                result.push_str(", ");
+                result.push_str(&zelf.regs[0].1.to_string());
+                result.push_str("), match=");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/sre.rs` around lines 857 - 860, Replace the allocation
from Wtf8Buf::from(format!(...)) by building the buffer in pieces: create result
with Wtf8Buf::from("<re.Match object; span=(") and then append the integer
bounds and punctuation directly (use result.push_str(...) / result.push(...) or
the same push helpers used elsewhere in mappingproxy.rs/template.rs) for
zelf.regs[0].0, ',', ' ', zelf.regs[0].1, '), match=' to avoid the intermediate
String allocation; update the code around the Wtf8Buf::from(format!(…)) call and
the variable result accordingly.

crates/vm/src/exception_group.rs (2)

217-228: Minor: ArgIterable is heavier than a direct tuple downcast for a known-tuple slot.

The constructor guarantees arg(1) is always a PyTuple, yet slot_repr goes through ArgIterable::try_from_object, which dispatches Python's iterator protocol. A direct downcast_ref::<PyTuple>() (mirroring the pattern in __str__ at line 191 and in get_exceptions_tuple) avoids the extra allocation and dispatch:

♻️ Direct tuple downcast in slot_repr

-            if let Some(exceptions_obj) = zelf.get_arg(1) {
-                let iter: ArgIterable<PyObjectRef> =
-                    ArgIterable::try_from_object(vm, exceptions_obj.clone())?;
-                let mut first = true;
-                for exc in iter.iter(vm)? {
-                    if !first {
-                        result.push_str(", ");
-                    }
-                    first = false;
-                    result.push_wtf8(exc?.repr(vm)?.as_wtf8());
-                }
-            }
+            if let Some(exceptions_obj) = zelf.get_arg(1) {
+                if let Some(tuple) = exceptions_obj.downcast_ref::<PyTuple>() {
+                    let mut first = true;
+                    for exc in tuple.iter() {
+                        if !first {
+                            result.push_str(", ");
+                        }
+                        first = false;
+                        result.push_wtf8(exc.repr(vm)?.as_wtf8());
+                    }
+                }
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/exception_group.rs` around lines 217 - 228, In slot_repr, avoid
using ArgIterable::try_from_object for the known-tuple arg(1); instead downcast
the argument to a PyTuple directly (use zelf.get_arg(1) and
downcast_ref::<PyTuple>()) and iterate its elements, mirroring the pattern used
in __str__ and get_exceptions_tuple to eliminate iterator protocol dispatch and
extra allocation; update the code paths around zelf, get_arg, and the for-loop
that builds result so it uses the PyTuple iteration and repr calls on each
element.

196-201: write! + .unwrap() is idiomatic but note the implicit assumption.

Wtf8Buf::write_str must be infallible (as is String's) for the two .unwrap() calls here and on line 213 to never panic. If the fmt::Write impl for Wtf8Buf can ever return Err (e.g., due to an encoding edge case when non-UTF-8 format arguments are passed), these become silent panics. Consider using .expect("Wtf8Buf write is infallible") to make the assumption explicit, or verify the impl is truly infallible.

💡 Explicit infallibility annotation

-            write!(result, " ({num_excs} sub-exception{suffix})").unwrap();
+            write!(result, " ({num_excs} sub-exception{suffix})").expect("Wtf8Buf write is infallible");

And similarly on line 213:

-            write!(result, "{class_name}(").unwrap();
+            write!(result, "{class_name}(").expect("Wtf8Buf write is infallible");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/exception_group.rs` around lines 196 - 201, The current use of
write!(result, ...) followed by .unwrap() assumes Wtf8Buf::write_str is
infallible; replace those .unwrap() calls with .expect("Wtf8Buf write is
infallible") (or otherwise assert/handle the error) for the write! invocation
that appends " ({num_excs} sub-exception{suffix})" and the similar write! call
later (the other occurrence around line 213) so the assumption is explicit;
locate the write! usages operating on the Wtf8Buf `result` and change their
error handling to .expect(...) or add an explicit check that the fmt::Write impl
cannot return Err.

crates/vm/src/builtins/bool.rs (1)

104-109: Return type String is inconsistent with peer __format__ implementations.

int.__format__, float.__format__, and complex.__format__ all return PyResult<Wtf8Buf>, but bool.__format__ returns PyResult<String>. While functionally correct (bool formatting only produces ASCII), aligning the return type would be more consistent.

♻️ Optional: align return type with siblings

-    fn __format__(obj: PyObjectRef, spec: PyUtf8StrRef, vm: &VirtualMachine) -> PyResult<String> {
+    fn __format__(obj: PyObjectRef, spec: PyUtf8StrRef, vm: &VirtualMachine) -> PyResult<Wtf8Buf> {
         let new_bool = obj.try_to_bool(vm)?;
         FormatSpec::parse(spec.as_str())
             .and_then(|format_spec| format_spec.format_bool(new_bool))
+            .map(Wtf8Buf::from_string)
             .map_err(|err| err.into_pyexception(vm))
     }

This would also require adding use rustpython_common::wtf8::Wtf8Buf; to the imports.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/bool.rs` around lines 104 - 109, The __format__
implementation for bool returns PyResult<String> which is inconsistent with
int/float/complex that return PyResult<Wtf8Buf>; change bool::__format__
signature to return PyResult<Wtf8Buf>, convert the result of
FormatSpec::parse(...).and_then(|fs| fs.format_bool(new_bool)) into a Wtf8Buf
(e.g. via Wtf8Buf::from_str or equivalent) and propagate errors the same way,
and add use rustpython_common::wtf8::Wtf8Buf; to the imports so the function
matches its peers.

crates/vm/src/import.rs (1)

528-534: Downcasting __package__ to PyUtf8StrRef rejects surrogate-containing package names.

Previously this would have accepted any PyStr (and potentially panicked on as_str()). Now it returns TypeError("package must be a string") for surrogate-containing strings. The error message is slightly misleading — the value is a string, it's just not a valid UTF-8 string. That said, surrogate package names are pathological and this aligns with CPython behavior where import resolution operates on str values.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/import.rs` around lines 528 - 534, The current downcast uses
PyUtf8StrRef which rejects surrogate-containing Python strings; change the
downcast to PyStrRef (i.e., downcast::<PyStrRef>()) so genuine PyStr values
(including surrogates) are accepted, and update any subsequent usage of pkg_str
to work with PyStrRef (use its APIs rather than UTF-8-only methods); keep the
TypeError for non-string types but do not treat surrogate-containing strings as
non-strings.

crates/vm/src/builtins/type.rs (1)

1564-1567: Avoid expect on slot names to keep surrogate paths panic‑free.

Using expect can still panic if upstream validation changes or if a surrogate slips through. Prefer converting to a typed error here.

♻️ Suggested fix

-                let mangled_name = mangle_name(
-                    &class_name,
-                    member.to_str().expect("slot name is always valid UTF-8"),
-                );
+                let member_name = member.to_str().ok_or_else(|| {
+                    vm.new_type_error("__slots__ must be valid UTF-8 identifiers".to_owned())
+                })?;
+                let mangled_name = mangle_name(&class_name, member_name);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1564 - 1567, The code calls
member.to_str().expect(...) before passing the string to mangle_name, which can
panic; replace the expect with a safe conversion that returns/propagates a typed
error instead (e.g., use member.to_str().ok_or_else(||
Error::SlotNameEncoding(member.to_os_string()))? or map to an existing error
variant), then pass the &str to mangle_name as before; update the surrounding
function (where mangle_name is invoked) to return a Result so the new error can
be propagated rather than panicking.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e77e62f and 3c9a023.

📒 Files selected for processing (120)

crates/common/src/str.rs
crates/stdlib/src/_asyncio.rs
crates/stdlib/src/_sqlite3.rs
crates/stdlib/src/array.rs
crates/stdlib/src/binascii.rs
crates/stdlib/src/contextvars.rs
crates/stdlib/src/csv.rs
crates/stdlib/src/grp.rs
crates/stdlib/src/hashlib.rs
crates/stdlib/src/json.rs
crates/stdlib/src/locale.rs
crates/stdlib/src/lzma.rs
crates/stdlib/src/multiprocessing.rs
crates/stdlib/src/overlapped.rs
crates/stdlib/src/posixshmem.rs
crates/stdlib/src/pyexpat.rs
crates/stdlib/src/pystruct.rs
crates/stdlib/src/socket.rs
crates/stdlib/src/ssl.rs
crates/stdlib/src/unicodedata.rs
crates/stdlib/src/zlib.rs
crates/vm/src/buffer.rs
crates/vm/src/builtins/bool.rs
crates/vm/src/builtins/classmethod.rs
crates/vm/src/builtins/code.rs
crates/vm/src/builtins/complex.rs
crates/vm/src/builtins/dict.rs
crates/vm/src/builtins/float.rs
crates/vm/src/builtins/function.rs
crates/vm/src/builtins/function/jit.rs
crates/vm/src/builtins/genericalias.rs
crates/vm/src/builtins/int.rs
crates/vm/src/builtins/interpolation.rs
crates/vm/src/builtins/list.rs
crates/vm/src/builtins/mappingproxy.rs
crates/vm/src/builtins/memory.rs
crates/vm/src/builtins/mod.rs
crates/vm/src/builtins/module.rs
crates/vm/src/builtins/namespace.rs
crates/vm/src/builtins/object.rs
crates/vm/src/builtins/range.rs
crates/vm/src/builtins/set.rs
crates/vm/src/builtins/slice.rs
crates/vm/src/builtins/staticmethod.rs
crates/vm/src/builtins/str.rs
crates/vm/src/builtins/template.rs
crates/vm/src/builtins/tuple.rs
crates/vm/src/builtins/type.rs
crates/vm/src/builtins/weakproxy.rs
crates/vm/src/bytes_inner.rs
crates/vm/src/codecs.rs
crates/vm/src/coroutine.rs
crates/vm/src/dict_inner.rs
crates/vm/src/exception_group.rs
crates/vm/src/exceptions.rs
crates/vm/src/frame.rs
crates/vm/src/function/buffer.rs
crates/vm/src/function/fspath.rs
crates/vm/src/function/mod.rs
crates/vm/src/import.rs
crates/vm/src/intern.rs
crates/vm/src/macros.rs
crates/vm/src/protocol/object.rs
crates/vm/src/stdlib/_abc.rs
crates/vm/src/stdlib/_winapi.rs
crates/vm/src/stdlib/_wmi.rs
crates/vm/src/stdlib/ast.rs
crates/vm/src/stdlib/ast/basic.rs
crates/vm/src/stdlib/ast/python.rs
crates/vm/src/stdlib/ast/repr.rs
crates/vm/src/stdlib/builtins.rs
crates/vm/src/stdlib/codecs.rs
crates/vm/src/stdlib/collections.rs
crates/vm/src/stdlib/ctypes.rs
crates/vm/src/stdlib/ctypes/array.rs
crates/vm/src/stdlib/ctypes/base.rs
crates/vm/src/stdlib/ctypes/function.rs
crates/vm/src/stdlib/ctypes/pointer.rs
crates/vm/src/stdlib/ctypes/simple.rs
crates/vm/src/stdlib/ctypes/structure.rs
crates/vm/src/stdlib/ctypes/union.rs
crates/vm/src/stdlib/functools.rs
crates/vm/src/stdlib/imp.rs
crates/vm/src/stdlib/io.rs
crates/vm/src/stdlib/itertools.rs
crates/vm/src/stdlib/marshal.rs
crates/vm/src/stdlib/msvcrt.rs
crates/vm/src/stdlib/nt.rs
crates/vm/src/stdlib/operator.rs
crates/vm/src/stdlib/os.rs
crates/vm/src/stdlib/posix.rs
crates/vm/src/stdlib/signal.rs
crates/vm/src/stdlib/sre.rs
crates/vm/src/stdlib/symtable.rs
crates/vm/src/stdlib/sys.rs
crates/vm/src/stdlib/thread.rs
crates/vm/src/stdlib/time.rs
crates/vm/src/stdlib/typevar.rs
crates/vm/src/stdlib/typing.rs
crates/vm/src/stdlib/winreg.rs
crates/vm/src/stdlib/winsound.rs
crates/vm/src/suggestion.rs
crates/vm/src/types/slot.rs
crates/vm/src/types/structseq.rs
crates/vm/src/utils.rs
crates/vm/src/vm/context.rs
crates/vm/src/vm/interpreter.rs
crates/vm/src/vm/mod.rs
crates/vm/src/vm/vm_new.rs
crates/vm/src/warn.rs
crates/wasm/src/browser_module.rs
crates/wasm/src/convert.rs
crates/wasm/src/js_module.rs
crates/wasm/src/wasm_builtins.rs
crates/wtf8/src/concat.rs
crates/wtf8/src/lib.rs
example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
src/lib.rs
src/shell.rs
src/shell/helper.rs

✅ Files skipped from review due to trivial changes (2)

crates/stdlib/src/binascii.rs
crates/vm/src/stdlib/_abc.rs

🚧 Files skipped from review as they are similar to previous changes (41)

crates/stdlib/src/multiprocessing.rs
crates/vm/src/builtins/staticmethod.rs
src/shell.rs
crates/vm/src/function/mod.rs
crates/vm/src/function/fspath.rs
crates/vm/src/stdlib/symtable.rs
crates/vm/src/protocol/object.rs
crates/vm/src/builtins/tuple.rs
crates/vm/src/builtins/function/jit.rs
crates/vm/src/builtins/genericalias.rs
crates/stdlib/src/unicodedata.rs
crates/vm/src/vm/interpreter.rs
crates/stdlib/src/posixshmem.rs
crates/vm/src/stdlib/winreg.rs
crates/wasm/src/wasm_builtins.rs
crates/vm/src/stdlib/signal.rs
crates/stdlib/src/pyexpat.rs
crates/vm/src/stdlib/ast/basic.rs
crates/vm/src/builtins/template.rs
crates/stdlib/src/array.rs
crates/vm/src/stdlib/ast/repr.rs
crates/vm/src/stdlib/ctypes/pointer.rs
crates/stdlib/src/lzma.rs
src/lib.rs
crates/vm/src/builtins/code.rs
crates/wasm/src/convert.rs
crates/vm/src/buffer.rs
crates/vm/src/builtins/complex.rs
example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
crates/vm/src/coroutine.rs
crates/vm/src/stdlib/collections.rs
crates/vm/src/stdlib/ctypes.rs
crates/vm/src/utils.rs
crates/vm/src/builtins/slice.rs
crates/stdlib/src/_sqlite3.rs
crates/vm/src/stdlib/typevar.rs
crates/stdlib/src/overlapped.rs
crates/vm/src/function/buffer.rs
crates/vm/src/vm/context.rs
crates/vm/src/builtins/memory.rs
src/shell/helper.rs

crates/vm/src/stdlib/codecs.rs

crates/vm/src/stdlib/functools.rs

crates/vm/src/stdlib/nt.rs

crates/vm/src/stdlib/os.rs

crates/wasm/src/js_module.rs

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

crates/vm/src/stdlib/ctypes/union.rs (1)

620-638: ⚠️ Potential issue | 🟡 Minor

Avoid silently skipping invalid field names in positional initialization.

At Line 622/623, a non-PyUtf8Str field name is ignored and the loop continues, which can surface later as a misleading "too many initializers" error. This should fail immediately with a direct type error (consistent with process_fields).

💡 Suggested fix

-                if let Some(tuple) = field.downcast_ref::<PyTuple>()
-                    && let Some(name) = tuple.first()
-                    && let Some(name_str) = name.downcast_ref::<PyUtf8Str>()
-                {
-                    let field_name = name_str.as_str().to_owned();
+                if let Some(tuple) = field.downcast_ref::<PyTuple>()
+                    && let Some(name) = tuple.first()
+                {
+                    let name_str = name
+                        .downcast_ref::<PyUtf8Str>()
+                        .ok_or_else(|| vm.new_type_error("field name must be a string"))?;
+                    let field_name = name_str.as_str().to_owned();
                     // Check for duplicate in kwargs
                     if kwargs.contains_key(&field_name) {
                         return Err(vm.new_type_error(format!(
                             "duplicate values for field {:?}",
                             field_name
                         )));
                     }
                     self_obj.as_object().set_attr(
                         vm.ctx.intern_str(field_name),
                         args[current_index].clone(),
                         vm,
                     )?;
                     current_index += 1;
                 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/union.rs` around lines 620 - 638, The
positional-initializer loop in union.rs silently skips field names that are not
PyUtf8Str, causing later misleading "too many initializers" errors; update the
block that matches if let Some(tuple) = field.downcast_ref::<PyTuple>() ... to
instead return an immediate type error when the tuple first element is not a
PyUtf8Str (use vm.new_type_error with a clear message consistent with
process_fields) so invalid field names fail fast; reference the
tuple/name/name_str checks and the kwargs/current_index handling when making the
change.

crates/vm/src/stdlib/posix.rs (1)

1974-1983: ⚠️ Potential issue | 🔴 Critical

Use try_downcast() instead of downcast() to properly propagate UTF-8 validation errors for surrogate-containing strings.

At lines 1974 and 2369, the code uses downcast::<PyUtf8Str>().map_err(...) which converts validation failures to TypeError. However, when a string contains surrogates, Python raises UnicodeEncodeError with the message "surrogates not allowed", not TypeError.

The current approach silently swallows the proper validation error. Use try_downcast::<PyUtf8Str>(vm) instead, which calls try_downcast_from() → ensure_valid_utf8() and generates the correct UnicodeEncodeError for surrogate-containing strings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 1974 - 1983, Replace uses of
downcast::<PyUtf8Str>().map_err(...) with try_downcast::<PyUtf8Str>(vm) so UTF-8
validation errors (like surrogate-containing strings) propagate as
UnicodeEncodeError instead of being converted to TypeError; specifically, in the
code that currently calls obj.downcast::<PyUtf8Str>().map_err(|_|
vm.new_type_error(...)) (the block that parses into PathconfVar and any similar
occurrence), call obj.try_downcast::<PyUtf8Str>(vm) and handle its Result the
same way so ensure_valid_utf8() runs and the VM produces the correct error type.

crates/vm/src/builtins/type.rs (1)

1970-1980: ⚠️ Potential issue | 🟡 Minor

type.__repr__ can drop custom non-UTF-8 __qualname__.

Line 1977 narrows to PyUtf8Str and falls back to name, so a valid PyStr __qualname__ with surrogate data is silently ignored in repr output.

💡 Suggested fix

-                format!(
-                    "<class '{}.{}'>",
-                    module,
-                    zelf.__qualname__(vm)
-                        .downcast_ref::<PyUtf8Str>()
-                        .map(|n| n.as_str())
-                        .unwrap_or_else(|| &name)
-                )
+                let qualname = zelf
+                    .__qualname__(vm)
+                    .downcast_ref::<PyStr>()
+                    .map(|n| n.as_wtf8().to_string())
+                    .unwrap_or_else(|| name.to_string());
+                format!("<class '{}.{}'>", module, qualname)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 1970 - 1980, The repr builder
for type.__repr__ currently only accepts PyUtf8Str for __qualname__ and falls
back to name, which drops valid non-UTF-8 PyStr values; update the match that
calls zelf.__qualname__(vm) to attempt downcast to a generic PyStr (not only
PyUtf8Str) and use a loss-tolerant conversion (e.g., a to_string_lossy or
equivalent on PyStr) to produce the qualname text so surrogate/non-UTF-8 data is
preserved in the formatted "<class '{}.{}'>" output.

crates/vm/src/stdlib/ctypes/base.rs (1)

2535-2542: ⚠️ Potential issue | 🟠 Major

Avoid rejecting non-UTF-8 str field names in _fields_ processing.

Line 2535 downcasts to PyUtf8Str, and Line 2541 uses fname.as_str(). That rejects valid Python str values containing surrogates, which is a regression in this WTF-8 migration path. Use PyStr + as_wtf8() here.

🐛 Proposed fix

-        let fname = field_tuple
+        let fname = field_tuple
             .first()
             .expect("len checked")
-            .downcast_ref::<PyUtf8Str>()
+            .downcast_ref::<PyStr>()
             .ok_or_else(|| vm.new_type_error("field name must be a string"))?;
@@
         let fdescr_obj = descr
             .proto
             .as_object()
-            .get_attr(vm.ctx.intern_str(fname.as_str()), vm)?;
+            .get_attr(vm.ctx.intern_str(fname.as_wtf8()), vm)?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/base.rs` around lines 2535 - 2542, The code
currently downcasts field names to PyUtf8Str and calls fname.as_str(), rejecting
valid Python str values with surrogates; change the downcast to PyStr and use
PyStr::as_wtf8() when constructing the attribute key passed into
descr.proto.as_object().get_attr(...) and in any error/context messages (replace
references to fname.as_str() with fname.as_wtf8() usage) so WTF-8 str values are
accepted while keeping the same vm.new_type_error when the object is not a str.

♻️ Duplicate comments (13)

src/shell.rs (1)

145-146: ⚠️ Potential issue | 🔴 Critical

Avoid panic in REPL prompt extraction (Line 145).

expect_str() can panic on surrogate-containing sys.ps1/sys.ps2, which reintroduces the failure mode this PR is meant to remove.

Suggested fix

-        let prompt = match prompt {
-            Ok(ref s) => s.expect_str(),
-            Err(_) => "",
-        };
+        let prompt = match prompt {
+            Ok(ref s) => s.to_str().unwrap_or(""),
+            Err(_) => "",
+        };

#!/bin/bash
# Read-only verification: confirm API behavior and current callsite.
rg -n "fn\\s+expect_str\\s*\\(" --type rust -A4 -B2
rg -n "fn\\s+to_str\\s*\\(" --type rust -A4 -B2
rg -n "expect_str\\(\\)" --type rust src/shell.rs -C2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/shell.rs` around lines 145 - 146, The code calls expect_str() which can
panic on surrogate-containing prompts; replace that call with a non-panicking
conversion (e.g., call to_str() and handle the Result/Option) so failures return
an empty prompt instead of panicking; specifically update the use of
expect_str() in the REPL prompt extraction to something like
to_str().unwrap_or("") or match on to_str() / to_string_lossy() and return "" on
error, ensuring no panics from expect_str().

crates/vm/src/stdlib/msvcrt.rs (1)

83-91: ⚠️ Potential issue | 🟠 Major

expect_str() still panics on surrogate input — the PR's goal is unmet here.

Replacing as_str() with expect_str() doesn't fix the panic: both methods blow up on PyStr values that contain surrogates (valid Python strings on Windows). putwch / ungetwch are Windows wide-character APIs that are supposed to accept surrogate code units, so the safe path is to walk the WTF-8 code points directly.

🔧 Surrogate-safe fix for putwch (line 83-91) and ungetwch (lines 108-120)

 fn putwch(s: PyStrRef, vm: &VirtualMachine) -> PyResult<()> {
     let c = s
-        .expect_str()
-        .chars()
+        .as_wtf8()
+        .code_points()
         .exactly_one()
         .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?;
-    unsafe { suppress_iph!(_putwch(c as u16)) };
+    unsafe { suppress_iph!(_putwch(c.to_u32() as u16)) };
     Ok(())
 }

 fn ungetwch(s: PyStrRef, vm: &VirtualMachine) -> PyResult<()> {
     let c =
-        s.expect_str().chars().exactly_one().map_err(|_| {
+        s.as_wtf8().code_points().exactly_one().map_err(|_| {
             vm.new_type_error("ungetwch() argument must be a string of length 1")
         })?;
-    let ret = unsafe { suppress_iph!(_ungetwch(c as u32)) };
+    let ret = unsafe { suppress_iph!(_ungetwch(c.to_u32())) };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/msvcrt.rs` around lines 83 - 91, The current putwch and
ungetwch use expect_str() which panics on surrogate-containing PyStr; instead
iterate the string's WTF-8 code units so surrogate code units are preserved: for
both putwch and ungetwch, replace the expect_str().chars().exactly_one() path
with a WTF-8–aware extraction from the PyStrRef (walk the underlying code
units/encoded units, ensure there is exactly one 16-bit code unit, map that unit
to u16 and call _putwch/_ungetwch), and keep the same vm.new_type_error when the
argument is not exactly one code unit; ensure you reference the functions putwch
and ungetwch and remove any use of expect_str() that assumes valid UTF-8.

crates/vm/src/stdlib/nt.rs (3)

1189-1192: ⚠️ Potential issue | 🔴 Critical

expect_str() still panics on surrogate-containing strings — same defect as as_str()

The change replaced as_str() with expect_str(), but expect_str() also panics when the string contains surrogates. Because argv comes from user-supplied Python objects via extract_elements_with, untrusted data can still trigger an unconditional panic — directly contradicting this PR's objective.

Use to_str() with proper Python-level error propagation:

🐛 Proposed fix

         let argv = vm.extract_elements_with(argv.as_ref(), |obj| {
             let arg = PyStrRef::try_from_object(vm, obj)?;
-            make_widestring(arg.expect_str())
+            make_widestring(arg.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?)
         })?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/nt.rs` around lines 1189 - 1192, The closure passed to
vm.extract_elements_with currently calls PyStrRef::try_from_object and then
expect_str(), which still panics on surrogate-containing input; change the
closure to call arg.to_str() (or equivalent non-panicking conversion) and
propagate any Unicode decoding error out as a Python error (so the ? from the
closure returns a Python-level error) before passing the resulting &str into
make_widestring; update the closure around PyStrRef::try_from_object(vm, obj)?
to use arg.to_str() and return a proper VM/Python error instead of letting
expect_str() panic.

1230-1233: ⚠️ Potential issue | 🔴 Critical

Same expect_str() panic risk in execve argv extraction

Identical issue as in execv: arg.expect_str() on user-provided argv elements panics on surrogates.

🐛 Proposed fix

         let argv = vm.extract_elements_with(argv.as_ref(), |obj| {
             let arg = PyStrRef::try_from_object(vm, obj)?;
-            make_widestring(arg.expect_str())
+            make_widestring(arg.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?)
         })?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/nt.rs` around lines 1230 - 1233, The argv extraction in
execve uses arg.expect_str() which can panic on surrogate-containing user
strings; change the closure passed to vm.extract_elements_with in execve to use
the safe string accessor (e.g. arg.as_str()) instead of expect_str(), so call
make_widestring(arg.as_str()) after obtaining arg via
PyStrRef::try_from_object(vm, obj) to avoid panics.

1253-1256: ⚠️ Potential issue | 🔴 Critical

expect_str() panic on surrogate env keys/values in execve

key.expect_str() and value.expect_str() at lines 1255–1256 are called on values extracted from a user-supplied ArgMapping. Surrogate-containing env strings will panic here instead of raising a Python UnicodeEncodeError.

🐛 Proposed fix

         let key = PyStrRef::try_from_object(vm, key)?;
         let value = PyStrRef::try_from_object(vm, value)?;
-        let key_str = key.expect_str();
-        let value_str = value.expect_str();
+        let key_str = key.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;
+        let value_str = value.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;

crates/vm/src/stdlib/_winapi.rs (1)

299-301: ⚠️ Potential issue | 🔴 Critical

expect_str() reintroduces panic paths on surrogate-containing str inputs

Line 299, Line 399, and Lines 421/423 still use expect_str(), which can panic and defeats the PR goal of removing panic-based string extraction. Please switch these callsites to fallible conversion (to_str() + error propagation) or a WTF-8-safe path where needed.

Proposed fix

@@
-        let wstr = |s: PyStrRef| {
-            let ws = widestring::WideCString::from_str(s.expect_str())
+        let wstr = |s: PyStrRef| {
+            let s = s.to_str().map_err(|e| e.to_pyexception(vm))?;
+            let ws = widestring::WideCString::from_str(s)
                 .map_err(|err| err.to_pyexception(vm))?;
             Ok(ws.into_vec_with_nul())
         };
@@
-        let src_path = std::path::Path::new(src_path.expect_str());
-        let dest_path = std::path::Path::new(dest_path.expect_str());
+        let src_path = std::path::Path::new(src_path.to_str().map_err(|e| e.to_pyexception(vm))?);
+        let dest_path =
+            std::path::Path::new(dest_path.to_str().map_err(|e| e.to_pyexception(vm))?);
@@
-            let k = k.expect_str();
+            let k = k.to_str().map_err(|e| e.to_pyexception(vm))?;
             let v = PyStrRef::try_from_object(vm, v)?;
-            let v = v.expect_str();
+            let v = v.to_str().map_err(|e| e.to_pyexception(vm))?;

#!/bin/bash
# Verify panic-prone string extraction remains in the touched file.
rg -nP 'expect_str\s*\(' crates/vm/src/stdlib/_winapi.rs -C2

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

Also applies to: 399-400, 421-423

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/_winapi.rs` around lines 299 - 301, The uses of
s.expect_str() (notably where WideCString::from_str(s.expect_str()) is called
before .map_err(|err| err.to_pyexception(vm))? and similar callsites around the
other mentions) reintroduce panic paths; replace each expect_str() with a
fallible extraction: call s.to_str().map_err(|e| /* convert to a VM error via
e.to_pyexception(vm) or a descriptive error */ )? and then pass the &str into
widestring::WideCString::from_str, or if the input may contain WTF-8/surrogates
use a WTF-8-safe conversion path (e.g., obtain the raw bytes and use a WTF-8
aware API) and propagate the error via to_pyexception(vm); ensure the final
Ok(ws.into_vec_with_nul()) remains unchanged and all conversions return Result
to avoid panics in functions like the WideCString conversion sites referenced.

crates/vm/src/types/slot.rs (1)

2077-2077: format!("… {}", repr.as_wtf8()) has the same Wtf8: Display concern as line 237 in structseq.rs.

Since this is inside #[cold] unreachable!, it has no practical impact on correctness, but it is worth tracking alongside the broader question of how Wtf8 formats.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/types/slot.rs` at line 2077, The unreachable! invocation uses
format with repr.as_wtf8() which relies on Wtf8::Display; change the message to
avoid requiring Display (e.g., use debug formatting) by replacing
unreachable!("slot must be overridden for {}", repr.as_wtf8()) with something
like unreachable!("slot must be overridden for {:?}", repr.as_wtf8()), so the
code in slot.rs no longer depends on Wtf8::Display.

crates/vm/src/stdlib/functools.rs (1)

428-430: ⚠️ Potential issue | 🟠 Major

Avoid expect_str() when normalizing stored keyword keys.

Line [430] still uses expect_str(), which can panic for surrogate-containing str keys. Use to_str() and raise TypeError instead so partial calls fail safely.

🛠️ Proposed fix

             for (key, value) in &*keywords {
                 let key_str = key
                     .downcast_ref::<crate::builtins::PyStr>()
                     .ok_or_else(|| vm.new_type_error("keywords must be strings"))?;
-                final_kwargs.insert(key_str.expect_str().to_owned(), value);
+                let key = key_str
+                    .to_str()
+                    .ok_or_else(|| vm.new_type_error("keywords must be strings"))?
+                    .to_owned();
+                final_kwargs.insert(key, value);
             }

#!/bin/bash
# Verify panic-prone API usage and current call site.
rg -n "pub fn expect_str|pub fn to_str" crates/vm/src/builtins/str.rs -A8 -B4
rg -n "final_kwargs.insert|downcast_ref::<crate::builtins::PyStr>|expect_str" crates/vm/src/stdlib/functools.rs -A8 -B4

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/functools.rs` around lines 428 - 430, The code
normalizes keyword keys using expect_str(), which can panic on
surrogate-containing strings; update the normalization in the block that calls
downcast_ref::<crate::builtins::PyStr>() and final_kwargs.insert(...) to call
to_str() instead, check the Result/Option and on failure return
vm.new_type_error("keywords must be strings") so the partial/keyword handling
fails with a TypeError rather than panicking; ensure you replace
expect_str().to_owned() with a safe to_str() usage and propagate the error into
the existing vm.new_type_error path.

crates/vm/src/stdlib/os.rs (1)

455-456: ⚠️ Potential issue | 🟠 Major

Replace panic-prone expect_str() in Windows env APIs.

Line [455], Line [456], and Line [503] use expect_str(), which can panic on surrogate-containing str values and bypass PyResult error handling. Please switch these to to_str() and raise Python errors instead of panicking.

🛠️ Proposed fix

 fn putenv(key: PyStrRef, value: PyStrRef, vm: &VirtualMachine) -> PyResult<()> {
-        let key_str = key.expect_str();
-        let value_str = value.expect_str();
+        let key_str = key
+            .to_str()
+            .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?;
+        let value_str = value
+            .to_str()
+            .ok_or_else(|| vm.new_value_error("illegal environment variable value"))?;
@@
 fn unsetenv(key: PyStrRef, vm: &VirtualMachine) -> PyResult<()> {
-        let key_str = key.expect_str();
+        let key_str = key
+            .to_str()
+            .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?;

#!/bin/bash
# Verify panic-vs-result string APIs and current call sites.
rg -n "pub fn expect_str|pub fn to_str" crates/vm/src/builtins/str.rs -A8 -B4
rg -n "fn putenv|fn unsetenv|expect_str" crates/vm/src/stdlib/os.rs -A10 -B4

Also applies to: 503-503

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/os.rs` around lines 455 - 456, Replace the panic-prone
expect_str() calls in the Windows env API handlers (the places assigning key_str
and value_str and the other site at line ~503) with to_str() and propagate a
proper Python error instead of panicking; specifically, in the functions
handling putenv/unsetenv (and any code that currently calls expect_str()), call
key.to_str() and value.to_str(), map the Err case to an appropriate PyResult
error via the VM (e.g., return Err(vm.new_type_error(...) or
vm.new_value_error(...)) with a clear message about invalid string contents) and
proceed with the unwrapped &str on Ok. Ensure all expect_str() occurrences in
this module are replaced and error paths return a PyResult rather than panic.

crates/vm/src/vm/mod.rs (1)

639-645: ⚠️ Potential issue | 🟡 Minor

Add a separator after the unraisable message prefix.

Line 640 builds "Exception ignored {msg}" without a separator before object repr, so output gets concatenated.

✏️ Suggested fix

-        let msg_str = if let Some(msg) = msg {
-            format!("Exception ignored {}", msg)
+        let msg_str = if let Some(msg) = msg {
+            format!("Exception ignored {msg}: ")
         } else {
             "Exception ignored in: ".to_owned()
         };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 639 - 645, The constructed unraisable
message in vm::mod.rs builds msg_str without a separator when Some(msg) is
present, resulting in concatenated output; update the msg_str construction (the
branch that sets msg_str when msg is Some) to include a separator (for example a
colon and space) between the prefix and the object repr so the string passed to
write_to_stderr(&msg_str, &stderr, self) reads like "Exception ignored: {msg}"
instead of "Exception ignored {msg}".

crates/vm/src/stdlib/builtins.rs (1)

572-575: ⚠️ Potential issue | 🟠 Major

Avoid expect_str() in the rustyline prompt path.

Line 573 can still panic when prompt contains surrogate data. Gate rustyline on UTF-8 conversion and fall back to standard I/O when conversion fails.

🛠️ Suggested fix

-        if use_rustyline {
-            let prompt = prompt.as_ref().map_or("", |s| s.expect_str());
+        let prompt_utf8 = prompt.as_ref().and_then(|s| s.try_into_utf8(vm).ok());
+        if use_rustyline && (prompt.as_ref().is_none() || prompt_utf8.is_some()) {
+            let prompt = prompt_utf8.as_ref().map_or("", |s| s.as_str());
             let mut readline = Readline::new(());
             match readline.readline(prompt) {

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/builtins.rs` around lines 572 - 575, The rustyline
branch currently calls prompt.as_ref().map_or("", |s| s.expect_str()) which can
panic on non-UTF-8/surrogate data; instead try to convert the prompt to a UTF-8
&str and only use rustyline when conversion succeeds—e.g., match
prompt.as_ref().and_then(|s| s.to_str().ok()) or equivalent—and if conversion
fails, disable/use_rustyline = false (or fall back to the standard std::io
prompt path) before constructing Readline::new() and calling readline.readline;
update the logic around use_rustyline, prompt, Readline::new, and
readline.readline to avoid any expect_str() usage and safely handle non-UTF-8
prompts.

crates/vm/src/builtins/interpolation.rs (1)

210-225: ⚠️ Potential issue | 🔴 Critical

push_str(&expr_repr) and push_str(&spec_repr) still panic on WTF-8 surrogates.

Lines 213 and 221 use push_str with &expr_repr/&spec_repr (both PyRef<PyStr>). This resolves via AsRef<str> on PyRef<PyStr>, which calls to_str().expect("str has surrogates") and panics if the expression or format_spec contains surrogates. All other repr parts (value_repr, conversion) already use the surrogate-safe push_wtf8. These two need the same treatment to be consistent with the PR's intent and to avoid panics.
🐛 Proposed fix
-        result.push_str(&expr_repr);
+        result.push_wtf8(expr_repr.as_wtf8());
         result.push_str(", ");
         if vm.is_none(&zelf.conversion) {
             result.push_str("None");
         } else {
             result.push_wtf8(zelf.conversion.repr(vm)?.as_wtf8());
         }
         result.push_str(", ");
-        result.push_str(&spec_repr);
+        result.push_wtf8(spec_repr.as_wtf8());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/interpolation.rs` around lines 210 - 225, The repr
construction in interpolation.rs uses push_str(&expr_repr) and
push_str(&spec_repr) which call AsRef<str> on PyRef<PyStr> and will panic on
WTF-8 surrogates; change those calls to use Wtf8Buf::push_wtf8 so they accept
surrogate-containing strings just like value_repr and conversion do—replace
push_str(&expr_repr) and push_str(&spec_repr) with
result.push_wtf8(expr_repr.as_wtf8()) and result.push_wtf8(spec_repr.as_wtf8())
(referencing the Wtf8Buf type and push_wtf8 method and the PyRef<PyStr> values
expr_repr/spec_repr).

crates/vm/src/stdlib/codecs.rs (1)

52-75: Factor out shared CodeArgs normalization in encode/decode.

Line 52 through Line 75 duplicate the same destructuring and encoding-default logic; only the final registry call differs.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/codecs.rs` around lines 52 - 75, Extract the repeated
CodeArgs normalization into a small helper (e.g., a private fn
normalize_code_args(args: CodeArgs) -> (PyObjectRef, &str, Option<String>) or
similar) that performs the destructuring and resolves encoding via
codecs::DEFAULT_ENCODING, then call vm.state.codec_registry.encode(...) and
vm.state.codec_registry.decode(...) with the normalized tuple from encode and
decode; reference the existing symbols CodeArgs, DEFAULT_ENCODING, encode,
decode, and vm.state.codec_registry to locate where to replace the duplicate
logic.

🧹 Nitpick comments (12)

crates/vm/src/warn.rs (2)

410-433: to_str() returning None (surrogate-containing action) silently falls through to the "Unrecognized action" error path

The _ arm in the match action_s block handles both Some(unknown_str) and None (action contains surrogates) identically. While treating a surrogate-containing action as unrecognized is functionally correct, the conflation is non-obvious. A brief comment or explicit None arm would clarify intent, especially since this is the key behavioral change introduced by the migration from as_str() to to_str().

♻️ Proposed clarification

         let action_s = action_str.to_str();
         match action_s {
             Some("once") => { ... }
             Some("module") => { ... }
             Some("default") => false,
+            // None means the action string contains surrogates and is not
+            // a recognized UTF-8 action name; fall through to the error.
             _ => {
                 return Err(vm.new_runtime_error(format!(
                     "Unrecognized action ({action_str}) in warnings.filters:\n {action_str}"
                 )));
             }
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/warn.rs` around lines 410 - 433, The match on action_s (result
of action_str.to_str()) currently conflates None (surrogate-containing string)
with unrecognized Some values in the wildcard arm; make this explicit by adding
a separate None => { /* surrogate-containing action: treat as unrecognized or
document behavior */ } arm or a brief comment before the match clarifying that
None indicates surrogate-containing input and should be handled the same as
unknown actions; reference the symbols action_str, action_s, to_str(), and the
match block around update_registry/get_once_registry so reviewers can locate and
apply the change.

395-405: Inconsistent comparison style: mix of as_bytes() and as_wtf8() for equivalent ASCII checks

Lines 395 and 399 use as_bytes() to compare ASCII action strings, but line 405 switches to as_wtf8() for the same pattern. All four strings ("error", "ignore", "always", "all") are pure ASCII, so as_bytes() is sufficient and cheaper throughout. Mixing the two idioms in the same function reduces readability.

♻️ Proposed refactor for consistency

-    if action_str.as_bytes() == b"error" {
+    let action_bytes = action_str.as_bytes();
+    if action_bytes == b"error" {
         let exc = PyBaseExceptionRef::try_from_object(vm, message)?;
         return Err(exc);
     }
-    if action_str.as_bytes() == b"ignore" {
+    if action_bytes == b"ignore" {
         return Ok(());
     }
 
-    let already = if action_str.as_wtf8() != "always" && action_str.as_wtf8() != "all" {
+    let already = if action_bytes != b"always" && action_bytes != b"all" {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/warn.rs` around lines 395 - 405, The comparisons against the
action string are inconsistent: some use action_str.as_bytes() (for "error" and
"ignore") while others use action_str.as_wtf8() (for "always" and "all"); change
the latter to use action_str.as_bytes() so all four ASCII literals
("error","ignore","always","all") are compared the same and more
efficiently—update the conditional that currently uses as_wtf8() to use
as_bytes() instead (look for the variable action_str and the branching that
checks "always" and "all").

crates/vm/src/builtins/function.rs (1)

1090-1096: Simplify the __qualname__/__name__ fallback chain.

The if let … { Some(qname) } else { … } pattern re-wraps an already-Some value unnecessarily.

♻️ Proposed simplification

-        let func_name = if let Some(qname) =
-            vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))?
-        {
-            Some(qname)
-        } else {
-            vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))?
-        };
+        let func_name = match vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))? {
+            Some(qname) => Some(qname),
+            None => vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))?,
+        };

Or more idiomatically using a fallible or_else:

-        let func_name = if let Some(qname) =
-            vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))?
-        {
-            Some(qname)
-        } else {
-            vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))?
-        };
+        let func_name = vm
+            .get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))?
+            .map(Ok)
+            .unwrap_or_else(|| vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__)))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 1090 - 1096, The current
if-let re-wraps an Option from vm.get_attribute_opt unnecessarily; replace the
conditional with an Option chain: first call
vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))? to
get an Option, then call .or(...) with
vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))? so
func_name is the first Some or the fallback without re-wrapping; update the
expression that assigns func_name accordingly (referencing func_name,
vm.get_attribute_opt, identifier!(vm, __qualname__), identifier!(vm, __name__),
and zelf.function).

crates/stdlib/src/zlib.rs (1)

468-471: Update the generic bound to align with codebase conventions and eliminate the intermediate variable.

The two-step conversion is correct, but it deviates from the established pattern in the stdlib. Other modules (e.g., csv.rs) use impl Into<Wtf8Buf> directly for similar functions, which works because Wtf8Buf implements From<String> and From<&str>.

Update the function to:
Recommended refactoring
-fn new_zlib_error(message: impl Into<String>, vm: &VirtualMachine) -> PyBaseExceptionRef {
-    let msg: String = message.into();
-    vm.new_exception_msg(vm.class("zlib", "error"), msg.into())
+fn new_zlib_error(message: impl Into<Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef {
+    vm.new_exception_msg(vm.class("zlib", "error"), message.into())
 }
Add to imports: use rustpython_common::wtf8::Wtf8Buf;
All call sites (string literals, e.to_string(), etc.) are compatible with Into<Wtf8Buf>.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/zlib.rs` around lines 468 - 471, Refactor new_zlib_error to
accept impl Into<Wtf8Buf> instead of impl Into<String>, remove the intermediate
msg variable and pass the converted value directly to vm.new_exception_msg (keep
vm.class("zlib", "error") as the exception class); also add the import use
rustpython_common::wtf8::Wtf8Buf; so call sites (literals and e.to_string())
which implement Into<Wtf8Buf> continue to work without extra allocations.

crates/wasm/src/convert.rs (1)

59-61: Redundant .into() on String binding

js_sys::Error::to_string() returns a Rust String (via Display; JavaScript's toString() is exposed as to_js_string() in js-sys bindings to avoid confusion with Rust's ToString trait and its to_string() method). The .into() here converts String → String (identity) since msg is typed as String; the actual String → Wtf8Buf (or equivalent) conversion happens implicitly at the new_type_error(msg) call site. The intermediate binding with .into() is unnecessary.
♻️ Suggested simplification
-    let msg: String = js_err.unchecked_into::<js_sys::Error>().to_string().into();
-    vm.new_type_error(msg)
+    vm.new_type_error(js_err.unchecked_into::<js_sys::Error>().to_string())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/wasm/src/convert.rs` around lines 59 - 61, Remove the redundant
.into() when converting the JS error to a Rust String: in the binding where you
create msg from js_err (using js_sys::Error::to_string()), just assign the
returned String directly to msg and pass it into vm.new_type_error(msg) — e.g.,
change the binding that currently uses
js_err.unchecked_into::<js_sys::Error>().to_string().into() to use to_string()
without .into() so you avoid the no-op String→String conversion.

crates/vm/src/types/structseq.rs (1)

237-237: Wtf8::fmt() replaces unpaired surrogates with U+FFFD, corrupting module names containing such code points.

The Display impl for Wtf8 iterates through the bytes and explicitly substitutes any unpaired surrogate code points with the replacement character U+FFFD. When formatting module_str.as_wtf8() in the qualified type name, module names containing surrogates (which can occur via the surrogateescape codec handler) will silently lose that information. While this is an improvement over the panic behavior of as_str(), full WTF-8 fidelity would require operating on Wtf8Buf instead of converting to String for repr_str.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/types/structseq.rs` at line 237, The current code calls
module_str.as_wtf8() and formats it (via Wtf8::fmt) into a String, which
replaces unpaired surrogates with U+FFFD; instead preserve full WTF-8 fidelity
by using a Wtf8Buf-based representation when building the qualified type name in
repr_str (referencing module_str.as_wtf8(), Wtf8::fmt(), and Self::NAME). Change
the construction to operate on or append a Wtf8Buf (or otherwise emit WTF-8
bytes directly) instead of converting to a Rust String for formatting, so module
names with unpaired surrogates are preserved end-to-end in the qualified name.

crates/vm/src/stdlib/io.rs (1)

5863-5869: Consider deduplicating repr-fallback logic in _dealloc_warn.

The same fallback pattern appears in both fileio and winconsoleio; extracting it would reduce drift risk.

♻️ Optional extraction sketch

+fn repr_or_file_fallback(source: PyObjectRef, vm: &VirtualMachine) -> Wtf8Buf {
+    source
+        .repr(vm)
+        .map(|s| s.as_wtf8().to_owned())
+        .unwrap_or_else(|_| Wtf8Buf::from("<file>"))
+}
...
- let repr = source
-     .repr(vm)
-     .map(|s| s.as_wtf8().to_owned())
-     .unwrap_or_else(|_| Wtf8Buf::from("<file>"));
+ let repr = repr_or_file_fallback(source, vm);

Also applies to: 6415-6422

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/io.rs` around lines 5863 - 5869, The repr fallback
(calling .repr(vm).map(|s| s.as_wtf8().to_owned()).unwrap_or_else(|_|
Wtf8Buf::from("<file>"))) is duplicated in _dealloc_warn across fileio and
winconsoleio; extract that logic into a small helper (e.g., a function like
format_unclosed_file_repr(vm, obj) or repr_fallback_for_unclosed_file) and
replace the inline chain in both _dealloc_warn implementations to call the
helper, then use its result in the warn call (crate::stdlib::warnings::warn) to
eliminate duplication and keep behavior identical.

crates/vm/src/codecs.rs (1)

114-136: get_incremental_encoder/get_incremental_decoder signatures should be migrated to Option<PyUtf8StrRef> for consistency.

The encode, decode, and all CodecsRegistry wrapper methods now accept Option<PyUtf8StrRef>, but these two methods still accept Option<PyStrRef>. The only caller (in crates/vm/src/stdlib/io.rs) unnecessarily converts a PyUtf8Str to PyStrRef via into_wtf8() just to pass it to these methods. Updating the signatures and using into_wtf8().into() internally—as done in encode and decode—would eliminate this redundant widening and align the API.

♻️ Suggested alignment

     pub fn get_incremental_encoder(
         &self,
-        errors: Option<PyStrRef>,
+        errors: Option<PyUtf8StrRef>,
         vm: &VirtualMachine,
     ) -> PyResult {
         let args = match errors {
-            Some(e) => vec![e.into()],
+            Some(e) => vec![e.into_wtf8().into()],
             None => vec![],
         };
         vm.call_method(self.0.as_object(), "incrementalencoder", args)
     }

     pub fn get_incremental_decoder(
         &self,
-        errors: Option<PyStrRef>,
+        errors: Option<PyUtf8StrRef>,
         vm: &VirtualMachine,
     ) -> PyResult {
         let args = match errors {
-            Some(e) => vec![e.into()],
+            Some(e) => vec![e.into_wtf8().into()],
             None => vec![],
         };
         vm.call_method(self.0.as_object(), "incrementaldecoder", args)
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/codecs.rs` around lines 114 - 136, Update the signatures of
get_incremental_encoder and get_incremental_decoder to accept
Option<PyUtf8StrRef> instead of Option<PyStrRef>, and inside each method convert
the optional PyUtf8StrRef to the argument expected by vm.call_method by calling
into_wtf8().into() (mirroring the encode/decode implementations) so callers no
longer need to perform an unnecessary widening conversion; keep method names
get_incremental_encoder and get_incremental_decoder and only change the
parameter types and the creation of args accordingly.

crates/vm/src/stdlib/operator.rs (1)

435-451: repr_wtf8 logic is correct; optional: replace first-flag with iterator combinators.

All three implementations produce the right output. The first: bool flag pattern works but is slightly verbose. A minor readability improvement is possible using intersperse or a separator-before-each approach:

✨ Optional refactor (attrgetter, same pattern applies to itemgetter)

-                let mut first = true;
                 for part in &zelf.attrs {
-                    if !first {
-                        result.push_str(", ");
-                    }
-                    first = false;
+                    if !std::ptr::eq(part, &zelf.attrs[0]) {
+                        result.push_str(", ");
+                    }
                     result.push_wtf8(part.as_object().repr(vm)?.as_wtf8());
                 }

Or, if the nightly iter_intersperse is not available, simply handle the first element before the loop:

-                let mut first = true;
-                for part in &zelf.attrs {
-                    if !first {
-                        result.push_str(", ");
-                    }
-                    first = false;
-                    result.push_wtf8(part.as_object().repr(vm)?.as_wtf8());
-                }
+                let mut iter = zelf.attrs.iter();
+                if let Some(first) = iter.next() {
+                    result.push_wtf8(first.as_object().repr(vm)?.as_wtf8());
+                }
+                for part in iter {
+                    result.push_str(", ");
+                    result.push_wtf8(part.as_object().repr(vm)?.as_wtf8());
+                }

Also applies to: 513-529, 601-622

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/operator.rs` around lines 435 - 451, The repr_wtf8
function in operator.rs uses a mutable first: bool flag to insert ", " between
attributes; replace this flag-based loop with an iterator-based approach for
readability: map each part to its repr wtf8 string (using
part.as_object().repr(vm)?.as_wtf8()), then join or intersperse with ", " when
building the Wtf8Buf in repr_wtf8 (and apply the same change to the analogous
repr_wtf8 implementations for itemgetter and the other attr/item reprs
referenced). Ensure you still respect the ReprGuard early-return branch and
preserve error propagation from repr(vm)? while constructing the joined
representation.

crates/stdlib/src/ssl.rs (1)

3899-3923: set_server_hostname setter inconsistently keeps Option<PyStrRef> vs Option<PyUtf8StrRef> in constructor.

The _wrap_socket path rejects surrogate hostnames at argument-parse time (via WrapSocketArgs.server_hostname: Option<PyUtf8StrRef>), while this setter accepts Option<PyStrRef> and validates lazily via try_into_utf8. Both are safe (no panic), but the error type/timing differs: PyUtf8StrRef raises a UnicodeDecodeError at arg-parsing, whereas try_into_utf8 here raises a different error at runtime. Consider aligning to Option<PyUtf8StrRef> for consistency.

💡 Suggested alignment

-    fn set_server_hostname(
-        &self,
-        value: Option<PyStrRef>,
-        vm: &VirtualMachine,
-    ) -> PyResult<()> {
+    fn set_server_hostname(
+        &self,
+        value: Option<PyUtf8StrRef>,
+        vm: &VirtualMachine,
+    ) -> PyResult<()> {
         if *self.handshake_done.lock() {
             return Err(
                 vm.new_value_error("Cannot set server_hostname on socket after handshake")
             );
         }
-        let hostname_string = value
-            .map(|s| {
-                let utf8 = s.try_into_utf8(vm)?;
-                validate_hostname(utf8.as_str(), vm)?;
-                Ok::<String, _>(utf8.as_str().to_owned())
-            })
-            .transpose()?;
+        let hostname_string = value
+            .map(|s| {
+                validate_hostname(s.as_str(), vm)?;
+                Ok::<String, _>(s.as_str().to_owned())
+            })
+            .transpose()?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/ssl.rs` around lines 3899 - 3923, The setter
set_server_hostname currently accepts Option<PyStrRef> and calls try_into_utf8,
producing a different error/timing than the WrapSocketArgs.server_hostname path
which uses Option<PyUtf8StrRef>; change the setter signature to accept
Option<PyUtf8StrRef> (matching WrapSocketArgs.server_hostname), remove
try_into_utf8 usage, and adjust the hostname_string construction to directly
clone/into a String from the PyUtf8StrRef before calling validate_hostname and
storing into self.server_hostname so both code paths raise the same
UnicodeDecodeError at argument-parse time.

crates/wtf8/src/concat.rs (1)

72-76: Drop unwrap() in concat formatting helpers to avoid panic paths.

Lines [75] and [84] use write!(...).unwrap(). This is avoidable panic surface; keep this path infallible by discarding the result explicitly.

♻️ Proposed fix

 impl<T: fmt::Display> Wtf8Concat for DisplayAsWtf8<T> {
     #[inline]
     fn fmt_wtf8(&self, buf: &mut Wtf8Buf) {
-        write!(buf, "{}", self.0).unwrap();
+        let _ = write!(buf, "{}", self.0);
     }
 }

 macro_rules! impl_wtf8_concat_for_int {
     ($($t:ty),*) => {
         $(impl Wtf8Concat for $t {
             #[inline]
             fn fmt_wtf8(&self, buf: &mut Wtf8Buf) {
-                write!(buf, "{}", self).unwrap();
+                let _ = write!(buf, "{}", self);
             }
         })*
     };
 }

As per coding guidelines, "Follow Rust best practices for error handling and memory management".

Also applies to: 81-85

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/wtf8/src/concat.rs` around lines 72 - 76, The concat formatting
helpers currently call write!(...).unwrap() inside the Wtf8Concat impls (notably
impl<T: fmt::Display> Wtf8Concat for DisplayAsWtf8 -> fmt_wtf8 and the analogous
impl around lines 81-85), which can panic; change these to discard the
fmt::Result instead of unwrapping (e.g., assign or ignore the return value) so
the formatting path is infallible. Locate the write!(buf, "{}", self.0).unwrap()
calls inside fmt_wtf8 (and the corresponding fmt_wtf8 in the other Display/Debug
wrapper) and replace unwrap() with an explicit ignore of the Result (e.g., let _
= write!(...)) to remove the panic path.

crates/vm/src/builtins/str.rs (1)

93-94: Use expect_str() in all AsRef<str> impls to remove duplication.

You already introduced PyStr::expect_str() on Line 482. Reusing it in these AsRef<str> impls keeps panic behavior consistent and centralizes future changes.

♻️ Proposed refactor

 impl AsRef<str> for PyStr {
     #[track_caller] // <- can remove this once it doesn't panic
     fn as_ref(&self) -> &str {
-        self.to_str().expect("str has surrogates")
+        self.expect_str()
     }
 }
@@
 impl AsRef<str> for Py<PyStr> {
     #[track_caller] // <- can remove this once it doesn't panic
     fn as_ref(&self) -> &str {
-        self.to_str().expect("str has surrogates")
+        self.expect_str()
     }
 }
@@
 impl AsRef<str> for PyStrRef {
     #[track_caller] // <- can remove this once it doesn't panic
     fn as_ref(&self) -> &str {
-        self.to_str().expect("str has surrogates")
+        self.expect_str()
     }
 }
@@
 impl AsRef<str> for PyRefExact<PyStr> {
     #[track_caller]
     fn as_ref(&self) -> &str {
-        self.to_str().expect("str has surrogates")
+        self.expect_str()
     }
 }
@@
 impl AsRef<str> for PyExact<PyStr> {
     #[track_caller]
     fn as_ref(&self) -> &str {
-        self.to_str().expect("str has surrogates")
+        self.expect_str()
     }
 }

Also applies to: 100-101, 107-108, 1893-1903, 482-487

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 93 - 94, Replace the direct calls
to self.to_str().expect("str has surrogates") inside the AsRef<str> impls with
the centralized PyStr::expect_str() helper to remove duplication and keep panic
behavior consistent; locate the AsRef<str> impl blocks (where to_str() is
currently used) and call self.expect_str() instead, ensuring you use the
existing PyStr::expect_str() method rather than duplicating the expect message.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c9a023 and 29ede73.

📒 Files selected for processing (120)

crates/common/src/str.rs
crates/stdlib/src/_asyncio.rs
crates/stdlib/src/_sqlite3.rs
crates/stdlib/src/array.rs
crates/stdlib/src/binascii.rs
crates/stdlib/src/contextvars.rs
crates/stdlib/src/csv.rs
crates/stdlib/src/grp.rs
crates/stdlib/src/hashlib.rs
crates/stdlib/src/json.rs
crates/stdlib/src/locale.rs
crates/stdlib/src/lzma.rs
crates/stdlib/src/multiprocessing.rs
crates/stdlib/src/overlapped.rs
crates/stdlib/src/posixshmem.rs
crates/stdlib/src/pyexpat.rs
crates/stdlib/src/pystruct.rs
crates/stdlib/src/socket.rs
crates/stdlib/src/ssl.rs
crates/stdlib/src/unicodedata.rs
crates/stdlib/src/zlib.rs
crates/vm/src/buffer.rs
crates/vm/src/builtins/bool.rs
crates/vm/src/builtins/classmethod.rs
crates/vm/src/builtins/code.rs
crates/vm/src/builtins/complex.rs
crates/vm/src/builtins/dict.rs
crates/vm/src/builtins/float.rs
crates/vm/src/builtins/function.rs
crates/vm/src/builtins/function/jit.rs
crates/vm/src/builtins/genericalias.rs
crates/vm/src/builtins/int.rs
crates/vm/src/builtins/interpolation.rs
crates/vm/src/builtins/list.rs
crates/vm/src/builtins/mappingproxy.rs
crates/vm/src/builtins/memory.rs
crates/vm/src/builtins/mod.rs
crates/vm/src/builtins/module.rs
crates/vm/src/builtins/namespace.rs
crates/vm/src/builtins/object.rs
crates/vm/src/builtins/range.rs
crates/vm/src/builtins/set.rs
crates/vm/src/builtins/slice.rs
crates/vm/src/builtins/staticmethod.rs
crates/vm/src/builtins/str.rs
crates/vm/src/builtins/template.rs
crates/vm/src/builtins/tuple.rs
crates/vm/src/builtins/type.rs
crates/vm/src/builtins/weakproxy.rs
crates/vm/src/bytes_inner.rs
crates/vm/src/codecs.rs
crates/vm/src/coroutine.rs
crates/vm/src/dict_inner.rs
crates/vm/src/exception_group.rs
crates/vm/src/exceptions.rs
crates/vm/src/frame.rs
crates/vm/src/function/buffer.rs
crates/vm/src/function/fspath.rs
crates/vm/src/function/mod.rs
crates/vm/src/import.rs
crates/vm/src/intern.rs
crates/vm/src/macros.rs
crates/vm/src/protocol/object.rs
crates/vm/src/stdlib/_abc.rs
crates/vm/src/stdlib/_winapi.rs
crates/vm/src/stdlib/_wmi.rs
crates/vm/src/stdlib/ast.rs
crates/vm/src/stdlib/ast/basic.rs
crates/vm/src/stdlib/ast/python.rs
crates/vm/src/stdlib/ast/repr.rs
crates/vm/src/stdlib/builtins.rs
crates/vm/src/stdlib/codecs.rs
crates/vm/src/stdlib/collections.rs
crates/vm/src/stdlib/ctypes.rs
crates/vm/src/stdlib/ctypes/array.rs
crates/vm/src/stdlib/ctypes/base.rs
crates/vm/src/stdlib/ctypes/function.rs
crates/vm/src/stdlib/ctypes/pointer.rs
crates/vm/src/stdlib/ctypes/simple.rs
crates/vm/src/stdlib/ctypes/structure.rs
crates/vm/src/stdlib/ctypes/union.rs
crates/vm/src/stdlib/functools.rs
crates/vm/src/stdlib/imp.rs
crates/vm/src/stdlib/io.rs
crates/vm/src/stdlib/itertools.rs
crates/vm/src/stdlib/marshal.rs
crates/vm/src/stdlib/msvcrt.rs
crates/vm/src/stdlib/nt.rs
crates/vm/src/stdlib/operator.rs
crates/vm/src/stdlib/os.rs
crates/vm/src/stdlib/posix.rs
crates/vm/src/stdlib/signal.rs
crates/vm/src/stdlib/sre.rs
crates/vm/src/stdlib/symtable.rs
crates/vm/src/stdlib/sys.rs
crates/vm/src/stdlib/thread.rs
crates/vm/src/stdlib/time.rs
crates/vm/src/stdlib/typevar.rs
crates/vm/src/stdlib/typing.rs
crates/vm/src/stdlib/winreg.rs
crates/vm/src/stdlib/winsound.rs
crates/vm/src/suggestion.rs
crates/vm/src/types/slot.rs
crates/vm/src/types/structseq.rs
crates/vm/src/utils.rs
crates/vm/src/vm/context.rs
crates/vm/src/vm/interpreter.rs
crates/vm/src/vm/mod.rs
crates/vm/src/vm/vm_new.rs
crates/vm/src/warn.rs
crates/wasm/src/browser_module.rs
crates/wasm/src/convert.rs
crates/wasm/src/js_module.rs
crates/wasm/src/wasm_builtins.rs
crates/wtf8/src/concat.rs
crates/wtf8/src/lib.rs
example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
src/lib.rs
src/shell.rs
src/shell/helper.rs

🚧 Files skipped from review as they are similar to previous changes (40)

example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
crates/vm/src/coroutine.rs
crates/vm/src/buffer.rs
crates/vm/src/builtins/genericalias.rs
crates/stdlib/src/lzma.rs
crates/vm/src/builtins/function/jit.rs
crates/vm/src/vm/interpreter.rs
crates/stdlib/src/multiprocessing.rs
crates/vm/src/stdlib/ctypes/pointer.rs
crates/wasm/src/browser_module.rs
crates/vm/src/builtins/mappingproxy.rs
crates/vm/src/builtins/float.rs
crates/vm/src/stdlib/sys.rs
crates/vm/src/builtins/tuple.rs
crates/vm/src/stdlib/ctypes.rs
crates/vm/src/intern.rs
crates/vm/src/stdlib/ctypes/function.rs
crates/stdlib/src/unicodedata.rs
crates/vm/src/dict_inner.rs
crates/vm/src/builtins/classmethod.rs
crates/vm/src/builtins/range.rs
crates/vm/src/builtins/code.rs
crates/vm/src/builtins/mod.rs
crates/vm/src/stdlib/itertools.rs
src/lib.rs
crates/vm/src/vm/context.rs
crates/wasm/src/js_module.rs
crates/vm/src/utils.rs
crates/vm/src/function/fspath.rs
crates/vm/src/stdlib/ast.rs
crates/vm/src/stdlib/signal.rs
crates/vm/src/stdlib/symtable.rs
crates/vm/src/builtins/staticmethod.rs
crates/vm/src/stdlib/_wmi.rs
src/shell/helper.rs
crates/vm/src/stdlib/typevar.rs
crates/wasm/src/wasm_builtins.rs
crates/stdlib/src/posixshmem.rs
crates/stdlib/src/array.rs
crates/vm/src/builtins/bool.rs

crates/stdlib/src/_sqlite3.rs

crates/stdlib/src/csv.rs

crates/stdlib/src/hashlib.rs

coderabbitai · 2026-02-25T08:40:18Z

crates/vm/src/frame.rs

+        let kwarg_names = kwarg_names.as_slice().iter().map(|pyobj| {
+            pyobj
+                .downcast_ref::<PyUtf8Str>()
+                .unwrap()
+                .as_str()
+                .to_owned()
+        });


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Finding CALL_KW and related decoding code ===" rg -n --type rust 'CALL_KW|CallKw|with_kwargs_names|kwarg_names' crates/vm/src --max-count=30 echo "" echo "=== Finding PyCode/CodeType validation and construction ===" rg -n --type rust 'impl.*PyCode|co_kwarg_names|kwarg.*name|new_code' crates/vm/src --max-count=30

Repository: RustPython/RustPython

Length of output: 5951

🏁 Script executed:

#!/bin/bash echo "=== Looking at the specific file and context around lines 2637-2643 ===" wc -l crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 164

🏁 Script executed:

#!/bin/bash echo "=== Reading frame.rs around lines 2631-2644 ===" sed -n '2625,2650p' crates/vm/src/frame.rs | cat -n echo "" echo "=== Reading the CALL_KW instruction handler context ===" sed -n '935,970p' crates/vm/src/frame.rs | cat -n

Repository: RustPython/RustPython

Length of output: 3193

🏁 Script executed:

#!/bin/bash echo "=== Checking CodeObject structure and construction ===" sed -n '1,100p' crates/vm/src/builtins/code.rs | cat -n echo "" echo "=== Looking for CodeObject or co_kwarg_names definition ===" rg -n 'struct CodeObject|co_kwarg_names|struct PyCode' crates/vm/src/builtins/code.rs | head -20

Repository: RustPython/RustPython

Length of output: 3785

🏁 Script executed:

#!/bin/bash echo "=== Looking for bytecode compilation/validation of CALL_KW metadata ===" rg -n 'CALL_KW|CallKw' crates --type rust | grep -i 'compile\|codegen\|bytecode' echo "" echo "=== Searching for validation of kwarg names in code construction ===" rg -n 'validate|check|assert' crates/vm/src/builtins/code.rs | head -20

Repository: RustPython/RustPython

Length of output: 1912

🏁 Script executed:

#!/bin/bash echo "=== Looking at CALL_KW bytecode generation in compile.rs ===" sed -n '7420,7450p' crates/codegen/src/compile.rs | cat -n echo "" echo "=== Finding where kwarg_names tuple is created from keyword arguments ===" rg -n 'kwarg_names|co_varnames|keyword.*names' crates/codegen/src/compile.rs -B 2 -A 2 | head -60

Repository: RustPython/RustPython

Length of output: 3198

🏁 Script executed:

#!/bin/bash echo "=== Checking how constants/code objects are validated during creation ===" rg -n 'validate|sanitize|check' crates/vm/src/builtins/code.rs -B 1 -A 3 echo "" echo "=== Looking for where the kwarg_names tuple is actually constructed/loaded ===" rg -n 'consts\[|constants\[' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 1470

🏁 Script executed:

#!/bin/bash echo "=== Examining the bytecode module to understand how constants are handled ===" find crates -name 'bytecode.rs' -o -name 'codegen.rs' | head -5 echo "" echo "=== Looking at what happens with keyword arg names in codegen ===" rg -n 'self.extend_args|keyword_names|list_comprehension' crates/codegen/src/compile.rs | head -30

Repository: RustPython/RustPython

Length of output: 246

🏁 Script executed:

#!/bin/bash echo "=== Finding the code() constructor implementation ===" sed -n '429,550p' crates/vm/src/builtins/code.rs | cat -n echo "" echo "=== Checking Constructor trait for PyCode ===" rg -n 'impl Constructor for PyCode' crates/vm/src/builtins/code.rs -A 100 | head -120

Repository: RustPython/RustPython

Length of output: 9386

🏁 Script executed:

#!/bin/bash echo "=== Searching for co_consts validation in code constructor ===" rg -n 'co_consts|consts.*validate|constants.*check' crates/vm/src/builtins/code.rs -B 2 -A 2 echo "" echo "=== Checking if code objects can be created with arbitrary consts ===" rg -n 'PyCodeNewArgs' crates/vm/src/builtins/code.rs -A 30 | head -50

Repository: RustPython/RustPython

Length of output: 2761

🏁 Script executed:

#!/bin/bash echo "=== Checking if there are any tests for malformed code objects ===" rg -n 'PyCodeNewArgs|test.*code|invalid.*consts' crates --type rust | head -20 echo "" echo "=== Verifying the exact error handling pattern used elsewhere for tuple elements ===" rg -n 'downcast.*unwrap\|downcast_ref.*unwrap' crates/vm/src/frame.rs | head -10

Repository: RustPython/RustPython

Length of output: 1003

Replace .unwrap() with fallible error handling to avoid VM panic on malformed CALL_KW metadata.

Line 2639 uses downcast_ref::<PyUtf8Str>().unwrap(), which panics if the kwarg_names tuple contains non-string values. Although the compiler ensures kwarg_names are strings at compile time, the code() constructor at lines 517-527 in crates/vm/src/builtins/code.rs accepts arbitrary consts without validation—unlike names, varnames, cellvars, and freevars which are validated as strings. A crafted code object could therefore cause a panic instead of raising a Python exception.

Either validate consts during code object construction (following the pattern of other tuple fields) or replace the unwrap with a fallible path that returns a proper Python exception. This aligns with Rust error handling best practices.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 2637 - 2643, The code in frame.rs maps kwarg_names using downcast_ref::<PyUtf8Str>().unwrap(), which will panic on non-string entries; replace that unwrap with fallible handling that raises a Python exception instead of unwinding the VM. Locate the kwarg_names iterator in frame.rs and change the downcast path to check downcast_ref and, on None, return a PyErr (or convert the mapping to a Result and propagate an Err) with a clear message about malformed CALL_KW metadata; alternatively validate consts at construction in crates/vm/src/builtins/code.rs to prevent non-string entries (mirror the validation used for names/varnames/cellvars/freevars) so downcast_ref::<PyUtf8Str>() cannot fail. Ensure the chosen approach uses the VM's error-return convention (no panics) and references kwarg_names and downcast_ref::<PyUtf8Str>() for locating the fix.

crates/vm/src/import.rs

crates/vm/src/stdlib/ctypes/simple.rs

crates/vm/src/stdlib/operator.rs

crates/vm/src/vm/vm_new.rs

ShaharNaveh · 2026-02-25T08:42:46Z

crates/vm/src/builtins/function.rs

    }
-
-    #[pygetset]
-    fn __qualname__(&self, vm: &VirtualMachine) -> PyResult {


Why was this deleted?

it was redundant

ShaharNaveh · 2026-02-25T08:44:58Z

crates/vm/src/builtins/str.rs

        self.data.as_str()
    }

+    /// Returns `&str`, panicking if the string contains surrogates.


Suggested change

/// Returns `&str`, panicking if the string contains surrogates.

/// Returns [`&str`].

///

/// # Panic

/// If the string contains surrogates.

github-actions · 2026-02-25T08:49:52Z

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_tstring.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on tstring)

Legend:

[+] path exists in CPython
[x] up-to-date, [ ] outdated

ShaharNaveh

Looks great!

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

crates/vm/src/stdlib/codecs.rs (1)

93-104: ⚠️ Potential issue | 🟡 Minor

Add embedded-NUL validation in register_error for API consistency.

lookup_error and _unregister_error reject names containing '\0', but register_error currently accepts them. That can register names that cannot be looked up/unregistered via this API surface.

Suggested patch

 fn register_error(
     name: PyUtf8StrRef,
     handler: PyObjectRef,
     vm: &VirtualMachine,
 ) -> PyResult<()> {
+    if name.as_str().contains('\0') {
+        return Err(cstring_error(vm));
+    }
     if !handler.is_callable() {
         return Err(vm.new_type_error("handler must be callable".to_owned()));
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/codecs.rs` around lines 93 - 104, register_error
currently allows names with embedded NULs, which breaks consistency with
lookup_error and _unregister_error; add the same validation in register_error by
checking name.as_str().contains('\0') and returning a VM ValueError (e.g. via
vm.new_value_error with the same message used by lookup_error/_unregister_error)
before calling register_error on codec_registry, and keep the existing callable
check and rest of the function unchanged.

crates/stdlib/src/ssl.rs (2)

1846-1868: ⚠️ Potential issue | 🟡 Minor

Use validate_hostname() in _wrap_socket to avoid validation drift.

_wrap_socket manually reimplements hostname checks, while _wrap_bio calls validate_hostname(). The current _wrap_socket path misses the max-length check and can diverge further over time.

♻️ Suggested simplification

-                Some(hostname_str) => {
-                    let hostname = hostname_str.as_str();
-
-                    // Validate hostname
-                    if hostname.is_empty() {
-                        return Err(vm.new_value_error("server_hostname cannot be an empty string"));
-                    }
-
-                    // Check if it starts with a dot
-                    if hostname.starts_with('.') {
-                        return Err(vm.new_value_error("server_hostname cannot start with a dot"));
-                    }
-
-                    // IP addresses are allowed
-                    // SNI will not be sent for IP addresses
-
-                    // Check for NULL bytes
-                    if hostname.contains('\0') {
-                        return Err(vm.new_type_error("embedded null character"));
-                    }
-
-                    Some(hostname.to_string())
-                }
+                Some(hostname_str) => {
+                    let hostname = hostname_str.as_str();
+                    validate_hostname(hostname, vm)?;
+                    Some(hostname.to_string())
+                }

As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/ssl.rs` around lines 1846 - 1868, _wrap_socket currently
reimplements hostname validation on args.server_hostname (checking empty,
leading dot, embedded NUL) while _wrap_bio uses validate_hostname(), causing
possible drift and missing the max-length check; update _wrap_socket to call
validate_hostname(vm, args.server_hostname.into_option().flatten()) (or call
validate_hostname with the extracted Option<String>) and use its returned
validated hostname (or propagate its error) instead of the inline checks,
removing the duplicated validation logic so both _wrap_socket and _wrap_bio
share the same validation path.

1089-1103: ⚠️ Potential issue | 🟡 Minor

Preserve Unicode conversion errors instead of falling through to generic type errors.

These blocks use if let Ok(PyUtf8StrRef::try_from_object(...)) and then fall back to bytes/type checks. For Python str values that fail UTF-8 conversion, this can mask the real conversion error and return an incorrect “must be str or bytes”/generic TypeError.

💡 Suggested fix pattern

- if let Ok(pwd_str) = PyUtf8StrRef::try_from_object(vm, p.clone()) {
-     Ok((Some(pwd_str.as_str().to_owned()), None))
+ if let Ok(pwd_str) = PyStrRef::try_from_object(vm, p.clone()) {
+     let pwd_str = pwd_str.try_into_utf8(vm)?;
+     Ok((Some(pwd_str.as_str().to_owned()), None))
  } else if let Ok(pwd_bytes_like) = ArgBytesLike::try_from_object(vm, p.clone()) {
      ...
  }

Apply the same pattern in load_dh_params, set_ecdh_curve, and password-callback handling so str-conversion errors propagate accurately.

Also applies to: 1750-1757, 1805-1812, 2022-2040

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/ssl.rs` around lines 1089 - 1103, The code handling
password_from_callable (and similar blocks in load_dh_params, set_ecdh_curve,
and other password-callback handling sites) currently uses
PyUtf8StrRef::try_from_object and falls through to bytes/type checks, which
swallows UTF-8 conversion errors; change the control flow so that after
attempting PyUtf8StrRef::try_from_object you explicitly propagate any UTF-8
conversion error (return Err with the specific conversion error) instead of
treating it as a generic non-str value—i.e., distinguish between “object is str
but conversion failed” versus “object is not str” and only fall back to
ArgBytesLike::try_from_object when the object is not a str; ensure the same
pattern is applied to the other occurrences (lines referenced around 1750-1757,
1805-1812, 2022-2040) so conversion failures are returned unchanged rather than
masked by a TypeError about expected str/bytes.

♻️ Duplicate comments (3)

crates/vm/src/vm/mod.rs (1)
638-659: Separator after msg now correctly added — LGTM.

The "{msg}: " format string properly separates the message from the subsequent repr output, addressing the prior concern about concatenated output.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 638 - 659, The msg string formatting
has been fixed by adding a separator; ensure the branch that constructs msg_str
uses format!("{msg}: ") (the variable is msg_str) before calling
write_to_stderr, and that the subsequent repr is obtained via object.repr(self)
and written using write_to_stderr with repr_wtf8, so no further change is
required other than keeping the current code path intact (msg_str,
write_to_stderr, object.repr, repr_wtf8).
crates/vm/src/builtins/str.rs (1)
1392-1394: Non-KeyError exceptions now propagate correctly.

The KeyError guard is in place and the prior review concern is fully resolved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 1392 - 1394, The match handling
in crates/vm/src/builtins/str.rs correctly guards KeyError with Err(e) if
e.fast_isinstance(vm.ctx.exceptions.key_error) => translated.push(cp) and
ensures all other Err(e) cases propagate via Err(e) => return Err(e); no code
changes are required—leave the existing arms in the match (the KeyError guard
and the generic Err propagation) as-is.
crates/stdlib/src/hashlib.rs (1)
221-229: Surrogate-string path now correctly falls through to UnsupportedDigestmodError.

The refactor from PyUtf8StrRef::try_from_object(vm, digestmod.clone())? to downcast_ref::<PyStr>() && name.to_str() is the right fix: to_str() returns None for surrogate-containing strings, causing them to fall through all branches and hit UnsupportedDigestmodError as intended. The __name__ attribute path (lines 226–231) is consistently updated with the same pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/hashlib.rs` around lines 221 - 229, The
surrogate-containing digestmod handling must consistently use
downcast_ref::<PyStr>() plus name.to_str() so surrogate strings return None and
fall through to UnsupportedDigestmodError; update the __name__-attribute branch
(the code that calls digestmod.get_attr("__name__", vm) and inspects name_obj)
to downcast to PyStr and call to_str() and then strip_prefix("openssl_") exactly
like the primary digestmod branch (referencing digestmod, PyStr, to_str,
get_attr("__name__", vm), strip_prefix and UnsupportedDigestmodError) so both
paths behave identically for surrogate-containing names.

🧹 Nitpick comments (5)

crates/vm/src/stdlib/ctypes/simple.rs (1)

121-121: code_points().count() == 1 scans the whole string; prefer the O(1) iterator pattern already used in value_to_bytes_endian.

Lines 121 and 334 use s.as_wtf8().code_points().count() == 1, which traverses every code point. Line 780 already uses the efficient destructuring check. For consistency and to avoid surprising O(n) behaviour on adversarially long inputs:

♻️ Proposed refactor (lines 120-125 and 333-338)

-            if let Some(s) = value.downcast_ref::<PyStr>() {
-                if s.as_wtf8().code_points().count() == 1 {
-                    Ok(value.to_owned())
-                } else {
-                    Err(vm.new_type_error("one character unicode string expected"))
-                }
-            } else {
+            if let Some(s) = value.downcast_ref::<PyStr>() {
+                let mut cps = s.as_wtf8().code_points();
+                if cps.next().is_some() && cps.next().is_none() {
+                    Ok(value.to_owned())
+                } else {
+                    Err(vm.new_type_error("one character unicode string expected"))
+                }
+            } else {

Same fix applies to line 334:

-            Some("u") => {
-                if let Some(s) = value.downcast_ref::<PyStr>()
-                    && s.as_wtf8().code_points().count() == 1
-                {
+            Some("u") => {
+                if let Some(s) = value.downcast_ref::<PyStr>() {
+                    let mut cps = s.as_wtf8().code_points();
+                    if cps.next().is_some() && cps.next().is_none() {

Also applies to: 334-334

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/simple.rs` at line 121, Replace the O(n) check
s.as_wtf8().code_points().count() == 1 with the O(1) iterator-destructuring
pattern used in value_to_bytes_endian (and at the code around line 780): create
a mutable iterator let mut it = s.as_wtf8().code_points(); then check if
it.next().is_some() and it.next().is_none() (or use if let (Some(cp), None) =
(it.next(), it.next()) to capture the single code point), and use that single
code point branch instead of counting all code points; apply this change for
both occurrences of s.as_wtf8().code_points().count() == 1.

crates/vm/src/vm/mod.rs (2)

1149-1149: Redundant use crate::protocol::TraceEvent inside resume_gen_frame.

TraceEvent is already imported at module level (line 37). The local use on line 1149 is unnecessary.

♻️ Remove the redundant import

-        use crate::protocol::TraceEvent;
         match self.trace_event(TraceEvent::Call, None) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` at line 1149, The local declaration use
crate::protocol::TraceEvent inside the function resume_gen_frame is redundant
because TraceEvent is already imported at module scope; remove that local use
statement from resume_gen_frame and ensure no other local shadowing occurs so
the module-level TraceEvent is used consistently.

872-889: Inconsistent key usage between Pass 1 and Pass 2 of module_clear_dict.

In Pass 1 (line 875), set_item is called with key_str (the &Py<PyStr> reference), while in Pass 2 (line 888), it's called with key_str.as_wtf8(). Similarly, the __builtins__ guard uses name != "__builtins__" (Wtf8 comparison) in Pass 1 but key_str.as_bytes() != b"__builtins__" (byte comparison) in Pass 2. Both work for ASCII, but the inconsistency reduces clarity.

♻️ Suggestion: unify the two passes

         // Pass 1: names starting with '_' (except __builtins__)
         for (key, value) in dict.into_iter().collect::<Vec<_>>() {
             if vm.is_none(&value) {
                 continue;
             }
             if let Some(key_str) = key.downcast_ref::<PyStr>() {
                 let name = key_str.as_wtf8();
                 if name.starts_with("_") && name != "__builtins__" {
                     let _ = dict.set_item(key_str, none.clone(), vm);
                 }
             }
         }
 
         // Pass 2: all remaining (except __builtins__)
         for (key, value) in dict.into_iter().collect::<Vec<_>>() {
             if vm.is_none(&value) {
                 continue;
             }
             if let Some(key_str) = key.downcast_ref::<PyStr>()
-                && key_str.as_bytes() != b"__builtins__"
+                && key_str.as_wtf8() != "__builtins__"
             {
-                let _ = dict.set_item(key_str.as_wtf8(), none.clone(), vm);
+                let _ = dict.set_item(key_str, none.clone(), vm);
             }
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 872 - 889, module_clear_dict uses
inconsistent key types between the two passes: Pass 1 uses the PyStr reference
(key_str) and compares via Wtf8, while Pass 2 uses byte slices and
key_str.as_wtf8() for set_item; unify them by consistently converting the PyStr
to a single representation (e.g., let name = key_str.as_wtf8()) in both passes,
use name for the "__builtins__" comparison (name != "__builtins__"), and pass
name to dict.set_item in both places so the key handling and comparisons are
identical across Pass 1 and Pass 2.

crates/vm/src/builtins/str.rs (2)

2123-2129: Simplify to match PyUtf8Str::as_str() style.

The unwrap_or_else + debug_assert! + unreachable_unchecked() triple is more complex than the equivalent pattern already used in PyUtf8Str::as_str() at line 2088. Prefer consistency:

♻️ Proposed simplification

-    pub fn as_str(&self) -> &str {
-        self.as_pystr().to_str().unwrap_or_else(|| {
-            debug_assert!(false, "PyUtf8Str invariant violated");
-            // Safety: PyUtf8Str guarantees valid UTF-8
-            unsafe { core::hint::unreachable_unchecked() }
-        })
-    }
+    pub fn as_str(&self) -> &str {
+        debug_assert!(
+            self.as_pystr().is_utf8(),
+            "PyUtf8Str invariant violated: inner string is not valid UTF-8"
+        );
+        // Safety: PyUtf8Str guarantees valid UTF-8
+        unsafe { self.as_pystr().to_str().unwrap_unchecked() }
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 2123 - 2129, The as_str
implementation uses an unwrap_or_else + debug_assert! + unreachable_unchecked()
pattern; change it to the simpler, consistent style used by PyUtf8Str::as_str():
call self.as_pystr().to_str() and unwrap/expect directly with the same invariant
message (or the exact handling used in PyUtf8Str::as_str()) so the code in
as_str matches PyUtf8Str::as_str() behavior and removes the unnecessary closure
and unsafe unreachable_unchecked usage.

2132-2143: Misleading method name: into_wtf8 returns PyStrRef, not a Wtf8 type.

into_wtf8 implies the result is Wtf8Buf or &Wtf8. The actual return type is PyStrRef (PyRef<PyStr>). Rename to into_pystr for consistency with the Py<PyUtf8Str>::as_pystr() method established at line 2114. This requires updates across 13 call sites in 7 files (str.rs, imp.rs, io.rs, codecs.rs, socket.rs, json.rs, type.rs).

♻️ Proposed rename

 impl PyRef<PyUtf8Str> {
-    /// Convert to PyStrRef. Safe because PyUtf8Str is a subtype of PyStr.
-    pub fn into_wtf8(self) -> PyStrRef {
+    /// Convert to PyStrRef. Safe because PyUtf8Str is a subtype of PyStr.
+    pub fn into_pystr(self) -> PyStrRef {
         unsafe { mem::transmute::<Self, PyStrRef>(self) }
     }
 }

 impl From<PyRef<PyUtf8Str>> for PyRef<PyStr> {
     fn from(s: PyRef<PyUtf8Str>) -> Self {
-        s.into_wtf8()
+        s.into_pystr()
     }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 2132 - 2143, The method name
into_wtf8 on impl PyRef<PyUtf8Str> is misleading because it returns PyStrRef
(PyRef<PyStr>); rename the method to into_pystr to match the existing
Py<PyUtf8Str>::as_pystr() naming and update its definition (impl
PyRef<PyUtf8Str>::into_wtf8 -> into_pystr) and every call site that uses
into_wtf8 across the codebase (str.rs, imp.rs, io.rs, codecs.rs, socket.rs,
json.rs, type.rs) so they call into_pystr instead; ensure the signature still
returns PyStrRef and keep the unsafe mem::transmute implementation unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/stdlib/src/hashlib.rs`:
- Line 82: The argument parsing currently uses PyUtf8StrRef in FromArgs (e.g.,
the parameter with name: PyUtf8StrRef) which calls ensure_valid_utf8 and raises
UnicodeEncodeError for surrogate-containing names; to make behavior consistent
with resolve_digestmod which uses downcast_ref::<PyStr>() and .to_str() and maps
surrogates to UnsupportedDigestmodError/new_value_error, switch the relevant
FromArgs parameter type from PyUtf8StrRef to PyStrRef and defer UTF-8 validation
by calling .to_str() inside the function body where you can map Err to
vm.new_value_error(...) (matching resolve_digestmod and CPython behavior),
ensuring both argument parsing and resolve_digestmod follow the same error path.

In `@crates/vm/src/builtins/namespace.rs`:
- Around line 159-161: The current repr builder in SimpleNamespace drops keys
whose key_str.as_wtf8().is_empty() — remove that early-continue so empty-string
keys are treated like any other key and rendered (producing e.g. namespace(=1)
for a key ""), i.e., delete the empty-key skip in the block around
key_str.as_wtf8() and ensure the code that appends key and value still runs for
empty keys so the empty-name + '=' + value appears in the output.

In `@crates/vm/src/builtins/type.rs`:
- Around line 1564-1567: The slot-name mangling path currently calls
member.to_str().expect(...), which can panic; change it to handle the UTF-8
conversion fallibly and return a Python error instead. Replace
member.to_str().expect(...) with a fallible conversion (e.g.,
member.to_str().map_err(|_| vm.new_type_error("slot name is not valid UTF-8"))?)
and propagate the error (using ?), then pass the obtained &str into mangle_name
(keeps class_name and mangle_name unchanged); ensure the surrounding function
signature returns a PyResult or the appropriate error type so the propagated
error is returned to Python instead of aborting.
- Line 1883: The delete-miss error path still calls attr_name.as_str() which can
panic for surrogate-containing (non-UTF8) names; instead build the error message
using a WTF-8-safe representation (e.g., use the existing attr_name.as_wtf8() /
the interned string from vm.ctx.intern_str(...) or otherwise escape surrogate
codepoints) so formatting never calls as_str(); update the code paths in the
delete-miss handling that currently reference attr_name.as_str() to use the
WTF-8-safe conversion/escaped form (referencing attr_name, vm.ctx.intern_str,
attr_name.as_wtf8(), and the delete-miss error construction) to eliminate the
panic.

In `@crates/vm/src/vm/mod.rs`:
- Around line 810-812: Replace the failing UTF-8-only downcast with a WTF-8-safe
conversion: where you currently call key.downcast_ref::<PyUtf8Str>().map(|s|
s.as_str().to_owned()).unwrap_or_default(), change it to use PyStr
(key.downcast_ref::<PyStr>()) and convert with .as_wtf8().to_string() (or
.to_string_lossy()) so non-UTF-8 names are preserved; update the expression that
assigns name accordingly (keep the unwrap_or_default fallback).

---

Outside diff comments:
In `@crates/stdlib/src/ssl.rs`:
- Around line 1846-1868: _wrap_socket currently reimplements hostname validation
on args.server_hostname (checking empty, leading dot, embedded NUL) while
_wrap_bio uses validate_hostname(), causing possible drift and missing the
max-length check; update _wrap_socket to call validate_hostname(vm,
args.server_hostname.into_option().flatten()) (or call validate_hostname with
the extracted Option<String>) and use its returned validated hostname (or
propagate its error) instead of the inline checks, removing the duplicated
validation logic so both _wrap_socket and _wrap_bio share the same validation
path.
- Around line 1089-1103: The code handling password_from_callable (and similar
blocks in load_dh_params, set_ecdh_curve, and other password-callback handling
sites) currently uses PyUtf8StrRef::try_from_object and falls through to
bytes/type checks, which swallows UTF-8 conversion errors; change the control
flow so that after attempting PyUtf8StrRef::try_from_object you explicitly
propagate any UTF-8 conversion error (return Err with the specific conversion
error) instead of treating it as a generic non-str value—i.e., distinguish
between “object is str but conversion failed” versus “object is not str” and
only fall back to ArgBytesLike::try_from_object when the object is not a str;
ensure the same pattern is applied to the other occurrences (lines referenced
around 1750-1757, 1805-1812, 2022-2040) so conversion failures are returned
unchanged rather than masked by a TypeError about expected str/bytes.

In `@crates/vm/src/stdlib/codecs.rs`:
- Around line 93-104: register_error currently allows names with embedded NULs,
which breaks consistency with lookup_error and _unregister_error; add the same
validation in register_error by checking name.as_str().contains('\0') and
returning a VM ValueError (e.g. via vm.new_value_error with the same message
used by lookup_error/_unregister_error) before calling register_error on
codec_registry, and keep the existing callable check and rest of the function
unchanged.

---

Duplicate comments:
In `@crates/stdlib/src/hashlib.rs`:
- Around line 221-229: The surrogate-containing digestmod handling must
consistently use downcast_ref::<PyStr>() plus name.to_str() so surrogate strings
return None and fall through to UnsupportedDigestmodError; update the
__name__-attribute branch (the code that calls digestmod.get_attr("__name__",
vm) and inspects name_obj) to downcast to PyStr and call to_str() and then
strip_prefix("openssl_") exactly like the primary digestmod branch (referencing
digestmod, PyStr, to_str, get_attr("__name__", vm), strip_prefix and
UnsupportedDigestmodError) so both paths behave identically for
surrogate-containing names.

In `@crates/vm/src/builtins/str.rs`:
- Around line 1392-1394: The match handling in crates/vm/src/builtins/str.rs
correctly guards KeyError with Err(e) if
e.fast_isinstance(vm.ctx.exceptions.key_error) => translated.push(cp) and
ensures all other Err(e) cases propagate via Err(e) => return Err(e); no code
changes are required—leave the existing arms in the match (the KeyError guard
and the generic Err propagation) as-is.

In `@crates/vm/src/vm/mod.rs`:
- Around line 638-659: The msg string formatting has been fixed by adding a
separator; ensure the branch that constructs msg_str uses format!("{msg}: ")
(the variable is msg_str) before calling write_to_stderr, and that the
subsequent repr is obtained via object.repr(self) and written using
write_to_stderr with repr_wtf8, so no further change is required other than
keeping the current code path intact (msg_str, write_to_stderr, object.repr,
repr_wtf8).

---

Nitpick comments:
In `@crates/vm/src/builtins/str.rs`:
- Around line 2123-2129: The as_str implementation uses an unwrap_or_else +
debug_assert! + unreachable_unchecked() pattern; change it to the simpler,
consistent style used by PyUtf8Str::as_str(): call self.as_pystr().to_str() and
unwrap/expect directly with the same invariant message (or the exact handling
used in PyUtf8Str::as_str()) so the code in as_str matches PyUtf8Str::as_str()
behavior and removes the unnecessary closure and unsafe unreachable_unchecked
usage.
- Around line 2132-2143: The method name into_wtf8 on impl PyRef<PyUtf8Str> is
misleading because it returns PyStrRef (PyRef<PyStr>); rename the method to
into_pystr to match the existing Py<PyUtf8Str>::as_pystr() naming and update its
definition (impl PyRef<PyUtf8Str>::into_wtf8 -> into_pystr) and every call site
that uses into_wtf8 across the codebase (str.rs, imp.rs, io.rs, codecs.rs,
socket.rs, json.rs, type.rs) so they call into_pystr instead; ensure the
signature still returns PyStrRef and keep the unsafe mem::transmute
implementation unchanged.

In `@crates/vm/src/stdlib/ctypes/simple.rs`:
- Line 121: Replace the O(n) check s.as_wtf8().code_points().count() == 1 with
the O(1) iterator-destructuring pattern used in value_to_bytes_endian (and at
the code around line 780): create a mutable iterator let mut it =
s.as_wtf8().code_points(); then check if it.next().is_some() and
it.next().is_none() (or use if let (Some(cp), None) = (it.next(), it.next()) to
capture the single code point), and use that single code point branch instead of
counting all code points; apply this change for both occurrences of
s.as_wtf8().code_points().count() == 1.

In `@crates/vm/src/vm/mod.rs`:
- Line 1149: The local declaration use crate::protocol::TraceEvent inside the
function resume_gen_frame is redundant because TraceEvent is already imported at
module scope; remove that local use statement from resume_gen_frame and ensure
no other local shadowing occurs so the module-level TraceEvent is used
consistently.
- Around line 872-889: module_clear_dict uses inconsistent key types between the
two passes: Pass 1 uses the PyStr reference (key_str) and compares via Wtf8,
while Pass 2 uses byte slices and key_str.as_wtf8() for set_item; unify them by
consistently converting the PyStr to a single representation (e.g., let name =
key_str.as_wtf8()) in both passes, use name for the "__builtins__" comparison
(name != "__builtins__"), and pass name to dict.set_item in both places so the
key handling and comparisons are identical across Pass 1 and Pass 2.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29ede73 and 94d8dde.

⛔ Files ignored due to path filters (1)

Lib/test/test_tstring.py is excluded by !Lib/**

📒 Files selected for processing (14)

crates/stdlib/src/hashlib.rs
crates/stdlib/src/socket.rs
crates/stdlib/src/ssl.rs
crates/vm/src/builtins/module.rs
crates/vm/src/builtins/namespace.rs
crates/vm/src/builtins/str.rs
crates/vm/src/builtins/type.rs
crates/vm/src/intern.rs
crates/vm/src/macros.rs
crates/vm/src/stdlib/builtins.rs
crates/vm/src/stdlib/codecs.rs
crates/vm/src/stdlib/ctypes/simple.rs
crates/vm/src/vm/mod.rs
crates/vm/src/vm/vm_new.rs

🚧 Files skipped from review as they are similar to previous changes (2)

crates/vm/src/macros.rs
crates/vm/src/intern.rs

crates/stdlib/src/hashlib.rs

coderabbitai · 2026-02-25T08:58:04Z

crates/vm/src/builtins/namespace.rs

+                if key_str.as_wtf8().is_empty() {
+                    continue;
+                }


⚠️ Potential issue | 🟡 Minor

Do not skip empty string keys in SimpleNamespace.__repr__.

At Line 159, empty string keys are currently dropped, but they are valid string keys and should still appear in repr output (e.g., namespace(=1) in CPython-compatible behavior). This is a small but real behavior regression.

💡 Proposed fix

- if key_str.as_wtf8().is_empty() { - continue; - } if !first { result.push_str(", "); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/namespace.rs` around lines 159 - 161, The current repr builder in SimpleNamespace drops keys whose key_str.as_wtf8().is_empty() — remove that early-continue so empty-string keys are treated like any other key and rendered (producing e.g. namespace(=1) for a key ""), i.e., delete the empty-key skip in the block around key_str.as_wtf8() and ensure the code that appends key and value still runs for empty keys so the empty-name + '=' + value appears in the output.

crates/vm/src/builtins/type.rs

coderabbitai · 2026-02-25T08:58:04Z

crates/vm/src/vm/mod.rs

+                .downcast_ref::<PyUtf8Str>()
                .map(|s| s.as_str().to_owned())
                .unwrap_or_default();


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how PyUtf8Str is defined and whether PyStr keys would downcast to it ast-grep --pattern 'struct PyUtf8Str { $$$ }' echo "---" # Check if PyStr and PyUtf8Str are separate types or if PyStr contains PyUtf8Str rg -n 'pub struct PyUtf8Str' --type=rs -C5 echo "---" # Check downcast_ref implementations rg -n 'impl.*PyPayload.*for.*PyUtf8Str' --type=rs -C3

Repository: RustPython/RustPython

Length of output: 143

🏁 Script executed:

sed -n '800,820p' crates/vm/src/vm/mod.rs

Repository: RustPython/RustPython

Length of output: 757

🏁 Script executed:

rg -n 'pub struct PyStr' --type=rs -C3

Repository: RustPython/RustPython

Length of output: 91

🏁 Script executed:

# Find PyUtf8Str definition rg -n 'pub struct PyUtf8Str' -C5

Repository: RustPython/RustPython

Length of output: 659

🏁 Script executed:

# Find PyStr definition rg -n 'pub struct PyStr' -C5

Repository: RustPython/RustPython

Length of output: 1996

🏁 Script executed:

# Check the impl blocks for these types rg -n 'impl.*PyUtf8Str' -C2

Repository: RustPython/RustPython

Length of output: 8460

🏁 Script executed:

# Search for PyPayload implementations rg -n 'PyPayload.*for.*PyUtf8Str' -C3

Repository: RustPython/RustPython

Length of output: 443

🏁 Script executed:

# Find PyStr methods for string conversion rg -n 'impl PyStr' -A 20 crates/vm/src/builtins/str.rs | head -100

Repository: RustPython/RustPython

Length of output: 3288

🏁 Script executed:

# Look for as_str, as_wtf8, to_str methods on PyStr rg -n 'pub fn as_str|pub fn as_wtf8|pub fn to_str' crates/vm/src/builtins/str.rs | head -20

Repository: RustPython/RustPython

Length of output: 382

🏁 Script executed:

# Check actual usage patterns in the codebase - how is PyStr typically converted to string? rg -n 'downcast_ref::<PyStr>' --max-count=10

Repository: RustPython/RustPython

Length of output: 9458

Use downcast_ref::<PyStr>() with proper WTF-8 handling instead of PyUtf8Str.

Dictionary keys in sys.modules are PyStr, which can contain WTF-8 surrogates. The current downcast_ref::<PyUtf8Str>() will fail for any non-UTF-8 keys and silently convert them to empty strings via unwrap_or_default(), preventing their weakrefs from being recorded during finalization.

Change to:

let name = key .downcast_ref::<PyStr>() .map(|s| s.as_wtf8().to_string()) .unwrap_or_default();

Or use another safe conversion method like to_string_lossy(). This pattern is already used elsewhere in the codebase (e.g., type.rs:1968).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 810 - 812, Replace the failing UTF-8-only downcast with a WTF-8-safe conversion: where you currently call key.downcast_ref::<PyUtf8Str>().map(|s| s.as_str().to_owned()).unwrap_or_default(), change it to use PyStr (key.downcast_ref::<PyStr>()) and convert with .as_wtf8().to_string() (or .to_string_lossy()) so non-UTF-8 names are preserved; update the expression that assigns name accordingly (keep the unwrap_or_default fallback).

- Remove as_str() from PyStr/Py<PyStr> (was panicking on surrogates) - Add Wtf8Concat trait and concat! macro for WTF-8 formatting - Add impl From<&str> for &Wtf8 conversion - Add AsPyStr/DictKey impls for PyUtf8Str types - Migrate all call sites to as_wtf8(), to_str(), or PyUtf8Str - Fix exception message APIs to accept Wtf8Buf - Deduplicate inner-scope imports across modules

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

crates/vm/src/codecs.rs (1)

114-136: ⚠️ Potential issue | 🟡 Minor

Inconsistent errors parameter type with encode/decode methods.

The get_incremental_encoder and get_incremental_decoder methods use Option<PyStrRef> for the errors parameter, while encode and decode have been updated to Option<PyUtf8StrRef>. Since all three method groups handle the same error handler names (e.g., "strict", "ignore"), they should use the same type for consistency.

Update both methods to use Option<PyUtf8StrRef> with the same conversion pattern as encode and decode:

Suggested fix

     pub fn get_incremental_encoder(
         &self,
-        errors: Option<PyStrRef>,
+        errors: Option<PyUtf8StrRef>,
         vm: &VirtualMachine,
     ) -> PyResult {
         let args = match errors {
-            Some(e) => vec![e.into()],
+            Some(e) => vec![e.into_wtf8().into()],
             None => vec![],
         };
         vm.call_method(self.0.as_object(), "incrementalencoder", args)
     }

     pub fn get_incremental_decoder(
         &self,
-        errors: Option<PyStrRef>,
+        errors: Option<PyUtf8StrRef>,
         vm: &VirtualMachine,
     ) -> PyResult {
         let args = match errors {
-            Some(e) => vec![e.into()],
+            Some(e) => vec![e.into_wtf8().into()],
             None => vec![],
         };
         vm.call_method(self.0.as_object(), "incrementaldecoder", args)
     }

Note: Call sites in crates/vm/src/stdlib/io.rs (lines 2923, 2956) will need type alignment updates when this change is made.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/codecs.rs` around lines 114 - 136, Change the errors parameter
type for get_incremental_encoder and get_incremental_decoder from
Option<PyStrRef> to Option<PyUtf8StrRef> and mirror the same conversion pattern
used by encode/decode: convert the Option<PyUtf8StrRef> into the vm.call_method
args (e.g., map Some(e) to vec![e.into()] or the exact conversion used by
encode/decode) so the error-handler names use the same UTF-8 validated type;
also update call sites in crates/vm/src/stdlib/io.rs that invoke these methods
to pass Option<PyUtf8StrRef>.

crates/stdlib/src/csv.rs (1)

141-146: ⚠️ Potential issue | 🟠 Major

Character validation is byte-based, so valid 1-char Unicode inputs can be rejected.

These branches validate “single character” using byte iteration (as_bytes().exactly_one()), not character count. A single non-ASCII char (multi-byte UTF-8) fails this check incorrectly.

💡 Suggested direction

- s.as_bytes().iter().copied().exactly_one()
+ let ch = s
+     .try_as_utf8(vm)?
+     .chars()
+     .exactly_one()
+     .map_err(|_| vm.new_type_error(r#""<field>" must be a 1-character string"#))?;
+ let b = {
+     let mut tmp = [0u8; 4];
+     let encoded = ch.encode_utf8(&mut tmp);
+     if encoded.len() != 1 {
+         return Err(vm.new_type_error(r#""<field>" must be a single-byte character"#));
+     }
+     encoded.as_bytes()[0]
+ };

Apply the same pattern consistently for delimiter/quotechar/escapechar parsing.

Also applies to: 162-164, 183-188, 582-586, 628-633

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/csv.rs` around lines 141 - 146, The code currently
validates single characters by checking bytes
(s.as_bytes().iter().copied().exactly_one()), which rejects valid multi-byte
Unicode characters; replace those byte-based checks with character-based checks
using s.chars().exactly_one().map_err(...) so you extract a char (not a u8) and
return that char where delimiter/quotechar/escapechar are parsed; apply this
same change to all similar spots (the branches around the snippets at the other
mentioned occurrences) so delimiter, quotechar and escapechar use
s.chars().exactly_one() and the error message remains the same.

crates/stdlib/src/_asyncio.rs (1)

842-851: 🛠️ Refactor suggestion | 🟠 Major

Inconsistency: PyFuture still uses repr_str while PyTask uses repr_wtf8.

The AI summary claims PyFuture was updated to repr_wtf8, but the code still shows repr_str returning String. Meanwhile, PyTask (line 1940) implements repr_wtf8 returning Wtf8Buf. For consistency with the WTF-8 migration and to properly handle surrogate-containing data, PyFuture should also implement repr_wtf8.

♻️ Suggested update for consistency

 impl Representable for PyFuture {
-    fn repr_str(zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<String> {
+    fn repr_wtf8(zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<Wtf8Buf> {
         let class_name = zelf.class().name().to_string();
         if let Some(_guard) = ReprGuard::enter(vm, zelf.as_object()) {
             let info = get_future_repr_info(zelf.as_object(), vm)?;
-            Ok(format!("<{} {}>", class_name, info))
+            Ok(wtf8_concat!("<", class_name, " ", info, ">"))
         } else {
-            Ok(format!("<{} ...>", class_name))
+            Ok(Wtf8Buf::from(format!("<{} ...>", class_name)))
         }
     }
 }

,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/_asyncio.rs` around lines 842 - 851, The PyFuture
Representable impl currently defines repr_str returning String; change it to
implement repr_wtf8 to match PyTask and the WTF-8 migration: replace the
repr_str(&zelf, vm) -> PyResult<String> with repr_wtf8(&zelf, vm) ->
PyResult<Wtf8Buf>, keep the same ReprGuard usage and call to
get_future_repr_info(zelf.as_object(), vm) to produce the inner text, and
convert that text into a Wtf8Buf for the Ok(...) result; remove the old repr_str
implementation and ensure Wtf8Buf is imported/available in the module.

♻️ Duplicate comments (15)

src/shell.rs (1)

145-146: ⚠️ Potential issue | 🔴 Critical

Avoid panic-prone prompt extraction in the REPL loop.

Line 145 still uses expect_str(), which can panic for surrogate-containing sys.ps1/sys.ps2 values and reintroduces the crash this PR is meant to eliminate.

Suggested fix

-        let prompt = match prompt {
-            Ok(ref s) => s.expect_str(),
-            Err(_) => "",
-        };
+        let prompt = match prompt {
+            Ok(ref s) => s.to_str().unwrap_or(""),
+            Err(_) => "",
+        };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/shell.rs` around lines 145 - 146, The code in the REPL loop uses
expect_str(), which can panic on surrogate-containing sys.ps1/sys.ps2 values;
change the prompt extraction to a non-panicking, lossy conversion (e.g., replace
expect_str() with s.to_string_lossy().into_owned() or equivalent) and return an
owned String (use String::new() instead of "" in the Err arm) so the match on
Ok(ref s) => ... and Err(_) => ... produces the correct safe String without
risking a panic.

src/shell/helper.rs (1)

139-145: ⚠️ Potential issue | 🔴 Critical

expect_str() can still panic on surrogates, contradicting PR objectives.

The PR aims to remove panics from PyStr::as_str(), but expect_str() has the same issue—it panics when the string contains surrogates. This is inconsistent with the surrogate-safe handling (as_bytes(), as_wtf8()) used elsewhere in this function.
Suggested fix using fallible conversion
         Some((
             startpos,
             completions
                 .into_iter()
-                .map(|s| s.expect_str().to_owned())
+                .filter_map(|s| s.to_str().map(str::to_owned))
                 .collect(),
         ))
This skips completions containing surrogates rather than panicking, which is appropriate for shell autocompletion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/shell/helper.rs` around lines 139 - 145, The code uses expect_str() which
can panic on surrogate-containing strings; change the mapping over completions
to a fallible conversion and skip invalid entries instead of panicking. Replace
the map(|s| s.expect_str().to_owned()) call with a filter_map that calls the
non-panicking conversion (e.g. s.as_wtf8().ok().map(|v| v.to_owned()) or another
fallible API you use elsewhere like as_bytes()/as_wtf8()), so
invalid/surrogate-containing completions are filtered out rather than causing a
panic (update the iterator from map to filter_map accordingly around the
completions.into_iter()).

crates/vm/src/stdlib/msvcrt.rs (1)

85-87: ⚠️ Potential issue | 🔴 Critical

Replace expect_str() with surrogate-safe WTF-8 handling in wide-char paths.

Line 85 and Line 110 reintroduce panic-prone extraction (expect_str()), which can crash on surrogate-containing Python strings. Use WTF-8 code points in both functions instead.

Proposed fix

@@
-        let c = s
-            .expect_str()
-            .chars()
+        let c = s
+            .as_wtf8()
+            .code_points()
             .exactly_one()
             .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?;
-        unsafe { suppress_iph!(_putwch(c as u16)) };
+        unsafe { suppress_iph!(_putwch(c.to_u32() as u16)) };
@@
-        let c =
-            s.expect_str().chars().exactly_one().map_err(|_| {
+        let c = s
+            .as_wtf8()
+            .code_points()
+            .exactly_one()
+            .map_err(|_| {
                 vm.new_type_error("ungetwch() argument must be a string of length 1")
             })?;
-        let ret = unsafe { suppress_iph!(_ungetwch(c as u32)) };
+        let ret = unsafe { suppress_iph!(_ungetwch(c.to_u32())) };

Verification (expected: putwch/ungetwch use as_wtf8().code_points() and no expect_str() in those functions):

#!/bin/bash
rg -nP --type=rust -C4 '\bfn\s+(putwch|ungetwch)\b|expect_str\(\)|as_wtf8\(\)\.code_points\(\)|_putwch\(|_ungetwch\(' crates/vm/src/stdlib/msvcrt.rs

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

Also applies to: 110-110

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/msvcrt.rs` around lines 85 - 87, The calls to
expect_str() in the wide-char path of putwch and ungetwch can panic on
surrogate-containing Python strings; replace the expect_str() extraction with
surrogate-safe iteration by using the Python string's as_wtf8().code_points()
iterator and consume those code points where the current code expects
chars().exactly_one(). Update the implementations of putwch and ungetwch to
iterate over as_wtf8().code_points(), map/convert the WTF-8 code points into the
u16/u32 values used for _putwch/_ungetwch, and remove any expect_str() usage so
no panics occur on invalid UTF-8/surrogates.

crates/wasm/src/browser_module.rs (1)

67-67: ⚠️ Potential issue | 🔴 Critical

expect_str() reintroduces surrogate panics in user-facing paths

These call sites still panic on surrogate-containing Python strings, so the PR objective (“remove unexpected panic from string extraction”) is not met. Please switch these conversions to non-panicking paths (to_str() with Python error propagation, or a deliberate lossy conversion where acceptable).
Suggested verification script (read-only)
#!/bin/bash
set -euo pipefail

# 1) Confirm remaining panic-prone call sites in this module
rg -n '\.expect_str\(' crates/wasm/src/browser_module.rs

# 2) Inspect available safe string-conversion APIs in RustPython string types
rg -n 'fn (to_str|to_string_lossy|as_wtf8)\(' crates vm -g '*.rs'
As per coding guidelines: "Follow Rust best practices for error handling and memory management".

Also applies to: 73-73, 79-79, 87-87, 95-95, 171-171, 206-206, 215-215, 227-227, 244-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/wasm/src/browser_module.rs` at line 67, Several call sites (notably
the FetchResponseFormat::from_str(vm, s.expect_str())? usage) still use
expect_str(), which can panic on surrogate-containing Python strings; replace
these with non-panicking conversions by using the safe to_str() and propagate
the Python error (e.g., return Err or use ?), or where lossy conversion is
acceptable use to_string_lossy()/as_wtf8-like API. Update all occurrences
referenced in the comment (lines near FetchResponseFormat::from_str, and the
other noted expect_str() sites) to call the safe conversion on the Python string
type and handle the Result/Error instead of unwrapping, ensuring function
signatures propagate errors as needed.

crates/vm/src/stdlib/operator.rs (1)

583-586: ⚠️ Potential issue | 🟡 Minor

Preserve UTF-8 validation errors instead of collapsing to TypeError.

At Line 584–585, map_err(|_| vm.new_type_error("method name must be a string")) converts all failures into one TypeError. This also masks surrogate-related UTF-8 validation failures that should propagate distinctly.

Proposed fix

-            let name = name
-                .try_into_value(vm)
-                .map_err(|_| vm.new_type_error("method name must be a string"))?;
+            let name = name
+                .downcast::<PyStr>()
+                .map_err(|_| vm.new_type_error("method name must be a string"))?;
+            let name: PyUtf8StrRef = name.try_into_utf8(vm)?;
             Ok(Self { name, args })

#!/bin/bash
set -euo pipefail

# Verify current constructor behavior in operator::methodcaller
cat -n crates/vm/src/stdlib/operator.rs | sed -n '578,590p'

# Verify UTF-8 conversion path and error generation logic
rg -n "fn try_into_utf8|ensure_valid_utf8|new_unicode_encode_error" crates/vm/src/builtins/str.rs -A6 -B6

Expected verification: first command shows the blanket map_err; second command shows UTF-8 validation path capable of raising a Unicode error that should not be flattened.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/operator.rs` around lines 583 - 586, The current
conversion of name via try_into_value(vm) collapses all errors into
vm.new_type_error("method name must be a string"); update that mapping to
preserve UTF-8 validation errors by inspecting the original error returned from
name.try_into_value(vm) and: if it's a UTF-8/unicode validation error, propagate
it (e.g., return vm.new_unicode_encode_error(...) or forward the original
unicode error), otherwise convert non-unicode failures to
vm.new_type_error("method name must be a string"); change the closure used with
map_err (or replace map_err with a match on the Err) so try_into_value(vm) can
return the distinct unicode error instead of always creating a TypeError while
still returning TypeError for genuine non-string cases, leaving the rest of the
constructor (Self { name, args }) unchanged.

crates/vm/src/frame.rs (1)

2666-2672: ⚠️ Potential issue | 🟡 Minor

Replace .unwrap() with fallible error handling to avoid VM panic on malformed CALL_KW metadata.

The downcast_ref::<PyUtf8Str>().unwrap() at line 2669 can panic if the kwarg_names tuple contains non-string values. While the compiler ensures kwarg_names are strings at compile time, crafted code objects could bypass this validation. Compare with line 2688 which properly uses ok_or_else for the same pattern.
🛡️ Suggested fix with proper error handling
 let kwarg_names = kwarg_names.as_slice().iter().map(|pyobj| {
-    pyobj
+    Ok::<_, PyBaseExceptionRef>(pyobj
         .downcast_ref::<PyUtf8Str>()
-        .unwrap()
+        .ok_or_else(|| vm.new_type_error("keyword argument names must be strings"))?
         .as_str()
-        .to_owned()
+        .to_owned())
 });
Note: This would require updating the call site to handle the Result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 2666 - 2672, The code currently calls
downcast_ref::<PyUtf8Str>().unwrap() when iterating kwarg_names, which can panic
on malformed CALL_KW metadata; change this to fallible handling like using
downcast_ref::<PyUtf8Str>().ok_or_else(...) and propagate a proper Err (matching
the error type used elsewhere in this file) instead of unwrapping, update the
iterator/map to return Results (or collect into a Result<Vec<_>>) and adjust the
surrounding code/call site to handle the Result; refer to the kwarg_names
variable and mirror the pattern used at the similar site around line 2688 that
uses ok_or_else for guidance.

crates/vm/src/stdlib/nt.rs (1)

1191-1194: ⚠️ Potential issue | 🔴 Critical

Replace expect_str() with fallible conversion in exec argument/env paths.

Line 1193, Line 1234, and Lines 1257-1258 still use expect_str(), which can panic on surrogate-containing input. In these user-facing conversion paths, this should return a Python exception instead of panicking.

Suggested fix

         let argv = vm.extract_elements_with(argv.as_ref(), |obj| {
             let arg = PyStrRef::try_from_object(vm, obj)?;
-            make_widestring(arg.expect_str())
+            let arg = arg
+                .to_str()
+                .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;
+            make_widestring(arg)
         })?;

         let argv = vm.extract_elements_with(argv.as_ref(), |obj| {
             let arg = PyStrRef::try_from_object(vm, obj)?;
-            make_widestring(arg.expect_str())
+            let arg = arg
+                .to_str()
+                .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;
+            make_widestring(arg)
         })?;

         for (key, value) in env.into_iter() {
             let key = PyStrRef::try_from_object(vm, key)?;
             let value = PyStrRef::try_from_object(vm, value)?;
-            let key_str = key.expect_str();
-            let value_str = value.expect_str();
+            let key_str = key
+                .to_str()
+                .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;
+            let value_str = value
+                .to_str()
+                .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;

#!/bin/bash
set -euo pipefail

# Verify there are no panic-prone string extractions left in execv/execve conversion paths.
sed -n '1178,1275p' crates/vm/src/stdlib/nt.rs
rg -n -C2 'expect_str\(\)' crates/vm/src/stdlib/nt.rs

Expected result: no expect_str() matches within the execv/execve argv/env conversion blocks.

As per coding guidelines "Follow Rust best practices for error handling and memory management".

Also applies to: 1232-1235, 1255-1258

crates/vm/src/stdlib/os.rs (1)

495-497: ⚠️ Potential issue | 🔴 Critical

Replace expect_str() in Windows env APIs; this can panic on surrogate input.

putenv/unsetenv should return a Python exception here, not panic the VM.

Proposed fix

-        let key_str = key.expect_str();
-        let value_str = value.expect_str();
+        let key_str = key
+            .to_str()
+            .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?;
+        let value_str = value
+            .to_str()
+            .ok_or_else(|| vm.new_value_error("illegal environment variable value"))?;
@@
-        let key_str = key.expect_str();
+        let key_str = key
+            .to_str()
+            .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?;

#!/bin/bash
# Verify panic-prone usage in os module
rg -n -C2 "expect_str\\(" crates/vm/src/stdlib/os.rs

# Verify conversion semantics in PyStr implementation
rg -n -C3 "fn expect_str|fn to_str|str has surrogates" crates/vm/src/builtins/str.rs

Also applies to: 543-545

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/os.rs` around lines 495 - 497, The use of
PyStrRef::expect_str() in the Windows env APIs (e.g., putenv and unsetenv) can
panic on surrogate-containing input; replace calls to expect_str() with a
non-panicking conversion (e.g., using the PyStrRef method that returns a
Result/&str) and on error return an appropriate Python exception via the
VirtualMachine (e.g., vm.new_type_error or
vm.new_value_error/new_unicode_decode_error) instead of panicking so the
functions (putenv, unsetenv) return a PyResult::Err when given invalid/surrogate
input.

crates/vm/src/stdlib/functools.rs (1)

426-431: ⚠️ Potential issue | 🔴 Critical

Replace panic-prone expect_str() in keyword merge path.

Line 430 can panic on surrogate-containing PyStr keys. This is a runtime crash path in partial.__call__; use fallible conversion and raise TypeError instead.

🛠️ Proposed fix

-                let key_str = key
-                    .downcast_ref::<crate::builtins::PyStr>()
-                    .ok_or_else(|| vm.new_type_error("keywords must be strings"))?;
-                final_kwargs.insert(key_str.expect_str().to_owned(), value);
+                let key_str = key
+                    .downcast_ref::<crate::builtins::PyStr>()
+                    .and_then(|s| s.to_str())
+                    .ok_or_else(|| vm.new_type_error("keywords must be strings"))?;
+                final_kwargs.insert(key_str.to_owned(), value);

The expect_str() method explicitly documents panic behavior for surrogate-containing strings. The safe alternative to_str() is available and returns Option<&str>.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/functools.rs` around lines 426 - 431, Replace the
panic-prone expect_str() call when merging keyword keys: instead of calling
key_str.expect_str().to_owned(), use the fallible to_str() on the PyStr returned
by downcast_ref::<crate::builtins::PyStr>() and convert that Option<&str> into a
TypeError via vm.new_type_error if it is None; then insert the owned string
(e.g., key_str.to_str().ok_or_else(|| vm.new_type_error("keywords must be
strings"))?.to_owned()) into final_kwargs so surrogate-containing strings raise
a TypeError instead of panicking.

crates/stdlib/src/csv.rs (1)

297-302: ⚠️ Potential issue | 🟡 Minor

Use TypeError for non-string dialect names in lookup/remove paths.

At Line 297 and Line 317, invalid name types are mapped to _csv.Error. This should be TypeError to stay consistent with argument-type failures elsewhere (including register_dialect).

💡 Suggested fix

-        let name = name.downcast::<PyStr>().map_err(|obj| {
-            new_csv_error(
-                vm,
-                format!("argument 0 must be a string, not '{}'", obj.class()),
-            )
-        })?;
+        let name = name.downcast::<PyStr>().map_err(|obj| {
+            vm.new_type_error(format!("argument 0 must be a string, not '{}'", obj.class()))
+        })?;

Also applies to: 317-322

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/csv.rs` around lines 297 - 302, The code currently maps
non-string `name` arguments to `_csv.Error` by calling `new_csv_error` in the
`name.downcast::<PyStr>().map_err(...)` closures in the dialect lookup/remove
paths; change those closures to raise a `TypeError` instead (use the same
TypeError construction used by `register_dialect`/other arg-type failures) so
invalid `name` types produce a TypeError rather than `_csv.Error`; update both
occurrences around the `name.downcast::<PyStr>()` call (the block shown and the
similar block at the 317–322 region).

crates/vm/src/stdlib/ctypes.rs (1)

1119-1122: ⚠️ Potential issue | 🟠 Major

Validate PyStr as UTF-8 before passing raw pointer to FFI.

At Line 1120, s.as_bytes() can still forward non-UTF-8/WTF-8 data directly as char*. This makes C-side string interpretation ambiguous and can corrupt symbol/argument semantics. Gate this branch with UTF-8 validation first.

💡 Suggested fix

-            } else if let Some(s) = arg.downcast_ref::<crate::builtins::PyStr>() {
-                let ptr = s.as_bytes().as_ptr() as isize;
+            } else if let Some(s) = arg.downcast_ref::<crate::builtins::PyStr>() {
+                let ptr = s.try_as_utf8(vm)?.as_bytes().as_ptr() as isize;
                 arg_values.push(ptr);
                 arg_types.push(Type::pointer());
             } else {

Use this read-only check to confirm API behavior and call-site usage:

#!/bin/bash
set -euo pipefail

echo "== PyStr UTF-8/bytes APIs =="
rg -nP --type rust 'fn\s+as_bytes|fn\s+try_as_utf8|fn\s+as_wtf8' crates/vm/src/builtins/str.rs -A6 -B4

echo
echo "== ctypes FFI string-pointer path =="
rg -nP --type rust 'downcast_ref::<crate::builtins::PyStr>\(\)|as_bytes\(\)\.as_ptr\(\)|try_as_utf8\(' crates/vm/src/stdlib/ctypes.rs -A6 -B6

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes.rs` around lines 1119 - 1122, Gate the PyStr
branch with a UTF-8 validation: replace the raw s.as_bytes().as_ptr() usage with
a call to s.try_as_utf8() (or equivalent) and only push the pointer and
Type::pointer() when try_as_utf8() returns Ok(slice). If try_as_utf8() returns
Err, return or raise a Python/FFI error (e.g., TypeError/ValueError) instead of
passing a raw pointer so callers using
arg.downcast_ref::<crate::builtins::PyStr>() can't receive non-UTF-8 data via
arg_values/arg_types.

crates/vm/src/builtins/namespace.rs (1)

158-160: ⚠️ Potential issue | 🟡 Minor

Do not skip empty-string keys in SimpleNamespace.__repr__.

Line 158 drops "" keys, but they are still valid string keys and should be rendered in repr output.

💡 Proposed fix

-                if key_str.as_wtf8().is_empty() {
-                    continue;
-                }
                 if !first {
                     result.push_str(", ");
                 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/namespace.rs` around lines 158 - 160, The repr for
SimpleNamespace currently skips keys whose key_str.as_wtf8().is_empty(), which
incorrectly omits valid "" keys; in the SimpleNamespace.__repr__ implementation
remove the conditional that continues on empty key (i.e., stop filtering out
key_str.as_wtf8().is_empty()) so empty-string keys are included in the repr, and
ensure the existing key formatting logic (the same code paths that quote/escape
other keys) is used for the "" key so it renders as an empty quoted string in
the output.

crates/vm/src/stdlib/ctypes/function.rs (2)

756-759: ⚠️ Potential issue | 🟠 Major

Avoid expect() in _type_ probing to prevent VM aborts.

Line 758 can panic on a non-UTF8 _type_ value; this path should fail closed (return false), not abort.

💡 Proposed fix

-        let c = s
-            .to_str()
-            .expect("_type_ is validated as ASCII at type creation");
-        if c.len() == 1 && "sPzUZXO".contains(c) {
-            return true;
+        if let Some(c) = s.to_str() {
+            if c.len() == 1 && "sPzUZXO".contains(c) {
+                return true;
+            }
         }

#!/bin/bash
# Verify remaining panic-prone `_type_` conversion in this module.
rg -n "to_str\\(\\)\\s*\\.expect|_type_" --type rs crates/vm/src/stdlib/ctypes/function.rs -C 2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 756 - 759, The code
currently calls s.to_str().expect(...) when probing `_type_` (variable s) which
can panic on non-UTF8 and abort the VM; instead, change the conversion to handle
invalid UTF-8 gracefully (e.g., use s.to_str().ok() and return false if None) so
the `_type_` probe fails closed; update the branch that uses c (the resulting
&str) — the check if c.len() == 1 && "sPzUZXO".contains(c) — to only run when
the to_str() succeeded and otherwise return false from the probe function.

283-289: ⚠️ Potential issue | 🟠 Major

Use wchar_t-width conversion instead of hardcoded UTF-16 in conv_param.

Line 284 builds a Vec<u16> unconditionally. On non-Windows targets where wchar_t is 32-bit, this produces incorrectly encoded wide strings.

💡 Proposed fix

-        let wide: Vec<u16> = s
-            .as_wtf8()
-            .encode_wide()
-            .chain(core::iter::once(0))
-            .collect();
-        let wide_bytes: Vec<u8> = wide.iter().flat_map(|&x| x.to_ne_bytes()).collect();
+        let wide_bytes = super::base::str_to_wchar_bytes(s.as_wtf8());

#!/bin/bash
# Verify current hardcoded UTF-16 conversion and platform-aware helper availability.
rg -n "encode_wide\\(|Vec<u16>|str_to_wchar_bytes|wchar_t" --type rs crates/vm/src/stdlib/ctypes/function.rs crates/vm/src/stdlib/ctypes/base.rs

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 283 - 289, The
conversion in conv_param currently builds a Vec<u16> via
s.as_wtf8().encode_wide(), which hardcodes UTF-16 and breaks on platforms where
wchar_t is 32-bit; replace this with a platform-aware conversion that produces
bytes sized to wchar_t (use the existing str_to_wchar_bytes helper in
ctypes::base or add one that uses cty::wchar_t / cfg(target_family = "windows")
to pick u16 vs u32), and then assign wide_bytes from that helper instead of from
wide.iter().flat_map(...); update the code paths referencing wide_bytes in
conv_param to use the new helper so wide/terminator encoding matches the target
wchar_t width.

crates/vm/src/vm/mod.rs (1)

809-812: ⚠️ Potential issue | 🟡 Minor

Use PyStr instead of PyUtf8Str for sys.modules keys.

Dictionary keys in sys.modules are PyStr, which can contain WTF-8 surrogates. The current downcast_ref::<PyUtf8Str>() will fail for any non-UTF-8 keys and silently convert them to empty strings via unwrap_or_default(), preventing their weakrefs from being properly recorded during finalization.

Suggested fix

         for (key, value) in items {
             let name = key
-                .downcast_ref::<PyUtf8Str>()
-                .map(|s| s.as_str().to_owned())
+                .downcast_ref::<PyStr>()
+                .map(|s| s.as_wtf8().to_string())
                 .unwrap_or_default();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 809 - 812, The code extracts module key
names using key.downcast_ref::<PyUtf8Str>() which fails for non-UTF-8 (WTF-8)
keys and falls back to empty string; change the downcast to use PyStr instead
(e.g. key.downcast_ref::<PyStr>()) and obtain the string representation that
preserves WTF-8 content (use PyStr's appropriate to-owned/string conversion) so
names from sys.modules are recorded correctly instead of becoming empty via
unwrap_or_default(); update the expression that builds name and remove the
unsafe fallback to empty string.

🧹 Nitpick comments (5)

crates/common/src/str.rs (1)

261-267: Consider tracking this TODO or addressing it in this PR.

Since the PR objective is specifically about establishing the "as" vs "to" naming convention (zero-cost vs small-cost conversions), it would be consistent to rename as_str to to_str as part of this PR rather than leaving it as a TODO. If intentionally deferred, consider creating a follow-up issue to track this.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/str.rs` around lines 261 - 267, Rename the method as_str to
to_str to follow the repo's "as" (zero-cost) vs "to" (small-cost) naming
convention: update the function signature (pub fn to_str(&self) -> Option<&str>)
replacing all uses of as_str, update any doc comments/tests/benchmarks
referencing as_str, and remove the TODO comment; if there are external usages
you can't change in this PR, add a deprecated shim pub fn as_str(&self) ->
Option<&str> that forwards to to_str and mark it with #[deprecated] so callers
get migrated in follow-ups.

crates/vm/src/utils.rs (1)

58-62: Avoid hard panic in shared repr helper.

Line 61 uses expect(...); a future empty-iterator caller would panic in repr code. Prefer graceful handling.

♻️ Suggested change

-        let first = parts_iter
-            .next()
-            .transpose()?
-            .expect("this is not called for empty collection");
-        repr.push_wtf8(first.as_wtf8());
-        for part in parts_iter {
-            repr.push_str(", ");
-            repr.push_wtf8(part?.as_wtf8());
-        }
+        if let Some(first) = parts_iter.next().transpose()? {
+            repr.push_wtf8(first.as_wtf8());
+            for part in parts_iter {
+                repr.push_str(", ");
+                repr.push_wtf8(part?.as_wtf8());
+            }
+        }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/utils.rs` around lines 58 - 62, The code currently calls
parts_iter.next().transpose()?.expect(...) which will panic on an empty
iterator; replace the expect-driven panic with graceful handling: match or if
let on parts_iter.next().transpose()? and when it yields None return an
appropriate Err (using the function's Result/error type) with a clear message
instead of panicking, otherwise bind to first and continue to call
repr.push_wtf8(first.as_wtf8()); update any signatures if needed to propagate
the error rather than panicking.

crates/stdlib/src/_asyncio.rs (1)

936-944: Inconsistent error handling compared to get_future_repr_info.

Line 942 uses x.str(vm)? which propagates errors, while get_future_repr_info (lines 902-904) uses if let Ok(s) = x.str(vm) to silently skip failures. This inconsistency could cause repr() to fail for tasks but succeed for futures when string conversion fails.

For repr functions, silent failure handling is typically preferred to avoid cascading errors.
♻️ Suggested fix for consistency
 let mut result = Wtf8Buf::new();
 let parts = list.borrow_vec();
 for (i, x) in parts.iter().enumerate() {
     if i > 0 {
         result.push_str(" ");
     }
-    result.push_wtf8(x.str(vm)?.as_wtf8());
+    if let Ok(s) = x.str(vm) {
+        result.push_wtf8(s.as_wtf8());
+    }
 }
 Ok(result)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/_asyncio.rs` around lines 936 - 944, The repr builder
currently propagates string-conversion errors via x.str(vm)? causing repr() to
fail; make it consistent with get_future_repr_info by silently skipping items
that fail to convert: replace the x.str(vm)? usage in the loop that builds
result (the parts.iter().enumerate() loop) with an if let Ok(s) = x.str(vm) { if
i > 0 { result.push_str(" "); } result.push_wtf8(s.as_wtf8()); } so failures are
ignored and repr returns Ok(result) as before.

crates/stdlib/src/ssl.rs (1)

811-825: Unify hostname validation across all server_hostname entry points.

The newer server_hostname surfaces are now centralized on UTF-8 types, but _wrap_socket still has its own manual checks and does not apply the same full validation path as set_server_hostname/_wrap_bio (notably the max-length rule in validate_hostname). Reusing the helper would keep behavior consistent.

♻️ Suggested change

-            let hostname = match args.server_hostname.into_option().flatten() {
-                Some(hostname_str) => {
-                    let hostname = hostname_str.as_str();
-
-                    // Validate hostname
-                    if hostname.is_empty() {
-                        return Err(vm.new_value_error("server_hostname cannot be an empty string"));
-                    }
-
-                    // Check if it starts with a dot
-                    if hostname.starts_with('.') {
-                        return Err(vm.new_value_error("server_hostname cannot start with a dot"));
-                    }
-
-                    // IP addresses are allowed
-                    // SNI will not be sent for IP addresses
-
-                    // Check for NULL bytes
-                    if hostname.contains('\0') {
-                        return Err(vm.new_type_error("embedded null character"));
-                    }
-
-                    Some(hostname.to_string())
-                }
-                None => None,
-            };
+            let hostname = match args.server_hostname.into_option().flatten() {
+                Some(hostname_str) => {
+                    let hostname = hostname_str.as_str();
+                    validate_hostname(hostname, vm)?;
+                    Some(hostname.to_string())
+                }
+                None => None,
+            };

Also applies to: 3902-3920

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/ssl.rs` around lines 811 - 825, _unify hostname validation
by routing all server_hostname entry points through the existing
validate_hostname helper (the same used by set_server_hostname and _wrap_bio)
instead of doing ad-hoc checks; specifically update _wrap_socket handling and
the WrapBioArgs/WrapSocketArgs server_hostname parsing to accept the UTF-8
variant and call validate_hostname (including the max-length rule) before using
the hostname, ensuring you reuse the same validation path and error reporting as
validate_hostname so behavior is consistent across set_server_hostname,
_wrap_bio, and _wrap_socket.

crates/vm/src/builtins/str.rs (1)

2120-2130: Verify safety of unreachable_unchecked in Py<PyUtf8Str>::as_str().

The unreachable_unchecked() in the else branch relies on the PyUtf8Str invariant (always valid UTF-8). While the debug_assert! catches violations in debug builds, in release builds a corrupted PyUtf8Str could cause undefined behavior. Consider whether a simpler approach without unsafe would suffice:

🔒 Safer alternative

     pub fn as_str(&self) -> &str {
-        self.as_pystr().to_str().unwrap_or_else(|| {
-            debug_assert!(false, "PyUtf8Str invariant violated");
-            // Safety: PyUtf8Str guarantees valid UTF-8
-            unsafe { core::hint::unreachable_unchecked() }
-        })
+        // Safety: PyUtf8Str invariant guarantees valid UTF-8
+        // The type can only be constructed from valid UTF-8 sources
+        unsafe { self.as_pystr().to_str().unwrap_unchecked() }
     }

This is slightly cleaner and the safety comment is more direct. The current implementation is functionally equivalent but the unwrap_or_else with unreachable_unchecked is a bit convoluted.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/str.rs` around lines 2120 - 2130, The use of unsafe
unreachable_unchecked in Py<PyUtf8Str>::as_str() is unnecessary and can cause UB
in release builds if the invariant is violated; replace the unwrap_or_else +
debug_assert + unreachable_unchecked pattern by a safe unwrap/expect on
as_pystr().to_str() (e.g., call to_str().expect("PyUtf8Str invariant violated"))
so any invariant break panics rather than invoking undefined behavior, keeping
the error message for debugging while removing unsafe code; locate the as_str
method and change the fallback from unreachable_unchecked to a safe
expect/unwrap on the result of as_pystr().to_str().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/stdlib/src/hashlib.rs`:
- Around line 228-231: The __name__ path preserves case causing mixed-case
algorithm names to slip through; update the branch that binds name_str and algo
(the let Some(name_str) / let Some(algo) = name_str.strip_prefix("openssl_")
branch) to normalize algo to lowercase before returning (e.g., replace returning
algo.to_owned() with returning a lowercased version such as
algo.to_ascii_lowercase()) so it matches the behavior of the direct str branch.

In `@crates/stdlib/src/socket.rs`:
- Around line 3024-3027: The code currently uses .filter_map(Result::ok) and
.next().unwrap() on the iterator returned by dns_lookup::getaddrinfo, which
silently drops Errs and panics if there are no Ok entries; update the logic in
the function that returns Result<(String, String), IoOrPyException> to
explicitly handle iterator errors: iterate the dns_lookup::getaddrinfo(...)
iterator (or collect into a Vec<Result<_,_>>), look for the first Ok value to
use as ainfo, and if none exist return an appropriate IoOrPyException by
converting the first/last Err via convert_socket_error(vm, err,
SocketError::GaiError) instead of panicking; remove .filter_map(Result::ok) and
.unwrap() and reference the iterator variable (res) and ainfo when implementing
this error-aware selection.

---

Outside diff comments:
In `@crates/stdlib/src/_asyncio.rs`:
- Around line 842-851: The PyFuture Representable impl currently defines
repr_str returning String; change it to implement repr_wtf8 to match PyTask and
the WTF-8 migration: replace the repr_str(&zelf, vm) -> PyResult<String> with
repr_wtf8(&zelf, vm) -> PyResult<Wtf8Buf>, keep the same ReprGuard usage and
call to get_future_repr_info(zelf.as_object(), vm) to produce the inner text,
and convert that text into a Wtf8Buf for the Ok(...) result; remove the old
repr_str implementation and ensure Wtf8Buf is imported/available in the module.

In `@crates/stdlib/src/csv.rs`:
- Around line 141-146: The code currently validates single characters by
checking bytes (s.as_bytes().iter().copied().exactly_one()), which rejects valid
multi-byte Unicode characters; replace those byte-based checks with
character-based checks using s.chars().exactly_one().map_err(...) so you extract
a char (not a u8) and return that char where delimiter/quotechar/escapechar are
parsed; apply this same change to all similar spots (the branches around the
snippets at the other mentioned occurrences) so delimiter, quotechar and
escapechar use s.chars().exactly_one() and the error message remains the same.

In `@crates/vm/src/codecs.rs`:
- Around line 114-136: Change the errors parameter type for
get_incremental_encoder and get_incremental_decoder from Option<PyStrRef> to
Option<PyUtf8StrRef> and mirror the same conversion pattern used by
encode/decode: convert the Option<PyUtf8StrRef> into the vm.call_method args
(e.g., map Some(e) to vec![e.into()] or the exact conversion used by
encode/decode) so the error-handler names use the same UTF-8 validated type;
also update call sites in crates/vm/src/stdlib/io.rs that invoke these methods
to pass Option<PyUtf8StrRef>.

---

Duplicate comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 297-302: The code currently maps non-string `name` arguments to
`_csv.Error` by calling `new_csv_error` in the
`name.downcast::<PyStr>().map_err(...)` closures in the dialect lookup/remove
paths; change those closures to raise a `TypeError` instead (use the same
TypeError construction used by `register_dialect`/other arg-type failures) so
invalid `name` types produce a TypeError rather than `_csv.Error`; update both
occurrences around the `name.downcast::<PyStr>()` call (the block shown and the
similar block at the 317–322 region).

In `@crates/vm/src/builtins/namespace.rs`:
- Around line 158-160: The repr for SimpleNamespace currently skips keys whose
key_str.as_wtf8().is_empty(), which incorrectly omits valid "" keys; in the
SimpleNamespace.__repr__ implementation remove the conditional that continues on
empty key (i.e., stop filtering out key_str.as_wtf8().is_empty()) so
empty-string keys are included in the repr, and ensure the existing key
formatting logic (the same code paths that quote/escape other keys) is used for
the "" key so it renders as an empty quoted string in the output.

In `@crates/vm/src/frame.rs`:
- Around line 2666-2672: The code currently calls
downcast_ref::<PyUtf8Str>().unwrap() when iterating kwarg_names, which can panic
on malformed CALL_KW metadata; change this to fallible handling like using
downcast_ref::<PyUtf8Str>().ok_or_else(...) and propagate a proper Err (matching
the error type used elsewhere in this file) instead of unwrapping, update the
iterator/map to return Results (or collect into a Result<Vec<_>>) and adjust the
surrounding code/call site to handle the Result; refer to the kwarg_names
variable and mirror the pattern used at the similar site around line 2688 that
uses ok_or_else for guidance.

In `@crates/vm/src/stdlib/ctypes.rs`:
- Around line 1119-1122: Gate the PyStr branch with a UTF-8 validation: replace
the raw s.as_bytes().as_ptr() usage with a call to s.try_as_utf8() (or
equivalent) and only push the pointer and Type::pointer() when try_as_utf8()
returns Ok(slice). If try_as_utf8() returns Err, return or raise a Python/FFI
error (e.g., TypeError/ValueError) instead of passing a raw pointer so callers
using arg.downcast_ref::<crate::builtins::PyStr>() can't receive non-UTF-8 data
via arg_values/arg_types.

In `@crates/vm/src/stdlib/ctypes/function.rs`:
- Around line 756-759: The code currently calls s.to_str().expect(...) when
probing `_type_` (variable s) which can panic on non-UTF8 and abort the VM;
instead, change the conversion to handle invalid UTF-8 gracefully (e.g., use
s.to_str().ok() and return false if None) so the `_type_` probe fails closed;
update the branch that uses c (the resulting &str) — the check if c.len() == 1
&& "sPzUZXO".contains(c) — to only run when the to_str() succeeded and otherwise
return false from the probe function.
- Around line 283-289: The conversion in conv_param currently builds a Vec<u16>
via s.as_wtf8().encode_wide(), which hardcodes UTF-16 and breaks on platforms
where wchar_t is 32-bit; replace this with a platform-aware conversion that
produces bytes sized to wchar_t (use the existing str_to_wchar_bytes helper in
ctypes::base or add one that uses cty::wchar_t / cfg(target_family = "windows")
to pick u16 vs u32), and then assign wide_bytes from that helper instead of from
wide.iter().flat_map(...); update the code paths referencing wide_bytes in
conv_param to use the new helper so wide/terminator encoding matches the target
wchar_t width.

In `@crates/vm/src/stdlib/functools.rs`:
- Around line 426-431: Replace the panic-prone expect_str() call when merging
keyword keys: instead of calling key_str.expect_str().to_owned(), use the
fallible to_str() on the PyStr returned by
downcast_ref::<crate::builtins::PyStr>() and convert that Option<&str> into a
TypeError via vm.new_type_error if it is None; then insert the owned string
(e.g., key_str.to_str().ok_or_else(|| vm.new_type_error("keywords must be
strings"))?.to_owned()) into final_kwargs so surrogate-containing strings raise
a TypeError instead of panicking.

In `@crates/vm/src/stdlib/msvcrt.rs`:
- Around line 85-87: The calls to expect_str() in the wide-char path of putwch
and ungetwch can panic on surrogate-containing Python strings; replace the
expect_str() extraction with surrogate-safe iteration by using the Python
string's as_wtf8().code_points() iterator and consume those code points where
the current code expects chars().exactly_one(). Update the implementations of
putwch and ungetwch to iterate over as_wtf8().code_points(), map/convert the
WTF-8 code points into the u16/u32 values used for _putwch/_ungetwch, and remove
any expect_str() usage so no panics occur on invalid UTF-8/surrogates.

In `@crates/vm/src/stdlib/operator.rs`:
- Around line 583-586: The current conversion of name via try_into_value(vm)
collapses all errors into vm.new_type_error("method name must be a string");
update that mapping to preserve UTF-8 validation errors by inspecting the
original error returned from name.try_into_value(vm) and: if it's a
UTF-8/unicode validation error, propagate it (e.g., return
vm.new_unicode_encode_error(...) or forward the original unicode error),
otherwise convert non-unicode failures to vm.new_type_error("method name must be
a string"); change the closure used with map_err (or replace map_err with a
match on the Err) so try_into_value(vm) can return the distinct unicode error
instead of always creating a TypeError while still returning TypeError for
genuine non-string cases, leaving the rest of the constructor (Self { name, args
}) unchanged.

In `@crates/vm/src/stdlib/os.rs`:
- Around line 495-497: The use of PyStrRef::expect_str() in the Windows env APIs
(e.g., putenv and unsetenv) can panic on surrogate-containing input; replace
calls to expect_str() with a non-panicking conversion (e.g., using the PyStrRef
method that returns a Result/&str) and on error return an appropriate Python
exception via the VirtualMachine (e.g., vm.new_type_error or
vm.new_value_error/new_unicode_decode_error) instead of panicking so the
functions (putenv, unsetenv) return a PyResult::Err when given invalid/surrogate
input.

In `@crates/vm/src/vm/mod.rs`:
- Around line 809-812: The code extracts module key names using
key.downcast_ref::<PyUtf8Str>() which fails for non-UTF-8 (WTF-8) keys and falls
back to empty string; change the downcast to use PyStr instead (e.g.
key.downcast_ref::<PyStr>()) and obtain the string representation that preserves
WTF-8 content (use PyStr's appropriate to-owned/string conversion) so names from
sys.modules are recorded correctly instead of becoming empty via
unwrap_or_default(); update the expression that builds name and remove the
unsafe fallback to empty string.

In `@crates/wasm/src/browser_module.rs`:
- Line 67: Several call sites (notably the FetchResponseFormat::from_str(vm,
s.expect_str())? usage) still use expect_str(), which can panic on
surrogate-containing Python strings; replace these with non-panicking
conversions by using the safe to_str() and propagate the Python error (e.g.,
return Err or use ?), or where lossy conversion is acceptable use
to_string_lossy()/as_wtf8-like API. Update all occurrences referenced in the
comment (lines near FetchResponseFormat::from_str, and the other noted
expect_str() sites) to call the safe conversion on the Python string type and
handle the Result/Error instead of unwrapping, ensuring function signatures
propagate errors as needed.

In `@src/shell.rs`:
- Around line 145-146: The code in the REPL loop uses expect_str(), which can
panic on surrogate-containing sys.ps1/sys.ps2 values; change the prompt
extraction to a non-panicking, lossy conversion (e.g., replace expect_str() with
s.to_string_lossy().into_owned() or equivalent) and return an owned String (use
String::new() instead of "" in the Err arm) so the match on Ok(ref s) => ... and
Err(_) => ... produces the correct safe String without risking a panic.

In `@src/shell/helper.rs`:
- Around line 139-145: The code uses expect_str() which can panic on
surrogate-containing strings; change the mapping over completions to a fallible
conversion and skip invalid entries instead of panicking. Replace the map(|s|
s.expect_str().to_owned()) call with a filter_map that calls the non-panicking
conversion (e.g. s.as_wtf8().ok().map(|v| v.to_owned()) or another fallible API
you use elsewhere like as_bytes()/as_wtf8()), so invalid/surrogate-containing
completions are filtered out rather than causing a panic (update the iterator
from map to filter_map accordingly around the completions.into_iter()).

---

Nitpick comments:
In `@crates/common/src/str.rs`:
- Around line 261-267: Rename the method as_str to to_str to follow the repo's
"as" (zero-cost) vs "to" (small-cost) naming convention: update the function
signature (pub fn to_str(&self) -> Option<&str>) replacing all uses of as_str,
update any doc comments/tests/benchmarks referencing as_str, and remove the TODO
comment; if there are external usages you can't change in this PR, add a
deprecated shim pub fn as_str(&self) -> Option<&str> that forwards to to_str and
mark it with #[deprecated] so callers get migrated in follow-ups.

In `@crates/stdlib/src/_asyncio.rs`:
- Around line 936-944: The repr builder currently propagates string-conversion
errors via x.str(vm)? causing repr() to fail; make it consistent with
get_future_repr_info by silently skipping items that fail to convert: replace
the x.str(vm)? usage in the loop that builds result (the
parts.iter().enumerate() loop) with an if let Ok(s) = x.str(vm) { if i > 0 {
result.push_str(" "); } result.push_wtf8(s.as_wtf8()); } so failures are ignored
and repr returns Ok(result) as before.

In `@crates/stdlib/src/ssl.rs`:
- Around line 811-825: _unify hostname validation by routing all server_hostname
entry points through the existing validate_hostname helper (the same used by
set_server_hostname and _wrap_bio) instead of doing ad-hoc checks; specifically
update _wrap_socket handling and the WrapBioArgs/WrapSocketArgs server_hostname
parsing to accept the UTF-8 variant and call validate_hostname (including the
max-length rule) before using the hostname, ensuring you reuse the same
validation path and error reporting as validate_hostname so behavior is
consistent across set_server_hostname, _wrap_bio, and _wrap_socket.

In `@crates/vm/src/builtins/str.rs`:
- Around line 2120-2130: The use of unsafe unreachable_unchecked in
Py<PyUtf8Str>::as_str() is unnecessary and can cause UB in release builds if the
invariant is violated; replace the unwrap_or_else + debug_assert +
unreachable_unchecked pattern by a safe unwrap/expect on as_pystr().to_str()
(e.g., call to_str().expect("PyUtf8Str invariant violated")) so any invariant
break panics rather than invoking undefined behavior, keeping the error message
for debugging while removing unsafe code; locate the as_str method and change
the fallback from unreachable_unchecked to a safe expect/unwrap on the result of
as_pystr().to_str().

In `@crates/vm/src/utils.rs`:
- Around line 58-62: The code currently calls
parts_iter.next().transpose()?.expect(...) which will panic on an empty
iterator; replace the expect-driven panic with graceful handling: match or if
let on parts_iter.next().transpose()? and when it yields None return an
appropriate Err (using the function's Result/error type) with a clear message
instead of panicking, otherwise bind to first and continue to call
repr.push_wtf8(first.as_wtf8()); update any signatures if needed to propagate
the error rather than panicking.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94d8dde and 58caf31.

⛔ Files ignored due to path filters (1)

Lib/test/test_tstring.py is excluded by !Lib/**

📒 Files selected for processing (120)

crates/common/src/str.rs
crates/stdlib/src/_asyncio.rs
crates/stdlib/src/_sqlite3.rs
crates/stdlib/src/array.rs
crates/stdlib/src/binascii.rs
crates/stdlib/src/contextvars.rs
crates/stdlib/src/csv.rs
crates/stdlib/src/grp.rs
crates/stdlib/src/hashlib.rs
crates/stdlib/src/json.rs
crates/stdlib/src/locale.rs
crates/stdlib/src/lzma.rs
crates/stdlib/src/multiprocessing.rs
crates/stdlib/src/overlapped.rs
crates/stdlib/src/posixshmem.rs
crates/stdlib/src/pyexpat.rs
crates/stdlib/src/pystruct.rs
crates/stdlib/src/socket.rs
crates/stdlib/src/ssl.rs
crates/stdlib/src/unicodedata.rs
crates/stdlib/src/zlib.rs
crates/vm/src/buffer.rs
crates/vm/src/builtins/bool.rs
crates/vm/src/builtins/classmethod.rs
crates/vm/src/builtins/code.rs
crates/vm/src/builtins/complex.rs
crates/vm/src/builtins/dict.rs
crates/vm/src/builtins/float.rs
crates/vm/src/builtins/function.rs
crates/vm/src/builtins/function/jit.rs
crates/vm/src/builtins/genericalias.rs
crates/vm/src/builtins/int.rs
crates/vm/src/builtins/interpolation.rs
crates/vm/src/builtins/list.rs
crates/vm/src/builtins/mappingproxy.rs
crates/vm/src/builtins/memory.rs
crates/vm/src/builtins/mod.rs
crates/vm/src/builtins/module.rs
crates/vm/src/builtins/namespace.rs
crates/vm/src/builtins/object.rs
crates/vm/src/builtins/range.rs
crates/vm/src/builtins/set.rs
crates/vm/src/builtins/slice.rs
crates/vm/src/builtins/staticmethod.rs
crates/vm/src/builtins/str.rs
crates/vm/src/builtins/template.rs
crates/vm/src/builtins/tuple.rs
crates/vm/src/builtins/type.rs
crates/vm/src/builtins/weakproxy.rs
crates/vm/src/bytes_inner.rs
crates/vm/src/codecs.rs
crates/vm/src/coroutine.rs
crates/vm/src/dict_inner.rs
crates/vm/src/exception_group.rs
crates/vm/src/exceptions.rs
crates/vm/src/frame.rs
crates/vm/src/function/buffer.rs
crates/vm/src/function/fspath.rs
crates/vm/src/function/mod.rs
crates/vm/src/import.rs
crates/vm/src/intern.rs
crates/vm/src/macros.rs
crates/vm/src/protocol/object.rs
crates/vm/src/stdlib/_abc.rs
crates/vm/src/stdlib/_winapi.rs
crates/vm/src/stdlib/_wmi.rs
crates/vm/src/stdlib/ast.rs
crates/vm/src/stdlib/ast/basic.rs
crates/vm/src/stdlib/ast/python.rs
crates/vm/src/stdlib/ast/repr.rs
crates/vm/src/stdlib/builtins.rs
crates/vm/src/stdlib/codecs.rs
crates/vm/src/stdlib/collections.rs
crates/vm/src/stdlib/ctypes.rs
crates/vm/src/stdlib/ctypes/array.rs
crates/vm/src/stdlib/ctypes/base.rs
crates/vm/src/stdlib/ctypes/function.rs
crates/vm/src/stdlib/ctypes/pointer.rs
crates/vm/src/stdlib/ctypes/simple.rs
crates/vm/src/stdlib/ctypes/structure.rs
crates/vm/src/stdlib/ctypes/union.rs
crates/vm/src/stdlib/functools.rs
crates/vm/src/stdlib/imp.rs
crates/vm/src/stdlib/io.rs
crates/vm/src/stdlib/itertools.rs
crates/vm/src/stdlib/marshal.rs
crates/vm/src/stdlib/msvcrt.rs
crates/vm/src/stdlib/nt.rs
crates/vm/src/stdlib/operator.rs
crates/vm/src/stdlib/os.rs
crates/vm/src/stdlib/posix.rs
crates/vm/src/stdlib/signal.rs
crates/vm/src/stdlib/sre.rs
crates/vm/src/stdlib/symtable.rs
crates/vm/src/stdlib/sys.rs
crates/vm/src/stdlib/thread.rs
crates/vm/src/stdlib/time.rs
crates/vm/src/stdlib/typevar.rs
crates/vm/src/stdlib/typing.rs
crates/vm/src/stdlib/winreg.rs
crates/vm/src/stdlib/winsound.rs
crates/vm/src/suggestion.rs
crates/vm/src/types/slot.rs
crates/vm/src/types/structseq.rs
crates/vm/src/utils.rs
crates/vm/src/vm/context.rs
crates/vm/src/vm/interpreter.rs
crates/vm/src/vm/mod.rs
crates/vm/src/vm/vm_new.rs
crates/vm/src/warn.rs
crates/wasm/src/browser_module.rs
crates/wasm/src/convert.rs
crates/wasm/src/js_module.rs
crates/wasm/src/wasm_builtins.rs
crates/wtf8/src/concat.rs
crates/wtf8/src/lib.rs
example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
src/lib.rs
src/shell.rs
src/shell/helper.rs

✅ Files skipped from review due to trivial changes (1)

crates/wtf8/src/concat.rs

🚧 Files skipped from review as they are similar to previous changes (33)

crates/stdlib/src/lzma.rs
crates/vm/src/function/fspath.rs
crates/wasm/src/convert.rs
crates/stdlib/src/unicodedata.rs
crates/wasm/src/js_module.rs
crates/vm/src/stdlib/collections.rs
crates/vm/src/builtins/weakproxy.rs
crates/vm/src/builtins/range.rs
crates/vm/src/vm/context.rs
crates/vm/src/builtins/classmethod.rs
crates/vm/src/builtins/dict.rs
crates/vm/src/stdlib/winsound.rs
crates/vm/src/macros.rs
crates/vm/src/stdlib/ast/basic.rs
src/lib.rs
crates/vm/src/stdlib/ast.rs
crates/stdlib/src/posixshmem.rs
crates/vm/src/builtins/memory.rs
crates/stdlib/src/overlapped.rs
crates/vm/src/stdlib/ctypes/pointer.rs
crates/vm/src/types/structseq.rs
crates/stdlib/src/zlib.rs
crates/vm/src/builtins/mappingproxy.rs
crates/vm/src/function/buffer.rs
crates/vm/src/stdlib/ctypes/base.rs
example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
crates/vm/src/stdlib/sys.rs
crates/vm/src/builtins/module.rs
crates/vm/src/stdlib/_wmi.rs
crates/stdlib/src/_sqlite3.rs
crates/vm/src/builtins/staticmethod.rs
crates/vm/src/stdlib/typevar.rs
crates/vm/src/builtins/code.rs

coderabbitai · 2026-02-26T18:22:12Z

crates/stdlib/src/hashlib.rs

+            && let Some(name_str) = name.to_str()
+            && let Some(algo) = name_str.strip_prefix("openssl_")
        {
            return Ok(algo.to_owned());


⚠️ Potential issue | 🟡 Minor

Normalize __name__-derived algorithm names to lowercase as well.

The direct str branch lowercases names, but the __name__ branch preserves case. Mixed-case values like openssl_SHA256 will fail later even though equivalent lowercase input succeeds.

Suggested patch

- if let Ok(name_obj) = digestmod.get_attr("__name__", vm) + if let Ok(name_obj) = digestmod.get_attr("__name__", vm) && let Some(name) = name_obj.downcast_ref::<PyStr>() && let Some(name_str) = name.to_str() && let Some(algo) = name_str.strip_prefix("openssl_") { - return Ok(algo.to_owned()); + return Ok(algo.to_lowercase()); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/hashlib.rs` around lines 228 - 231, The __name__ path preserves case causing mixed-case algorithm names to slip through; update the branch that binds name_str and algo (the let Some(name_str) / let Some(algo) = name_str.strip_prefix("openssl_") branch) to normalize algo to lowercase before returning (e.g., replace returning algo.to_owned() with returning a lowercased version such as algo.to_ascii_lowercase()) so it matches the behavior of the direct str branch.

coderabbitai · 2026-02-26T18:22:13Z

crates/stdlib/src/socket.rs

+        let mut res = dns_lookup::getaddrinfo(Some(host_str), Some(&service), Some(hints))
+            .map_err(|e| convert_socket_error(vm, e, SocketError::GaiError))?
+            .filter_map(Result::ok);
        let mut ainfo = res.next().unwrap();


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # First, let's look at the getnameinfo function context echo "=== Checking getnameinfo function around lines 3024-3027 ===" sed -n '3010,3040p' crates/stdlib/src/socket.rs echo "" echo "=== Full function context starting from getnameinfo ===" rg -B 20 -A 30 'fn getnameinfo' crates/stdlib/src/socket.rs | head -80

Repository: RustPython/RustPython

Length of output: 3331

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check the entire getnameinfo function to understand the error handling flow echo "=== Searching for getnameinfo function definition ===" rg -n 'fn getnameinfo' crates/stdlib/src/socket.rs echo "" echo "=== Get line count for context ===" wc -l crates/stdlib/src/socket.rs

Repository: RustPython/RustPython

Length of output: 212

Avoid panic and silent error loss in getnameinfo addrinfo iteration.

At lines 3024-3027, .filter_map(Result::ok) silently discards iterator errors, and .unwrap() on res.next() will panic if no successful entry exists. The function's return type Result<(String, String), IoOrPyException> expects proper error handling, not panics.

🔧 Suggested fix

- let mut res = dns_lookup::getaddrinfo(Some(host_str), Some(&service), Some(hints)) - .map_err(|e| convert_socket_error(vm, e, SocketError::GaiError))? - .filter_map(Result::ok); - let mut ainfo = res.next().unwrap(); - if res.next().is_some() { + let mut res = dns_lookup::getaddrinfo(Some(host_str), Some(&service), Some(hints)) + .map_err(|e| convert_socket_error(vm, e, SocketError::GaiError))?; + let mut ainfo = res + .next() + .ok_or_else(|| vm.new_os_error("sockaddr resolved to no addresses".to_owned()))? + .map_err(IoOrPyException::from)?; + if res.next().transpose().map_err(IoOrPyException::from)?.is_some() { return Err(vm .new_os_error("sockaddr resolved to multiple addresses".to_owned()) .into()); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/socket.rs` around lines 3024 - 3027, The code currently uses .filter_map(Result::ok) and .next().unwrap() on the iterator returned by dns_lookup::getaddrinfo, which silently drops Errs and panics if there are no Ok entries; update the logic in the function that returns Result<(String, String), IoOrPyException> to explicitly handle iterator errors: iterate the dns_lookup::getaddrinfo(...) iterator (or collect into a Vec<Result<_,_>>), look for the first Ok value to use as ainfo, and if none exist return an appropriate IoOrPyException by converting the first/last Err via convert_socket_error(vm, err, SocketError::GaiError) instead of panicking; remove .filter_map(Result::ok) and .unwrap() and reference the iterator variable (res) and ainfo when implementing this error-aware selection.

youknowone force-pushed the to-str branch 5 times, most recently from 1155af3 to 43d129f Compare February 24, 2026 14:00

youknowone force-pushed the to-str branch 2 times, most recently from 53c851b to e77e62f Compare February 24, 2026 14:21

youknowone marked this pull request as ready for review February 24, 2026 14:58

youknowone force-pushed the to-str branch from e77e62f to 3c9a023 Compare February 24, 2026 15:10

youknowone requested review from ShaharNaveh and coolreader18 February 24, 2026 15:21

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

youknowone force-pushed the to-str branch from 3c9a023 to 29ede73 Compare February 25, 2026 08:26

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

ShaharNaveh reviewed Feb 25, 2026

View reviewed changes

ShaharNaveh approved these changes Feb 25, 2026

View reviewed changes

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

youknowone force-pushed the to-str branch from 94d8dde to 9853cc7 Compare February 26, 2026 17:18

youknowone force-pushed the to-str branch from 9853cc7 to 58caf31 Compare February 26, 2026 18:07

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

youknowone merged commit 93099e3 into RustPython:main Feb 26, 2026
13 of 14 checks passed

youknowone deleted the to-str branch February 26, 2026 19:20

-    /// Returns `&str`, panicking if the string contains surrogates.
+    /// Returns [`&str`].
+    ///
+    /// # Panic
+    /// If the string contains surrogates.

Conversation

youknowone commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

github-actions bot commented Feb 24, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ShaharNaveh Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

youknowone Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ShaharNaveh Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Library Dependencies

Uh oh!

ShaharNaveh left a comment

Choose a reason for hiding this comment

Uh oh!

youknowone commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

ShaharNaveh Feb 25, 2026 •

edited

Loading

github-actions bot commented Feb 25, 2026 •

edited

Loading