Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughLarge cross-crate migration to UTF‑8/WTF‑8 handling: adds Wtf8Concat/Wtf8Buf, PyUtf8Str (+interned), replaces many PyStr/PyStrRef/String usages with PyUtf8StrRef/Wtf8Buf, updates repr/format paths to repr_wtf8, and threads WTF‑8 through error/message, I/O, ctypes, codecs, and stdlib modules. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
1155af3 to
43d129f
Compare
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin to-str |
53c851b to
e77e62f
Compare
There was a problem hiding this comment.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (12)
crates/vm/src/builtins/classmethod.rs (1)
210-224:⚠️ Potential issue | 🟠 MajorSwitch to
repr_wtf8()implementation to properly preserve WTF-8 surrogates.The current
repr_str()implementation usesformat!with.as_wtf8()values (lines 214, 218), which invokesWtf8'sDisplaytrait. This silently replaces unpaired surrogates with the Unicode replacement character, defeating the purpose of WTF-8 support. Follow the pattern from other types likePyPartialin functools.rs: implementrepr_wtf8()returningWtf8Bufinstead, and usewrite!()orWtf8Concatto concatenate without lossy Display conversion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/classmethod.rs` around lines 210 - 224, The repr implementation in classmethod.rs currently uses .__qualname__().downcast_ref::<PyStr>().map(|n| n.as_wtf8()) and .__module__().downcast_ref::<PyStr>().map(|m| m.as_wtf8()) with format!, which loses WTF-8 surrogates; change the repr_str() style code to implement repr_wtf8() that returns a Wtf8Buf (follow the PyPartial pattern in functools.rs), fetch qualname and module as PyStr and call their repr_wtf8()/repr_wtf8-equivalents (or convert to Wtf8Buf) instead of as_wtf8(), and build the final representation using write! into the Wtf8Buf or use Wtf8Concat so you concatenate module, qualname, class.slot_name() and callable without going through Display and losing surrogate information.crates/vm/src/stdlib/_winapi.rs (1)
420-428:⚠️ Potential issue | 🟠 MajorAvoid byte‑index slicing of Unicode env keys.
k[1..]can panic when the first character is multi‑byte (valid for Unicode env var names). Use a char‑based check to keep this path panic‑free.Proposed fix
- if k.is_empty() || k[1..].contains('=') { + if k.is_empty() || k.chars().skip(1).any(|c| c == '=') { return Err(vm.new_value_error("illegal environment variable name")); }As per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/_winapi.rs` around lines 420 - 428, The code slices the Unicode string k with k[1..], which can panic on multi‑byte first characters; replace that byte-index slice with a char-based check such as k.is_empty() || k.chars().skip(1).any(|c| c == '=') (locate the branch that currently uses k[1..].contains('=') after PyStrRef::expect_str()) so the illegal-name check is unicode-safe and panic-free.crates/vm/src/builtins/code.rs (1)
547-563:⚠️ Potential issue | 🟠 MajorAvoid reintroducing surrogate panics in
repr_str.
Now thatsource_path,obj_name, andqualnameare interned from WTF-8,repr_strstill callsself.source_path().as_str()(Line 421), which can panic on surrogates. Please switch to a non-panicking rendering path (e.g.,to_str()with a fallback or formatting the WTF-8 value directly) to keep the#7196fix intact.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/code.rs` around lines 547 - 563, repr_str currently calls self.source_path().as_str() (and likely similar calls for obj_name/qualname) which can panic on surrogate-containing WTF-8 interned strings; update the repr_str implementation to use a non-panicking rendering path (e.g., call to_str() and use its Option fallback or format the underlying WTF-8 value directly with a lossless/fallback representation) for source_path, obj_name, and qualname so surrogate data won't cause a panic; modify the repr_str function in the CodeObject impl to use to_str() or a safe formatter and include a sensible fallback when to_str() returns None.crates/vm/src/stdlib/posix.rs (1)
1969-1987:⚠️ Potential issue | 🟡 MinorRemove the unnecessary inner block in
SysconfName::TryFromObject(lines 2374–2383).The block around
let name = s.as_str()can be inlined. The downcast toPyUtf8Stris intentional as part of the UTF-8 migration effort (see commit e77e62f), andas_str()will succeed on normal Python strings; any failure on strings with surrogates is expected and correctly reports "configuration names must be strings or integers".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/posix.rs` around lines 1969 - 1987, The downcast branch in PathconfName::try_from_object currently introduces an unnecessary inner block around retrieving the UTF-8 string; inline the call so you directly call s.as_str().parse::<PathconfVar>() instead of creating a separate inner block/variable scope. Update the obj.downcast::<PyUtf8Str>() branch in TryFromObject (for PathconfName::try_from_object) to map_err as before but immediately call s.as_str().parse::<PathconfVar>() and handle the parse error with vm.new_value_error("unrecognized configuration name"), returning the parsed i32 as Self(i).crates/stdlib/src/_sqlite3.rs (1)
721-728:⚠️ Potential issue | 🟠 MajorReplace
expect_str()with error-checked UTF-8 conversion to avoid panic on surrogate input.The
expect_str()call panics on surrogate-containing strings. Sinceregister_converteris exposed as a@pyfunctionwhere user input flows into thetypenameparameter as aPyStrRef, a user can trigger a crash by passing a type name with surrogates. Usetypename.try_into_utf8(vm)?.as_str().to_uppercase()instead to fail gracefully with a proper error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/_sqlite3.rs` around lines 721 - 728, The code in register_converter uses typename.expect_str() which can panic on surrogate-containing input; replace this with a fallible UTF-8 conversion that returns a Python error instead of panicking by calling typename.try_into_utf8(vm)? to obtain a Utf8Str (or String via as_str().to_uppercase()) and then pass the uppercased name to converters().set_item; update the assignment of name to use typename.try_into_utf8(vm)?.as_str().to_uppercase() (or equivalent) so invalid UTF-8 is handled gracefully.crates/vm/src/codecs.rs (1)
114-136:⚠️ Potential issue | 🟠 Major
get_incremental_encoder/get_incremental_decodershould acceptOption<PyUtf8StrRef>to match theencode/decodeAPI.The methods
encodeanddecodeacceptOption<PyUtf8StrRef>and internally convert it viainto_wtf8(), whileget_incremental_encoderandget_incremental_decoderacceptOption<PyStrRef>directly. For API consistency, the incremental methods should follow the same pattern and acceptOption<PyUtf8StrRef>, performing the conversion internally as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/codecs.rs` around lines 114 - 136, Change the signatures of get_incremental_encoder and get_incremental_decoder to accept Option<PyUtf8StrRef> (not Option<PyStrRef>), and when building args convert the Utf8 ref to a WTF-8 PyStr using its into_wtf8() before calling vm.call_method; e.g. match the errors with Some(e) => vec![e.into_wtf8().into()] and None => vec![] so the API and conversion match encode/decode (refer to get_incremental_encoder, get_incremental_decoder and the into_wtf8() helper).crates/vm/src/builtins/type.rs (1)
1968-1979:⚠️ Potential issue | 🟡 MinorPreserve surrogate
__qualname__in__repr__.
__qualname__can contain surrogates; downcasting toPyUtf8Strdrops those cases and falls back toname. UsingPyStr+as_wtf8()retains the original value.🔧 Surrogate-preserving repr
- Some(module) if module != "builtins" => { - let name = zelf.name(); - format!( - "<class '{}.{}'>", - module, - zelf.__qualname__(vm) - .downcast_ref::<PyUtf8Str>() - .map(|n| n.as_str()) - .unwrap_or_else(|| &name) - ) - } + Some(module) if module != "builtins" => { + let name = zelf.name(); + let qualname = zelf.__qualname__(vm); + if let Some(q) = qualname.downcast_ref::<PyStr>() { + format!("<class '{}.{}'>", module, q.as_wtf8()) + } else { + format!("<class '{}.{}'>", module, name) + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 1968 - 1979, The repr construction drops surrogate-containing __qualname__ by downcasting to PyUtf8Str; change the downcast to PyStr and use its as_wtf8() (surrogate-preserving) when formatting the class repr for zelf.__qualname__(vm) so that surrogate sequences are retained instead of falling back to name; adjust the map/unwrap_or_else flow around zelf.__qualname__(vm).downcast_ref::<PyUtf8Str>() to use .downcast_ref::<PyStr>().map(|n| n.as_wtf8()) and keep the existing fallback to name.crates/vm/src/builtins/namespace.rs (1)
141-173:⚠️ Potential issue | 🟠 MajorAvoid potential panic in
SimpleNamespacerepr: guard quote-stripping.
&key_str[1..key_str.len() - 1]assumes the key repr is always quoted and length ≥ 2. Users can mutate__dict__to include non-string keys (e.g.,ns.__dict__[1] = 2), whose repr may be"1"(len 1) and will panic here.Proposed safe guard (keep current behavior for normal string keys)
- let k = key.repr(vm)?; - let key_str = k.as_wtf8(); - // Strip quotes from key repr - result.push_wtf8(&key_str[1..key_str.len() - 1]); + let k = key.repr(vm)?; + let key_str = k.as_wtf8(); + // Strip quotes from key repr (only when it actually looks like a quoted string) + if key_str.len() >= 2 { + let b = key_str.as_bytes(); + let first = b[0]; + let last = b[b.len() - 1]; + if (first == b'\'' || first == b'"') && last == first { + result.push_wtf8(&key_str[1..key_str.len() - 1]); + } else { + result.push_wtf8(key_str); + } + } else { + result.push_wtf8(key_str); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/namespace.rs` around lines 141 - 173, The repr implementation for PyNamespace (fn repr_wtf8 in the Representable impl) currently slices key reprs with &key_str[1..key_str.len()-1], which can panic for short or unquoted reprs; update the loop that processes dict keys to first check that key_str.len() >= 2 and that key_str starts and ends with matching quote characters before stripping, and if those checks fail fall back to using the full key_str (or key.repr(vm)? as-is) so non-string or short reprs don’t cause a panic while preserving the current quoted-key behavior for normal string keys.crates/vm/src/stdlib/os.rs (1)
452-476:⚠️ Potential issue | 🟠 Major
expect_str()panics on surrogate-containing strings in Windowsputenv/unsetenv.Both functions call
key.expect_str()/value.expect_str()on user-providedPyStrRefarguments, butexpect_str()panics unconditionally when the string contains surrogates (via.expect("PyStr contains surrogates")). This violates the PR's goal of preventing unexpected panics on surrogate-containing input.Use
PyUtf8StrRefin the function signature to validate at binding time (raising a Python exception instead), or explicitly callensure_valid_utf8(vm)?before working with the string as UTF-8.Also applies to: lines 500–523 (
unsetenv)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/os.rs` around lines 452 - 476, The putenv and unsetenv functions currently call key.expect_str()/value.expect_str(), which can panic on surrogate-containing PyStrRef; change the bindings to accept validated UTF-8 types (use PyUtf8StrRef for the key and value parameters) or call ensure_valid_utf8(vm)? on each PyStrRef before converting to native strings so that invalid surrogates raise a Python exception instead of panicking; update both putenv and unsetenv to perform this validation (refer to the functions named putenv and unsetenv and the use of expect_str()/to_wide_with_nul()/check_env_var_len) so all surrogate-containing input is handled safely.crates/vm/src/stdlib/ast/python.rs (1)
129-165:⚠️ Potential issue | 🟠 MajorVerify
PyUtf8Strdowncasts may silently skip WTF-8 string keys, causing false keyword argument errors.
PyUtf8Strdowncasting requires strict UTF-8 validation (seePyPayload::validate_downcastable_from()andtry_downcast_from()incrates/vm/src/builtins/str.rs), which fails silently for Python strings containing unpaired surrogates (valid WTF-8 but not UTF-8).This creates inconsistent behavior:
- Line 155: Keys with surrogates are silently ignored by
if let Ok(key) = key.downcast::<PyUtf8Str>(), leaving them inexpectingset- Line 172: Non-UTF8 keys in
field_typesare skipped vialet Ok(key) = key.downcast::<PyUtf8Str>() else { continue; }- Line 227: Downcast failures get mapped to misleading "keywords must be strings" error (key is a valid string, just not UTF-8)
Result: Mismatched keys are not properly removed from validation sets, causing false "unexpected keyword argument" or "missing required argument" errors even when the argument was provided with a valid (but non-UTF8) string key.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ast/python.rs` around lines 129 - 165, The code incorrectly downcasts dict keys to PyUtf8Str (strict UTF-8) which silently skips valid WTF-8 Python strings and leaves keys in the expecting set; update the key handling in the loops using dict.items_vec() and args.kwargs to accept general PyStr (WTF-8) rather than PyUtf8Str so non-UTF8 keys are recognized and removed from expecting; specifically, replace downcast::<PyUtf8Str>() checks with downcast::<PyStr>() (or use the existing PyStrRef/try_downcast_from for PyStr) in the loops that iterate dict.items_vec() and args.kwargs and adjust the error path that reports "keywords must be strings" to only trigger when the object is truly non-string, keeping PyStr-typed keys even if not UTF-8.crates/vm/src/stdlib/collections.rs (1)
560-584:⚠️ Potential issue | 🟡 MinorFix empty deque repr when maxlen is set.
format!("{class_name}([{closing_part})")yieldsdeque([]], maxlen=...)(extra]) whenmaxlenis present. Please construct the empty-case string explicitly.🔧 Suggested fix
- if zelf.__len__() == 0 { - return Ok(vm.ctx.new_str(format!("{class_name}([{closing_part})"))); - } + if zelf.__len__() == 0 { + let s = match zelf.maxlen { + Some(maxlen) => format!("{class_name}([], maxlen={maxlen})"), + None => format!("{class_name}([])"), + }; + return Ok(vm.ctx.new_str(s)); + }Also applies to: 587-589
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/collections.rs` around lines 560 - 584, The empty-case repr for PyDeque in impl Representable::repr incorrectly composes the string using closing_part, producing an extra ']' when maxlen is set; change the empty branch in repr to build the two forms explicitly: if zelf.maxlen.is_some() return Ok(vm.ctx.new_str(format!("{class_name}([], maxlen={})", zelf.maxlen.unwrap()))), else return Ok(vm.ctx.new_str(format!("{class_name}([])"))); apply the same explicit construction for the other empty-case occurrence (the similar code at the later block referenced).crates/vm/src/stdlib/io.rs (1)
2904-2965:⚠️ Potential issue | 🔴 CriticalFix type mismatch:
errorsWtf8Buf vsStatelessIncremental*fields.
errorsis converted toWtf8Buf, but the fallback stateless encoder/decoder structs still storeOption<PyStrRef>, which will not type-check. Align the field types (or keeperrorsasPyUtf8StrRefand avoidinto_wtf8()here).🛠️ Suggested fix (align `errors` to Wtf8Buf)
struct StatelessIncrementalEncoder { encode: PyObjectRef, - errors: Option<PyStrRef>, + errors: Option<Wtf8Buf>, name: Option<PyStrRef>, } struct StatelessIncrementalDecoder { decode: PyObjectRef, - errors: Option<PyStrRef>, + errors: Option<Wtf8Buf>, }- if let Some(errors) = &self.errors { - args.push(errors.to_owned().into()); - } + if let Some(errors) = &self.errors { + args.push(vm.ctx.new_str(errors.clone()).into()); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/io.rs` around lines 2904 - 2965, The local variable errors is converted to a Wtf8Buf but the fallback structs StatelessIncrementalEncoder and StatelessIncrementalDecoder still expect Option<PyStrRef>, causing a type mismatch; update those structs to store Option<Wtf8Buf> (or change their error field types accordingly) and pass Some(errors.clone()) into their constructors in the get_incremental_encoder and get_incremental_decoder fallback branches (adjust the decoder branch where it currently uses Some(errors)). Update any related field uses and clones so types align (references: errors variable, StatelessIncrementalEncoder, StatelessIncrementalDecoder, codec.get_incremental_encoder, codec.get_incremental_decoder, and the encode/decode constructors like StatelessIncrementalEncoder { encode: ..., errors: Some(errors.clone()), ... }).
♻️ Duplicate comments (2)
crates/stdlib/src/_sqlite3.rs (2)
2916-2923: Same surrogate-safety concern asregister_converter.This
expect_str()usage has the same potential panic risk; please apply the same UTF‑8‑safe handling here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/_sqlite3.rs` around lines 2916 - 2923, The code currently calls isolation_level.expect_str().bytes(), which can panic on invalid UTF-8 the same way register_converter did; change the conversion to a UTF-8-safe form (e.g., use to_string_lossy() and then extend with its as_bytes()) so the string bytes are produced without panicking, e.g. replace isolation_level.expect_str().bytes() with isolation_level.expect_str().to_string_lossy().as_bytes() (and keep the surrounding s.extend(...) and self._exec(&s, vm) logic).
2192-2202: Same surrogate-safety concern asregister_converter.This
expect_str()usage has the same potential panic risk; please apply the same UTF‑8‑safe handling here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/_sqlite3.rs` around lines 2192 - 2202, The code uses expect_str() on both name and obj which can panic on invalid UTF-8; replace these with a UTF-8-safe conversion (use the PyStr->to_str / try-to-str API that returns Result<&str, _> or an Option) for the variables used in the comparison, handle the error by skipping/breaking the comparison instead of panicking, and then use the resulting &str to build a_iter and b_iter as before; locate the occurrences in the loop that iterates self.description (the block that uses description.iter().zip(0..), name, obj, a_iter, b_iter and returns self.data.getitem_by_index(vm, i)) and change the expect_str() calls to the safe conversion and appropriate error handling.
🧹 Nitpick comments (10)
crates/stdlib/src/binascii.rs (1)
851-853: Optional: tighten thenew_binascii_errorwrapper signature to match the migrated APINow that
vm.new_exception_msgacceptsInto<Wtf8Buf>, the wrapper can takeimpl Into<Wtf8Buf>(or justWtf8Buf) directly, which lets every call site drop the unnecessary.to_owned()allocation on string literals.♻️ Proposed refactor
-fn new_binascii_error(msg: String, vm: &VirtualMachine) -> PyBaseExceptionRef { - vm.new_exception_msg(decl::error_type(vm), msg.into()) +fn new_binascii_error(msg: impl Into<rustpython_vm::common::wtf8::Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef { + vm.new_exception_msg(decl::error_type(vm), msg.into()) }Call sites can then pass
&strdirectly:- return Err(super::new_binascii_error("Odd-length string".to_owned(), vm)); + return Err(super::new_binascii_error("Odd-length string", vm));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/binascii.rs` around lines 851 - 853, Change the signature of new_binascii_error to accept impl Into<Wtf8Buf> (or Wtf8Buf) instead of String and forward that into vm.new_exception_msg (which now accepts Into<Wtf8Buf>); update the function declaration new_binascii_error(msg: impl Into<Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef and call vm.new_exception_msg(decl::error_type(vm), msg) so call sites can pass &str without .to_owned().crates/vm/src/macros.rs (1)
191-199: Extract metavariable expansion outside theunsafeblock instead of suppressing the lint.
#[allow(clippy::macro_metavars_in_unsafe)]is a workaround for a lint whose intended fix is to bind the expression before theunsafeblock. Doing so keeps the safe-code/unsafe-code boundary explicit and removes the need for the suppression entirely:♻️ Proposed refactor
($as_ctx:expr, $name:ident) => {{ // Safety: All known identifiers are ascii strings. - #[allow(clippy::macro_metavars_in_unsafe)] - unsafe { - $crate::builtins::PyUtf8StrInterned::from_str_interned_unchecked( - $as_ctx.as_ref().names.$name - ) - } + let interned = $as_ctx.as_ref().names.$name; + unsafe { $crate::builtins::PyUtf8StrInterned::from_str_interned_unchecked(interned) } }};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/macros.rs` around lines 191 - 199, The macro currently silences clippy::macro_metavars_in_unsafe; instead bind the metavariable expansion outside the unsafe block: evaluate and store the expression $as_ctx.as_ref().names.$name into a local let (e.g. names_val) before entering unsafe, then call $crate::builtins::PyUtf8StrInterned::from_str_interned_unchecked(names_val) inside the unsafe block and remove the #[allow(clippy::macro_metavars_in_unsafe)] attribute; this keeps the safe/unsafe boundary explicit for the macro arm that uses $as_ctx and $name.crates/vm/src/stdlib/posix.rs (1)
2374-2383: Unnecessary inner block inSysconfName::TryFromObject.The
{ let name = s.as_str(); ... }scope is redundant —sis in scope through the entireErrarm and the lifetime of the borrow doesn't require restriction.♻️ Proposed simplification
- { - let name = s.as_str(); - name.parse::<SysconfVar>().or_else(|_| { - if name == "SC_PAGESIZE" { - Ok(SysconfVar::SC_PAGESIZE) - } else { - Err(vm.new_value_error("unrecognized configuration name")) - } - })? as i32 - } + let name = s.as_str(); + name.parse::<SysconfVar>().or_else(|_| { + if name == "SC_PAGESIZE" { + Ok(SysconfVar::SC_PAGESIZE) + } else { + Err(vm.new_value_error("unrecognized configuration name")) + } + })? as i32🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/posix.rs` around lines 2374 - 2383, In SysconfName::TryFromObject, remove the unnecessary inner block and local shadowing of name; directly call s.as_str().parse::<SysconfVar>().or_else(|_| { if s.as_str() == "SC_PAGESIZE" { Ok(SysconfVar::SC_PAGESIZE) } else { Err(vm.new_value_error("unrecognized configuration name")) } })? as i32 so the borrow of s lasts naturally without the extra scope and the code is simplified while keeping the same logic for SysconfVar and the SC_PAGESIZE special-case.crates/stdlib/src/unicodedata.rs (1)
128-139:to_str()chain is correct; minor note on the error-message Display for surrogate inputs.When
name.to_str()returnsNone(lone-surrogate string), the chain falls through to theKeyErrorat line 134 — which is the correct Python behavior since every valid Unicode character name is ASCII. No correctness concern there.The
let_chainssyntax (lines 128–129) is fully supported: the project's MSRV is 1.93.0, which exceeds the Rust 1.88.0 requirement wherelet_chainsstabilized (June 2025), and the project uses the 2024 edition where the feature is available.One small point: the error message on line 136 formats
namevia itsDisplayimpl:format!("undefined character name '{name}'")If
namecontains lone surrogates,DisplayforPyStrRefwill likely substituteU+FFFDreplacement characters, so the error message won't faithfully reproduce the raw surrogate codepoints. CPython would surface the exact code units. This is well within acceptable range for an improvement, but if tighter CPython parity is desired down the line, the error path could usename.as_wtf8()with an escaped representation instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/unicodedata.rs` around lines 128 - 139, The error message currently formats the PyStrRef `name` via its Display implementation (format!("undefined character name '{name}'")) which may replace lone surrogates with U+FFFD; modify the KeyError formatting in the branch that returns vm.new_key_error so it emits an escaped representation preserving raw code units (e.g., use name.as_wtf8() and produce a diagnostic-escaped string) instead of relying on Display; update the call site that builds the message passed into vm.ctx.new_str(...) so the message shows exact code units for surrogate-containing inputs while leaving the rest of the logic (name.to_str(), unicode_names2::character, self.check_age) unchanged.crates/stdlib/src/ssl.rs (1)
3913-3921: Consider aligning setter arg type withPyUtf8StrRef.This would make the setter consistent with the updated call sites and remove the extra
try_into_utf8step.♻️ Suggested adjustment
- fn set_server_hostname( - &self, - value: Option<PyStrRef>, - vm: &VirtualMachine, - ) -> PyResult<()> { + fn set_server_hostname( + &self, + value: Option<PyUtf8StrRef>, + vm: &VirtualMachine, + ) -> PyResult<()> { // Check if handshake is already done if *self.handshake_done.lock() { return Err( vm.new_value_error("Cannot set server_hostname on socket after handshake") ); } // Validate hostname let hostname_string = value .map(|s| { - let utf8 = s.try_into_utf8(vm)?; - validate_hostname(utf8.as_str(), vm)?; - Ok::<String, _>(utf8.as_str().to_owned()) + validate_hostname(s.as_str(), vm)?; + Ok::<String, _>(s.as_str().to_owned()) }) .transpose()?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl.rs` around lines 3913 - 3921, The setter currently accepts an Option<PyObjectRef> (or similar) and calls try_into_utf8 inside the closure; change the setter signature to accept Option<PyUtf8StrRef> (or Option<&PyUtf8StrRef> as appropriate) so callers can pass already-validated UTF-8 refs, then remove the try_into_utf8 call in the closure and use the PyUtf8StrRef's .as_str() directly for validate_hostname and when creating the owned String; update uses around server_hostname.write() assignment to take the Option<PyUtf8StrRef> and set hostname_string = value.map(|s| { validate_hostname(s.as_str(), vm)?; Ok::<String, _>(s.as_str().to_owned()) }).transpose()? accordingly.crates/stdlib/src/socket.rs (1)
2911-2916: Simplify by acceptingPyUtf8StrRefdirectly (likesethostname/inet_aton/inet_pton).
gethostbyaddr,gethostbyname, andgethostbyname_exstill takePyStrRefand explicitly call.try_into_utf8(vm).map_err(IoOrPyException::from)?. SinceIoOrPyException: From<PyBaseExceptionRef>, the#[pyfunction]macro can propagate aTryFromObjecterror asIoOrPyException::Pyautomatically — the same end result as the current explicit conversion.♻️ Proposed simplification (shown for `gethostbyaddr`; apply identically to `gethostbyname` and `gethostbyname_ex`)
#[pyfunction] fn gethostbyaddr( - addr: PyStrRef, + addr: PyUtf8StrRef, vm: &VirtualMachine, ) -> Result<(String, PyListRef, PyListRef), IoOrPyException> { - let addr = addr.try_into_utf8(vm).map_err(IoOrPyException::from)?; let addr = get_addr(vm, addr, c::AF_UNSPEC)?;Also applies to: 2928-2930, 2941-2943
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/socket.rs` around lines 2911 - 2916, Change the functions gethostbyaddr, gethostbyname, and gethostbyname_ex to accept PyUtf8StrRef (like sethostname/inet_aton/inet_pton) instead of PyStrRef and remove the explicit .try_into_utf8(vm).map_err(IoOrPyException::from)? conversions; let the #[pyfunction] macro propagate TryFromObject errors as IoOrPyException::Py automatically, then use the received PyUtf8StrRef directly in the existing get_addr calls (e.g., replace the local addr conversion in gethostbyaddr and the analogous conversions in gethostbyname and gethostbyname_ex).crates/vm/src/stdlib/ctypes/array.rs (1)
4-5: Run rustfmt & clippy before merge.Please confirm
cargo fmtandcargo clippywere run for these Rust changes.
As per coding guidelines: "Follow the default rustfmt code style by running cargo fmt to format Rust code" and "Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints that are introduced by your changes."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/array.rs` around lines 4 - 5, Run rustfmt and clippy for the VM crate and fix any formatting or lints introduced by these changes: run cargo fmt for crates/vm (ensure files like crates/vm/src/stdlib/ctypes/array.rs are formatted) and run cargo clippy --fix or cargo clippy to surface warnings; address all clippy suggestions and warnings (including any issues related to imports such as LazyLock and SaturatedSliceIter or unused/incorrect imports) before merging.crates/vm/src/stdlib/ctypes/union.rs (1)
620-624: Silent skip when field name isn't valid UTF-8 — verify this is intentional.If
downcast_ref::<PyUtf8Str>()fails (e.g., a field name contains surrogates), the positional argument is silently skipped without error. In practice,process_fields(line 220) already validates field names asPyUtf8Str, so this path shouldn't be reachable with malformed names. However, the structure.rs counterpart at line 751 has the identical pattern — so at least they're consistent.If you want extra safety, consider logging or returning an error when the downcast fails, rather than silently skipping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/union.rs` around lines 620 - 624, The code currently silently skips a field when tuple.first().downcast_ref::<PyUtf8Str>() fails (i.e., non-UTF-8 names); update the handling in the union field-processing block (the branch using PyTuple and downcast_ref::<PyUtf8Str>()) to treat a failed downcast as an explicit error instead of silently skipping: return or propagate a descriptive error (e.g., invalid field name encoding) from the surrounding function (see process_fields and the analogous handling in structure.rs) so callers get notified when a field name is not valid UTF-8.crates/vm/src/utils.rs (1)
1-2: Pre-merge hygiene: please run rustfmt + clippy for this refactor-heavy PR.These changes touch a lot of core codepaths; it’s worth ensuring no formatting drift and no new clippy warnings slipped in.
As per coding guidelines “Follow the default rustfmt code style by running
cargo fmt” and “Always run clippy (cargo clippy) before completing tasks and fix any warnings or lints that are introduced by your changes”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/utils.rs` around lines 1 - 2, This PR introduced formatting and potential lints in core code; run rustfmt and clippy and fix any resulting issues: run cargo fmt (or rustfmt) to apply default formatting to crates/vm/src/utils.rs and the repo, then run cargo clippy --all-targets --all-features -- -D warnings to surface lints and address them (fix unused imports like Wtf8/Wtf8Buf if unused, correct style issues, replace unwraps/expect where appropriate, and resolve any borrow/ownership or performance warnings reported for functions in utils.rs). Ensure the file symbols (Wtf8, Wtf8Buf and any functions in utils.rs) compile cleanly with no clippy warnings before merging.crates/vm/src/builtins/memory.rs (1)
808-830: Good:memoryview.cast()now requires UTF‑8-safe format strings (PyUtf8StrRef).This change removes the previous “may panic on surrogates” footgun by ensuring
cast_to_1d()can safely useformat.as_str()and still keepBufferDescriptor.formatin sync.If you want, I can help draft a small regression test that passes a surrogate-containing format and asserts an exception (not a panic).
Also applies to: 957-963
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/memory.rs` around lines 808 - 830, Change calls that accept raw strings to use PyUtf8StrRef and then use format.as_str() so BufferDescriptor.format stays UTF‑8 safe: in the cast_to_1d implementation replace any use of a raw format string with a PyUtf8StrRef parameter, call Self::parse_format(format.as_str(), vm) and set BufferDescriptor.format to format.as_str().to_owned().into(); ensure the same pattern (use PyUtf8StrRef, parse_format(format.as_str(), vm), and assign format via format.as_str().to_owned().into()) is applied to the other memoryview cast routine(s) that construct BufferDescriptor (the other function that currently calls parse_format/uses format.as_str()/builds dim_desc), keeping the rest of the struct fields (buffer.clone(), AtomicCell::new(false), OnceCell::new(), etc.) unchanged.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (120)
crates/common/src/str.rscrates/stdlib/src/_asyncio.rscrates/stdlib/src/_sqlite3.rscrates/stdlib/src/array.rscrates/stdlib/src/binascii.rscrates/stdlib/src/contextvars.rscrates/stdlib/src/csv.rscrates/stdlib/src/grp.rscrates/stdlib/src/hashlib.rscrates/stdlib/src/json.rscrates/stdlib/src/locale.rscrates/stdlib/src/lzma.rscrates/stdlib/src/multiprocessing.rscrates/stdlib/src/overlapped.rscrates/stdlib/src/posixshmem.rscrates/stdlib/src/pyexpat.rscrates/stdlib/src/pystruct.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/stdlib/src/unicodedata.rscrates/stdlib/src/zlib.rscrates/vm/src/buffer.rscrates/vm/src/builtins/bool.rscrates/vm/src/builtins/classmethod.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/complex.rscrates/vm/src/builtins/dict.rscrates/vm/src/builtins/float.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/function/jit.rscrates/vm/src/builtins/genericalias.rscrates/vm/src/builtins/int.rscrates/vm/src/builtins/interpolation.rscrates/vm/src/builtins/list.rscrates/vm/src/builtins/mappingproxy.rscrates/vm/src/builtins/memory.rscrates/vm/src/builtins/mod.rscrates/vm/src/builtins/module.rscrates/vm/src/builtins/namespace.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/range.rscrates/vm/src/builtins/set.rscrates/vm/src/builtins/slice.rscrates/vm/src/builtins/staticmethod.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/template.rscrates/vm/src/builtins/tuple.rscrates/vm/src/builtins/type.rscrates/vm/src/builtins/weakproxy.rscrates/vm/src/bytes_inner.rscrates/vm/src/codecs.rscrates/vm/src/coroutine.rscrates/vm/src/dict_inner.rscrates/vm/src/exception_group.rscrates/vm/src/exceptions.rscrates/vm/src/frame.rscrates/vm/src/function/buffer.rscrates/vm/src/function/fspath.rscrates/vm/src/function/mod.rscrates/vm/src/import.rscrates/vm/src/intern.rscrates/vm/src/macros.rscrates/vm/src/protocol/object.rscrates/vm/src/stdlib/_abc.rscrates/vm/src/stdlib/_winapi.rscrates/vm/src/stdlib/_wmi.rscrates/vm/src/stdlib/ast.rscrates/vm/src/stdlib/ast/basic.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/stdlib/ast/repr.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/stdlib/codecs.rscrates/vm/src/stdlib/collections.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/base.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/functools.rscrates/vm/src/stdlib/imp.rscrates/vm/src/stdlib/io.rscrates/vm/src/stdlib/itertools.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/stdlib/msvcrt.rscrates/vm/src/stdlib/nt.rscrates/vm/src/stdlib/operator.rscrates/vm/src/stdlib/os.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/signal.rscrates/vm/src/stdlib/sre.rscrates/vm/src/stdlib/symtable.rscrates/vm/src/stdlib/sys.rscrates/vm/src/stdlib/thread.rscrates/vm/src/stdlib/time.rscrates/vm/src/stdlib/typevar.rscrates/vm/src/stdlib/typing.rscrates/vm/src/stdlib/winreg.rscrates/vm/src/stdlib/winsound.rscrates/vm/src/suggestion.rscrates/vm/src/types/slot.rscrates/vm/src/types/structseq.rscrates/vm/src/utils.rscrates/vm/src/vm/context.rscrates/vm/src/vm/interpreter.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/vm_new.rscrates/vm/src/warn.rscrates/wasm/src/browser_module.rscrates/wasm/src/convert.rscrates/wasm/src/js_module.rscrates/wasm/src/wasm_builtins.rscrates/wtf8/src/concat.rscrates/wtf8/src/lib.rsexample_projects/wasm32_without_js/rustpython-without-js/src/lib.rssrc/lib.rssrc/shell.rssrc/shell/helper.rs
| let response_format = match response_format { | ||
| Some(s) => FetchResponseFormat::from_str(vm, s.as_str())?, | ||
| Some(s) => FetchResponseFormat::from_str(vm, s.expect_str())?, | ||
| None => FetchResponseFormat::Text, | ||
| }; | ||
|
|
||
| let opts = web_sys::RequestInit::new(); | ||
|
|
||
| match method { | ||
| Some(s) => opts.set_method(s.as_str()), | ||
| None => opts.set_method("GET"), | ||
| }; | ||
| opts.set_method(method.map_or("GET", |s| s.expect_str())); | ||
|
|
||
| if let Some(body) = body { | ||
| opts.set_body(&convert::py_to_js(vm, body)); | ||
| } | ||
|
|
||
| let request = web_sys::Request::new_with_str_and_init(url.as_str(), &opts) | ||
| let request = web_sys::Request::new_with_str_and_init(url.expect_str(), &opts) | ||
| .map_err(|err| convert::js_py_typeerror(vm, err))?; | ||
|
|
||
| if let Some(headers) = headers { | ||
| let h = request.headers(); | ||
| for (key, value) in headers { | ||
| let key = key.str(vm)?; | ||
| let value = value.str(vm)?; | ||
| h.set(key.as_str(), value.as_str()) | ||
| h.set(key.expect_str(), value.expect_str()) | ||
| .map_err(|err| convert::js_py_typeerror(vm, err))?; | ||
| } | ||
| } | ||
|
|
||
| if let Some(content_type) = content_type { | ||
| request | ||
| .headers() | ||
| .set("Content-Type", content_type.as_str()) | ||
| .set("Content-Type", content_type.expect_str()) | ||
| .map_err(|err| convert::js_py_typeerror(vm, err))?; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the definition of expect_str
rg -n "fn expect_str" crates/ -A 5 -B 2Repository: RustPython/RustPython
Length of output: 584
🏁 Script executed:
# Also find to_str for comparison
rg -n "fn to_str" crates/ -A 5 -B 2 | head -100Repository: RustPython/RustPython
Length of output: 5179
🏁 Script executed:
# Check the PyStr and PyUtf8Str trait implementations
rg -n "impl.*PyStr|impl.*PyUtf8Str" crates/ -A 10 | head -150Repository: RustPython/RustPython
Length of output: 10174
🏁 Script executed:
# Look at the actual code in browser_module.rs at the mentioned lines
cat -n crates/wasm/src/browser_module.rs | sed -n '60,100p'Repository: RustPython/RustPython
Length of output: 1714
🏁 Script executed:
# Also check lines 168-216 and 227-245
cat -n crates/wasm/src/browser_module.rs | sed -n '160,220p'Repository: RustPython/RustPython
Length of output: 2175
🏁 Script executed:
cat -n crates/wasm/src/browser_module.rs | sed -n '220,250p'Repository: RustPython/RustPython
Length of output: 1564
Use to_str() or to_string_lossy() instead of expect_str() to prevent surrogate panics.
All uses of expect_str() at these call sites will panic if the Python string contains surrogates, aborting the WASM runtime instead of raising a Python error. These are user-input-facing operations (URLs, headers, selectors, attributes, module paths) where panics break proper error handling. Replace with to_str() + error propagation or to_string_lossy() to handle surrogates safely.
Locations: line 67 (response_format), 73 (method), 79 (URL), 87 (headers), 95 (content-type), 171 (CSS selector), 206 (element attribute get), 215 (element attribute set), 227 (module path), 244 (module name).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm/src/browser_module.rs` around lines 66 - 96, Several call sites
(e.g., FetchResponseFormat::from_str(vm, s.expect_str()), opts.set_method(...
method.map_or(...)), url.expect_str(), header key/value handling via
key.str()/value.str(), and request.headers().set("Content-Type",
content_type.expect_str())) use expect_str() which can panic on
surrogate-containing Python strings; replace these with safe conversions (use
to_str() and propagate a Python/JS error or to_string_lossy() where lossy
fallback is acceptable) and ensure the error is converted to the existing error
path (e.g., map_err(|e| convert::js_py_typeerror(vm, e)) or return a VM error)
so WASM does not abort; apply the same change to the other mentioned locations
(CSS selector handling, element attribute get/set, module path/name) using the
same safe-to-string pattern and error propagation.
| move |_self: PyObjectRef, data: PyStrRef, vm: &VirtualMachine| -> PyResult<()> { | ||
| write_f(data.as_str(), vm) | ||
| write_f(data.expect_str(), vm) | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find definitions of expect_str() and to_str() methods
rg -n "fn\s+(expect_str|to_str)" --type rust -C 5Repository: RustPython/RustPython
Length of output: 7601
🏁 Script executed:
#!/bin/bash
# Search for PyStrRef definition and its methods
fd -e rs | xargs rg -l "PyStrRef" | head -20Repository: RustPython/RustPython
Length of output: 627
🏁 Script executed:
#!/bin/bash
# Look for expect_str implementation details
rg -n "expect_str" --type rust -C 3 -g "*.rs"Repository: RustPython/RustPython
Length of output: 14863
🏁 Script executed:
#!/bin/bash
# Check the PR context - look at the file being changed
head -50 crates/wasm/src/wasm_builtins.rs
tail -20 crates/wasm/src/wasm_builtins.rsRepository: RustPython/RustPython
Length of output: 2138
Use to_string_lossy() instead of expect_str() to handle surrogates gracefully (Line 36).
expect_str() explicitly panics on surrogate-containing strings, which conflicts with this PR's objective. For WASM console output, use to_string_lossy() to replace surrogates with U+FFFD instead:
Suggested fix
- write_f(data.expect_str(), vm)
+ write_f(&data.to_string_lossy(), vm)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/wasm/src/wasm_builtins.rs` around lines 35 - 37, The closure handling
WASM console output currently calls data.expect_str() which panics on
surrogate-containing strings; replace that with data.to_string_lossy() to
convert surrogates to U+FFFD and pass the resulting string (or &str from the
Cow) into write_f instead. Update the move |_self: PyObjectRef, data: PyStrRef,
vm: &VirtualMachine| -> PyResult<()> { write_f(data.expect_str(), vm) } closure
to call data.to_string_lossy() and forward the lossy string to write_f so
surrogate-containing input is handled gracefully.
| Ok(ref s) => s.expect_str(), | ||
| Err(_) => "", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual code at lines 145-146 in src/shell.rs
head -150 src/shell.rs | tail -20Repository: RustPython/RustPython
Length of output: 616
🏁 Script executed:
# Find the definition of expect_str() to understand its actual behavior
rg -n "fn\s+expect_str|def\s+expect_str" --type rust -A 5Repository: RustPython/RustPython
Length of output: 482
🏁 Script executed:
# Also search for to_str() definition to compare
rg -n "fn\s+to_str" --type rust -A 3 | head -50Repository: RustPython/RustPython
Length of output: 2982
🏁 Script executed:
# Check what type 's' is in this context by looking at more surrounding code
sed -n '130,160p' src/shell.rsRepository: RustPython/RustPython
Length of output: 970
Panic reintroduced via expect_str() on user-controlled prompt
expect_str() calls .expect("PyStr contains surrogates") and will panic if the PyStr contains Unicode surrogates. Since sys.ps1 and sys.ps2 are user-controllable, this reintroduces the panic scenario the PR aims to eliminate. Use to_str().unwrap_or("") or to_string_lossy() instead.
Safer prompt extraction
- let prompt = match prompt {
- Ok(ref s) => s.expect_str(),
- Err(_) => "",
- };
+ let prompt = match prompt {
+ Ok(ref s) => s.to_str().unwrap_or(""),
+ Err(_) => "",
+ };📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Ok(ref s) => s.expect_str(), | |
| Err(_) => "", | |
| let prompt = match prompt { | |
| Ok(ref s) => s.to_str().unwrap_or(""), | |
| Err(_) => "", | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/shell.rs` around lines 145 - 146, The match arm uses PyStr::expect_str()
(Ok(ref s) => s.expect_str()), which can panic on user-controlled
sys.ps1/sys.ps2 containing Unicode surrogates; replace that call with a
non-panicking conversion such as s.to_str().unwrap_or("") or
s.to_string_lossy().as_ref() so the prompt extraction never panics—locate the
expect_str() usage in src/shell.rs and swap it for one of these safe
conversions.
| Some(( | ||
| startpos, | ||
| completions | ||
| .into_iter() | ||
| .map(|s| s.as_str().to_owned()) | ||
| .map(|s| s.expect_str().to_owned()) | ||
| .collect(), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the repository structure and locate relevant files
git ls-files | grep -E "\.(rs|toml)$" | head -20Repository: RustPython/RustPython
Length of output: 1359
🏁 Script executed:
# Find PyStrRef and its expect_str implementation
rg "fn expect_str" -g '*.rs' -A 5Repository: RustPython/RustPython
Length of output: 458
🏁 Script executed:
# Look at the complete_opt function in the file being reviewed
rg -n "fn complete_opt" -g '*.rs' -B 2 -A 50Repository: RustPython/RustPython
Length of output: 3145
🏁 Script executed:
# Search for PyStrRef definition
rg "struct PyStrRef|type PyStrRef" -g '*.rs' -B 2 -A 10Repository: RustPython/RustPython
Length of output: 741
🏁 Script executed:
# Check what to_str() does on PyStrRef (if it exists)
rg "fn to_str\|to_str()" -g '*.rs' -B 2 -A 5 | head -100Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Search for to_str method on PyStr
rg "fn to_str" crates/vm/src/builtins/str.rs -B 2 -A 5Repository: RustPython/RustPython
Length of output: 419
🏁 Script executed:
# Look for all methods available on PyStr to understand the API
rg "impl.*PyStr|fn (as_|to_)" crates/vm/src/builtins/str.rs -B 1 -A 3 | head -150Repository: RustPython/RustPython
Length of output: 3118
🏁 Script executed:
# Check the actual type at line 113 - what does get_available_completions return
rg -n "get_available_completions" src/shell/helper.rs -B 5 -A 15Repository: RustPython/RustPython
Length of output: 1896
Avoid panic from expect_str() in completion mapping.
Line 143 calls expect_str() on completions that may contain surrogates, causing a panic with "PyStr contains surrogates." This contradicts the surrounding code's use of as_wtf8() for surrogate-safe sorting. Use a fallible conversion instead:
Suggested fix
- Some((
- startpos,
- completions
- .into_iter()
- .map(|s| s.expect_str().to_owned())
- .collect(),
- ))
+ Some((
+ startpos,
+ completions
+ .into_iter()
+ .filter_map(|s| s.to_str().map(str::to_owned))
+ .collect(),
+ ))This skips invalid UTF-8 entries, which is appropriate for shell completions.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Some(( | |
| startpos, | |
| completions | |
| .into_iter() | |
| .map(|s| s.as_str().to_owned()) | |
| .map(|s| s.expect_str().to_owned()) | |
| .collect(), | |
| Some(( | |
| startpos, | |
| completions | |
| .into_iter() | |
| .filter_map(|s| s.to_str().map(str::to_owned)) | |
| .collect(), | |
| )) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/shell/helper.rs` around lines 139 - 144, The current map calls
expect_str() on completion items which can panic ("PyStr contains surrogates");
instead make the conversion fallible and skip invalid entries: locate the
completion mapping where completions.into_iter().map(|s|
s.expect_str().to_owned()) is used and replace it with a filter_map that
attempts a surrogate-safe conversion (use the same as_wtf8()/fallible API
already used elsewhere) and .map(|s| s.to_owned()) only on success, so invalid
UTF-8/composite entries are skipped rather than panicking.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
crates/vm/src/stdlib/nt.rs (1)
1230-1256:⚠️ Potential issue | 🟠 MajorReplace
expect_str()withto_str()for proper error handling to prevent panic on surrogate characters.
expect_str()panics if the string contains surrogates, and since PyStr supports WTF-8, surrogates are possible in user input. The argv and env conversions (lines 1232, 1255–1256) should handle this gracefully instead of panicking.Use
to_str().ok_or_else(...)to convertNoneto a proper error, following the pattern established throughout the codebase:Corrected fix
let argv = vm.extract_elements_with(argv.as_ref(), |obj| { let arg = PyStrRef::try_from_object(vm, obj)?; - make_widestring(arg.expect_str()) + let arg_str = arg.to_str() + .ok_or_else(|| vm.new_value_error("string contains surrogates"))?; + make_widestring(arg_str) })?;- let key_str = key.expect_str(); - let value_str = value.expect_str(); + let key_str = key.to_str() + .ok_or_else(|| vm.new_value_error("string contains surrogates"))?; + let value_str = value.to_str() + .ok_or_else(|| vm.new_value_error("string contains surrogates"))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/nt.rs` around lines 1230 - 1256, The code currently calls expect_str() on PyStrRef values when building argv and env (in the argv extraction map closure and in the env loop handling key/value), which can panic on surrogate-containing WTF-8 input; replace those expect_str() uses with to_str().ok_or_else(...) and return a VM error instead of panicking: for the argv mapping (where PyStrRef::try_from_object is used and the first element is checked) use arg.to_str().ok_or_else(|| vm.new_value_error("execve: argv first element contains invalid surrogates")) (or a similar descriptive vm error) and for the env loop (key and value) replace key.expect_str() and value.expect_str() with key.to_str().ok_or_else(|| vm.new_value_error("execve: env contains invalid surrogates")) and value.to_str().ok_or_else(...); keep the rest of the argv_execve/env_strings construction unchanged.crates/vm/src/stdlib/ast/python.rs (1)
129-231:⚠️ Potential issue | 🟠 MajorFix type inconsistency: use
Vec<PyUtf8StrRef>when buildingpayloadin__replace__.Lines 102 and 110 cast
fieldsandattributestoVec<PyStrRef>, but earlier in the same function (lines 131-138) they are cast toVec<PyUtf8StrRef>. Keys added viafield.as_object()andattr.as_object()at lines 105 and 113 will bePyStrobjects. However, line 128 attempts to downcast all payload keys toPyUtf8Str, which will fail if those keys are non-UTF8PyStrobjects, triggering the "keywords must be strings" error.Change lines 102 and 110 to use
Vec<PyUtf8StrRef>to ensure payload keys are consistently UTF-8 strings.Suggested fix
if let Some(fields) = fields.clone() { - let fields: Vec<PyStrRef> = fields.try_to_value(vm)?; + let fields: Vec<PyUtf8StrRef> = fields.try_to_value(vm)?; for field in fields { if let Some(value) = dict.get_item_opt::<Wtf8>(field.as_wtf8(), vm)? { payload.set_item(field.as_object(), value, vm)?; } } } if let Some(attributes) = attributes.clone() { - let attributes: Vec<PyStrRef> = attributes.try_to_value(vm)?; + let attributes: Vec<PyUtf8StrRef> = attributes.try_to_value(vm)?; for attr in attributes { if let Some(value) = dict.get_item_opt::<Wtf8>(attr.as_wtf8(), vm)? { payload.set_item(attr.as_object(), value, vm)?; } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ast/python.rs` around lines 129 - 231, The payload construction in __replace__ uses Vec<PyStrRef> for fields and attributes but earlier the code uses Vec<PyUtf8StrRef>, causing a mismatch when payload keys are later downcast to PyUtf8Str; change the casts that build payload (the conversions that create fields and attributes before calling dict.get_item_opt and payload.set_item) from Vec<PyStrRef> to Vec<PyUtf8StrRef> so keys inserted via field.as_object()/attr.as_object() are guaranteed UTF‑8 and will downcast successfully to PyUtf8Str in the final kwargs collection.crates/vm/src/frame.rs (1)
2635-2660:⚠️ Potential issue | 🟠 MajorUse safe error handling for PyUtf8Str keyword name downcast to avoid panics on malformed bytecode.
collect_keyword_argsusesdowncast_ref::<PyUtf8Str>().unwrap()on kwarg names, but the tuple containsPyStrobjects created byctx.new_str(), which can contain WTF-8 surrogates and are not guaranteed to be valid UTF-8. This will panic if keyword argument strings contain invalid UTF-8. In contrast,collect_ex_argscorrectly handles this withok_or_else()returning a proper error. Apply consistent error handling using either a safe conversion liketry_as_utf8()or return aPyResultto propagate errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 2635 - 2660, The current kwarg name mapping in the function that builds FuncArgs (the block that maps kwarg_names.as_slice().iter() and calls downcast_ref::<PyUtf8Str>().unwrap()) can panic on invalid UTF-8; change it to propagate a PyResult error like collect_ex_args does: perform a safe downcast (use downcast_ref::<PyUtf8Str>().ok_or_else(|| vm.new_type_error("keywords must be strings"))? or use try_as_utf8()/equivalent) and return a PyResult<FuncArgs> on failure, then call FuncArgs::with_kwargs_names only after successful conversion so malformed keyword names produce a Python TypeError instead of panicking. Ensure you reference the same symbols: kwarg_names, PyUtf8Str, FuncArgs::with_kwargs_names and the surrounding function signature to adjust the return type if needed.crates/stdlib/src/ssl.rs (1)
2019-2039:⚠️ Potential issue | 🟠 MajorAccept
password=Noneto match CPython behavior.
OptionalArg::Present(None)currently raises aTypeError, but Python allowspassword=None. This breaks compatibility and makesload_cert_chain(password=None)fail.🐛 Suggested fix
match password { OptionalArg::Present(p) => { + if vm.is_none(p) { + return Ok((None, None)); + } // Try string first if let Ok(pwd_str) = PyUtf8StrRef::try_from_object(vm, p.clone()) { Ok((Some(pwd_str.as_str().to_owned()), None)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl.rs` around lines 2019 - 2039, The match on password currently treats OptionalArg::Present(None) as invalid; change the handling in the password match (in the load_cert_chain/password parsing block) to accept OptionalArg::Present(None) and return Ok((None, None)) (i.e., treat explicit None as no password), while preserving existing branches for Present(Some(obj)) that try string/bytes/callable and for OptionalArg::Missing; update the match arm that inspects OptionalArg::Present to first check for p.is_none() (or match OptionalArg::Present(None)) and handle it accordingly so load_cert_chain/password behavior matches CPython.
♻️ Duplicate comments (17)
crates/vm/src/stdlib/msvcrt.rs (2)
83-90:⚠️ Potential issue | 🟠 MajorAvoid
expect_str()to prevent surrogate-triggered panics inputwch.
PyStrRefcan contain surrogates;expect_str()can panic and crash the VM. Use WTF‑8 code points instead.✅ Suggested surrogate-safe change
- let c = s - .expect_str() - .chars() - .exactly_one() - .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?; - unsafe { suppress_iph!(_putwch(c as u16)) }; + let c = s + .as_wtf8() + .code_points() + .exactly_one() + .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?; + unsafe { suppress_iph!(_putwch(c.to_u32() as u16)) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/msvcrt.rs` around lines 83 - 90, The putwch function uses PyStrRef::expect_str(), which can panic on surrogate-containing strings and crash the VM; update putwch to extract the string via a surrogate-safe WTF‑8/UTF‑8 accessor (do not call expect_str()), iterate or decode WTF‑8 code points to ensure exactly one Unicode scalar or code unit as intended, and map errors to vm.new_type_error("putch() argument must be a string of length 1") instead of panicking; locate the logic in putwch and replace the expect_str()/chars().exactly_one() path with a surrogate-safe WTF‑8 check/iterator that returns a controlled PyResult error when the length is not one, then call _putwch with the correct u16 value.
108-114:⚠️ Potential issue | 🟠 MajorAvoid
expect_str()to prevent surrogate-triggered panics inungetwch.Same issue as
putwch:expect_str()can panic on surrogate-containing strings.✅ Suggested surrogate-safe change
- let c = - s.expect_str().chars().exactly_one().map_err(|_| { - vm.new_type_error("ungetwch() argument must be a string of length 1") - })?; - let ret = unsafe { suppress_iph!(_ungetwch(c as u32)) }; + let c = s + .as_wtf8() + .code_points() + .exactly_one() + .map_err(|_| vm.new_type_error("ungetwch() argument must be a string of length 1"))?; + let ret = unsafe { suppress_iph!(_ungetwch(c.to_u32())) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/msvcrt.rs` around lines 108 - 114, In ungetwch, avoid calling s.expect_str() (which can panic on surrogate-containing strings); instead obtain the Python string without expect and iterate its characters safely (as in the putwch fix) so you call .chars().exactly_one().map_err(...) on the surrogate-safe string view, then pass the resulting code point to the unsafe suppress_iph!(_ungetwch(c as u32)) call; update the code around ungetwch and the character-extraction logic to mirror the surrogate-safe approach used for putwch.crates/vm/src/stdlib/_wmi.rs (1)
563-565:⚠️ Potential issue | 🔴 Critical
expect_str()still panics on surrogate-containing strings — PR objective unmetReplacing
as_str()withexpect_str()is a rename-only change;expect_str()still calls.expect(…)and will panic on anyPyStrthat holds lone surrogates. This file needs to complete the migration toto_str()with proper error propagation, which is exactly what the rest of the PR does elsewhere.🐛 Proposed fix
- let query_str = query.expect_str(); + let query_str = query.to_str().map_err(|e| vm.new_value_error(e.to_string()))?;
query_stris only used for the ASCII prefix check (lines 566-569) and thewide_str(query_str)call (line 573), both of which accept&str, so no further changes are needed downstream.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/_wmi.rs` around lines 563 - 565, The code still calls expect_str() which can panic on surrogate-containing PyStr; change exec_query to call query.to_str(vm)? (propagating the PyErr) and bind query_str as &str for the ASCII prefix check and wide_str(query_str) call so errors are returned instead of panicking; update the usage of query_str accordingly in exec_query.crates/wasm/src/browser_module.rs (1)
66-68:⚠️ Potential issue | 🔴 CriticalReplace
expect_str()with fallible conversion to avoid WASM panics.
expect_str()will panic on surrogate-containing Python strings; these inputs are user-controlled (URLs, headers, selectors, module names) and will abort the WASM runtime instead of raising a Python error. Please switch toto_str()(propagate a Python error) orto_string_lossy()where a lossy fallback is acceptable, and map errors through the existingconvert::js_py_typeerror/py_err_to_js_errpaths.#!/bin/bash # Inspect PyStrRef conversion APIs and error types to wire correct fallible conversion paths. rg -n "fn expect_str|fn to_str|fn to_string_lossy" crates -g '*.rs' -C3 rg -n "impl .*PyStrRef|impl .*PyStr" crates -g '*.rs' -C3Also applies to: 73-73, 79-80, 87-88, 95-96, 171-172, 206-208, 215-216, 227-228, 244-245
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/wasm/src/browser_module.rs` around lines 66 - 68, Replace all uses of expect_str() that can panic (e.g., in the response_format match where FetchResponseFormat::from_str(vm, s.expect_str()) is called) with a fallible conversion and propagate a Python error: call s.to_str() (or s.to_string_lossy() if a lossy fallback is acceptable) and map the Err through the existing conversion helpers (convert::js_py_typeerror or py_err_to_js_err) before passing the resulting &str/String into FetchResponseFormat::from_str and other callers; apply the same change to the other occurrences flagged (headers, url, selector, module-name conversions) so no user-controlled PyStrRef can trigger a WASM panic.crates/vm/src/stdlib/functools.rs (1)
480-492: Lossyformat!on&Wtf8reference still present — surrogates replaced viaDisplay.The
format!("{module_name}.{qualname_str}")at line 486 (and subsequentlyWtf8Buf::from(format!("{qualified_name}("))at line 492) triggers theDisplayimplementation forWtf8, which substitutes unpaired surrogates with U+FFFD. This was flagged in a previous review and the proposed fix (build the prefix directly withWtf8Buf::push_wtf8/push_str) was not applied.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/functools.rs` around lines 480 - 492, The code uses format! on Wtf8 values (format!("{module_name}.{qualname_str}") and format!("{qualified_name}(")) which invokes Display and loses unpaired surrogates; instead construct the strings with Wtf8-aware APIs: avoid format! and build a Wtf8Buf directly (use Wtf8Buf::new / Wtf8Buf::from and call push_wtf8 or push_str) to append module_name, a literal ".", qualname_str, and finally "(" so surrogates are preserved; update the qualified_name creation and the result initialization (the variables/match arm around module.downcast::<crate::builtins::PyStr>(), module_name, qualname_str, qualified_name, and result) to use these push_* methods rather than format!.crates/vm/src/intern.rs (1)
160-168:as_str()can still panic on interned strings containing surrogates.
StringPool::internacceptsWtf8Bufand&Wtf8(lines 292–305), so interned strings aren't guaranteed to be valid UTF-8. Theunwrap_or_else(|| panic!(...))reintroduces exactly the kind of panic this PR aims to eliminate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/intern.rs` around lines 160 - 168, The current as_str() on PyInterned<PyStr> calls self.inner.to_str().unwrap_or_else(...) which can still panic for WTF-8 data with surrogates; replace this panic-prone API by returning a non-panicking string view (e.g., change as_str() to return Cow<'_, str> and use self.inner.to_string_lossy() instead of to_str().unwrap_or_else(...)), or alternatively add a new as_wtf8() that returns &Wtf8 and keep a lossy as_str_lossy() that returns Cow<'_, str>; update uses of as_str accordingly to avoid any unwrap/panic on invalid UTF-8.crates/vm/src/stdlib/_winapi.rs (3)
298-302:⚠️ Potential issue | 🔴 Critical
expect_str()panics on surrogate-containing strings — contradicts PR objective.The
wstrclosure callss.expect_str()which panics when thePyStrcontains surrogates. On Windows, file paths and command lines can legitimately contain unpaired surrogates (e.g., filenames created via UTF-16 APIs). This reintroduces the exact class of panics the PR aims to eliminate.Consider using
s.as_wtf8().to_wide_with_nul()directly (bypassing theWideCString::from_strpath) since the Windows APIs accept UTF-16 including surrogates, or uses.to_str()with proper error propagation.Suggested approach using WTF-8 → wide conversion
let wstr = |s: PyStrRef| { - let ws = widestring::WideCString::from_str(s.expect_str()) - .map_err(|err| err.to_pyexception(vm))?; - Ok(ws.into_vec_with_nul()) + Ok(s.as_wtf8().to_wide_with_nul()) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/_winapi.rs` around lines 298 - 302, The wstr closure currently calls s.expect_str(), which can panic on PyStrRef values containing surrogate code points; replace that path with a surrogate-safe conversion by using s.as_wtf8().to_wide_with_nul() (or an equivalent WTF-8 → UTF-16 conversion) and avoid WideCString::from_str/from_str error branches that assume valid UTF-8; update the closure (wstr) to take PyStrRef, call as_wtf8(), convert to a wide Vec<u16> including NUL with to_wide_with_nul(), and propagate any conversion errors via map_err to the vm rather than using expect_str().
399-400:⚠️ Potential issue | 🔴 Critical
expect_str()inCreateJunctionwill panic on surrogate paths.Same issue as in the
wstrclosure. Windows junction paths can contain surrogates. Useto_str()with error handling or convert via WTF-8 → wide string.Suggested fix using to_str()
- let src_path = std::path::Path::new(src_path.expect_str()); - let dest_path = std::path::Path::new(dest_path.expect_str()); + let src_str = src_path.to_str().ok_or_else(|| { + vm.new_value_error("src_path contains surrogates") + })?; + let dest_str = dest_path.to_str().ok_or_else(|| { + vm.new_value_error("dest_path contains surrogates") + })?; + let src_path = std::path::Path::new(src_str); + let dest_path = std::path::Path::new(dest_str);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/_winapi.rs` around lines 399 - 400, The CreateJunction implementation currently calls expect_str() when building src_path and dest_path (and similarly uses the wstr closure), which will panic on Windows paths containing UTF-16 surrogates; change these to avoid expect_str(): call to_str() and handle the None case (return a proper error) or convert the OsStr via a WTF-8 → wide-string path conversion before passing to Windows APIs, updating the CreateJunction function and the wstr helper to accept and handle surrogate-containing paths safely instead of unwrapping.
419-430:⚠️ Potential issue | 🔴 Critical
expect_str()ingetenvironmentwill panic on surrogate env vars.Windows environment variables can contain unpaired surrogates. Both
k.expect_str()andv.expect_str()at lines 421 and 423 will panic in that case.Additionally, line 424 uses
k.contains('\0')andv.contains('\0')on&str— this is correct only ifexpect_str()succeeds, but the code will panic before reaching the null check if surrogates are present.Consider converting to wide strings via WTF-8 directly rather than going through
&str.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/_winapi.rs` around lines 419 - 430, The code in getenvironment uses PyStrRef::expect_str() for k and v which will panic on unpaired surrogates; replace the expect_str() calls and subsequent &str operations with a WTF-8/raw wide-string path (e.g. obtain the WTF-8/wide representation from the PyStrRef returned by PyStrRef::try_from_object instead of calling expect_str()), perform the null byte check against the raw/wtf8 bytes, validate the name emptiness and '=' rule using the WTF-8-aware view (checking the first scalar/code-unit appropriately rather than slicing &str), and then perform the uppercase/normalization step on a Unicode-safe representation (or normalize to wide/uppercase after conversion) so no unpaired-surrogate panics can occur; update references to expect_str(), the temporary k/v &str usage, and key_upper creation to use the new WTF-8/wide-string handling.crates/vm/src/types/slot.rs (1)
1715-1724:⚠️ Potential issue | 🟠 MajorBreak the default repr recursion to avoid stack overflow.
repr_wtf8andrepr_strstill call each other when neither is overridden.Suggested fix (one example)
- fn repr_str(zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<String> { - Self::repr_wtf8(zelf, vm).map(|w| w.to_string()) - } + fn repr_str(_zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<String> { + Err(vm.new_type_error("repr_str or repr_wtf8 must be implemented")) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/types/slot.rs` around lines 1715 - 1724, The two methods repr_wtf8 and repr_str mutually call each other when neither is overridden, causing infinite recursion; break this by giving repr_wtf8 a non-recursive default implementation that builds a fallback representation (e.g., using the object's class/type name and address/ID) instead of calling repr_str; keep repr_str converting the Wtf8Buf to String, and ensure repr (which currently calls repr_wtf8) still uses that fallback when no override exists so the recursion is resolved (update the repr_wtf8 implementation and leave repr_str as the simple .to_string() conversion).crates/vm/src/stdlib/builtins.rs (1)
566-575:⚠️ Potential issue | 🟠 MajorGuard
expect_str()to avoid surrogate panics.
This still panics on non‑UTF‑8 prompts; switch totry_into_utf8(vm)and fall back to the std‑I/O path when conversion fails.🛠️ Suggested safe prompt conversion
- if use_rustyline { - let prompt = prompt.as_ref().map_or("", |s| s.expect_str()); + let (prompt_utf8, use_rustyline) = match prompt.as_ref() { + Some(s) => { + let prompt_utf8 = s.try_into_utf8(vm).ok(); + (prompt_utf8, use_rustyline && prompt_utf8.is_some()) + } + None => (None, use_rustyline), + }; + if use_rustyline { + let prompt = prompt_utf8.as_ref().map_or("", |s| s.as_str()); let mut readline = Readline::new(()); match readline.readline(prompt) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/builtins.rs` around lines 566 - 575, The code unconditionally calls expect_str() to build the prompt before using Readline (inside the use_rustyline branch), which panics on non‑UTF‑8 prompts; replace that call with a safe conversion using try_into_utf8(vm) (or equivalent) and if the conversion returns an Err, skip the rustyline branch and fall back to the normal std‑IO path. Specifically, change the prompt extraction around expect_str() so the prompt is attempted via try_into_utf8(vm) and only proceed to create Readline and call readline.readline(prompt) when conversion succeeds; otherwise treat use_rustyline as false (or jump to the existing std I/O handling).crates/vm/src/vm/mod.rs (1)
639-660:⚠️ Potential issue | 🟡 MinorAdd a separator after
msgin unraisable output.
Without it, output concatenates; same issue as the earlier review.✏️ Suggested tweak
- let msg_str = if let Some(msg) = msg { - format!("Exception ignored {}", msg) + let msg_str = if let Some(msg) = msg { + format!("Exception ignored {msg}: ") } else { "Exception ignored in: ".to_owned() };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 639 - 660, The unraisable output concatenates object repr because the Some(msg) branch of msg_str lacks a separator; update the construction of msg_str (where msg_str is built from msg in the let msg_str = if let Some(msg) { ... } else { ... } block) to append a separator (for example ": " or " in: ") after the interpolated msg so it matches the None branch's separator before the subsequent write_to_stderr and repr_result/repr_wtf8 output; ensure the change affects the same msg_str/ msg handling before write_to_stderr is called.crates/vm/src/stdlib/ctypes/function.rs (2)
753-759:⚠️ Potential issue | 🟠 MajorAvoid panic if
_type_is mutated to non‑UTF‑8.
to_str().expect(...)can panic if Python mutates_type_to a non‑UTF‑8 value; treat invalid UTF‑8 as a non‑match instead.🛠️ Suggested fix
- let c = s - .to_str() - .expect("_type_ is validated as ASCII at type creation"); - if c.len() == 1 && "sPzUZXO".contains(c) { - return true; - } + if let Some(c) = s.to_str() { + if c.len() == 1 && "sPzUZXO".contains(c) { + return true; + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 753 - 759, The code currently calls s.to_str().expect(...) which can panic if the Python-side `_type_` string contains invalid UTF‑8; change the logic to safely handle invalid UTF‑8 by attempting to convert with to_str() and only proceeding when it returns Ok(&str) (treat Err as a non‑match), so replace the expect-based conversion in the block that reads ctype.get_attr("_type_", vm) and downcasts to PyStr with a guarded to_str() check before using c (referencing the PyStr value and the existing conditional that checks c.len() == 1 && "sPzUZXO".contains(c)).
282-289:⚠️ Potential issue | 🟠 MajorUse platform‑aware wchar_t conversion (UTF‑16 here breaks non‑Windows).
encode_wide()yields UTF‑16; on Unix,libc::wchar_tis typically 32‑bit, so this path will mis-encode wide strings.🛠️ Suggested fix
- // Convert to null-terminated UTF-16, preserving lone surrogates - let wide: Vec<u16> = s - .as_wtf8() - .encode_wide() - .chain(core::iter::once(0)) - .collect(); - let wide_bytes: Vec<u8> = wide.iter().flat_map(|&x| x.to_ne_bytes()).collect(); - let keep = vm.ctx.new_bytes(wide_bytes); + // Convert to platform wchar_t bytes (UTF-16 on Windows, UTF-32 on Unix) + let wide_bytes = super::base::str_to_wchar_bytes(s.as_wtf8()); + let keep = vm.ctx.new_bytes(wide_bytes);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 282 - 289, The current branch in the PyStr handling uses encode_wide() and treats wchar_t as 16-bit which breaks on platforms where libc::wchar_t is 32-bit; change the conversion in the value.downcast_ref::<PyStr>() branch (symbols: s, wide, wide_bytes) to be platform-aware: check std::mem::size_of::<libc::wchar_t>() (or cfg(target_os = "windows")) and if it's 2 bytes keep the existing UTF‑16 flow (s.as_wtf8().encode_wide() + 0u16), otherwise produce UTF‑32 code units by iterating the WTF-8 string into Unicode scalar values (e.g. s.as_wtf8().chars() or equivalent that preserves lone surrogates semantics), append a 0u32 terminator, and then convert those u32 units to native-endian bytes for wide_bytes so the produced buffer matches the platform's wchar_t size.crates/vm/src/builtins/interpolation.rs (1)
205-223:⚠️ Potential issue | 🟠 MajorAvoid
push_stronPyStrreprs to prevent surrogate panics.
expr_repr/spec_reprarePyStrvalues;push_str(&expr_repr)routes throughAsRef<str>and can panic on WTF‑8 surrogates. Usepush_wtf8(..)like the other parts.🐛 Proposed fix
- result.push_str(&expr_repr); + result.push_wtf8(expr_repr.as_wtf8()); result.push_str(", "); if vm.is_none(&zelf.conversion) { result.push_str("None"); } else { result.push_wtf8(zelf.conversion.repr(vm)?.as_wtf8()); } result.push_str(", "); - result.push_str(&spec_repr); + result.push_wtf8(spec_repr.as_wtf8());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/interpolation.rs` around lines 205 - 223, In repr_wtf8 replace the uses of push_str(&expr_repr) and push_str(&spec_repr) — which call AsRef<str> and can panic on WTF‑8 surrogates — with push_wtf8 using the PyStr/WTF‑8 backing (e.g. call result.push_wtf8(expr_repr.as_wtf8()) and result.push_wtf8(spec_repr.as_wtf8())); keep the other push_wtf8 usages and the conversion handling in the repr_wtf8 function unchanged.crates/vm/src/stdlib/ctypes/base.rs (1)
366-373: Clarify the_type_ASCII validation scope in the comment.
Theto_str().expect(...)assumption is safe only for the simple-type_type_path; consider noting that explicitly so the invariant is clear to future readers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/base.rs` around lines 366 - 373, Clarify that the ASCII validation referenced by the expect() call applies only to the simple-type `_type_` path: update the comment near the field_type.get_attr("_type_", vm) / downcast_ref::<PyStr>() branch to explicitly state that `_type_` was validated as ASCII at type creation for simple types so that the to_str().expect("_type_ is validated as ASCII at type creation") invariant is clearly scoped to this code path (the simple-type `_type_` attribute handling in this block).crates/vm/src/builtins/str.rs (1)
1363-1390:⚠️ Potential issue | 🟠 MajorDo not swallow non-
KeyErrorexceptions intranslate.
Errors other thanKeyErrorfromtable.__getitem__should propagate; treating them as “missing mapping” can mask failures.🐛 Suggested fix
- match table.get_item(&*cp.to_u32().to_pyobject(vm), vm) { + match table.get_item(&*cp.to_u32().to_pyobject(vm), vm) { Ok(value) => { if let Some(text) = value.downcast_ref::<Self>() { translated.push_wtf8(text.as_wtf8()); } else if let Some(bigint) = value.downcast_ref::<PyInt>() { let mapped = bigint .as_bigint() .to_u32() .and_then(CodePoint::from_u32) .ok_or_else(|| { vm.new_value_error("character mapping must be in range(0x110000)") })?; translated.push(mapped); } else if !vm.is_none(&value) { return Err( vm.new_type_error("character mapping must return integer, None or str") ); } } - _ => translated.push(cp), + Err(e) if e.fast_isinstance(vm.ctx.exceptions.key_error) => { + translated.push(cp) + } + Err(e) => return Err(e), }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/str.rs` around lines 1363 - 1390, The translate method currently treats any Err from table.get_item as a missing mapping; change the Err arm to capture the exception (e.g., Err(err)) and only swallow it if it is a KeyError, otherwise propagate it by returning Err(err). Use the VM helper to test the exception type (e.g., vm.is_instance(&err, &vm.ctx.exceptions.key_error) or equivalent) so non-KeyError exceptions from table.__getitem__ are returned to the caller instead of being treated as "no mapping".
🧹 Nitpick comments (9)
crates/stdlib/src/zlib.rs (1)
468-471: Optional: collapse the two-stepinto()chain by widening the parameter bound.The intermediate
Stringbinding exists solely to bridgeimpl Into<String>→Wtf8Buf. Since every call site already passes&strorString— both of which implementInto<Wtf8Buf>after this PR's additions — the signature can be updated directly, eliminating the extra allocation on already-Stringinputs.♻️ Proposed simplification
- fn new_zlib_error(message: impl Into<String>, vm: &VirtualMachine) -> PyBaseExceptionRef { - let msg: String = message.into(); - vm.new_exception_msg(vm.class("zlib", "error"), msg.into()) + fn new_zlib_error(message: impl Into<Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef { + vm.new_exception_msg(vm.class("zlib", "error"), message.into()) }
Wtf8Bufwould also need to be brought into scope at the top of the innerzlibmodule:use rustpython_vm::common::wtf8::Wtf8Buf; // adjust path as needed🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/zlib.rs` around lines 468 - 471, Change new_zlib_error to accept message: impl Into<Wtf8Buf> (instead of impl Into<String>), bring Wtf8Buf into scope in the inner zlib module, and pass message.into() directly to vm.new_exception_msg to avoid the intermediate String allocation; update the signature and the call site in fn new_zlib_error accordingly so the function uses Wtf8Buf conversion once.crates/vm/src/stdlib/time.rs (1)
103-104: Considers.as_wtf8().as_bytes()for API consistency
s.as_bytes()is correct here (ASCII is identical under WTF-8 and UTF-8), but the rest of this PR consistently usesas_wtf8()to make the WTF-8 nature explicit. Using the same pattern here keeps the migration surface uniform.🔧 Suggested change
- && s.as_bytes() == b"negative duration" + && s.as_wtf8().as_bytes() == b"negative duration"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/time.rs` around lines 103 - 104, Replace the UTF-8 byte conversion call to maintain consistency with the rest of the PR: in the conditional that extracts the first argument string from e.args() (the closure using arg.str(vm).ok()), change s.as_bytes() to s.as_wtf8().as_bytes() so the WTF-8 representation is used consistently; keep the same comparison to b"negative duration".crates/vm/src/stdlib/sre.rs (2)
856-856: Prefer.expect()over.unwrap()to document the invariant.
get_slice(0, …)returnsNoneonly whenregs[0]contains a negative span. Sinceregs[0]is always set from ausizeinMatch::new, theunwrap()is safe, but the invariant is silent. Usingexpect()documents it clearly.♻️ Suggested change
- let match_repr = zelf.get_slice(0, str_drive, vm).unwrap().repr(vm)?; + let match_repr = zelf + .get_slice(0, str_drive, vm) + .expect("group 0 always has a valid span") + .repr(vm)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/sre.rs` at line 856, Replace the silent unwrap on zelf.get_slice(0, str_drive, vm).unwrap() with expect() to document the invariant: change to zelf.get_slice(0, str_drive, vm).expect("regs[0] is non-negative by construction in Match::new").repr(vm)?; this makes it explicit that get_slice can only be None for a negative span and that regs[0] is always set from a usize in Match::new.
857-860:Wtf8Buf::from(format!(…))allocates an intermediateString.Since only ASCII literals and integers are used, the intermediate
Stringcan be avoided by pushing pieces directly—matching the style used elsewhere in this PR (e.g.mappingproxy.rs,template.rs).♻️ Suggested change
- let mut result = Wtf8Buf::from(format!( - "<re.Match object; span=({}, {}), match=", - zelf.regs[0].0, zelf.regs[0].1, - )); + let mut result = Wtf8Buf::from("<re.Match object; span=("); + result.push_str(&zelf.regs[0].0.to_string()); + result.push_str(", "); + result.push_str(&zelf.regs[0].1.to_string()); + result.push_str("), match=");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/sre.rs` around lines 857 - 860, Replace the allocation from Wtf8Buf::from(format!(...)) by building the buffer in pieces: create result with Wtf8Buf::from("<re.Match object; span=(") and then append the integer bounds and punctuation directly (use result.push_str(...) / result.push(...) or the same push helpers used elsewhere in mappingproxy.rs/template.rs) for zelf.regs[0].0, ',', ' ', zelf.regs[0].1, '), match=' to avoid the intermediate String allocation; update the code around the Wtf8Buf::from(format!(…)) call and the variable result accordingly.crates/vm/src/exception_group.rs (2)
217-228: Minor:ArgIterableis heavier than a direct tuple downcast for a known-tuple slot.The constructor guarantees arg(1) is always a
PyTuple, yetslot_reprgoes throughArgIterable::try_from_object, which dispatches Python's iterator protocol. A directdowncast_ref::<PyTuple>()(mirroring the pattern in__str__at line 191 and inget_exceptions_tuple) avoids the extra allocation and dispatch:♻️ Direct tuple downcast in slot_repr
- if let Some(exceptions_obj) = zelf.get_arg(1) { - let iter: ArgIterable<PyObjectRef> = - ArgIterable::try_from_object(vm, exceptions_obj.clone())?; - let mut first = true; - for exc in iter.iter(vm)? { - if !first { - result.push_str(", "); - } - first = false; - result.push_wtf8(exc?.repr(vm)?.as_wtf8()); - } - } + if let Some(exceptions_obj) = zelf.get_arg(1) { + if let Some(tuple) = exceptions_obj.downcast_ref::<PyTuple>() { + let mut first = true; + for exc in tuple.iter() { + if !first { + result.push_str(", "); + } + first = false; + result.push_wtf8(exc.repr(vm)?.as_wtf8()); + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/exception_group.rs` around lines 217 - 228, In slot_repr, avoid using ArgIterable::try_from_object for the known-tuple arg(1); instead downcast the argument to a PyTuple directly (use zelf.get_arg(1) and downcast_ref::<PyTuple>()) and iterate its elements, mirroring the pattern used in __str__ and get_exceptions_tuple to eliminate iterator protocol dispatch and extra allocation; update the code paths around zelf, get_arg, and the for-loop that builds result so it uses the PyTuple iteration and repr calls on each element.
196-201:write!+.unwrap()is idiomatic but note the implicit assumption.
Wtf8Buf::write_strmust be infallible (as isString's) for the two.unwrap()calls here and on line 213 to never panic. If thefmt::Writeimpl forWtf8Bufcan ever returnErr(e.g., due to an encoding edge case when non-UTF-8 format arguments are passed), these become silent panics. Consider using.expect("Wtf8Buf write is infallible")to make the assumption explicit, or verify the impl is truly infallible.💡 Explicit infallibility annotation
- write!(result, " ({num_excs} sub-exception{suffix})").unwrap(); + write!(result, " ({num_excs} sub-exception{suffix})").expect("Wtf8Buf write is infallible");And similarly on line 213:
- write!(result, "{class_name}(").unwrap(); + write!(result, "{class_name}(").expect("Wtf8Buf write is infallible");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/exception_group.rs` around lines 196 - 201, The current use of write!(result, ...) followed by .unwrap() assumes Wtf8Buf::write_str is infallible; replace those .unwrap() calls with .expect("Wtf8Buf write is infallible") (or otherwise assert/handle the error) for the write! invocation that appends " ({num_excs} sub-exception{suffix})" and the similar write! call later (the other occurrence around line 213) so the assumption is explicit; locate the write! usages operating on the Wtf8Buf `result` and change their error handling to .expect(...) or add an explicit check that the fmt::Write impl cannot return Err.crates/vm/src/builtins/bool.rs (1)
104-109: Return typeStringis inconsistent with peer__format__implementations.
int.__format__,float.__format__, andcomplex.__format__all returnPyResult<Wtf8Buf>, butbool.__format__returnsPyResult<String>. While functionally correct (bool formatting only produces ASCII), aligning the return type would be more consistent.♻️ Optional: align return type with siblings
- fn __format__(obj: PyObjectRef, spec: PyUtf8StrRef, vm: &VirtualMachine) -> PyResult<String> { + fn __format__(obj: PyObjectRef, spec: PyUtf8StrRef, vm: &VirtualMachine) -> PyResult<Wtf8Buf> { let new_bool = obj.try_to_bool(vm)?; FormatSpec::parse(spec.as_str()) .and_then(|format_spec| format_spec.format_bool(new_bool)) + .map(Wtf8Buf::from_string) .map_err(|err| err.into_pyexception(vm)) }This would also require adding
use rustpython_common::wtf8::Wtf8Buf;to the imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/bool.rs` around lines 104 - 109, The __format__ implementation for bool returns PyResult<String> which is inconsistent with int/float/complex that return PyResult<Wtf8Buf>; change bool::__format__ signature to return PyResult<Wtf8Buf>, convert the result of FormatSpec::parse(...).and_then(|fs| fs.format_bool(new_bool)) into a Wtf8Buf (e.g. via Wtf8Buf::from_str or equivalent) and propagate errors the same way, and add use rustpython_common::wtf8::Wtf8Buf; to the imports so the function matches its peers.crates/vm/src/import.rs (1)
528-534: Downcasting__package__toPyUtf8StrRefrejects surrogate-containing package names.Previously this would have accepted any
PyStr(and potentially panicked onas_str()). Now it returnsTypeError("package must be a string")for surrogate-containing strings. The error message is slightly misleading — the value is a string, it's just not a valid UTF-8 string. That said, surrogate package names are pathological and this aligns with CPython behavior where import resolution operates onstrvalues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/import.rs` around lines 528 - 534, The current downcast uses PyUtf8StrRef which rejects surrogate-containing Python strings; change the downcast to PyStrRef (i.e., downcast::<PyStrRef>()) so genuine PyStr values (including surrogates) are accepted, and update any subsequent usage of pkg_str to work with PyStrRef (use its APIs rather than UTF-8-only methods); keep the TypeError for non-string types but do not treat surrogate-containing strings as non-strings.crates/vm/src/builtins/type.rs (1)
1564-1567: Avoidexpecton slot names to keep surrogate paths panic‑free.Using
expectcan still panic if upstream validation changes or if a surrogate slips through. Prefer converting to a typed error here.♻️ Suggested fix
- let mangled_name = mangle_name( - &class_name, - member.to_str().expect("slot name is always valid UTF-8"), - ); + let member_name = member.to_str().ok_or_else(|| { + vm.new_type_error("__slots__ must be valid UTF-8 identifiers".to_owned()) + })?; + let mangled_name = mangle_name(&class_name, member_name);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 1564 - 1567, The code calls member.to_str().expect(...) before passing the string to mangle_name, which can panic; replace the expect with a safe conversion that returns/propagates a typed error instead (e.g., use member.to_str().ok_or_else(|| Error::SlotNameEncoding(member.to_os_string()))? or map to an existing error variant), then pass the &str to mangle_name as before; update the surrounding function (where mangle_name is invoked) to return a Result so the new error can be propagated rather than panicking.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (120)
crates/common/src/str.rscrates/stdlib/src/_asyncio.rscrates/stdlib/src/_sqlite3.rscrates/stdlib/src/array.rscrates/stdlib/src/binascii.rscrates/stdlib/src/contextvars.rscrates/stdlib/src/csv.rscrates/stdlib/src/grp.rscrates/stdlib/src/hashlib.rscrates/stdlib/src/json.rscrates/stdlib/src/locale.rscrates/stdlib/src/lzma.rscrates/stdlib/src/multiprocessing.rscrates/stdlib/src/overlapped.rscrates/stdlib/src/posixshmem.rscrates/stdlib/src/pyexpat.rscrates/stdlib/src/pystruct.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/stdlib/src/unicodedata.rscrates/stdlib/src/zlib.rscrates/vm/src/buffer.rscrates/vm/src/builtins/bool.rscrates/vm/src/builtins/classmethod.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/complex.rscrates/vm/src/builtins/dict.rscrates/vm/src/builtins/float.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/function/jit.rscrates/vm/src/builtins/genericalias.rscrates/vm/src/builtins/int.rscrates/vm/src/builtins/interpolation.rscrates/vm/src/builtins/list.rscrates/vm/src/builtins/mappingproxy.rscrates/vm/src/builtins/memory.rscrates/vm/src/builtins/mod.rscrates/vm/src/builtins/module.rscrates/vm/src/builtins/namespace.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/range.rscrates/vm/src/builtins/set.rscrates/vm/src/builtins/slice.rscrates/vm/src/builtins/staticmethod.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/template.rscrates/vm/src/builtins/tuple.rscrates/vm/src/builtins/type.rscrates/vm/src/builtins/weakproxy.rscrates/vm/src/bytes_inner.rscrates/vm/src/codecs.rscrates/vm/src/coroutine.rscrates/vm/src/dict_inner.rscrates/vm/src/exception_group.rscrates/vm/src/exceptions.rscrates/vm/src/frame.rscrates/vm/src/function/buffer.rscrates/vm/src/function/fspath.rscrates/vm/src/function/mod.rscrates/vm/src/import.rscrates/vm/src/intern.rscrates/vm/src/macros.rscrates/vm/src/protocol/object.rscrates/vm/src/stdlib/_abc.rscrates/vm/src/stdlib/_winapi.rscrates/vm/src/stdlib/_wmi.rscrates/vm/src/stdlib/ast.rscrates/vm/src/stdlib/ast/basic.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/stdlib/ast/repr.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/stdlib/codecs.rscrates/vm/src/stdlib/collections.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/base.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/functools.rscrates/vm/src/stdlib/imp.rscrates/vm/src/stdlib/io.rscrates/vm/src/stdlib/itertools.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/stdlib/msvcrt.rscrates/vm/src/stdlib/nt.rscrates/vm/src/stdlib/operator.rscrates/vm/src/stdlib/os.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/signal.rscrates/vm/src/stdlib/sre.rscrates/vm/src/stdlib/symtable.rscrates/vm/src/stdlib/sys.rscrates/vm/src/stdlib/thread.rscrates/vm/src/stdlib/time.rscrates/vm/src/stdlib/typevar.rscrates/vm/src/stdlib/typing.rscrates/vm/src/stdlib/winreg.rscrates/vm/src/stdlib/winsound.rscrates/vm/src/suggestion.rscrates/vm/src/types/slot.rscrates/vm/src/types/structseq.rscrates/vm/src/utils.rscrates/vm/src/vm/context.rscrates/vm/src/vm/interpreter.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/vm_new.rscrates/vm/src/warn.rscrates/wasm/src/browser_module.rscrates/wasm/src/convert.rscrates/wasm/src/js_module.rscrates/wasm/src/wasm_builtins.rscrates/wtf8/src/concat.rscrates/wtf8/src/lib.rsexample_projects/wasm32_without_js/rustpython-without-js/src/lib.rssrc/lib.rssrc/shell.rssrc/shell/helper.rs
✅ Files skipped from review due to trivial changes (2)
- crates/stdlib/src/binascii.rs
- crates/vm/src/stdlib/_abc.rs
🚧 Files skipped from review as they are similar to previous changes (41)
- crates/stdlib/src/multiprocessing.rs
- crates/vm/src/builtins/staticmethod.rs
- src/shell.rs
- crates/vm/src/function/mod.rs
- crates/vm/src/function/fspath.rs
- crates/vm/src/stdlib/symtable.rs
- crates/vm/src/protocol/object.rs
- crates/vm/src/builtins/tuple.rs
- crates/vm/src/builtins/function/jit.rs
- crates/vm/src/builtins/genericalias.rs
- crates/stdlib/src/unicodedata.rs
- crates/vm/src/vm/interpreter.rs
- crates/stdlib/src/posixshmem.rs
- crates/vm/src/stdlib/winreg.rs
- crates/wasm/src/wasm_builtins.rs
- crates/vm/src/stdlib/signal.rs
- crates/stdlib/src/pyexpat.rs
- crates/vm/src/stdlib/ast/basic.rs
- crates/vm/src/builtins/template.rs
- crates/stdlib/src/array.rs
- crates/vm/src/stdlib/ast/repr.rs
- crates/vm/src/stdlib/ctypes/pointer.rs
- crates/stdlib/src/lzma.rs
- src/lib.rs
- crates/vm/src/builtins/code.rs
- crates/wasm/src/convert.rs
- crates/vm/src/buffer.rs
- crates/vm/src/builtins/complex.rs
- example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
- crates/vm/src/coroutine.rs
- crates/vm/src/stdlib/collections.rs
- crates/vm/src/stdlib/ctypes.rs
- crates/vm/src/utils.rs
- crates/vm/src/builtins/slice.rs
- crates/stdlib/src/_sqlite3.rs
- crates/vm/src/stdlib/typevar.rs
- crates/stdlib/src/overlapped.rs
- crates/vm/src/function/buffer.rs
- crates/vm/src/vm/context.rs
- crates/vm/src/builtins/memory.rs
- src/shell/helper.rs
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
crates/vm/src/stdlib/ctypes/union.rs (1)
620-638:⚠️ Potential issue | 🟡 MinorAvoid silently skipping invalid field names in positional initialization.
At Line 622/623, a non-
PyUtf8Strfield name is ignored and the loop continues, which can surface later as a misleading"too many initializers"error. This should fail immediately with a direct type error (consistent withprocess_fields).💡 Suggested fix
- if let Some(tuple) = field.downcast_ref::<PyTuple>() - && let Some(name) = tuple.first() - && let Some(name_str) = name.downcast_ref::<PyUtf8Str>() - { - let field_name = name_str.as_str().to_owned(); + if let Some(tuple) = field.downcast_ref::<PyTuple>() + && let Some(name) = tuple.first() + { + let name_str = name + .downcast_ref::<PyUtf8Str>() + .ok_or_else(|| vm.new_type_error("field name must be a string"))?; + let field_name = name_str.as_str().to_owned(); // Check for duplicate in kwargs if kwargs.contains_key(&field_name) { return Err(vm.new_type_error(format!( "duplicate values for field {:?}", field_name ))); } self_obj.as_object().set_attr( vm.ctx.intern_str(field_name), args[current_index].clone(), vm, )?; current_index += 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/union.rs` around lines 620 - 638, The positional-initializer loop in union.rs silently skips field names that are not PyUtf8Str, causing later misleading "too many initializers" errors; update the block that matches if let Some(tuple) = field.downcast_ref::<PyTuple>() ... to instead return an immediate type error when the tuple first element is not a PyUtf8Str (use vm.new_type_error with a clear message consistent with process_fields) so invalid field names fail fast; reference the tuple/name/name_str checks and the kwargs/current_index handling when making the change.crates/vm/src/stdlib/posix.rs (1)
1974-1983:⚠️ Potential issue | 🔴 CriticalUse
try_downcast()instead ofdowncast()to properly propagate UTF-8 validation errors for surrogate-containing strings.At lines 1974 and 2369, the code uses
downcast::<PyUtf8Str>().map_err(...)which converts validation failures toTypeError. However, when a string contains surrogates, Python raisesUnicodeEncodeErrorwith the message "surrogates not allowed", notTypeError.The current approach silently swallows the proper validation error. Use
try_downcast::<PyUtf8Str>(vm)instead, which callstry_downcast_from()→ensure_valid_utf8()and generates the correctUnicodeEncodeErrorfor surrogate-containing strings.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/posix.rs` around lines 1974 - 1983, Replace uses of downcast::<PyUtf8Str>().map_err(...) with try_downcast::<PyUtf8Str>(vm) so UTF-8 validation errors (like surrogate-containing strings) propagate as UnicodeEncodeError instead of being converted to TypeError; specifically, in the code that currently calls obj.downcast::<PyUtf8Str>().map_err(|_| vm.new_type_error(...)) (the block that parses into PathconfVar and any similar occurrence), call obj.try_downcast::<PyUtf8Str>(vm) and handle its Result the same way so ensure_valid_utf8() runs and the VM produces the correct error type.crates/vm/src/builtins/type.rs (1)
1970-1980:⚠️ Potential issue | 🟡 Minor
type.__repr__can drop custom non-UTF-8__qualname__.Line 1977 narrows to
PyUtf8Strand falls back toname, so a validPyStr__qualname__with surrogate data is silently ignored in repr output.💡 Suggested fix
- format!( - "<class '{}.{}'>", - module, - zelf.__qualname__(vm) - .downcast_ref::<PyUtf8Str>() - .map(|n| n.as_str()) - .unwrap_or_else(|| &name) - ) + let qualname = zelf + .__qualname__(vm) + .downcast_ref::<PyStr>() + .map(|n| n.as_wtf8().to_string()) + .unwrap_or_else(|| name.to_string()); + format!("<class '{}.{}'>", module, qualname)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 1970 - 1980, The repr builder for type.__repr__ currently only accepts PyUtf8Str for __qualname__ and falls back to name, which drops valid non-UTF-8 PyStr values; update the match that calls zelf.__qualname__(vm) to attempt downcast to a generic PyStr (not only PyUtf8Str) and use a loss-tolerant conversion (e.g., a to_string_lossy or equivalent on PyStr) to produce the qualname text so surrogate/non-UTF-8 data is preserved in the formatted "<class '{}.{}'>" output.crates/vm/src/stdlib/ctypes/base.rs (1)
2535-2542:⚠️ Potential issue | 🟠 MajorAvoid rejecting non-UTF-8
strfield names in_fields_processing.Line 2535 downcasts to
PyUtf8Str, and Line 2541 usesfname.as_str(). That rejects valid Pythonstrvalues containing surrogates, which is a regression in this WTF-8 migration path. UsePyStr+as_wtf8()here.🐛 Proposed fix
- let fname = field_tuple + let fname = field_tuple .first() .expect("len checked") - .downcast_ref::<PyUtf8Str>() + .downcast_ref::<PyStr>() .ok_or_else(|| vm.new_type_error("field name must be a string"))?; @@ let fdescr_obj = descr .proto .as_object() - .get_attr(vm.ctx.intern_str(fname.as_str()), vm)?; + .get_attr(vm.ctx.intern_str(fname.as_wtf8()), vm)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/base.rs` around lines 2535 - 2542, The code currently downcasts field names to PyUtf8Str and calls fname.as_str(), rejecting valid Python str values with surrogates; change the downcast to PyStr and use PyStr::as_wtf8() when constructing the attribute key passed into descr.proto.as_object().get_attr(...) and in any error/context messages (replace references to fname.as_str() with fname.as_wtf8() usage) so WTF-8 str values are accepted while keeping the same vm.new_type_error when the object is not a str.
♻️ Duplicate comments (13)
src/shell.rs (1)
145-146:⚠️ Potential issue | 🔴 CriticalAvoid panic in REPL prompt extraction (Line 145).
expect_str()can panic on surrogate-containingsys.ps1/sys.ps2, which reintroduces the failure mode this PR is meant to remove.Suggested fix
- let prompt = match prompt { - Ok(ref s) => s.expect_str(), - Err(_) => "", - }; + let prompt = match prompt { + Ok(ref s) => s.to_str().unwrap_or(""), + Err(_) => "", + };#!/bin/bash # Read-only verification: confirm API behavior and current callsite. rg -n "fn\\s+expect_str\\s*\\(" --type rust -A4 -B2 rg -n "fn\\s+to_str\\s*\\(" --type rust -A4 -B2 rg -n "expect_str\\(\\)" --type rust src/shell.rs -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/shell.rs` around lines 145 - 146, The code calls expect_str() which can panic on surrogate-containing prompts; replace that call with a non-panicking conversion (e.g., call to_str() and handle the Result/Option) so failures return an empty prompt instead of panicking; specifically update the use of expect_str() in the REPL prompt extraction to something like to_str().unwrap_or("") or match on to_str() / to_string_lossy() and return "" on error, ensuring no panics from expect_str().crates/vm/src/stdlib/msvcrt.rs (1)
83-91:⚠️ Potential issue | 🟠 Major
expect_str()still panics on surrogate input — the PR's goal is unmet here.Replacing
as_str()withexpect_str()doesn't fix the panic: both methods blow up onPyStrvalues that contain surrogates (valid Python strings on Windows).putwch/ungetwchare Windows wide-character APIs that are supposed to accept surrogate code units, so the safe path is to walk the WTF-8 code points directly.🔧 Surrogate-safe fix for
putwch(line 83-91) andungetwch(lines 108-120)fn putwch(s: PyStrRef, vm: &VirtualMachine) -> PyResult<()> { let c = s - .expect_str() - .chars() + .as_wtf8() + .code_points() .exactly_one() .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?; - unsafe { suppress_iph!(_putwch(c as u16)) }; + unsafe { suppress_iph!(_putwch(c.to_u32() as u16)) }; Ok(()) }fn ungetwch(s: PyStrRef, vm: &VirtualMachine) -> PyResult<()> { let c = - s.expect_str().chars().exactly_one().map_err(|_| { + s.as_wtf8().code_points().exactly_one().map_err(|_| { vm.new_type_error("ungetwch() argument must be a string of length 1") })?; - let ret = unsafe { suppress_iph!(_ungetwch(c as u32)) }; + let ret = unsafe { suppress_iph!(_ungetwch(c.to_u32())) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/msvcrt.rs` around lines 83 - 91, The current putwch and ungetwch use expect_str() which panics on surrogate-containing PyStr; instead iterate the string's WTF-8 code units so surrogate code units are preserved: for both putwch and ungetwch, replace the expect_str().chars().exactly_one() path with a WTF-8–aware extraction from the PyStrRef (walk the underlying code units/encoded units, ensure there is exactly one 16-bit code unit, map that unit to u16 and call _putwch/_ungetwch), and keep the same vm.new_type_error when the argument is not exactly one code unit; ensure you reference the functions putwch and ungetwch and remove any use of expect_str() that assumes valid UTF-8.crates/vm/src/stdlib/nt.rs (3)
1189-1192:⚠️ Potential issue | 🔴 Critical
expect_str()still panics on surrogate-containing strings — same defect asas_str()The change replaced
as_str()withexpect_str(), butexpect_str()also panics when the string contains surrogates. Becauseargvcomes from user-supplied Python objects viaextract_elements_with, untrusted data can still trigger an unconditional panic — directly contradicting this PR's objective.Use
to_str()with proper Python-level error propagation:🐛 Proposed fix
let argv = vm.extract_elements_with(argv.as_ref(), |obj| { let arg = PyStrRef::try_from_object(vm, obj)?; - make_widestring(arg.expect_str()) + make_widestring(arg.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?) })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/nt.rs` around lines 1189 - 1192, The closure passed to vm.extract_elements_with currently calls PyStrRef::try_from_object and then expect_str(), which still panics on surrogate-containing input; change the closure to call arg.to_str() (or equivalent non-panicking conversion) and propagate any Unicode decoding error out as a Python error (so the ? from the closure returns a Python-level error) before passing the resulting &str into make_widestring; update the closure around PyStrRef::try_from_object(vm, obj)? to use arg.to_str() and return a proper VM/Python error instead of letting expect_str() panic.
1230-1233:⚠️ Potential issue | 🔴 CriticalSame
expect_str()panic risk inexecveargv extractionIdentical issue as in
execv:arg.expect_str()on user-provided argv elements panics on surrogates.🐛 Proposed fix
let argv = vm.extract_elements_with(argv.as_ref(), |obj| { let arg = PyStrRef::try_from_object(vm, obj)?; - make_widestring(arg.expect_str()) + make_widestring(arg.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?) })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/nt.rs` around lines 1230 - 1233, The argv extraction in execve uses arg.expect_str() which can panic on surrogate-containing user strings; change the closure passed to vm.extract_elements_with in execve to use the safe string accessor (e.g. arg.as_str()) instead of expect_str(), so call make_widestring(arg.as_str()) after obtaining arg via PyStrRef::try_from_object(vm, obj) to avoid panics.
1253-1256:⚠️ Potential issue | 🔴 Critical
expect_str()panic on surrogate env keys/values inexecve
key.expect_str()andvalue.expect_str()at lines 1255–1256 are called on values extracted from a user-suppliedArgMapping. Surrogate-containing env strings will panic here instead of raising a PythonUnicodeEncodeError.🐛 Proposed fix
let key = PyStrRef::try_from_object(vm, key)?; let value = PyStrRef::try_from_object(vm, value)?; - let key_str = key.expect_str(); - let value_str = value.expect_str(); + let key_str = key.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?; + let value_str = value.to_str().ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;crates/vm/src/stdlib/_winapi.rs (1)
299-301:⚠️ Potential issue | 🔴 Critical
expect_str()reintroduces panic paths on surrogate-containingstrinputsLine 299, Line 399, and Lines 421/423 still use
expect_str(), which can panic and defeats the PR goal of removing panic-based string extraction. Please switch these callsites to fallible conversion (to_str()+ error propagation) or a WTF-8-safe path where needed.Proposed fix
@@ - let wstr = |s: PyStrRef| { - let ws = widestring::WideCString::from_str(s.expect_str()) + let wstr = |s: PyStrRef| { + let s = s.to_str().map_err(|e| e.to_pyexception(vm))?; + let ws = widestring::WideCString::from_str(s) .map_err(|err| err.to_pyexception(vm))?; Ok(ws.into_vec_with_nul()) }; @@ - let src_path = std::path::Path::new(src_path.expect_str()); - let dest_path = std::path::Path::new(dest_path.expect_str()); + let src_path = std::path::Path::new(src_path.to_str().map_err(|e| e.to_pyexception(vm))?); + let dest_path = + std::path::Path::new(dest_path.to_str().map_err(|e| e.to_pyexception(vm))?); @@ - let k = k.expect_str(); + let k = k.to_str().map_err(|e| e.to_pyexception(vm))?; let v = PyStrRef::try_from_object(vm, v)?; - let v = v.expect_str(); + let v = v.to_str().map_err(|e| e.to_pyexception(vm))?;#!/bin/bash # Verify panic-prone string extraction remains in the touched file. rg -nP 'expect_str\s*\(' crates/vm/src/stdlib/_winapi.rs -C2As per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 399-400, 421-423
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/_winapi.rs` around lines 299 - 301, The uses of s.expect_str() (notably where WideCString::from_str(s.expect_str()) is called before .map_err(|err| err.to_pyexception(vm))? and similar callsites around the other mentions) reintroduce panic paths; replace each expect_str() with a fallible extraction: call s.to_str().map_err(|e| /* convert to a VM error via e.to_pyexception(vm) or a descriptive error */ )? and then pass the &str into widestring::WideCString::from_str, or if the input may contain WTF-8/surrogates use a WTF-8-safe conversion path (e.g., obtain the raw bytes and use a WTF-8 aware API) and propagate the error via to_pyexception(vm); ensure the final Ok(ws.into_vec_with_nul()) remains unchanged and all conversions return Result to avoid panics in functions like the WideCString conversion sites referenced.crates/vm/src/types/slot.rs (1)
2077-2077:format!("… {}", repr.as_wtf8())has the sameWtf8: Displayconcern as line 237 instructseq.rs.Since this is inside
#[cold] unreachable!, it has no practical impact on correctness, but it is worth tracking alongside the broader question of howWtf8formats.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/types/slot.rs` at line 2077, The unreachable! invocation uses format with repr.as_wtf8() which relies on Wtf8::Display; change the message to avoid requiring Display (e.g., use debug formatting) by replacing unreachable!("slot must be overridden for {}", repr.as_wtf8()) with something like unreachable!("slot must be overridden for {:?}", repr.as_wtf8()), so the code in slot.rs no longer depends on Wtf8::Display.crates/vm/src/stdlib/functools.rs (1)
428-430:⚠️ Potential issue | 🟠 MajorAvoid
expect_str()when normalizing stored keyword keys.Line [430] still uses
expect_str(), which can panic for surrogate-containingstrkeys. Useto_str()and raiseTypeErrorinstead sopartialcalls fail safely.🛠️ Proposed fix
for (key, value) in &*keywords { let key_str = key .downcast_ref::<crate::builtins::PyStr>() .ok_or_else(|| vm.new_type_error("keywords must be strings"))?; - final_kwargs.insert(key_str.expect_str().to_owned(), value); + let key = key_str + .to_str() + .ok_or_else(|| vm.new_type_error("keywords must be strings"))? + .to_owned(); + final_kwargs.insert(key, value); }#!/bin/bash # Verify panic-prone API usage and current call site. rg -n "pub fn expect_str|pub fn to_str" crates/vm/src/builtins/str.rs -A8 -B4 rg -n "final_kwargs.insert|downcast_ref::<crate::builtins::PyStr>|expect_str" crates/vm/src/stdlib/functools.rs -A8 -B4🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/functools.rs` around lines 428 - 430, The code normalizes keyword keys using expect_str(), which can panic on surrogate-containing strings; update the normalization in the block that calls downcast_ref::<crate::builtins::PyStr>() and final_kwargs.insert(...) to call to_str() instead, check the Result/Option and on failure return vm.new_type_error("keywords must be strings") so the partial/keyword handling fails with a TypeError rather than panicking; ensure you replace expect_str().to_owned() with a safe to_str() usage and propagate the error into the existing vm.new_type_error path.crates/vm/src/stdlib/os.rs (1)
455-456:⚠️ Potential issue | 🟠 MajorReplace panic-prone
expect_str()in Windows env APIs.Line [455], Line [456], and Line [503] use
expect_str(), which can panic on surrogate-containingstrvalues and bypassPyResulterror handling. Please switch these toto_str()and raise Python errors instead of panicking.🛠️ Proposed fix
fn putenv(key: PyStrRef, value: PyStrRef, vm: &VirtualMachine) -> PyResult<()> { - let key_str = key.expect_str(); - let value_str = value.expect_str(); + let key_str = key + .to_str() + .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?; + let value_str = value + .to_str() + .ok_or_else(|| vm.new_value_error("illegal environment variable value"))?; @@ fn unsetenv(key: PyStrRef, vm: &VirtualMachine) -> PyResult<()> { - let key_str = key.expect_str(); + let key_str = key + .to_str() + .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?;#!/bin/bash # Verify panic-vs-result string APIs and current call sites. rg -n "pub fn expect_str|pub fn to_str" crates/vm/src/builtins/str.rs -A8 -B4 rg -n "fn putenv|fn unsetenv|expect_str" crates/vm/src/stdlib/os.rs -A10 -B4Also applies to: 503-503
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/os.rs` around lines 455 - 456, Replace the panic-prone expect_str() calls in the Windows env API handlers (the places assigning key_str and value_str and the other site at line ~503) with to_str() and propagate a proper Python error instead of panicking; specifically, in the functions handling putenv/unsetenv (and any code that currently calls expect_str()), call key.to_str() and value.to_str(), map the Err case to an appropriate PyResult error via the VM (e.g., return Err(vm.new_type_error(...) or vm.new_value_error(...)) with a clear message about invalid string contents) and proceed with the unwrapped &str on Ok. Ensure all expect_str() occurrences in this module are replaced and error paths return a PyResult rather than panic.crates/vm/src/vm/mod.rs (1)
639-645:⚠️ Potential issue | 🟡 MinorAdd a separator after the unraisable message prefix.
Line 640builds"Exception ignored {msg}"without a separator before object repr, so output gets concatenated.✏️ Suggested fix
- let msg_str = if let Some(msg) = msg { - format!("Exception ignored {}", msg) + let msg_str = if let Some(msg) = msg { + format!("Exception ignored {msg}: ") } else { "Exception ignored in: ".to_owned() };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 639 - 645, The constructed unraisable message in vm::mod.rs builds msg_str without a separator when Some(msg) is present, resulting in concatenated output; update the msg_str construction (the branch that sets msg_str when msg is Some) to include a separator (for example a colon and space) between the prefix and the object repr so the string passed to write_to_stderr(&msg_str, &stderr, self) reads like "Exception ignored: {msg}" instead of "Exception ignored {msg}".crates/vm/src/stdlib/builtins.rs (1)
572-575:⚠️ Potential issue | 🟠 MajorAvoid
expect_str()in the rustyline prompt path.
Line 573can still panic when prompt contains surrogate data. Gate rustyline on UTF-8 conversion and fall back to standard I/O when conversion fails.🛠️ Suggested fix
- if use_rustyline { - let prompt = prompt.as_ref().map_or("", |s| s.expect_str()); + let prompt_utf8 = prompt.as_ref().and_then(|s| s.try_into_utf8(vm).ok()); + if use_rustyline && (prompt.as_ref().is_none() || prompt_utf8.is_some()) { + let prompt = prompt_utf8.as_ref().map_or("", |s| s.as_str()); let mut readline = Readline::new(()); match readline.readline(prompt) {As per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/builtins.rs` around lines 572 - 575, The rustyline branch currently calls prompt.as_ref().map_or("", |s| s.expect_str()) which can panic on non-UTF-8/surrogate data; instead try to convert the prompt to a UTF-8 &str and only use rustyline when conversion succeeds—e.g., match prompt.as_ref().and_then(|s| s.to_str().ok()) or equivalent—and if conversion fails, disable/use_rustyline = false (or fall back to the standard std::io prompt path) before constructing Readline::new() and calling readline.readline; update the logic around use_rustyline, prompt, Readline::new, and readline.readline to avoid any expect_str() usage and safely handle non-UTF-8 prompts.crates/vm/src/builtins/interpolation.rs (1)
210-225:⚠️ Potential issue | 🔴 Critical
push_str(&expr_repr)andpush_str(&spec_repr)still panic on WTF-8 surrogates.Lines 213 and 221 use
push_strwith&expr_repr/&spec_repr(bothPyRef<PyStr>). This resolves viaAsRef<str>onPyRef<PyStr>, which callsto_str().expect("str has surrogates")and panics if the expression or format_spec contains surrogates. All other repr parts (value_repr,conversion) already use the surrogate-safepush_wtf8. These two need the same treatment to be consistent with the PR's intent and to avoid panics.🐛 Proposed fix
- result.push_str(&expr_repr); + result.push_wtf8(expr_repr.as_wtf8()); result.push_str(", "); if vm.is_none(&zelf.conversion) { result.push_str("None"); } else { result.push_wtf8(zelf.conversion.repr(vm)?.as_wtf8()); } result.push_str(", "); - result.push_str(&spec_repr); + result.push_wtf8(spec_repr.as_wtf8());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/interpolation.rs` around lines 210 - 225, The repr construction in interpolation.rs uses push_str(&expr_repr) and push_str(&spec_repr) which call AsRef<str> on PyRef<PyStr> and will panic on WTF-8 surrogates; change those calls to use Wtf8Buf::push_wtf8 so they accept surrogate-containing strings just like value_repr and conversion do—replace push_str(&expr_repr) and push_str(&spec_repr) with result.push_wtf8(expr_repr.as_wtf8()) and result.push_wtf8(spec_repr.as_wtf8()) (referencing the Wtf8Buf type and push_wtf8 method and the PyRef<PyStr> values expr_repr/spec_repr).crates/vm/src/stdlib/codecs.rs (1)
52-75: Factor out sharedCodeArgsnormalization inencode/decode.Line 52 through Line 75 duplicate the same destructuring and encoding-default logic; only the final registry call differs.
As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/codecs.rs` around lines 52 - 75, Extract the repeated CodeArgs normalization into a small helper (e.g., a private fn normalize_code_args(args: CodeArgs) -> (PyObjectRef, &str, Option<String>) or similar) that performs the destructuring and resolves encoding via codecs::DEFAULT_ENCODING, then call vm.state.codec_registry.encode(...) and vm.state.codec_registry.decode(...) with the normalized tuple from encode and decode; reference the existing symbols CodeArgs, DEFAULT_ENCODING, encode, decode, and vm.state.codec_registry to locate where to replace the duplicate logic.
🧹 Nitpick comments (12)
crates/vm/src/warn.rs (2)
410-433:to_str()returningNone(surrogate-containing action) silently falls through to the "Unrecognized action" error pathThe
_arm in thematch action_sblock handles bothSome(unknown_str)andNone(action contains surrogates) identically. While treating a surrogate-containing action as unrecognized is functionally correct, the conflation is non-obvious. A brief comment or explicitNonearm would clarify intent, especially since this is the key behavioral change introduced by the migration fromas_str()toto_str().♻️ Proposed clarification
let action_s = action_str.to_str(); match action_s { Some("once") => { ... } Some("module") => { ... } Some("default") => false, + // None means the action string contains surrogates and is not + // a recognized UTF-8 action name; fall through to the error. _ => { return Err(vm.new_runtime_error(format!( "Unrecognized action ({action_str}) in warnings.filters:\n {action_str}" ))); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/warn.rs` around lines 410 - 433, The match on action_s (result of action_str.to_str()) currently conflates None (surrogate-containing string) with unrecognized Some values in the wildcard arm; make this explicit by adding a separate None => { /* surrogate-containing action: treat as unrecognized or document behavior */ } arm or a brief comment before the match clarifying that None indicates surrogate-containing input and should be handled the same as unknown actions; reference the symbols action_str, action_s, to_str(), and the match block around update_registry/get_once_registry so reviewers can locate and apply the change.
395-405: Inconsistent comparison style: mix ofas_bytes()andas_wtf8()for equivalent ASCII checksLines 395 and 399 use
as_bytes()to compare ASCII action strings, but line 405 switches toas_wtf8()for the same pattern. All four strings ("error", "ignore", "always", "all") are pure ASCII, soas_bytes()is sufficient and cheaper throughout. Mixing the two idioms in the same function reduces readability.♻️ Proposed refactor for consistency
- if action_str.as_bytes() == b"error" { + let action_bytes = action_str.as_bytes(); + if action_bytes == b"error" { let exc = PyBaseExceptionRef::try_from_object(vm, message)?; return Err(exc); } - if action_str.as_bytes() == b"ignore" { + if action_bytes == b"ignore" { return Ok(()); } - let already = if action_str.as_wtf8() != "always" && action_str.as_wtf8() != "all" { + let already = if action_bytes != b"always" && action_bytes != b"all" {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/warn.rs` around lines 395 - 405, The comparisons against the action string are inconsistent: some use action_str.as_bytes() (for "error" and "ignore") while others use action_str.as_wtf8() (for "always" and "all"); change the latter to use action_str.as_bytes() so all four ASCII literals ("error","ignore","always","all") are compared the same and more efficiently—update the conditional that currently uses as_wtf8() to use as_bytes() instead (look for the variable action_str and the branching that checks "always" and "all").crates/vm/src/builtins/function.rs (1)
1090-1096: Simplify the__qualname__/__name__fallback chain.The
if let … { Some(qname) } else { … }pattern re-wraps an already-Somevalue unnecessarily.♻️ Proposed simplification
- let func_name = if let Some(qname) = - vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))? - { - Some(qname) - } else { - vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))? - }; + let func_name = match vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))? { + Some(qname) => Some(qname), + None => vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))?, + };Or more idiomatically using a fallible
or_else:- let func_name = if let Some(qname) = - vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))? - { - Some(qname) - } else { - vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))? - }; + let func_name = vm + .get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))? + .map(Ok) + .unwrap_or_else(|| vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__)))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/function.rs` around lines 1090 - 1096, The current if-let re-wraps an Option from vm.get_attribute_opt unnecessarily; replace the conditional with an Option chain: first call vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __qualname__))? to get an Option, then call .or(...) with vm.get_attribute_opt(zelf.function.clone(), identifier!(vm, __name__))? so func_name is the first Some or the fallback without re-wrapping; update the expression that assigns func_name accordingly (referencing func_name, vm.get_attribute_opt, identifier!(vm, __qualname__), identifier!(vm, __name__), and zelf.function).crates/stdlib/src/zlib.rs (1)
468-471: Update the generic bound to align with codebase conventions and eliminate the intermediate variable.The two-step conversion is correct, but it deviates from the established pattern in the stdlib. Other modules (e.g.,
csv.rs) useimpl Into<Wtf8Buf>directly for similar functions, which works becauseWtf8BufimplementsFrom<String>andFrom<&str>.Update the function to:
Recommended refactoring
-fn new_zlib_error(message: impl Into<String>, vm: &VirtualMachine) -> PyBaseExceptionRef { - let msg: String = message.into(); - vm.new_exception_msg(vm.class("zlib", "error"), msg.into()) +fn new_zlib_error(message: impl Into<Wtf8Buf>, vm: &VirtualMachine) -> PyBaseExceptionRef { + vm.new_exception_msg(vm.class("zlib", "error"), message.into()) }Add to imports:
use rustpython_common::wtf8::Wtf8Buf;All call sites (string literals,
e.to_string(), etc.) are compatible withInto<Wtf8Buf>.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/zlib.rs` around lines 468 - 471, Refactor new_zlib_error to accept impl Into<Wtf8Buf> instead of impl Into<String>, remove the intermediate msg variable and pass the converted value directly to vm.new_exception_msg (keep vm.class("zlib", "error") as the exception class); also add the import use rustpython_common::wtf8::Wtf8Buf; so call sites (literals and e.to_string()) which implement Into<Wtf8Buf> continue to work without extra allocations.crates/wasm/src/convert.rs (1)
59-61: Redundant.into()onStringbinding
js_sys::Error::to_string()returns a RustString(viaDisplay; JavaScript'stoString()is exposed asto_js_string()injs-sysbindings to avoid confusion with Rust'sToStringtrait and itsto_string()method). The.into()here convertsString → String(identity) sincemsgis typed asString; the actualString → Wtf8Buf(or equivalent) conversion happens implicitly at thenew_type_error(msg)call site. The intermediate binding with.into()is unnecessary.♻️ Suggested simplification
- let msg: String = js_err.unchecked_into::<js_sys::Error>().to_string().into(); - vm.new_type_error(msg) + vm.new_type_error(js_err.unchecked_into::<js_sys::Error>().to_string())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/wasm/src/convert.rs` around lines 59 - 61, Remove the redundant .into() when converting the JS error to a Rust String: in the binding where you create msg from js_err (using js_sys::Error::to_string()), just assign the returned String directly to msg and pass it into vm.new_type_error(msg) — e.g., change the binding that currently uses js_err.unchecked_into::<js_sys::Error>().to_string().into() to use to_string() without .into() so you avoid the no-op String→String conversion.crates/vm/src/types/structseq.rs (1)
237-237:Wtf8::fmt()replaces unpaired surrogates with U+FFFD, corrupting module names containing such code points.The
Displayimpl forWtf8iterates through the bytes and explicitly substitutes any unpaired surrogate code points with the replacement character U+FFFD. When formattingmodule_str.as_wtf8()in the qualified type name, module names containing surrogates (which can occur via thesurrogateescapecodec handler) will silently lose that information. While this is an improvement over the panic behavior ofas_str(), full WTF-8 fidelity would require operating onWtf8Bufinstead of converting toStringforrepr_str.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/types/structseq.rs` at line 237, The current code calls module_str.as_wtf8() and formats it (via Wtf8::fmt) into a String, which replaces unpaired surrogates with U+FFFD; instead preserve full WTF-8 fidelity by using a Wtf8Buf-based representation when building the qualified type name in repr_str (referencing module_str.as_wtf8(), Wtf8::fmt(), and Self::NAME). Change the construction to operate on or append a Wtf8Buf (or otherwise emit WTF-8 bytes directly) instead of converting to a Rust String for formatting, so module names with unpaired surrogates are preserved end-to-end in the qualified name.crates/vm/src/stdlib/io.rs (1)
5863-5869: Consider deduplicating repr-fallback logic in_dealloc_warn.The same fallback pattern appears in both
fileioandwinconsoleio; extracting it would reduce drift risk.♻️ Optional extraction sketch
+fn repr_or_file_fallback(source: PyObjectRef, vm: &VirtualMachine) -> Wtf8Buf { + source + .repr(vm) + .map(|s| s.as_wtf8().to_owned()) + .unwrap_or_else(|_| Wtf8Buf::from("<file>")) +} ... - let repr = source - .repr(vm) - .map(|s| s.as_wtf8().to_owned()) - .unwrap_or_else(|_| Wtf8Buf::from("<file>")); + let repr = repr_or_file_fallback(source, vm);Also applies to: 6415-6422
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/io.rs` around lines 5863 - 5869, The repr fallback (calling .repr(vm).map(|s| s.as_wtf8().to_owned()).unwrap_or_else(|_| Wtf8Buf::from("<file>"))) is duplicated in _dealloc_warn across fileio and winconsoleio; extract that logic into a small helper (e.g., a function like format_unclosed_file_repr(vm, obj) or repr_fallback_for_unclosed_file) and replace the inline chain in both _dealloc_warn implementations to call the helper, then use its result in the warn call (crate::stdlib::warnings::warn) to eliminate duplication and keep behavior identical.crates/vm/src/codecs.rs (1)
114-136:get_incremental_encoder/get_incremental_decodersignatures should be migrated toOption<PyUtf8StrRef>for consistency.The
encode,decode, and allCodecsRegistrywrapper methods now acceptOption<PyUtf8StrRef>, but these two methods still acceptOption<PyStrRef>. The only caller (incrates/vm/src/stdlib/io.rs) unnecessarily converts aPyUtf8StrtoPyStrRefviainto_wtf8()just to pass it to these methods. Updating the signatures and usinginto_wtf8().into()internally—as done inencodeanddecode—would eliminate this redundant widening and align the API.♻️ Suggested alignment
pub fn get_incremental_encoder( &self, - errors: Option<PyStrRef>, + errors: Option<PyUtf8StrRef>, vm: &VirtualMachine, ) -> PyResult { let args = match errors { - Some(e) => vec![e.into()], + Some(e) => vec![e.into_wtf8().into()], None => vec![], }; vm.call_method(self.0.as_object(), "incrementalencoder", args) } pub fn get_incremental_decoder( &self, - errors: Option<PyStrRef>, + errors: Option<PyUtf8StrRef>, vm: &VirtualMachine, ) -> PyResult { let args = match errors { - Some(e) => vec![e.into()], + Some(e) => vec![e.into_wtf8().into()], None => vec![], }; vm.call_method(self.0.as_object(), "incrementaldecoder", args) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/codecs.rs` around lines 114 - 136, Update the signatures of get_incremental_encoder and get_incremental_decoder to accept Option<PyUtf8StrRef> instead of Option<PyStrRef>, and inside each method convert the optional PyUtf8StrRef to the argument expected by vm.call_method by calling into_wtf8().into() (mirroring the encode/decode implementations) so callers no longer need to perform an unnecessary widening conversion; keep method names get_incremental_encoder and get_incremental_decoder and only change the parameter types and the creation of args accordingly.crates/vm/src/stdlib/operator.rs (1)
435-451:repr_wtf8logic is correct; optional: replacefirst-flag with iterator combinators.All three implementations produce the right output. The
first: boolflag pattern works but is slightly verbose. A minor readability improvement is possible usingintersperseor a separator-before-each approach:✨ Optional refactor (attrgetter, same pattern applies to itemgetter)
- let mut first = true; for part in &zelf.attrs { - if !first { - result.push_str(", "); - } - first = false; + if !std::ptr::eq(part, &zelf.attrs[0]) { + result.push_str(", "); + } result.push_wtf8(part.as_object().repr(vm)?.as_wtf8()); }Or, if the nightly
iter_intersperseis not available, simply handle the first element before the loop:- let mut first = true; - for part in &zelf.attrs { - if !first { - result.push_str(", "); - } - first = false; - result.push_wtf8(part.as_object().repr(vm)?.as_wtf8()); - } + let mut iter = zelf.attrs.iter(); + if let Some(first) = iter.next() { + result.push_wtf8(first.as_object().repr(vm)?.as_wtf8()); + } + for part in iter { + result.push_str(", "); + result.push_wtf8(part.as_object().repr(vm)?.as_wtf8()); + }Also applies to: 513-529, 601-622
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/operator.rs` around lines 435 - 451, The repr_wtf8 function in operator.rs uses a mutable first: bool flag to insert ", " between attributes; replace this flag-based loop with an iterator-based approach for readability: map each part to its repr wtf8 string (using part.as_object().repr(vm)?.as_wtf8()), then join or intersperse with ", " when building the Wtf8Buf in repr_wtf8 (and apply the same change to the analogous repr_wtf8 implementations for itemgetter and the other attr/item reprs referenced). Ensure you still respect the ReprGuard early-return branch and preserve error propagation from repr(vm)? while constructing the joined representation.crates/stdlib/src/ssl.rs (1)
3899-3923:set_server_hostnamesetter inconsistently keepsOption<PyStrRef>vsOption<PyUtf8StrRef>in constructor.The
_wrap_socketpath rejects surrogate hostnames at argument-parse time (viaWrapSocketArgs.server_hostname: Option<PyUtf8StrRef>), while this setter acceptsOption<PyStrRef>and validates lazily viatry_into_utf8. Both are safe (no panic), but the error type/timing differs:PyUtf8StrRefraises aUnicodeDecodeErrorat arg-parsing, whereastry_into_utf8here raises a different error at runtime. Consider aligning toOption<PyUtf8StrRef>for consistency.💡 Suggested alignment
- fn set_server_hostname( - &self, - value: Option<PyStrRef>, - vm: &VirtualMachine, - ) -> PyResult<()> { + fn set_server_hostname( + &self, + value: Option<PyUtf8StrRef>, + vm: &VirtualMachine, + ) -> PyResult<()> { if *self.handshake_done.lock() { return Err( vm.new_value_error("Cannot set server_hostname on socket after handshake") ); } - let hostname_string = value - .map(|s| { - let utf8 = s.try_into_utf8(vm)?; - validate_hostname(utf8.as_str(), vm)?; - Ok::<String, _>(utf8.as_str().to_owned()) - }) - .transpose()?; + let hostname_string = value + .map(|s| { + validate_hostname(s.as_str(), vm)?; + Ok::<String, _>(s.as_str().to_owned()) + }) + .transpose()?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl.rs` around lines 3899 - 3923, The setter set_server_hostname currently accepts Option<PyStrRef> and calls try_into_utf8, producing a different error/timing than the WrapSocketArgs.server_hostname path which uses Option<PyUtf8StrRef>; change the setter signature to accept Option<PyUtf8StrRef> (matching WrapSocketArgs.server_hostname), remove try_into_utf8 usage, and adjust the hostname_string construction to directly clone/into a String from the PyUtf8StrRef before calling validate_hostname and storing into self.server_hostname so both code paths raise the same UnicodeDecodeError at argument-parse time.crates/wtf8/src/concat.rs (1)
72-76: Dropunwrap()in concat formatting helpers to avoid panic paths.Lines [75] and [84] use
write!(...).unwrap(). This is avoidable panic surface; keep this path infallible by discarding the result explicitly.♻️ Proposed fix
impl<T: fmt::Display> Wtf8Concat for DisplayAsWtf8<T> { #[inline] fn fmt_wtf8(&self, buf: &mut Wtf8Buf) { - write!(buf, "{}", self.0).unwrap(); + let _ = write!(buf, "{}", self.0); } } macro_rules! impl_wtf8_concat_for_int { ($($t:ty),*) => { $(impl Wtf8Concat for $t { #[inline] fn fmt_wtf8(&self, buf: &mut Wtf8Buf) { - write!(buf, "{}", self).unwrap(); + let _ = write!(buf, "{}", self); } })* }; }As per coding guidelines, "Follow Rust best practices for error handling and memory management".
Also applies to: 81-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/wtf8/src/concat.rs` around lines 72 - 76, The concat formatting helpers currently call write!(...).unwrap() inside the Wtf8Concat impls (notably impl<T: fmt::Display> Wtf8Concat for DisplayAsWtf8 -> fmt_wtf8 and the analogous impl around lines 81-85), which can panic; change these to discard the fmt::Result instead of unwrapping (e.g., assign or ignore the return value) so the formatting path is infallible. Locate the write!(buf, "{}", self.0).unwrap() calls inside fmt_wtf8 (and the corresponding fmt_wtf8 in the other Display/Debug wrapper) and replace unwrap() with an explicit ignore of the Result (e.g., let _ = write!(...)) to remove the panic path.crates/vm/src/builtins/str.rs (1)
93-94: Useexpect_str()in allAsRef<str>impls to remove duplication.You already introduced
PyStr::expect_str()on Line 482. Reusing it in theseAsRef<str>impls keeps panic behavior consistent and centralizes future changes.♻️ Proposed refactor
impl AsRef<str> for PyStr { #[track_caller] // <- can remove this once it doesn't panic fn as_ref(&self) -> &str { - self.to_str().expect("str has surrogates") + self.expect_str() } } @@ impl AsRef<str> for Py<PyStr> { #[track_caller] // <- can remove this once it doesn't panic fn as_ref(&self) -> &str { - self.to_str().expect("str has surrogates") + self.expect_str() } } @@ impl AsRef<str> for PyStrRef { #[track_caller] // <- can remove this once it doesn't panic fn as_ref(&self) -> &str { - self.to_str().expect("str has surrogates") + self.expect_str() } } @@ impl AsRef<str> for PyRefExact<PyStr> { #[track_caller] fn as_ref(&self) -> &str { - self.to_str().expect("str has surrogates") + self.expect_str() } } @@ impl AsRef<str> for PyExact<PyStr> { #[track_caller] fn as_ref(&self) -> &str { - self.to_str().expect("str has surrogates") + self.expect_str() } }Also applies to: 100-101, 107-108, 1893-1903, 482-487
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/str.rs` around lines 93 - 94, Replace the direct calls to self.to_str().expect("str has surrogates") inside the AsRef<str> impls with the centralized PyStr::expect_str() helper to remove duplication and keep panic behavior consistent; locate the AsRef<str> impl blocks (where to_str() is currently used) and call self.expect_str() instead, ensuring you use the existing PyStr::expect_str() method rather than duplicating the expect message.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (120)
crates/common/src/str.rscrates/stdlib/src/_asyncio.rscrates/stdlib/src/_sqlite3.rscrates/stdlib/src/array.rscrates/stdlib/src/binascii.rscrates/stdlib/src/contextvars.rscrates/stdlib/src/csv.rscrates/stdlib/src/grp.rscrates/stdlib/src/hashlib.rscrates/stdlib/src/json.rscrates/stdlib/src/locale.rscrates/stdlib/src/lzma.rscrates/stdlib/src/multiprocessing.rscrates/stdlib/src/overlapped.rscrates/stdlib/src/posixshmem.rscrates/stdlib/src/pyexpat.rscrates/stdlib/src/pystruct.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/stdlib/src/unicodedata.rscrates/stdlib/src/zlib.rscrates/vm/src/buffer.rscrates/vm/src/builtins/bool.rscrates/vm/src/builtins/classmethod.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/complex.rscrates/vm/src/builtins/dict.rscrates/vm/src/builtins/float.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/function/jit.rscrates/vm/src/builtins/genericalias.rscrates/vm/src/builtins/int.rscrates/vm/src/builtins/interpolation.rscrates/vm/src/builtins/list.rscrates/vm/src/builtins/mappingproxy.rscrates/vm/src/builtins/memory.rscrates/vm/src/builtins/mod.rscrates/vm/src/builtins/module.rscrates/vm/src/builtins/namespace.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/range.rscrates/vm/src/builtins/set.rscrates/vm/src/builtins/slice.rscrates/vm/src/builtins/staticmethod.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/template.rscrates/vm/src/builtins/tuple.rscrates/vm/src/builtins/type.rscrates/vm/src/builtins/weakproxy.rscrates/vm/src/bytes_inner.rscrates/vm/src/codecs.rscrates/vm/src/coroutine.rscrates/vm/src/dict_inner.rscrates/vm/src/exception_group.rscrates/vm/src/exceptions.rscrates/vm/src/frame.rscrates/vm/src/function/buffer.rscrates/vm/src/function/fspath.rscrates/vm/src/function/mod.rscrates/vm/src/import.rscrates/vm/src/intern.rscrates/vm/src/macros.rscrates/vm/src/protocol/object.rscrates/vm/src/stdlib/_abc.rscrates/vm/src/stdlib/_winapi.rscrates/vm/src/stdlib/_wmi.rscrates/vm/src/stdlib/ast.rscrates/vm/src/stdlib/ast/basic.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/stdlib/ast/repr.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/stdlib/codecs.rscrates/vm/src/stdlib/collections.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/base.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/functools.rscrates/vm/src/stdlib/imp.rscrates/vm/src/stdlib/io.rscrates/vm/src/stdlib/itertools.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/stdlib/msvcrt.rscrates/vm/src/stdlib/nt.rscrates/vm/src/stdlib/operator.rscrates/vm/src/stdlib/os.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/signal.rscrates/vm/src/stdlib/sre.rscrates/vm/src/stdlib/symtable.rscrates/vm/src/stdlib/sys.rscrates/vm/src/stdlib/thread.rscrates/vm/src/stdlib/time.rscrates/vm/src/stdlib/typevar.rscrates/vm/src/stdlib/typing.rscrates/vm/src/stdlib/winreg.rscrates/vm/src/stdlib/winsound.rscrates/vm/src/suggestion.rscrates/vm/src/types/slot.rscrates/vm/src/types/structseq.rscrates/vm/src/utils.rscrates/vm/src/vm/context.rscrates/vm/src/vm/interpreter.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/vm_new.rscrates/vm/src/warn.rscrates/wasm/src/browser_module.rscrates/wasm/src/convert.rscrates/wasm/src/js_module.rscrates/wasm/src/wasm_builtins.rscrates/wtf8/src/concat.rscrates/wtf8/src/lib.rsexample_projects/wasm32_without_js/rustpython-without-js/src/lib.rssrc/lib.rssrc/shell.rssrc/shell/helper.rs
🚧 Files skipped from review as they are similar to previous changes (40)
- example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
- crates/vm/src/coroutine.rs
- crates/vm/src/buffer.rs
- crates/vm/src/builtins/genericalias.rs
- crates/stdlib/src/lzma.rs
- crates/vm/src/builtins/function/jit.rs
- crates/vm/src/vm/interpreter.rs
- crates/stdlib/src/multiprocessing.rs
- crates/vm/src/stdlib/ctypes/pointer.rs
- crates/wasm/src/browser_module.rs
- crates/vm/src/builtins/mappingproxy.rs
- crates/vm/src/builtins/float.rs
- crates/vm/src/stdlib/sys.rs
- crates/vm/src/builtins/tuple.rs
- crates/vm/src/stdlib/ctypes.rs
- crates/vm/src/intern.rs
- crates/vm/src/stdlib/ctypes/function.rs
- crates/stdlib/src/unicodedata.rs
- crates/vm/src/dict_inner.rs
- crates/vm/src/builtins/classmethod.rs
- crates/vm/src/builtins/range.rs
- crates/vm/src/builtins/code.rs
- crates/vm/src/builtins/mod.rs
- crates/vm/src/stdlib/itertools.rs
- src/lib.rs
- crates/vm/src/vm/context.rs
- crates/wasm/src/js_module.rs
- crates/vm/src/utils.rs
- crates/vm/src/function/fspath.rs
- crates/vm/src/stdlib/ast.rs
- crates/vm/src/stdlib/signal.rs
- crates/vm/src/stdlib/symtable.rs
- crates/vm/src/builtins/staticmethod.rs
- crates/vm/src/stdlib/_wmi.rs
- src/shell/helper.rs
- crates/vm/src/stdlib/typevar.rs
- crates/wasm/src/wasm_builtins.rs
- crates/stdlib/src/posixshmem.rs
- crates/stdlib/src/array.rs
- crates/vm/src/builtins/bool.rs
| let kwarg_names = kwarg_names.as_slice().iter().map(|pyobj| { | ||
| pyobj | ||
| .downcast_ref::<PyUtf8Str>() | ||
| .unwrap() | ||
| .as_str() | ||
| .to_owned() | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Finding CALL_KW and related decoding code ==="
rg -n --type rust 'CALL_KW|CallKw|with_kwargs_names|kwarg_names' crates/vm/src --max-count=30
echo ""
echo "=== Finding PyCode/CodeType validation and construction ==="
rg -n --type rust 'impl.*PyCode|co_kwarg_names|kwarg.*name|new_code' crates/vm/src --max-count=30Repository: RustPython/RustPython
Length of output: 5951
🏁 Script executed:
#!/bin/bash
echo "=== Looking at the specific file and context around lines 2637-2643 ==="
wc -l crates/vm/src/frame.rsRepository: RustPython/RustPython
Length of output: 164
🏁 Script executed:
#!/bin/bash
echo "=== Reading frame.rs around lines 2631-2644 ==="
sed -n '2625,2650p' crates/vm/src/frame.rs | cat -n
echo ""
echo "=== Reading the CALL_KW instruction handler context ==="
sed -n '935,970p' crates/vm/src/frame.rs | cat -nRepository: RustPython/RustPython
Length of output: 3193
🏁 Script executed:
#!/bin/bash
echo "=== Checking CodeObject structure and construction ==="
sed -n '1,100p' crates/vm/src/builtins/code.rs | cat -n
echo ""
echo "=== Looking for CodeObject or co_kwarg_names definition ==="
rg -n 'struct CodeObject|co_kwarg_names|struct PyCode' crates/vm/src/builtins/code.rs | head -20Repository: RustPython/RustPython
Length of output: 3785
🏁 Script executed:
#!/bin/bash
echo "=== Looking for bytecode compilation/validation of CALL_KW metadata ==="
rg -n 'CALL_KW|CallKw' crates --type rust | grep -i 'compile\|codegen\|bytecode'
echo ""
echo "=== Searching for validation of kwarg names in code construction ==="
rg -n 'validate|check|assert' crates/vm/src/builtins/code.rs | head -20Repository: RustPython/RustPython
Length of output: 1912
🏁 Script executed:
#!/bin/bash
echo "=== Looking at CALL_KW bytecode generation in compile.rs ==="
sed -n '7420,7450p' crates/codegen/src/compile.rs | cat -n
echo ""
echo "=== Finding where kwarg_names tuple is created from keyword arguments ==="
rg -n 'kwarg_names|co_varnames|keyword.*names' crates/codegen/src/compile.rs -B 2 -A 2 | head -60Repository: RustPython/RustPython
Length of output: 3198
🏁 Script executed:
#!/bin/bash
echo "=== Checking how constants/code objects are validated during creation ==="
rg -n 'validate|sanitize|check' crates/vm/src/builtins/code.rs -B 1 -A 3
echo ""
echo "=== Looking for where the kwarg_names tuple is actually constructed/loaded ==="
rg -n 'consts\[|constants\[' crates/vm/src/frame.rsRepository: RustPython/RustPython
Length of output: 1470
🏁 Script executed:
#!/bin/bash
echo "=== Examining the bytecode module to understand how constants are handled ==="
find crates -name 'bytecode.rs' -o -name 'codegen.rs' | head -5
echo ""
echo "=== Looking at what happens with keyword arg names in codegen ==="
rg -n 'self.extend_args|keyword_names|list_comprehension' crates/codegen/src/compile.rs | head -30Repository: RustPython/RustPython
Length of output: 246
🏁 Script executed:
#!/bin/bash
echo "=== Finding the code() constructor implementation ==="
sed -n '429,550p' crates/vm/src/builtins/code.rs | cat -n
echo ""
echo "=== Checking Constructor trait for PyCode ==="
rg -n 'impl Constructor for PyCode' crates/vm/src/builtins/code.rs -A 100 | head -120Repository: RustPython/RustPython
Length of output: 9386
🏁 Script executed:
#!/bin/bash
echo "=== Searching for co_consts validation in code constructor ==="
rg -n 'co_consts|consts.*validate|constants.*check' crates/vm/src/builtins/code.rs -B 2 -A 2
echo ""
echo "=== Checking if code objects can be created with arbitrary consts ==="
rg -n 'PyCodeNewArgs' crates/vm/src/builtins/code.rs -A 30 | head -50Repository: RustPython/RustPython
Length of output: 2761
🏁 Script executed:
#!/bin/bash
echo "=== Checking if there are any tests for malformed code objects ==="
rg -n 'PyCodeNewArgs|test.*code|invalid.*consts' crates --type rust | head -20
echo ""
echo "=== Verifying the exact error handling pattern used elsewhere for tuple elements ==="
rg -n 'downcast.*unwrap\|downcast_ref.*unwrap' crates/vm/src/frame.rs | head -10Repository: RustPython/RustPython
Length of output: 1003
Replace .unwrap() with fallible error handling to avoid VM panic on malformed CALL_KW metadata.
Line 2639 uses downcast_ref::<PyUtf8Str>().unwrap(), which panics if the kwarg_names tuple contains non-string values. Although the compiler ensures kwarg_names are strings at compile time, the code() constructor at lines 517-527 in crates/vm/src/builtins/code.rs accepts arbitrary consts without validation—unlike names, varnames, cellvars, and freevars which are validated as strings. A crafted code object could therefore cause a panic instead of raising a Python exception.
Either validate consts during code object construction (following the pattern of other tuple fields) or replace the unwrap with a fallible path that returns a proper Python exception. This aligns with Rust error handling best practices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/frame.rs` around lines 2637 - 2643, The code in frame.rs maps
kwarg_names using downcast_ref::<PyUtf8Str>().unwrap(), which will panic on
non-string entries; replace that unwrap with fallible handling that raises a
Python exception instead of unwinding the VM. Locate the kwarg_names iterator in
frame.rs and change the downcast path to check downcast_ref and, on None, return
a PyErr (or convert the mapping to a Result and propagate an Err) with a clear
message about malformed CALL_KW metadata; alternatively validate consts at
construction in crates/vm/src/builtins/code.rs to prevent non-string entries
(mirror the validation used for names/varnames/cellvars/freevars) so
downcast_ref::<PyUtf8Str>() cannot fail. Ensure the chosen approach uses the
VM's error-return convention (no panics) and references kwarg_names and
downcast_ref::<PyUtf8Str>() for locating the fix.
| } | ||
|
|
||
| #[pygetset] | ||
| fn __qualname__(&self, vm: &VirtualMachine) -> PyResult { |
crates/vm/src/builtins/str.rs
Outdated
| self.data.as_str() | ||
| } | ||
|
|
||
| /// Returns `&str`, panicking if the string contains surrogates. |
There was a problem hiding this comment.
| /// Returns `&str`, panicking if the string contains surrogates. | |
| /// Returns [`&str`]. | |
| /// | |
| /// # Panic | |
| /// If the string contains surrogates. |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_tstring.py (TODO: 4) dependencies: dependent tests: (no tests depend on tstring) Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/vm/src/stdlib/codecs.rs (1)
93-104:⚠️ Potential issue | 🟡 MinorAdd embedded-NUL validation in
register_errorfor API consistency.
lookup_errorand_unregister_errorreject names containing'\0', butregister_errorcurrently accepts them. That can register names that cannot be looked up/unregistered via this API surface.Suggested patch
fn register_error( name: PyUtf8StrRef, handler: PyObjectRef, vm: &VirtualMachine, ) -> PyResult<()> { + if name.as_str().contains('\0') { + return Err(cstring_error(vm)); + } if !handler.is_callable() { return Err(vm.new_type_error("handler must be callable".to_owned())); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/codecs.rs` around lines 93 - 104, register_error currently allows names with embedded NULs, which breaks consistency with lookup_error and _unregister_error; add the same validation in register_error by checking name.as_str().contains('\0') and returning a VM ValueError (e.g. via vm.new_value_error with the same message used by lookup_error/_unregister_error) before calling register_error on codec_registry, and keep the existing callable check and rest of the function unchanged.crates/stdlib/src/ssl.rs (2)
1846-1868:⚠️ Potential issue | 🟡 MinorUse
validate_hostname()in_wrap_socketto avoid validation drift.
_wrap_socketmanually reimplements hostname checks, while_wrap_biocallsvalidate_hostname(). The current_wrap_socketpath misses the max-length check and can diverge further over time.♻️ Suggested simplification
- Some(hostname_str) => { - let hostname = hostname_str.as_str(); - - // Validate hostname - if hostname.is_empty() { - return Err(vm.new_value_error("server_hostname cannot be an empty string")); - } - - // Check if it starts with a dot - if hostname.starts_with('.') { - return Err(vm.new_value_error("server_hostname cannot start with a dot")); - } - - // IP addresses are allowed - // SNI will not be sent for IP addresses - - // Check for NULL bytes - if hostname.contains('\0') { - return Err(vm.new_type_error("embedded null character")); - } - - Some(hostname.to_string()) - } + Some(hostname_str) => { + let hostname = hostname_str.as_str(); + validate_hostname(hostname, vm)?; + Some(hostname.to_string()) + }As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl.rs` around lines 1846 - 1868, _wrap_socket currently reimplements hostname validation on args.server_hostname (checking empty, leading dot, embedded NUL) while _wrap_bio uses validate_hostname(), causing possible drift and missing the max-length check; update _wrap_socket to call validate_hostname(vm, args.server_hostname.into_option().flatten()) (or call validate_hostname with the extracted Option<String>) and use its returned validated hostname (or propagate its error) instead of the inline checks, removing the duplicated validation logic so both _wrap_socket and _wrap_bio share the same validation path.
1089-1103:⚠️ Potential issue | 🟡 MinorPreserve Unicode conversion errors instead of falling through to generic type errors.
These blocks use
if let Ok(PyUtf8StrRef::try_from_object(...))and then fall back to bytes/type checks. For Pythonstrvalues that fail UTF-8 conversion, this can mask the real conversion error and return an incorrect “must be str or bytes”/genericTypeError.💡 Suggested fix pattern
- if let Ok(pwd_str) = PyUtf8StrRef::try_from_object(vm, p.clone()) { - Ok((Some(pwd_str.as_str().to_owned()), None)) + if let Ok(pwd_str) = PyStrRef::try_from_object(vm, p.clone()) { + let pwd_str = pwd_str.try_into_utf8(vm)?; + Ok((Some(pwd_str.as_str().to_owned()), None)) } else if let Ok(pwd_bytes_like) = ArgBytesLike::try_from_object(vm, p.clone()) { ... }Apply the same pattern in
load_dh_params,set_ecdh_curve, and password-callback handling so str-conversion errors propagate accurately.Also applies to: 1750-1757, 1805-1812, 2022-2040
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl.rs` around lines 1089 - 1103, The code handling password_from_callable (and similar blocks in load_dh_params, set_ecdh_curve, and other password-callback handling sites) currently uses PyUtf8StrRef::try_from_object and falls through to bytes/type checks, which swallows UTF-8 conversion errors; change the control flow so that after attempting PyUtf8StrRef::try_from_object you explicitly propagate any UTF-8 conversion error (return Err with the specific conversion error) instead of treating it as a generic non-str value—i.e., distinguish between “object is str but conversion failed” versus “object is not str” and only fall back to ArgBytesLike::try_from_object when the object is not a str; ensure the same pattern is applied to the other occurrences (lines referenced around 1750-1757, 1805-1812, 2022-2040) so conversion failures are returned unchanged rather than masked by a TypeError about expected str/bytes.
♻️ Duplicate comments (3)
crates/vm/src/vm/mod.rs (1)
638-659: Separator aftermsgnow correctly added — LGTM.The
"{msg}: "format string properly separates the message from the subsequent repr output, addressing the prior concern about concatenated output.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 638 - 659, The msg string formatting has been fixed by adding a separator; ensure the branch that constructs msg_str uses format!("{msg}: ") (the variable is msg_str) before calling write_to_stderr, and that the subsequent repr is obtained via object.repr(self) and written using write_to_stderr with repr_wtf8, so no further change is required other than keeping the current code path intact (msg_str, write_to_stderr, object.repr, repr_wtf8).crates/vm/src/builtins/str.rs (1)
1392-1394: Non-KeyErrorexceptions now propagate correctly.The
KeyErrorguard is in place and the prior review concern is fully resolved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/str.rs` around lines 1392 - 1394, The match handling in crates/vm/src/builtins/str.rs correctly guards KeyError with Err(e) if e.fast_isinstance(vm.ctx.exceptions.key_error) => translated.push(cp) and ensures all other Err(e) cases propagate via Err(e) => return Err(e); no code changes are required—leave the existing arms in the match (the KeyError guard and the generic Err propagation) as-is.crates/stdlib/src/hashlib.rs (1)
221-229: Surrogate-string path now correctly falls through toUnsupportedDigestmodError.The refactor from
PyUtf8StrRef::try_from_object(vm, digestmod.clone())?todowncast_ref::<PyStr>() && name.to_str()is the right fix:to_str()returnsNonefor surrogate-containing strings, causing them to fall through all branches and hitUnsupportedDigestmodErroras intended. The__name__attribute path (lines 226–231) is consistently updated with the same pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/hashlib.rs` around lines 221 - 229, The surrogate-containing digestmod handling must consistently use downcast_ref::<PyStr>() plus name.to_str() so surrogate strings return None and fall through to UnsupportedDigestmodError; update the __name__-attribute branch (the code that calls digestmod.get_attr("__name__", vm) and inspects name_obj) to downcast to PyStr and call to_str() and then strip_prefix("openssl_") exactly like the primary digestmod branch (referencing digestmod, PyStr, to_str, get_attr("__name__", vm), strip_prefix and UnsupportedDigestmodError) so both paths behave identically for surrogate-containing names.
🧹 Nitpick comments (5)
crates/vm/src/stdlib/ctypes/simple.rs (1)
121-121:code_points().count() == 1scans the whole string; prefer the O(1) iterator pattern already used invalue_to_bytes_endian.Lines 121 and 334 use
s.as_wtf8().code_points().count() == 1, which traverses every code point. Line 780 already uses the efficient destructuring check. For consistency and to avoid surprising O(n) behaviour on adversarially long inputs:♻️ Proposed refactor (lines 120-125 and 333-338)
- if let Some(s) = value.downcast_ref::<PyStr>() { - if s.as_wtf8().code_points().count() == 1 { - Ok(value.to_owned()) - } else { - Err(vm.new_type_error("one character unicode string expected")) - } - } else { + if let Some(s) = value.downcast_ref::<PyStr>() { + let mut cps = s.as_wtf8().code_points(); + if cps.next().is_some() && cps.next().is_none() { + Ok(value.to_owned()) + } else { + Err(vm.new_type_error("one character unicode string expected")) + } + } else {Same fix applies to line 334:
- Some("u") => { - if let Some(s) = value.downcast_ref::<PyStr>() - && s.as_wtf8().code_points().count() == 1 - { + Some("u") => { + if let Some(s) = value.downcast_ref::<PyStr>() { + let mut cps = s.as_wtf8().code_points(); + if cps.next().is_some() && cps.next().is_none() {Also applies to: 334-334
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/simple.rs` at line 121, Replace the O(n) check s.as_wtf8().code_points().count() == 1 with the O(1) iterator-destructuring pattern used in value_to_bytes_endian (and at the code around line 780): create a mutable iterator let mut it = s.as_wtf8().code_points(); then check if it.next().is_some() and it.next().is_none() (or use if let (Some(cp), None) = (it.next(), it.next()) to capture the single code point), and use that single code point branch instead of counting all code points; apply this change for both occurrences of s.as_wtf8().code_points().count() == 1.crates/vm/src/vm/mod.rs (2)
1149-1149: Redundantuse crate::protocol::TraceEventinsideresume_gen_frame.
TraceEventis already imported at module level (line 37). The localuseon line 1149 is unnecessary.♻️ Remove the redundant import
- use crate::protocol::TraceEvent; match self.trace_event(TraceEvent::Call, None) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` at line 1149, The local declaration use crate::protocol::TraceEvent inside the function resume_gen_frame is redundant because TraceEvent is already imported at module scope; remove that local use statement from resume_gen_frame and ensure no other local shadowing occurs so the module-level TraceEvent is used consistently.
872-889: Inconsistent key usage between Pass 1 and Pass 2 ofmodule_clear_dict.In Pass 1 (line 875),
set_itemis called withkey_str(the&Py<PyStr>reference), while in Pass 2 (line 888), it's called withkey_str.as_wtf8(). Similarly, the__builtins__guard usesname != "__builtins__"(Wtf8 comparison) in Pass 1 butkey_str.as_bytes() != b"__builtins__"(byte comparison) in Pass 2. Both work for ASCII, but the inconsistency reduces clarity.♻️ Suggestion: unify the two passes
// Pass 1: names starting with '_' (except __builtins__) for (key, value) in dict.into_iter().collect::<Vec<_>>() { if vm.is_none(&value) { continue; } if let Some(key_str) = key.downcast_ref::<PyStr>() { let name = key_str.as_wtf8(); if name.starts_with("_") && name != "__builtins__" { let _ = dict.set_item(key_str, none.clone(), vm); } } } // Pass 2: all remaining (except __builtins__) for (key, value) in dict.into_iter().collect::<Vec<_>>() { if vm.is_none(&value) { continue; } if let Some(key_str) = key.downcast_ref::<PyStr>() - && key_str.as_bytes() != b"__builtins__" + && key_str.as_wtf8() != "__builtins__" { - let _ = dict.set_item(key_str.as_wtf8(), none.clone(), vm); + let _ = dict.set_item(key_str, none.clone(), vm); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 872 - 889, module_clear_dict uses inconsistent key types between the two passes: Pass 1 uses the PyStr reference (key_str) and compares via Wtf8, while Pass 2 uses byte slices and key_str.as_wtf8() for set_item; unify them by consistently converting the PyStr to a single representation (e.g., let name = key_str.as_wtf8()) in both passes, use name for the "__builtins__" comparison (name != "__builtins__"), and pass name to dict.set_item in both places so the key handling and comparisons are identical across Pass 1 and Pass 2.crates/vm/src/builtins/str.rs (2)
2123-2129: Simplify to matchPyUtf8Str::as_str()style.The
unwrap_or_else+debug_assert!+unreachable_unchecked()triple is more complex than the equivalent pattern already used inPyUtf8Str::as_str()at line 2088. Prefer consistency:♻️ Proposed simplification
- pub fn as_str(&self) -> &str { - self.as_pystr().to_str().unwrap_or_else(|| { - debug_assert!(false, "PyUtf8Str invariant violated"); - // Safety: PyUtf8Str guarantees valid UTF-8 - unsafe { core::hint::unreachable_unchecked() } - }) - } + pub fn as_str(&self) -> &str { + debug_assert!( + self.as_pystr().is_utf8(), + "PyUtf8Str invariant violated: inner string is not valid UTF-8" + ); + // Safety: PyUtf8Str guarantees valid UTF-8 + unsafe { self.as_pystr().to_str().unwrap_unchecked() } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/str.rs` around lines 2123 - 2129, The as_str implementation uses an unwrap_or_else + debug_assert! + unreachable_unchecked() pattern; change it to the simpler, consistent style used by PyUtf8Str::as_str(): call self.as_pystr().to_str() and unwrap/expect directly with the same invariant message (or the exact handling used in PyUtf8Str::as_str()) so the code in as_str matches PyUtf8Str::as_str() behavior and removes the unnecessary closure and unsafe unreachable_unchecked usage.
2132-2143: Misleading method name:into_wtf8returnsPyStrRef, not aWtf8type.
into_wtf8implies the result isWtf8Bufor&Wtf8. The actual return type isPyStrRef(PyRef<PyStr>). Rename tointo_pystrfor consistency with thePy<PyUtf8Str>::as_pystr()method established at line 2114. This requires updates across 13 call sites in 7 files (str.rs, imp.rs, io.rs, codecs.rs, socket.rs, json.rs, type.rs).♻️ Proposed rename
impl PyRef<PyUtf8Str> { - /// Convert to PyStrRef. Safe because PyUtf8Str is a subtype of PyStr. - pub fn into_wtf8(self) -> PyStrRef { + /// Convert to PyStrRef. Safe because PyUtf8Str is a subtype of PyStr. + pub fn into_pystr(self) -> PyStrRef { unsafe { mem::transmute::<Self, PyStrRef>(self) } } } impl From<PyRef<PyUtf8Str>> for PyRef<PyStr> { fn from(s: PyRef<PyUtf8Str>) -> Self { - s.into_wtf8() + s.into_pystr() } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/str.rs` around lines 2132 - 2143, The method name into_wtf8 on impl PyRef<PyUtf8Str> is misleading because it returns PyStrRef (PyRef<PyStr>); rename the method to into_pystr to match the existing Py<PyUtf8Str>::as_pystr() naming and update its definition (impl PyRef<PyUtf8Str>::into_wtf8 -> into_pystr) and every call site that uses into_wtf8 across the codebase (str.rs, imp.rs, io.rs, codecs.rs, socket.rs, json.rs, type.rs) so they call into_pystr instead; ensure the signature still returns PyStrRef and keep the unsafe mem::transmute implementation unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/stdlib/src/hashlib.rs`:
- Line 82: The argument parsing currently uses PyUtf8StrRef in FromArgs (e.g.,
the parameter with name: PyUtf8StrRef) which calls ensure_valid_utf8 and raises
UnicodeEncodeError for surrogate-containing names; to make behavior consistent
with resolve_digestmod which uses downcast_ref::<PyStr>() and .to_str() and maps
surrogates to UnsupportedDigestmodError/new_value_error, switch the relevant
FromArgs parameter type from PyUtf8StrRef to PyStrRef and defer UTF-8 validation
by calling .to_str() inside the function body where you can map Err to
vm.new_value_error(...) (matching resolve_digestmod and CPython behavior),
ensuring both argument parsing and resolve_digestmod follow the same error path.
In `@crates/vm/src/builtins/namespace.rs`:
- Around line 159-161: The current repr builder in SimpleNamespace drops keys
whose key_str.as_wtf8().is_empty() — remove that early-continue so empty-string
keys are treated like any other key and rendered (producing e.g. namespace(=1)
for a key ""), i.e., delete the empty-key skip in the block around
key_str.as_wtf8() and ensure the code that appends key and value still runs for
empty keys so the empty-name + '=' + value appears in the output.
In `@crates/vm/src/builtins/type.rs`:
- Around line 1564-1567: The slot-name mangling path currently calls
member.to_str().expect(...), which can panic; change it to handle the UTF-8
conversion fallibly and return a Python error instead. Replace
member.to_str().expect(...) with a fallible conversion (e.g.,
member.to_str().map_err(|_| vm.new_type_error("slot name is not valid UTF-8"))?)
and propagate the error (using ?), then pass the obtained &str into mangle_name
(keeps class_name and mangle_name unchanged); ensure the surrounding function
signature returns a PyResult or the appropriate error type so the propagated
error is returned to Python instead of aborting.
- Line 1883: The delete-miss error path still calls attr_name.as_str() which can
panic for surrogate-containing (non-UTF8) names; instead build the error message
using a WTF-8-safe representation (e.g., use the existing attr_name.as_wtf8() /
the interned string from vm.ctx.intern_str(...) or otherwise escape surrogate
codepoints) so formatting never calls as_str(); update the code paths in the
delete-miss handling that currently reference attr_name.as_str() to use the
WTF-8-safe conversion/escaped form (referencing attr_name, vm.ctx.intern_str,
attr_name.as_wtf8(), and the delete-miss error construction) to eliminate the
panic.
In `@crates/vm/src/vm/mod.rs`:
- Around line 810-812: Replace the failing UTF-8-only downcast with a WTF-8-safe
conversion: where you currently call key.downcast_ref::<PyUtf8Str>().map(|s|
s.as_str().to_owned()).unwrap_or_default(), change it to use PyStr
(key.downcast_ref::<PyStr>()) and convert with .as_wtf8().to_string() (or
.to_string_lossy()) so non-UTF-8 names are preserved; update the expression that
assigns name accordingly (keep the unwrap_or_default fallback).
---
Outside diff comments:
In `@crates/stdlib/src/ssl.rs`:
- Around line 1846-1868: _wrap_socket currently reimplements hostname validation
on args.server_hostname (checking empty, leading dot, embedded NUL) while
_wrap_bio uses validate_hostname(), causing possible drift and missing the
max-length check; update _wrap_socket to call validate_hostname(vm,
args.server_hostname.into_option().flatten()) (or call validate_hostname with
the extracted Option<String>) and use its returned validated hostname (or
propagate its error) instead of the inline checks, removing the duplicated
validation logic so both _wrap_socket and _wrap_bio share the same validation
path.
- Around line 1089-1103: The code handling password_from_callable (and similar
blocks in load_dh_params, set_ecdh_curve, and other password-callback handling
sites) currently uses PyUtf8StrRef::try_from_object and falls through to
bytes/type checks, which swallows UTF-8 conversion errors; change the control
flow so that after attempting PyUtf8StrRef::try_from_object you explicitly
propagate any UTF-8 conversion error (return Err with the specific conversion
error) instead of treating it as a generic non-str value—i.e., distinguish
between “object is str but conversion failed” versus “object is not str” and
only fall back to ArgBytesLike::try_from_object when the object is not a str;
ensure the same pattern is applied to the other occurrences (lines referenced
around 1750-1757, 1805-1812, 2022-2040) so conversion failures are returned
unchanged rather than masked by a TypeError about expected str/bytes.
In `@crates/vm/src/stdlib/codecs.rs`:
- Around line 93-104: register_error currently allows names with embedded NULs,
which breaks consistency with lookup_error and _unregister_error; add the same
validation in register_error by checking name.as_str().contains('\0') and
returning a VM ValueError (e.g. via vm.new_value_error with the same message
used by lookup_error/_unregister_error) before calling register_error on
codec_registry, and keep the existing callable check and rest of the function
unchanged.
---
Duplicate comments:
In `@crates/stdlib/src/hashlib.rs`:
- Around line 221-229: The surrogate-containing digestmod handling must
consistently use downcast_ref::<PyStr>() plus name.to_str() so surrogate strings
return None and fall through to UnsupportedDigestmodError; update the
__name__-attribute branch (the code that calls digestmod.get_attr("__name__",
vm) and inspects name_obj) to downcast to PyStr and call to_str() and then
strip_prefix("openssl_") exactly like the primary digestmod branch (referencing
digestmod, PyStr, to_str, get_attr("__name__", vm), strip_prefix and
UnsupportedDigestmodError) so both paths behave identically for
surrogate-containing names.
In `@crates/vm/src/builtins/str.rs`:
- Around line 1392-1394: The match handling in crates/vm/src/builtins/str.rs
correctly guards KeyError with Err(e) if
e.fast_isinstance(vm.ctx.exceptions.key_error) => translated.push(cp) and
ensures all other Err(e) cases propagate via Err(e) => return Err(e); no code
changes are required—leave the existing arms in the match (the KeyError guard
and the generic Err propagation) as-is.
In `@crates/vm/src/vm/mod.rs`:
- Around line 638-659: The msg string formatting has been fixed by adding a
separator; ensure the branch that constructs msg_str uses format!("{msg}: ")
(the variable is msg_str) before calling write_to_stderr, and that the
subsequent repr is obtained via object.repr(self) and written using
write_to_stderr with repr_wtf8, so no further change is required other than
keeping the current code path intact (msg_str, write_to_stderr, object.repr,
repr_wtf8).
---
Nitpick comments:
In `@crates/vm/src/builtins/str.rs`:
- Around line 2123-2129: The as_str implementation uses an unwrap_or_else +
debug_assert! + unreachable_unchecked() pattern; change it to the simpler,
consistent style used by PyUtf8Str::as_str(): call self.as_pystr().to_str() and
unwrap/expect directly with the same invariant message (or the exact handling
used in PyUtf8Str::as_str()) so the code in as_str matches PyUtf8Str::as_str()
behavior and removes the unnecessary closure and unsafe unreachable_unchecked
usage.
- Around line 2132-2143: The method name into_wtf8 on impl PyRef<PyUtf8Str> is
misleading because it returns PyStrRef (PyRef<PyStr>); rename the method to
into_pystr to match the existing Py<PyUtf8Str>::as_pystr() naming and update its
definition (impl PyRef<PyUtf8Str>::into_wtf8 -> into_pystr) and every call site
that uses into_wtf8 across the codebase (str.rs, imp.rs, io.rs, codecs.rs,
socket.rs, json.rs, type.rs) so they call into_pystr instead; ensure the
signature still returns PyStrRef and keep the unsafe mem::transmute
implementation unchanged.
In `@crates/vm/src/stdlib/ctypes/simple.rs`:
- Line 121: Replace the O(n) check s.as_wtf8().code_points().count() == 1 with
the O(1) iterator-destructuring pattern used in value_to_bytes_endian (and at
the code around line 780): create a mutable iterator let mut it =
s.as_wtf8().code_points(); then check if it.next().is_some() and
it.next().is_none() (or use if let (Some(cp), None) = (it.next(), it.next()) to
capture the single code point), and use that single code point branch instead of
counting all code points; apply this change for both occurrences of
s.as_wtf8().code_points().count() == 1.
In `@crates/vm/src/vm/mod.rs`:
- Line 1149: The local declaration use crate::protocol::TraceEvent inside the
function resume_gen_frame is redundant because TraceEvent is already imported at
module scope; remove that local use statement from resume_gen_frame and ensure
no other local shadowing occurs so the module-level TraceEvent is used
consistently.
- Around line 872-889: module_clear_dict uses inconsistent key types between the
two passes: Pass 1 uses the PyStr reference (key_str) and compares via Wtf8,
while Pass 2 uses byte slices and key_str.as_wtf8() for set_item; unify them by
consistently converting the PyStr to a single representation (e.g., let name =
key_str.as_wtf8()) in both passes, use name for the "__builtins__" comparison
(name != "__builtins__"), and pass name to dict.set_item in both places so the
key handling and comparisons are identical across Pass 1 and Pass 2.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_tstring.pyis excluded by!Lib/**
📒 Files selected for processing (14)
crates/stdlib/src/hashlib.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/vm/src/builtins/module.rscrates/vm/src/builtins/namespace.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/type.rscrates/vm/src/intern.rscrates/vm/src/macros.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/stdlib/codecs.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/vm_new.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/src/macros.rs
- crates/vm/src/intern.rs
| if key_str.as_wtf8().is_empty() { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Do not skip empty string keys in SimpleNamespace.__repr__.
At Line 159, empty string keys are currently dropped, but they are valid string keys and should still appear in repr output (e.g., namespace(=1) in CPython-compatible behavior). This is a small but real behavior regression.
💡 Proposed fix
- if key_str.as_wtf8().is_empty() {
- continue;
- }
if !first {
result.push_str(", ");
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/namespace.rs` around lines 159 - 161, The current repr
builder in SimpleNamespace drops keys whose key_str.as_wtf8().is_empty() —
remove that early-continue so empty-string keys are treated like any other key
and rendered (producing e.g. namespace(=1) for a key ""), i.e., delete the
empty-key skip in the block around key_str.as_wtf8() and ensure the code that
appends key and value still runs for empty keys so the empty-name + '=' + value
appears in the output.
| .downcast_ref::<PyUtf8Str>() | ||
| .map(|s| s.as_str().to_owned()) | ||
| .unwrap_or_default(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how PyUtf8Str is defined and whether PyStr keys would downcast to it
ast-grep --pattern 'struct PyUtf8Str { $$$ }'
echo "---"
# Check if PyStr and PyUtf8Str are separate types or if PyStr contains PyUtf8Str
rg -n 'pub struct PyUtf8Str' --type=rs -C5
echo "---"
# Check downcast_ref implementations
rg -n 'impl.*PyPayload.*for.*PyUtf8Str' --type=rs -C3Repository: RustPython/RustPython
Length of output: 143
🏁 Script executed:
sed -n '800,820p' crates/vm/src/vm/mod.rsRepository: RustPython/RustPython
Length of output: 757
🏁 Script executed:
rg -n 'pub struct PyStr' --type=rs -C3Repository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
# Find PyUtf8Str definition
rg -n 'pub struct PyUtf8Str' -C5Repository: RustPython/RustPython
Length of output: 659
🏁 Script executed:
# Find PyStr definition
rg -n 'pub struct PyStr' -C5Repository: RustPython/RustPython
Length of output: 1996
🏁 Script executed:
# Check the impl blocks for these types
rg -n 'impl.*PyUtf8Str' -C2Repository: RustPython/RustPython
Length of output: 8460
🏁 Script executed:
# Search for PyPayload implementations
rg -n 'PyPayload.*for.*PyUtf8Str' -C3Repository: RustPython/RustPython
Length of output: 443
🏁 Script executed:
# Find PyStr methods for string conversion
rg -n 'impl PyStr' -A 20 crates/vm/src/builtins/str.rs | head -100Repository: RustPython/RustPython
Length of output: 3288
🏁 Script executed:
# Look for as_str, as_wtf8, to_str methods on PyStr
rg -n 'pub fn as_str|pub fn as_wtf8|pub fn to_str' crates/vm/src/builtins/str.rs | head -20Repository: RustPython/RustPython
Length of output: 382
🏁 Script executed:
# Check actual usage patterns in the codebase - how is PyStr typically converted to string?
rg -n 'downcast_ref::<PyStr>' --max-count=10Repository: RustPython/RustPython
Length of output: 9458
Use downcast_ref::<PyStr>() with proper WTF-8 handling instead of PyUtf8Str.
Dictionary keys in sys.modules are PyStr, which can contain WTF-8 surrogates. The current downcast_ref::<PyUtf8Str>() will fail for any non-UTF-8 keys and silently convert them to empty strings via unwrap_or_default(), preventing their weakrefs from being recorded during finalization.
Change to:
let name = key
.downcast_ref::<PyStr>()
.map(|s| s.as_wtf8().to_string())
.unwrap_or_default();Or use another safe conversion method like to_string_lossy(). This pattern is already used elsewhere in the codebase (e.g., type.rs:1968).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/vm/mod.rs` around lines 810 - 812, Replace the failing
UTF-8-only downcast with a WTF-8-safe conversion: where you currently call
key.downcast_ref::<PyUtf8Str>().map(|s|
s.as_str().to_owned()).unwrap_or_default(), change it to use PyStr
(key.downcast_ref::<PyStr>()) and convert with .as_wtf8().to_string() (or
.to_string_lossy()) so non-UTF-8 names are preserved; update the expression that
assigns name accordingly (keep the unwrap_or_default fallback).
- Remove as_str() from PyStr/Py<PyStr> (was panicking on surrogates) - Add Wtf8Concat trait and concat! macro for WTF-8 formatting - Add impl From<&str> for &Wtf8 conversion - Add AsPyStr/DictKey impls for PyUtf8Str types - Migrate all call sites to as_wtf8(), to_str(), or PyUtf8Str - Fix exception message APIs to accept Wtf8Buf - Deduplicate inner-scope imports across modules
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/vm/src/codecs.rs (1)
114-136:⚠️ Potential issue | 🟡 MinorInconsistent
errorsparameter type withencode/decodemethods.The
get_incremental_encoderandget_incremental_decodermethods useOption<PyStrRef>for theerrorsparameter, whileencodeanddecodehave been updated toOption<PyUtf8StrRef>. Since all three method groups handle the same error handler names (e.g., "strict", "ignore"), they should use the same type for consistency.Update both methods to use
Option<PyUtf8StrRef>with the same conversion pattern asencodeanddecode:Suggested fix
pub fn get_incremental_encoder( &self, - errors: Option<PyStrRef>, + errors: Option<PyUtf8StrRef>, vm: &VirtualMachine, ) -> PyResult { let args = match errors { - Some(e) => vec![e.into()], + Some(e) => vec![e.into_wtf8().into()], None => vec![], }; vm.call_method(self.0.as_object(), "incrementalencoder", args) } pub fn get_incremental_decoder( &self, - errors: Option<PyStrRef>, + errors: Option<PyUtf8StrRef>, vm: &VirtualMachine, ) -> PyResult { let args = match errors { - Some(e) => vec![e.into()], + Some(e) => vec![e.into_wtf8().into()], None => vec![], }; vm.call_method(self.0.as_object(), "incrementaldecoder", args) }Note: Call sites in
crates/vm/src/stdlib/io.rs(lines 2923, 2956) will need type alignment updates when this change is made.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/codecs.rs` around lines 114 - 136, Change the errors parameter type for get_incremental_encoder and get_incremental_decoder from Option<PyStrRef> to Option<PyUtf8StrRef> and mirror the same conversion pattern used by encode/decode: convert the Option<PyUtf8StrRef> into the vm.call_method args (e.g., map Some(e) to vec![e.into()] or the exact conversion used by encode/decode) so the error-handler names use the same UTF-8 validated type; also update call sites in crates/vm/src/stdlib/io.rs that invoke these methods to pass Option<PyUtf8StrRef>.crates/stdlib/src/csv.rs (1)
141-146:⚠️ Potential issue | 🟠 MajorCharacter validation is byte-based, so valid 1-char Unicode inputs can be rejected.
These branches validate “single character” using byte iteration (
as_bytes().exactly_one()), not character count. A single non-ASCII char (multi-byte UTF-8) fails this check incorrectly.💡 Suggested direction
- s.as_bytes().iter().copied().exactly_one() + let ch = s + .try_as_utf8(vm)? + .chars() + .exactly_one() + .map_err(|_| vm.new_type_error(r#""<field>" must be a 1-character string"#))?; + let b = { + let mut tmp = [0u8; 4]; + let encoded = ch.encode_utf8(&mut tmp); + if encoded.len() != 1 { + return Err(vm.new_type_error(r#""<field>" must be a single-byte character"#)); + } + encoded.as_bytes()[0] + };Apply the same pattern consistently for delimiter/quotechar/escapechar parsing.
Also applies to: 162-164, 183-188, 582-586, 628-633
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/csv.rs` around lines 141 - 146, The code currently validates single characters by checking bytes (s.as_bytes().iter().copied().exactly_one()), which rejects valid multi-byte Unicode characters; replace those byte-based checks with character-based checks using s.chars().exactly_one().map_err(...) so you extract a char (not a u8) and return that char where delimiter/quotechar/escapechar are parsed; apply this same change to all similar spots (the branches around the snippets at the other mentioned occurrences) so delimiter, quotechar and escapechar use s.chars().exactly_one() and the error message remains the same.crates/stdlib/src/_asyncio.rs (1)
842-851: 🛠️ Refactor suggestion | 🟠 MajorInconsistency:
PyFuturestill usesrepr_strwhilePyTaskusesrepr_wtf8.The AI summary claims
PyFuturewas updated torepr_wtf8, but the code still showsrepr_strreturningString. Meanwhile,PyTask(line 1940) implementsrepr_wtf8returningWtf8Buf. For consistency with the WTF-8 migration and to properly handle surrogate-containing data,PyFutureshould also implementrepr_wtf8.♻️ Suggested update for consistency
impl Representable for PyFuture { - fn repr_str(zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<String> { + fn repr_wtf8(zelf: &Py<Self>, vm: &VirtualMachine) -> PyResult<Wtf8Buf> { let class_name = zelf.class().name().to_string(); if let Some(_guard) = ReprGuard::enter(vm, zelf.as_object()) { let info = get_future_repr_info(zelf.as_object(), vm)?; - Ok(format!("<{} {}>", class_name, info)) + Ok(wtf8_concat!("<", class_name, " ", info, ">")) } else { - Ok(format!("<{} ...>", class_name)) + Ok(Wtf8Buf::from(format!("<{} ...>", class_name))) } } },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/_asyncio.rs` around lines 842 - 851, The PyFuture Representable impl currently defines repr_str returning String; change it to implement repr_wtf8 to match PyTask and the WTF-8 migration: replace the repr_str(&zelf, vm) -> PyResult<String> with repr_wtf8(&zelf, vm) -> PyResult<Wtf8Buf>, keep the same ReprGuard usage and call to get_future_repr_info(zelf.as_object(), vm) to produce the inner text, and convert that text into a Wtf8Buf for the Ok(...) result; remove the old repr_str implementation and ensure Wtf8Buf is imported/available in the module.
♻️ Duplicate comments (15)
src/shell.rs (1)
145-146:⚠️ Potential issue | 🔴 CriticalAvoid panic-prone prompt extraction in the REPL loop.
Line 145 still uses
expect_str(), which can panic for surrogate-containingsys.ps1/sys.ps2values and reintroduces the crash this PR is meant to eliminate.Suggested fix
- let prompt = match prompt { - Ok(ref s) => s.expect_str(), - Err(_) => "", - }; + let prompt = match prompt { + Ok(ref s) => s.to_str().unwrap_or(""), + Err(_) => "", + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/shell.rs` around lines 145 - 146, The code in the REPL loop uses expect_str(), which can panic on surrogate-containing sys.ps1/sys.ps2 values; change the prompt extraction to a non-panicking, lossy conversion (e.g., replace expect_str() with s.to_string_lossy().into_owned() or equivalent) and return an owned String (use String::new() instead of "" in the Err arm) so the match on Ok(ref s) => ... and Err(_) => ... produces the correct safe String without risking a panic.src/shell/helper.rs (1)
139-145:⚠️ Potential issue | 🔴 Critical
expect_str()can still panic on surrogates, contradicting PR objectives.The PR aims to remove panics from
PyStr::as_str(), butexpect_str()has the same issue—it panics when the string contains surrogates. This is inconsistent with the surrogate-safe handling (as_bytes(),as_wtf8()) used elsewhere in this function.Suggested fix using fallible conversion
Some(( startpos, completions .into_iter() - .map(|s| s.expect_str().to_owned()) + .filter_map(|s| s.to_str().map(str::to_owned)) .collect(), ))This skips completions containing surrogates rather than panicking, which is appropriate for shell autocompletion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/shell/helper.rs` around lines 139 - 145, The code uses expect_str() which can panic on surrogate-containing strings; change the mapping over completions to a fallible conversion and skip invalid entries instead of panicking. Replace the map(|s| s.expect_str().to_owned()) call with a filter_map that calls the non-panicking conversion (e.g. s.as_wtf8().ok().map(|v| v.to_owned()) or another fallible API you use elsewhere like as_bytes()/as_wtf8()), so invalid/surrogate-containing completions are filtered out rather than causing a panic (update the iterator from map to filter_map accordingly around the completions.into_iter()).crates/vm/src/stdlib/msvcrt.rs (1)
85-87:⚠️ Potential issue | 🔴 CriticalReplace
expect_str()with surrogate-safe WTF-8 handling in wide-char paths.Line 85 and Line 110 reintroduce panic-prone extraction (
expect_str()), which can crash on surrogate-containing Python strings. Use WTF-8 code points in both functions instead.Proposed fix
@@ - let c = s - .expect_str() - .chars() + let c = s + .as_wtf8() + .code_points() .exactly_one() .map_err(|_| vm.new_type_error("putch() argument must be a string of length 1"))?; - unsafe { suppress_iph!(_putwch(c as u16)) }; + unsafe { suppress_iph!(_putwch(c.to_u32() as u16)) }; @@ - let c = - s.expect_str().chars().exactly_one().map_err(|_| { + let c = s + .as_wtf8() + .code_points() + .exactly_one() + .map_err(|_| { vm.new_type_error("ungetwch() argument must be a string of length 1") })?; - let ret = unsafe { suppress_iph!(_ungetwch(c as u32)) }; + let ret = unsafe { suppress_iph!(_ungetwch(c.to_u32())) };Verification (expected:
putwch/ungetwchuseas_wtf8().code_points()and noexpect_str()in those functions):#!/bin/bash rg -nP --type=rust -C4 '\bfn\s+(putwch|ungetwch)\b|expect_str\(\)|as_wtf8\(\)\.code_points\(\)|_putwch\(|_ungetwch\(' crates/vm/src/stdlib/msvcrt.rsAs per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 110-110
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/msvcrt.rs` around lines 85 - 87, The calls to expect_str() in the wide-char path of putwch and ungetwch can panic on surrogate-containing Python strings; replace the expect_str() extraction with surrogate-safe iteration by using the Python string's as_wtf8().code_points() iterator and consume those code points where the current code expects chars().exactly_one(). Update the implementations of putwch and ungetwch to iterate over as_wtf8().code_points(), map/convert the WTF-8 code points into the u16/u32 values used for _putwch/_ungetwch, and remove any expect_str() usage so no panics occur on invalid UTF-8/surrogates.crates/wasm/src/browser_module.rs (1)
67-67:⚠️ Potential issue | 🔴 Critical
expect_str()reintroduces surrogate panics in user-facing pathsThese call sites still panic on surrogate-containing Python strings, so the PR objective (“remove unexpected panic from string extraction”) is not met. Please switch these conversions to non-panicking paths (
to_str()with Python error propagation, or a deliberate lossy conversion where acceptable).Suggested verification script (read-only)
#!/bin/bash set -euo pipefail # 1) Confirm remaining panic-prone call sites in this module rg -n '\.expect_str\(' crates/wasm/src/browser_module.rs # 2) Inspect available safe string-conversion APIs in RustPython string types rg -n 'fn (to_str|to_string_lossy|as_wtf8)\(' crates vm -g '*.rs'As per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 73-73, 79-79, 87-87, 95-95, 171-171, 206-206, 215-215, 227-227, 244-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/wasm/src/browser_module.rs` at line 67, Several call sites (notably the FetchResponseFormat::from_str(vm, s.expect_str())? usage) still use expect_str(), which can panic on surrogate-containing Python strings; replace these with non-panicking conversions by using the safe to_str() and propagate the Python error (e.g., return Err or use ?), or where lossy conversion is acceptable use to_string_lossy()/as_wtf8-like API. Update all occurrences referenced in the comment (lines near FetchResponseFormat::from_str, and the other noted expect_str() sites) to call the safe conversion on the Python string type and handle the Result/Error instead of unwrapping, ensuring function signatures propagate errors as needed.crates/vm/src/stdlib/operator.rs (1)
583-586:⚠️ Potential issue | 🟡 MinorPreserve UTF-8 validation errors instead of collapsing to
TypeError.At Line 584–585,
map_err(|_| vm.new_type_error("method name must be a string"))converts all failures into oneTypeError. This also masks surrogate-related UTF-8 validation failures that should propagate distinctly.Proposed fix
- let name = name - .try_into_value(vm) - .map_err(|_| vm.new_type_error("method name must be a string"))?; + let name = name + .downcast::<PyStr>() + .map_err(|_| vm.new_type_error("method name must be a string"))?; + let name: PyUtf8StrRef = name.try_into_utf8(vm)?; Ok(Self { name, args })#!/bin/bash set -euo pipefail # Verify current constructor behavior in operator::methodcaller cat -n crates/vm/src/stdlib/operator.rs | sed -n '578,590p' # Verify UTF-8 conversion path and error generation logic rg -n "fn try_into_utf8|ensure_valid_utf8|new_unicode_encode_error" crates/vm/src/builtins/str.rs -A6 -B6Expected verification: first command shows the blanket
map_err; second command shows UTF-8 validation path capable of raising a Unicode error that should not be flattened.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/operator.rs` around lines 583 - 586, The current conversion of name via try_into_value(vm) collapses all errors into vm.new_type_error("method name must be a string"); update that mapping to preserve UTF-8 validation errors by inspecting the original error returned from name.try_into_value(vm) and: if it's a UTF-8/unicode validation error, propagate it (e.g., return vm.new_unicode_encode_error(...) or forward the original unicode error), otherwise convert non-unicode failures to vm.new_type_error("method name must be a string"); change the closure used with map_err (or replace map_err with a match on the Err) so try_into_value(vm) can return the distinct unicode error instead of always creating a TypeError while still returning TypeError for genuine non-string cases, leaving the rest of the constructor (Self { name, args }) unchanged.crates/vm/src/frame.rs (1)
2666-2672:⚠️ Potential issue | 🟡 MinorReplace
.unwrap()with fallible error handling to avoid VM panic on malformed CALL_KW metadata.The
downcast_ref::<PyUtf8Str>().unwrap()at line 2669 can panic if thekwarg_namestuple contains non-string values. While the compiler ensures kwarg_names are strings at compile time, crafted code objects could bypass this validation. Compare with line 2688 which properly usesok_or_elsefor the same pattern.🛡️ Suggested fix with proper error handling
let kwarg_names = kwarg_names.as_slice().iter().map(|pyobj| { - pyobj + Ok::<_, PyBaseExceptionRef>(pyobj .downcast_ref::<PyUtf8Str>() - .unwrap() + .ok_or_else(|| vm.new_type_error("keyword argument names must be strings"))? .as_str() - .to_owned() + .to_owned()) });Note: This would require updating the call site to handle the Result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 2666 - 2672, The code currently calls downcast_ref::<PyUtf8Str>().unwrap() when iterating kwarg_names, which can panic on malformed CALL_KW metadata; change this to fallible handling like using downcast_ref::<PyUtf8Str>().ok_or_else(...) and propagate a proper Err (matching the error type used elsewhere in this file) instead of unwrapping, update the iterator/map to return Results (or collect into a Result<Vec<_>>) and adjust the surrounding code/call site to handle the Result; refer to the kwarg_names variable and mirror the pattern used at the similar site around line 2688 that uses ok_or_else for guidance.crates/vm/src/stdlib/nt.rs (1)
1191-1194:⚠️ Potential issue | 🔴 CriticalReplace
expect_str()with fallible conversion in exec argument/env paths.Line 1193, Line 1234, and Lines 1257-1258 still use
expect_str(), which can panic on surrogate-containing input. In these user-facing conversion paths, this should return a Python exception instead of panicking.Suggested fix
let argv = vm.extract_elements_with(argv.as_ref(), |obj| { let arg = PyStrRef::try_from_object(vm, obj)?; - make_widestring(arg.expect_str()) + let arg = arg + .to_str() + .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?; + make_widestring(arg) })?;let argv = vm.extract_elements_with(argv.as_ref(), |obj| { let arg = PyStrRef::try_from_object(vm, obj)?; - make_widestring(arg.expect_str()) + let arg = arg + .to_str() + .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?; + make_widestring(arg) })?;for (key, value) in env.into_iter() { let key = PyStrRef::try_from_object(vm, key)?; let value = PyStrRef::try_from_object(vm, value)?; - let key_str = key.expect_str(); - let value_str = value.expect_str(); + let key_str = key + .to_str() + .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?; + let value_str = value + .to_str() + .ok_or_else(|| vm.new_unicode_encode_error("surrogates not allowed"))?;#!/bin/bash set -euo pipefail # Verify there are no panic-prone string extractions left in execv/execve conversion paths. sed -n '1178,1275p' crates/vm/src/stdlib/nt.rs rg -n -C2 'expect_str\(\)' crates/vm/src/stdlib/nt.rsExpected result: no
expect_str()matches within theexecv/execveargv/env conversion blocks.As per coding guidelines "Follow Rust best practices for error handling and memory management".
Also applies to: 1232-1235, 1255-1258
crates/vm/src/stdlib/os.rs (1)
495-497:⚠️ Potential issue | 🔴 CriticalReplace
expect_str()in Windows env APIs; this can panic on surrogate input.
putenv/unsetenvshould return a Python exception here, not panic the VM.Proposed fix
- let key_str = key.expect_str(); - let value_str = value.expect_str(); + let key_str = key + .to_str() + .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?; + let value_str = value + .to_str() + .ok_or_else(|| vm.new_value_error("illegal environment variable value"))?; @@ - let key_str = key.expect_str(); + let key_str = key + .to_str() + .ok_or_else(|| vm.new_value_error("illegal environment variable name"))?;#!/bin/bash # Verify panic-prone usage in os module rg -n -C2 "expect_str\\(" crates/vm/src/stdlib/os.rs # Verify conversion semantics in PyStr implementation rg -n -C3 "fn expect_str|fn to_str|str has surrogates" crates/vm/src/builtins/str.rsAlso applies to: 543-545
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/os.rs` around lines 495 - 497, The use of PyStrRef::expect_str() in the Windows env APIs (e.g., putenv and unsetenv) can panic on surrogate-containing input; replace calls to expect_str() with a non-panicking conversion (e.g., using the PyStrRef method that returns a Result/&str) and on error return an appropriate Python exception via the VirtualMachine (e.g., vm.new_type_error or vm.new_value_error/new_unicode_decode_error) instead of panicking so the functions (putenv, unsetenv) return a PyResult::Err when given invalid/surrogate input.crates/vm/src/stdlib/functools.rs (1)
426-431:⚠️ Potential issue | 🔴 CriticalReplace panic-prone
expect_str()in keyword merge path.Line 430 can panic on surrogate-containing
PyStrkeys. This is a runtime crash path inpartial.__call__; use fallible conversion and raiseTypeErrorinstead.🛠️ Proposed fix
- let key_str = key - .downcast_ref::<crate::builtins::PyStr>() - .ok_or_else(|| vm.new_type_error("keywords must be strings"))?; - final_kwargs.insert(key_str.expect_str().to_owned(), value); + let key_str = key + .downcast_ref::<crate::builtins::PyStr>() + .and_then(|s| s.to_str()) + .ok_or_else(|| vm.new_type_error("keywords must be strings"))?; + final_kwargs.insert(key_str.to_owned(), value);The
expect_str()method explicitly documents panic behavior for surrogate-containing strings. The safe alternativeto_str()is available and returnsOption<&str>.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/functools.rs` around lines 426 - 431, Replace the panic-prone expect_str() call when merging keyword keys: instead of calling key_str.expect_str().to_owned(), use the fallible to_str() on the PyStr returned by downcast_ref::<crate::builtins::PyStr>() and convert that Option<&str> into a TypeError via vm.new_type_error if it is None; then insert the owned string (e.g., key_str.to_str().ok_or_else(|| vm.new_type_error("keywords must be strings"))?.to_owned()) into final_kwargs so surrogate-containing strings raise a TypeError instead of panicking.crates/stdlib/src/csv.rs (1)
297-302:⚠️ Potential issue | 🟡 MinorUse
TypeErrorfor non-string dialect names in lookup/remove paths.At Line 297 and Line 317, invalid
nametypes are mapped to_csv.Error. This should beTypeErrorto stay consistent with argument-type failures elsewhere (includingregister_dialect).💡 Suggested fix
- let name = name.downcast::<PyStr>().map_err(|obj| { - new_csv_error( - vm, - format!("argument 0 must be a string, not '{}'", obj.class()), - ) - })?; + let name = name.downcast::<PyStr>().map_err(|obj| { + vm.new_type_error(format!("argument 0 must be a string, not '{}'", obj.class())) + })?;Also applies to: 317-322
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/csv.rs` around lines 297 - 302, The code currently maps non-string `name` arguments to `_csv.Error` by calling `new_csv_error` in the `name.downcast::<PyStr>().map_err(...)` closures in the dialect lookup/remove paths; change those closures to raise a `TypeError` instead (use the same TypeError construction used by `register_dialect`/other arg-type failures) so invalid `name` types produce a TypeError rather than `_csv.Error`; update both occurrences around the `name.downcast::<PyStr>()` call (the block shown and the similar block at the 317–322 region).crates/vm/src/stdlib/ctypes.rs (1)
1119-1122:⚠️ Potential issue | 🟠 MajorValidate
PyStras UTF-8 before passing raw pointer to FFI.At Line 1120,
s.as_bytes()can still forward non-UTF-8/WTF-8 data directly aschar*. This makes C-side string interpretation ambiguous and can corrupt symbol/argument semantics. Gate this branch with UTF-8 validation first.💡 Suggested fix
- } else if let Some(s) = arg.downcast_ref::<crate::builtins::PyStr>() { - let ptr = s.as_bytes().as_ptr() as isize; + } else if let Some(s) = arg.downcast_ref::<crate::builtins::PyStr>() { + let ptr = s.try_as_utf8(vm)?.as_bytes().as_ptr() as isize; arg_values.push(ptr); arg_types.push(Type::pointer()); } else {Use this read-only check to confirm API behavior and call-site usage:
#!/bin/bash set -euo pipefail echo "== PyStr UTF-8/bytes APIs ==" rg -nP --type rust 'fn\s+as_bytes|fn\s+try_as_utf8|fn\s+as_wtf8' crates/vm/src/builtins/str.rs -A6 -B4 echo echo "== ctypes FFI string-pointer path ==" rg -nP --type rust 'downcast_ref::<crate::builtins::PyStr>\(\)|as_bytes\(\)\.as_ptr\(\)|try_as_utf8\(' crates/vm/src/stdlib/ctypes.rs -A6 -B6🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes.rs` around lines 1119 - 1122, Gate the PyStr branch with a UTF-8 validation: replace the raw s.as_bytes().as_ptr() usage with a call to s.try_as_utf8() (or equivalent) and only push the pointer and Type::pointer() when try_as_utf8() returns Ok(slice). If try_as_utf8() returns Err, return or raise a Python/FFI error (e.g., TypeError/ValueError) instead of passing a raw pointer so callers using arg.downcast_ref::<crate::builtins::PyStr>() can't receive non-UTF-8 data via arg_values/arg_types.crates/vm/src/builtins/namespace.rs (1)
158-160:⚠️ Potential issue | 🟡 MinorDo not skip empty-string keys in
SimpleNamespace.__repr__.Line 158 drops
""keys, but they are still valid string keys and should be rendered in repr output.💡 Proposed fix
- if key_str.as_wtf8().is_empty() { - continue; - } if !first { result.push_str(", "); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/namespace.rs` around lines 158 - 160, The repr for SimpleNamespace currently skips keys whose key_str.as_wtf8().is_empty(), which incorrectly omits valid "" keys; in the SimpleNamespace.__repr__ implementation remove the conditional that continues on empty key (i.e., stop filtering out key_str.as_wtf8().is_empty()) so empty-string keys are included in the repr, and ensure the existing key formatting logic (the same code paths that quote/escape other keys) is used for the "" key so it renders as an empty quoted string in the output.crates/vm/src/stdlib/ctypes/function.rs (2)
756-759:⚠️ Potential issue | 🟠 MajorAvoid
expect()in_type_probing to prevent VM aborts.Line 758 can panic on a non-UTF8
_type_value; this path should fail closed (return false), not abort.💡 Proposed fix
- let c = s - .to_str() - .expect("_type_ is validated as ASCII at type creation"); - if c.len() == 1 && "sPzUZXO".contains(c) { - return true; + if let Some(c) = s.to_str() { + if c.len() == 1 && "sPzUZXO".contains(c) { + return true; + } }#!/bin/bash # Verify remaining panic-prone `_type_` conversion in this module. rg -n "to_str\\(\\)\\s*\\.expect|_type_" --type rs crates/vm/src/stdlib/ctypes/function.rs -C 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 756 - 759, The code currently calls s.to_str().expect(...) when probing `_type_` (variable s) which can panic on non-UTF8 and abort the VM; instead, change the conversion to handle invalid UTF-8 gracefully (e.g., use s.to_str().ok() and return false if None) so the `_type_` probe fails closed; update the branch that uses c (the resulting &str) — the check if c.len() == 1 && "sPzUZXO".contains(c) — to only run when the to_str() succeeded and otherwise return false from the probe function.
283-289:⚠️ Potential issue | 🟠 MajorUse
wchar_t-width conversion instead of hardcoded UTF-16 inconv_param.Line 284 builds a
Vec<u16>unconditionally. On non-Windows targets wherewchar_tis 32-bit, this produces incorrectly encoded wide strings.💡 Proposed fix
- let wide: Vec<u16> = s - .as_wtf8() - .encode_wide() - .chain(core::iter::once(0)) - .collect(); - let wide_bytes: Vec<u8> = wide.iter().flat_map(|&x| x.to_ne_bytes()).collect(); + let wide_bytes = super::base::str_to_wchar_bytes(s.as_wtf8());#!/bin/bash # Verify current hardcoded UTF-16 conversion and platform-aware helper availability. rg -n "encode_wide\\(|Vec<u16>|str_to_wchar_bytes|wchar_t" --type rs crates/vm/src/stdlib/ctypes/function.rs crates/vm/src/stdlib/ctypes/base.rs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/ctypes/function.rs` around lines 283 - 289, The conversion in conv_param currently builds a Vec<u16> via s.as_wtf8().encode_wide(), which hardcodes UTF-16 and breaks on platforms where wchar_t is 32-bit; replace this with a platform-aware conversion that produces bytes sized to wchar_t (use the existing str_to_wchar_bytes helper in ctypes::base or add one that uses cty::wchar_t / cfg(target_family = "windows") to pick u16 vs u32), and then assign wide_bytes from that helper instead of from wide.iter().flat_map(...); update the code paths referencing wide_bytes in conv_param to use the new helper so wide/terminator encoding matches the target wchar_t width.crates/vm/src/vm/mod.rs (1)
809-812:⚠️ Potential issue | 🟡 MinorUse
PyStrinstead ofPyUtf8Strfor sys.modules keys.Dictionary keys in
sys.modulesarePyStr, which can contain WTF-8 surrogates. The currentdowncast_ref::<PyUtf8Str>()will fail for any non-UTF-8 keys and silently convert them to empty strings viaunwrap_or_default(), preventing their weakrefs from being properly recorded during finalization.Suggested fix
for (key, value) in items { let name = key - .downcast_ref::<PyUtf8Str>() - .map(|s| s.as_str().to_owned()) + .downcast_ref::<PyStr>() + .map(|s| s.as_wtf8().to_string()) .unwrap_or_default();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 809 - 812, The code extracts module key names using key.downcast_ref::<PyUtf8Str>() which fails for non-UTF-8 (WTF-8) keys and falls back to empty string; change the downcast to use PyStr instead (e.g. key.downcast_ref::<PyStr>()) and obtain the string representation that preserves WTF-8 content (use PyStr's appropriate to-owned/string conversion) so names from sys.modules are recorded correctly instead of becoming empty via unwrap_or_default(); update the expression that builds name and remove the unsafe fallback to empty string.
🧹 Nitpick comments (5)
crates/common/src/str.rs (1)
261-267: Consider tracking this TODO or addressing it in this PR.Since the PR objective is specifically about establishing the "as" vs "to" naming convention (zero-cost vs small-cost conversions), it would be consistent to rename
as_strtoto_stras part of this PR rather than leaving it as a TODO. If intentionally deferred, consider creating a follow-up issue to track this.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/common/src/str.rs` around lines 261 - 267, Rename the method as_str to to_str to follow the repo's "as" (zero-cost) vs "to" (small-cost) naming convention: update the function signature (pub fn to_str(&self) -> Option<&str>) replacing all uses of as_str, update any doc comments/tests/benchmarks referencing as_str, and remove the TODO comment; if there are external usages you can't change in this PR, add a deprecated shim pub fn as_str(&self) -> Option<&str> that forwards to to_str and mark it with #[deprecated] so callers get migrated in follow-ups.crates/vm/src/utils.rs (1)
58-62: Avoid hard panic in shared repr helper.Line 61 uses
expect(...); a future empty-iterator caller would panic in repr code. Prefer graceful handling.♻️ Suggested change
- let first = parts_iter - .next() - .transpose()? - .expect("this is not called for empty collection"); - repr.push_wtf8(first.as_wtf8()); - for part in parts_iter { - repr.push_str(", "); - repr.push_wtf8(part?.as_wtf8()); - } + if let Some(first) = parts_iter.next().transpose()? { + repr.push_wtf8(first.as_wtf8()); + for part in parts_iter { + repr.push_str(", "); + repr.push_wtf8(part?.as_wtf8()); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/utils.rs` around lines 58 - 62, The code currently calls parts_iter.next().transpose()?.expect(...) which will panic on an empty iterator; replace the expect-driven panic with graceful handling: match or if let on parts_iter.next().transpose()? and when it yields None return an appropriate Err (using the function's Result/error type) with a clear message instead of panicking, otherwise bind to first and continue to call repr.push_wtf8(first.as_wtf8()); update any signatures if needed to propagate the error rather than panicking.crates/stdlib/src/_asyncio.rs (1)
936-944: Inconsistent error handling compared toget_future_repr_info.Line 942 uses
x.str(vm)?which propagates errors, whileget_future_repr_info(lines 902-904) usesif let Ok(s) = x.str(vm)to silently skip failures. This inconsistency could causerepr()to fail for tasks but succeed for futures when string conversion fails.For repr functions, silent failure handling is typically preferred to avoid cascading errors.
♻️ Suggested fix for consistency
let mut result = Wtf8Buf::new(); let parts = list.borrow_vec(); for (i, x) in parts.iter().enumerate() { if i > 0 { result.push_str(" "); } - result.push_wtf8(x.str(vm)?.as_wtf8()); + if let Ok(s) = x.str(vm) { + result.push_wtf8(s.as_wtf8()); + } } Ok(result)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/_asyncio.rs` around lines 936 - 944, The repr builder currently propagates string-conversion errors via x.str(vm)? causing repr() to fail; make it consistent with get_future_repr_info by silently skipping items that fail to convert: replace the x.str(vm)? usage in the loop that builds result (the parts.iter().enumerate() loop) with an if let Ok(s) = x.str(vm) { if i > 0 { result.push_str(" "); } result.push_wtf8(s.as_wtf8()); } so failures are ignored and repr returns Ok(result) as before.crates/stdlib/src/ssl.rs (1)
811-825: Unify hostname validation across all server_hostname entry points.The newer
server_hostnamesurfaces are now centralized on UTF-8 types, but_wrap_socketstill has its own manual checks and does not apply the same full validation path asset_server_hostname/_wrap_bio(notably the max-length rule invalidate_hostname). Reusing the helper would keep behavior consistent.♻️ Suggested change
- let hostname = match args.server_hostname.into_option().flatten() { - Some(hostname_str) => { - let hostname = hostname_str.as_str(); - - // Validate hostname - if hostname.is_empty() { - return Err(vm.new_value_error("server_hostname cannot be an empty string")); - } - - // Check if it starts with a dot - if hostname.starts_with('.') { - return Err(vm.new_value_error("server_hostname cannot start with a dot")); - } - - // IP addresses are allowed - // SNI will not be sent for IP addresses - - // Check for NULL bytes - if hostname.contains('\0') { - return Err(vm.new_type_error("embedded null character")); - } - - Some(hostname.to_string()) - } - None => None, - }; + let hostname = match args.server_hostname.into_option().flatten() { + Some(hostname_str) => { + let hostname = hostname_str.as_str(); + validate_hostname(hostname, vm)?; + Some(hostname.to_string()) + } + None => None, + };Also applies to: 3902-3920
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl.rs` around lines 811 - 825, _unify hostname validation by routing all server_hostname entry points through the existing validate_hostname helper (the same used by set_server_hostname and _wrap_bio) instead of doing ad-hoc checks; specifically update _wrap_socket handling and the WrapBioArgs/WrapSocketArgs server_hostname parsing to accept the UTF-8 variant and call validate_hostname (including the max-length rule) before using the hostname, ensuring you reuse the same validation path and error reporting as validate_hostname so behavior is consistent across set_server_hostname, _wrap_bio, and _wrap_socket.crates/vm/src/builtins/str.rs (1)
2120-2130: Verify safety ofunreachable_uncheckedinPy<PyUtf8Str>::as_str().The
unreachable_unchecked()in the else branch relies on thePyUtf8Strinvariant (always valid UTF-8). While thedebug_assert!catches violations in debug builds, in release builds a corruptedPyUtf8Strcould cause undefined behavior. Consider whether a simpler approach without unsafe would suffice:🔒 Safer alternative
pub fn as_str(&self) -> &str { - self.as_pystr().to_str().unwrap_or_else(|| { - debug_assert!(false, "PyUtf8Str invariant violated"); - // Safety: PyUtf8Str guarantees valid UTF-8 - unsafe { core::hint::unreachable_unchecked() } - }) + // Safety: PyUtf8Str invariant guarantees valid UTF-8 + // The type can only be constructed from valid UTF-8 sources + unsafe { self.as_pystr().to_str().unwrap_unchecked() } }This is slightly cleaner and the safety comment is more direct. The current implementation is functionally equivalent but the
unwrap_or_elsewithunreachable_uncheckedis a bit convoluted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/str.rs` around lines 2120 - 2130, The use of unsafe unreachable_unchecked in Py<PyUtf8Str>::as_str() is unnecessary and can cause UB in release builds if the invariant is violated; replace the unwrap_or_else + debug_assert + unreachable_unchecked pattern by a safe unwrap/expect on as_pystr().to_str() (e.g., call to_str().expect("PyUtf8Str invariant violated")) so any invariant break panics rather than invoking undefined behavior, keeping the error message for debugging while removing unsafe code; locate the as_str method and change the fallback from unreachable_unchecked to a safe expect/unwrap on the result of as_pystr().to_str().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/stdlib/src/hashlib.rs`:
- Around line 228-231: The __name__ path preserves case causing mixed-case
algorithm names to slip through; update the branch that binds name_str and algo
(the let Some(name_str) / let Some(algo) = name_str.strip_prefix("openssl_")
branch) to normalize algo to lowercase before returning (e.g., replace returning
algo.to_owned() with returning a lowercased version such as
algo.to_ascii_lowercase()) so it matches the behavior of the direct str branch.
In `@crates/stdlib/src/socket.rs`:
- Around line 3024-3027: The code currently uses .filter_map(Result::ok) and
.next().unwrap() on the iterator returned by dns_lookup::getaddrinfo, which
silently drops Errs and panics if there are no Ok entries; update the logic in
the function that returns Result<(String, String), IoOrPyException> to
explicitly handle iterator errors: iterate the dns_lookup::getaddrinfo(...)
iterator (or collect into a Vec<Result<_,_>>), look for the first Ok value to
use as ainfo, and if none exist return an appropriate IoOrPyException by
converting the first/last Err via convert_socket_error(vm, err,
SocketError::GaiError) instead of panicking; remove .filter_map(Result::ok) and
.unwrap() and reference the iterator variable (res) and ainfo when implementing
this error-aware selection.
---
Outside diff comments:
In `@crates/stdlib/src/_asyncio.rs`:
- Around line 842-851: The PyFuture Representable impl currently defines
repr_str returning String; change it to implement repr_wtf8 to match PyTask and
the WTF-8 migration: replace the repr_str(&zelf, vm) -> PyResult<String> with
repr_wtf8(&zelf, vm) -> PyResult<Wtf8Buf>, keep the same ReprGuard usage and
call to get_future_repr_info(zelf.as_object(), vm) to produce the inner text,
and convert that text into a Wtf8Buf for the Ok(...) result; remove the old
repr_str implementation and ensure Wtf8Buf is imported/available in the module.
In `@crates/stdlib/src/csv.rs`:
- Around line 141-146: The code currently validates single characters by
checking bytes (s.as_bytes().iter().copied().exactly_one()), which rejects valid
multi-byte Unicode characters; replace those byte-based checks with
character-based checks using s.chars().exactly_one().map_err(...) so you extract
a char (not a u8) and return that char where delimiter/quotechar/escapechar are
parsed; apply this same change to all similar spots (the branches around the
snippets at the other mentioned occurrences) so delimiter, quotechar and
escapechar use s.chars().exactly_one() and the error message remains the same.
In `@crates/vm/src/codecs.rs`:
- Around line 114-136: Change the errors parameter type for
get_incremental_encoder and get_incremental_decoder from Option<PyStrRef> to
Option<PyUtf8StrRef> and mirror the same conversion pattern used by
encode/decode: convert the Option<PyUtf8StrRef> into the vm.call_method args
(e.g., map Some(e) to vec![e.into()] or the exact conversion used by
encode/decode) so the error-handler names use the same UTF-8 validated type;
also update call sites in crates/vm/src/stdlib/io.rs that invoke these methods
to pass Option<PyUtf8StrRef>.
---
Duplicate comments:
In `@crates/stdlib/src/csv.rs`:
- Around line 297-302: The code currently maps non-string `name` arguments to
`_csv.Error` by calling `new_csv_error` in the
`name.downcast::<PyStr>().map_err(...)` closures in the dialect lookup/remove
paths; change those closures to raise a `TypeError` instead (use the same
TypeError construction used by `register_dialect`/other arg-type failures) so
invalid `name` types produce a TypeError rather than `_csv.Error`; update both
occurrences around the `name.downcast::<PyStr>()` call (the block shown and the
similar block at the 317–322 region).
In `@crates/vm/src/builtins/namespace.rs`:
- Around line 158-160: The repr for SimpleNamespace currently skips keys whose
key_str.as_wtf8().is_empty(), which incorrectly omits valid "" keys; in the
SimpleNamespace.__repr__ implementation remove the conditional that continues on
empty key (i.e., stop filtering out key_str.as_wtf8().is_empty()) so
empty-string keys are included in the repr, and ensure the existing key
formatting logic (the same code paths that quote/escape other keys) is used for
the "" key so it renders as an empty quoted string in the output.
In `@crates/vm/src/frame.rs`:
- Around line 2666-2672: The code currently calls
downcast_ref::<PyUtf8Str>().unwrap() when iterating kwarg_names, which can panic
on malformed CALL_KW metadata; change this to fallible handling like using
downcast_ref::<PyUtf8Str>().ok_or_else(...) and propagate a proper Err (matching
the error type used elsewhere in this file) instead of unwrapping, update the
iterator/map to return Results (or collect into a Result<Vec<_>>) and adjust the
surrounding code/call site to handle the Result; refer to the kwarg_names
variable and mirror the pattern used at the similar site around line 2688 that
uses ok_or_else for guidance.
In `@crates/vm/src/stdlib/ctypes.rs`:
- Around line 1119-1122: Gate the PyStr branch with a UTF-8 validation: replace
the raw s.as_bytes().as_ptr() usage with a call to s.try_as_utf8() (or
equivalent) and only push the pointer and Type::pointer() when try_as_utf8()
returns Ok(slice). If try_as_utf8() returns Err, return or raise a Python/FFI
error (e.g., TypeError/ValueError) instead of passing a raw pointer so callers
using arg.downcast_ref::<crate::builtins::PyStr>() can't receive non-UTF-8 data
via arg_values/arg_types.
In `@crates/vm/src/stdlib/ctypes/function.rs`:
- Around line 756-759: The code currently calls s.to_str().expect(...) when
probing `_type_` (variable s) which can panic on non-UTF8 and abort the VM;
instead, change the conversion to handle invalid UTF-8 gracefully (e.g., use
s.to_str().ok() and return false if None) so the `_type_` probe fails closed;
update the branch that uses c (the resulting &str) — the check if c.len() == 1
&& "sPzUZXO".contains(c) — to only run when the to_str() succeeded and otherwise
return false from the probe function.
- Around line 283-289: The conversion in conv_param currently builds a Vec<u16>
via s.as_wtf8().encode_wide(), which hardcodes UTF-16 and breaks on platforms
where wchar_t is 32-bit; replace this with a platform-aware conversion that
produces bytes sized to wchar_t (use the existing str_to_wchar_bytes helper in
ctypes::base or add one that uses cty::wchar_t / cfg(target_family = "windows")
to pick u16 vs u32), and then assign wide_bytes from that helper instead of from
wide.iter().flat_map(...); update the code paths referencing wide_bytes in
conv_param to use the new helper so wide/terminator encoding matches the target
wchar_t width.
In `@crates/vm/src/stdlib/functools.rs`:
- Around line 426-431: Replace the panic-prone expect_str() call when merging
keyword keys: instead of calling key_str.expect_str().to_owned(), use the
fallible to_str() on the PyStr returned by
downcast_ref::<crate::builtins::PyStr>() and convert that Option<&str> into a
TypeError via vm.new_type_error if it is None; then insert the owned string
(e.g., key_str.to_str().ok_or_else(|| vm.new_type_error("keywords must be
strings"))?.to_owned()) into final_kwargs so surrogate-containing strings raise
a TypeError instead of panicking.
In `@crates/vm/src/stdlib/msvcrt.rs`:
- Around line 85-87: The calls to expect_str() in the wide-char path of putwch
and ungetwch can panic on surrogate-containing Python strings; replace the
expect_str() extraction with surrogate-safe iteration by using the Python
string's as_wtf8().code_points() iterator and consume those code points where
the current code expects chars().exactly_one(). Update the implementations of
putwch and ungetwch to iterate over as_wtf8().code_points(), map/convert the
WTF-8 code points into the u16/u32 values used for _putwch/_ungetwch, and remove
any expect_str() usage so no panics occur on invalid UTF-8/surrogates.
In `@crates/vm/src/stdlib/operator.rs`:
- Around line 583-586: The current conversion of name via try_into_value(vm)
collapses all errors into vm.new_type_error("method name must be a string");
update that mapping to preserve UTF-8 validation errors by inspecting the
original error returned from name.try_into_value(vm) and: if it's a
UTF-8/unicode validation error, propagate it (e.g., return
vm.new_unicode_encode_error(...) or forward the original unicode error),
otherwise convert non-unicode failures to vm.new_type_error("method name must be
a string"); change the closure used with map_err (or replace map_err with a
match on the Err) so try_into_value(vm) can return the distinct unicode error
instead of always creating a TypeError while still returning TypeError for
genuine non-string cases, leaving the rest of the constructor (Self { name, args
}) unchanged.
In `@crates/vm/src/stdlib/os.rs`:
- Around line 495-497: The use of PyStrRef::expect_str() in the Windows env APIs
(e.g., putenv and unsetenv) can panic on surrogate-containing input; replace
calls to expect_str() with a non-panicking conversion (e.g., using the PyStrRef
method that returns a Result/&str) and on error return an appropriate Python
exception via the VirtualMachine (e.g., vm.new_type_error or
vm.new_value_error/new_unicode_decode_error) instead of panicking so the
functions (putenv, unsetenv) return a PyResult::Err when given invalid/surrogate
input.
In `@crates/vm/src/vm/mod.rs`:
- Around line 809-812: The code extracts module key names using
key.downcast_ref::<PyUtf8Str>() which fails for non-UTF-8 (WTF-8) keys and falls
back to empty string; change the downcast to use PyStr instead (e.g.
key.downcast_ref::<PyStr>()) and obtain the string representation that preserves
WTF-8 content (use PyStr's appropriate to-owned/string conversion) so names from
sys.modules are recorded correctly instead of becoming empty via
unwrap_or_default(); update the expression that builds name and remove the
unsafe fallback to empty string.
In `@crates/wasm/src/browser_module.rs`:
- Line 67: Several call sites (notably the FetchResponseFormat::from_str(vm,
s.expect_str())? usage) still use expect_str(), which can panic on
surrogate-containing Python strings; replace these with non-panicking
conversions by using the safe to_str() and propagate the Python error (e.g.,
return Err or use ?), or where lossy conversion is acceptable use
to_string_lossy()/as_wtf8-like API. Update all occurrences referenced in the
comment (lines near FetchResponseFormat::from_str, and the other noted
expect_str() sites) to call the safe conversion on the Python string type and
handle the Result/Error instead of unwrapping, ensuring function signatures
propagate errors as needed.
In `@src/shell.rs`:
- Around line 145-146: The code in the REPL loop uses expect_str(), which can
panic on surrogate-containing sys.ps1/sys.ps2 values; change the prompt
extraction to a non-panicking, lossy conversion (e.g., replace expect_str() with
s.to_string_lossy().into_owned() or equivalent) and return an owned String (use
String::new() instead of "" in the Err arm) so the match on Ok(ref s) => ... and
Err(_) => ... produces the correct safe String without risking a panic.
In `@src/shell/helper.rs`:
- Around line 139-145: The code uses expect_str() which can panic on
surrogate-containing strings; change the mapping over completions to a fallible
conversion and skip invalid entries instead of panicking. Replace the map(|s|
s.expect_str().to_owned()) call with a filter_map that calls the non-panicking
conversion (e.g. s.as_wtf8().ok().map(|v| v.to_owned()) or another fallible API
you use elsewhere like as_bytes()/as_wtf8()), so invalid/surrogate-containing
completions are filtered out rather than causing a panic (update the iterator
from map to filter_map accordingly around the completions.into_iter()).
---
Nitpick comments:
In `@crates/common/src/str.rs`:
- Around line 261-267: Rename the method as_str to to_str to follow the repo's
"as" (zero-cost) vs "to" (small-cost) naming convention: update the function
signature (pub fn to_str(&self) -> Option<&str>) replacing all uses of as_str,
update any doc comments/tests/benchmarks referencing as_str, and remove the TODO
comment; if there are external usages you can't change in this PR, add a
deprecated shim pub fn as_str(&self) -> Option<&str> that forwards to to_str and
mark it with #[deprecated] so callers get migrated in follow-ups.
In `@crates/stdlib/src/_asyncio.rs`:
- Around line 936-944: The repr builder currently propagates string-conversion
errors via x.str(vm)? causing repr() to fail; make it consistent with
get_future_repr_info by silently skipping items that fail to convert: replace
the x.str(vm)? usage in the loop that builds result (the
parts.iter().enumerate() loop) with an if let Ok(s) = x.str(vm) { if i > 0 {
result.push_str(" "); } result.push_wtf8(s.as_wtf8()); } so failures are ignored
and repr returns Ok(result) as before.
In `@crates/stdlib/src/ssl.rs`:
- Around line 811-825: _unify hostname validation by routing all server_hostname
entry points through the existing validate_hostname helper (the same used by
set_server_hostname and _wrap_bio) instead of doing ad-hoc checks; specifically
update _wrap_socket handling and the WrapBioArgs/WrapSocketArgs server_hostname
parsing to accept the UTF-8 variant and call validate_hostname (including the
max-length rule) before using the hostname, ensuring you reuse the same
validation path and error reporting as validate_hostname so behavior is
consistent across set_server_hostname, _wrap_bio, and _wrap_socket.
In `@crates/vm/src/builtins/str.rs`:
- Around line 2120-2130: The use of unsafe unreachable_unchecked in
Py<PyUtf8Str>::as_str() is unnecessary and can cause UB in release builds if the
invariant is violated; replace the unwrap_or_else + debug_assert +
unreachable_unchecked pattern by a safe unwrap/expect on as_pystr().to_str()
(e.g., call to_str().expect("PyUtf8Str invariant violated")) so any invariant
break panics rather than invoking undefined behavior, keeping the error message
for debugging while removing unsafe code; locate the as_str method and change
the fallback from unreachable_unchecked to a safe expect/unwrap on the result of
as_pystr().to_str().
In `@crates/vm/src/utils.rs`:
- Around line 58-62: The code currently calls
parts_iter.next().transpose()?.expect(...) which will panic on an empty
iterator; replace the expect-driven panic with graceful handling: match or if
let on parts_iter.next().transpose()? and when it yields None return an
appropriate Err (using the function's Result/error type) with a clear message
instead of panicking, otherwise bind to first and continue to call
repr.push_wtf8(first.as_wtf8()); update any signatures if needed to propagate
the error rather than panicking.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Lib/test/test_tstring.pyis excluded by!Lib/**
📒 Files selected for processing (120)
crates/common/src/str.rscrates/stdlib/src/_asyncio.rscrates/stdlib/src/_sqlite3.rscrates/stdlib/src/array.rscrates/stdlib/src/binascii.rscrates/stdlib/src/contextvars.rscrates/stdlib/src/csv.rscrates/stdlib/src/grp.rscrates/stdlib/src/hashlib.rscrates/stdlib/src/json.rscrates/stdlib/src/locale.rscrates/stdlib/src/lzma.rscrates/stdlib/src/multiprocessing.rscrates/stdlib/src/overlapped.rscrates/stdlib/src/posixshmem.rscrates/stdlib/src/pyexpat.rscrates/stdlib/src/pystruct.rscrates/stdlib/src/socket.rscrates/stdlib/src/ssl.rscrates/stdlib/src/unicodedata.rscrates/stdlib/src/zlib.rscrates/vm/src/buffer.rscrates/vm/src/builtins/bool.rscrates/vm/src/builtins/classmethod.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/complex.rscrates/vm/src/builtins/dict.rscrates/vm/src/builtins/float.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/function/jit.rscrates/vm/src/builtins/genericalias.rscrates/vm/src/builtins/int.rscrates/vm/src/builtins/interpolation.rscrates/vm/src/builtins/list.rscrates/vm/src/builtins/mappingproxy.rscrates/vm/src/builtins/memory.rscrates/vm/src/builtins/mod.rscrates/vm/src/builtins/module.rscrates/vm/src/builtins/namespace.rscrates/vm/src/builtins/object.rscrates/vm/src/builtins/range.rscrates/vm/src/builtins/set.rscrates/vm/src/builtins/slice.rscrates/vm/src/builtins/staticmethod.rscrates/vm/src/builtins/str.rscrates/vm/src/builtins/template.rscrates/vm/src/builtins/tuple.rscrates/vm/src/builtins/type.rscrates/vm/src/builtins/weakproxy.rscrates/vm/src/bytes_inner.rscrates/vm/src/codecs.rscrates/vm/src/coroutine.rscrates/vm/src/dict_inner.rscrates/vm/src/exception_group.rscrates/vm/src/exceptions.rscrates/vm/src/frame.rscrates/vm/src/function/buffer.rscrates/vm/src/function/fspath.rscrates/vm/src/function/mod.rscrates/vm/src/import.rscrates/vm/src/intern.rscrates/vm/src/macros.rscrates/vm/src/protocol/object.rscrates/vm/src/stdlib/_abc.rscrates/vm/src/stdlib/_winapi.rscrates/vm/src/stdlib/_wmi.rscrates/vm/src/stdlib/ast.rscrates/vm/src/stdlib/ast/basic.rscrates/vm/src/stdlib/ast/python.rscrates/vm/src/stdlib/ast/repr.rscrates/vm/src/stdlib/builtins.rscrates/vm/src/stdlib/codecs.rscrates/vm/src/stdlib/collections.rscrates/vm/src/stdlib/ctypes.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes/base.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/simple.rscrates/vm/src/stdlib/ctypes/structure.rscrates/vm/src/stdlib/ctypes/union.rscrates/vm/src/stdlib/functools.rscrates/vm/src/stdlib/imp.rscrates/vm/src/stdlib/io.rscrates/vm/src/stdlib/itertools.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/stdlib/msvcrt.rscrates/vm/src/stdlib/nt.rscrates/vm/src/stdlib/operator.rscrates/vm/src/stdlib/os.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/signal.rscrates/vm/src/stdlib/sre.rscrates/vm/src/stdlib/symtable.rscrates/vm/src/stdlib/sys.rscrates/vm/src/stdlib/thread.rscrates/vm/src/stdlib/time.rscrates/vm/src/stdlib/typevar.rscrates/vm/src/stdlib/typing.rscrates/vm/src/stdlib/winreg.rscrates/vm/src/stdlib/winsound.rscrates/vm/src/suggestion.rscrates/vm/src/types/slot.rscrates/vm/src/types/structseq.rscrates/vm/src/utils.rscrates/vm/src/vm/context.rscrates/vm/src/vm/interpreter.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/vm_new.rscrates/vm/src/warn.rscrates/wasm/src/browser_module.rscrates/wasm/src/convert.rscrates/wasm/src/js_module.rscrates/wasm/src/wasm_builtins.rscrates/wtf8/src/concat.rscrates/wtf8/src/lib.rsexample_projects/wasm32_without_js/rustpython-without-js/src/lib.rssrc/lib.rssrc/shell.rssrc/shell/helper.rs
✅ Files skipped from review due to trivial changes (1)
- crates/wtf8/src/concat.rs
🚧 Files skipped from review as they are similar to previous changes (33)
- crates/stdlib/src/lzma.rs
- crates/vm/src/function/fspath.rs
- crates/wasm/src/convert.rs
- crates/stdlib/src/unicodedata.rs
- crates/wasm/src/js_module.rs
- crates/vm/src/stdlib/collections.rs
- crates/vm/src/builtins/weakproxy.rs
- crates/vm/src/builtins/range.rs
- crates/vm/src/vm/context.rs
- crates/vm/src/builtins/classmethod.rs
- crates/vm/src/builtins/dict.rs
- crates/vm/src/stdlib/winsound.rs
- crates/vm/src/macros.rs
- crates/vm/src/stdlib/ast/basic.rs
- src/lib.rs
- crates/vm/src/stdlib/ast.rs
- crates/stdlib/src/posixshmem.rs
- crates/vm/src/builtins/memory.rs
- crates/stdlib/src/overlapped.rs
- crates/vm/src/stdlib/ctypes/pointer.rs
- crates/vm/src/types/structseq.rs
- crates/stdlib/src/zlib.rs
- crates/vm/src/builtins/mappingproxy.rs
- crates/vm/src/function/buffer.rs
- crates/vm/src/stdlib/ctypes/base.rs
- example_projects/wasm32_without_js/rustpython-without-js/src/lib.rs
- crates/vm/src/stdlib/sys.rs
- crates/vm/src/builtins/module.rs
- crates/vm/src/stdlib/_wmi.rs
- crates/stdlib/src/_sqlite3.rs
- crates/vm/src/builtins/staticmethod.rs
- crates/vm/src/stdlib/typevar.rs
- crates/vm/src/builtins/code.rs
| && let Some(name_str) = name.to_str() | ||
| && let Some(algo) = name_str.strip_prefix("openssl_") | ||
| { | ||
| return Ok(algo.to_owned()); |
There was a problem hiding this comment.
Normalize __name__-derived algorithm names to lowercase as well.
The direct str branch lowercases names, but the __name__ branch preserves case. Mixed-case values like openssl_SHA256 will fail later even though equivalent lowercase input succeeds.
Suggested patch
- if let Ok(name_obj) = digestmod.get_attr("__name__", vm)
+ if let Ok(name_obj) = digestmod.get_attr("__name__", vm)
&& let Some(name) = name_obj.downcast_ref::<PyStr>()
&& let Some(name_str) = name.to_str()
&& let Some(algo) = name_str.strip_prefix("openssl_")
{
- return Ok(algo.to_owned());
+ return Ok(algo.to_lowercase());
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/stdlib/src/hashlib.rs` around lines 228 - 231, The __name__ path
preserves case causing mixed-case algorithm names to slip through; update the
branch that binds name_str and algo (the let Some(name_str) / let Some(algo) =
name_str.strip_prefix("openssl_") branch) to normalize algo to lowercase before
returning (e.g., replace returning algo.to_owned() with returning a lowercased
version such as algo.to_ascii_lowercase()) so it matches the behavior of the
direct str branch.
| let mut res = dns_lookup::getaddrinfo(Some(host_str), Some(&service), Some(hints)) | ||
| .map_err(|e| convert_socket_error(vm, e, SocketError::GaiError))? | ||
| .filter_map(Result::ok); | ||
| let mut ainfo = res.next().unwrap(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# First, let's look at the getnameinfo function context
echo "=== Checking getnameinfo function around lines 3024-3027 ==="
sed -n '3010,3040p' crates/stdlib/src/socket.rs
echo ""
echo "=== Full function context starting from getnameinfo ==="
rg -B 20 -A 30 'fn getnameinfo' crates/stdlib/src/socket.rs | head -80Repository: RustPython/RustPython
Length of output: 3331
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the entire getnameinfo function to understand the error handling flow
echo "=== Searching for getnameinfo function definition ==="
rg -n 'fn getnameinfo' crates/stdlib/src/socket.rs
echo ""
echo "=== Get line count for context ==="
wc -l crates/stdlib/src/socket.rsRepository: RustPython/RustPython
Length of output: 212
Avoid panic and silent error loss in getnameinfo addrinfo iteration.
At lines 3024-3027, .filter_map(Result::ok) silently discards iterator errors, and .unwrap() on res.next() will panic if no successful entry exists. The function's return type Result<(String, String), IoOrPyException> expects proper error handling, not panics.
🔧 Suggested fix
- let mut res = dns_lookup::getaddrinfo(Some(host_str), Some(&service), Some(hints))
- .map_err(|e| convert_socket_error(vm, e, SocketError::GaiError))?
- .filter_map(Result::ok);
- let mut ainfo = res.next().unwrap();
- if res.next().is_some() {
+ let mut res = dns_lookup::getaddrinfo(Some(host_str), Some(&service), Some(hints))
+ .map_err(|e| convert_socket_error(vm, e, SocketError::GaiError))?;
+ let mut ainfo = res
+ .next()
+ .ok_or_else(|| vm.new_os_error("sockaddr resolved to no addresses".to_owned()))?
+ .map_err(IoOrPyException::from)?;
+ if res.next().transpose().map_err(IoOrPyException::from)?.is_some() {
return Err(vm
.new_os_error("sockaddr resolved to multiple addresses".to_owned())
.into());
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/stdlib/src/socket.rs` around lines 3024 - 3027, The code currently
uses .filter_map(Result::ok) and .next().unwrap() on the iterator returned by
dns_lookup::getaddrinfo, which silently drops Errs and panics if there are no Ok
entries; update the logic in the function that returns Result<(String, String),
IoOrPyException> to explicitly handle iterator errors: iterate the
dns_lookup::getaddrinfo(...) iterator (or collect into a Vec<Result<_,_>>), look
for the first Ok value to use as ainfo, and if none exist return an appropriate
IoOrPyException by converting the first/last Err via convert_socket_error(vm,
err, SocketError::GaiError) instead of panicking; remove .filter_map(Result::ok)
and .unwrap() and reference the iterator variable (res) and ainfo when
implementing this error-aware selection.
Fix #7196
Summary by CodeRabbit
Improvements
New Features
Bug Fixes