【Need Update csv.rs】Update csv.py and test_csv.py from CPython v3.12 by Blues-star · Pull Request #5176 · RustPython/RustPython

Blues-star · 2024-02-20T12:25:36Z

I've updated csv.py and test_csv.py, but version 3.12 has changed the import options, and __version__ now needs to be imported from _csv.c (which is now csv.rs). It seems that csv.rs has not yet aligned with the behavior of cpython upstream. Should we consider updating csv.rs or postponing this commit? Due to this reason, I'm still unable to test test_csv.

fanninpm · 2024-02-20T16:13:44Z

Please, go ahead and update the behavior of csv.rs.

Blues-star · 2024-02-22T02:13:03Z

Does the current csv implementation not incorporate the dialect feature?

youknowone · 2024-02-23T16:02:11Z

I don't remember the exact state. Since test_csv didn't exist, I guess our csv implementation was very immature.

Blues-star · 2024-02-25T17:44:58Z

I have done some work to make the csv module compatible with test_csv.py, but due to my unfamiliarity with the API, the current version contains too much template code and redundant operations. It looks like the implementation is a bit messy. I will try to optimize the code and submit a new commit. I will close the current commit because I have a few more commits to make. Here are the test results.

# cargo run -- -m unittest -v Lib.test.test_csv
----------------------------------------------------------------------
Ran 123 tests in 2.669s

FAILED (failures=4, errors=10, skipped=6, expected failures=11)

Blues-star · 2024-02-27T02:50:17Z

Currently, I have adapted most of the behaviors specified by the Python CSV module, but there are still 18 test cases failing, which I have marked as 'failed.' The functionality of the module should be working as expected at the moment. By the way, I have used modules from std in the CSV module, such as Hashmap and Mutex, and initialized using Onecell. I'm not sure if this is feasible, or if I should switch to using PyMutex instead?

Blues-star · 2024-02-27T09:51:24Z

@youknowone have a review？

youknowone

Thank you for many efforts to add a new test and implementing a many features in rust native module!

I'm not sure if this is feasible, or if I should switch to using PyMutex instead?
PyMutex is single/multi thread support helpers. You can easilty turns parking_lot::Mutex to PyMutex. (but not std Mutex) I think PyMutex fits here. Otherwise we prefer to use parking_lot::PyMutex rather than std Mutex

The most of changes looks really good, but I have a bit of concerns about using intern_str. Could you check if they are intended to be actually interned or not?

youknowone · 2024-02-27T10:57:52Z

stdlib/src/csv.rs

+        fn doublequote(&self, vm: &VirtualMachine) -> PyRef<PyInt> {
+            vm.ctx.new_bool(self.doublequote).to_owned()
+        }


The return type automatically be able to be turned into PyObjectRef.
So this is possible:

Suggested change

fn doublequote(&self, vm: &VirtualMachine) -> PyRef<PyInt> {

vm.ctx.new_bool(self.doublequote).to_owned()

}

fn doublequote(&self, vm: &VirtualMachine) -> bool {

self.doublequote

}

youknowone · 2024-02-27T11:00:06Z

stdlib/src/csv.rs

+        fn lineterminator(&self, vm: &VirtualMachine) -> PyRef<PyStr> {
+            match self.lineterminator {
+                Terminator::CRLF => vm.ctx.intern_str("\r\n".to_string()).to_owned(),
+                Terminator::Any(t) => vm.ctx.intern_str(format!("{}", t as char)).to_owned(),


is this t static value or dynamic value?

intern_str interns static strings. If this is a dynamic value,

Suggested change

Terminator::Any(t) => vm.ctx.intern_str(format!("{}", t as char)).to_owned(),

Terminator::Any(t) => vm.ctx.new_str(format!("{}", t as char)).to_owned(),

Haha, I didn't notice this API when I was looking through the API documentation. I was still wondering why there wasn't a basic API for creating dynamic strings.

stdlib/src/csv.rs

youknowone · 2024-02-27T11:10:46Z

stdlib/src/csv.rs

+            s @ PyStr => {
+                Ok(if s.as_str().bytes().eq(b"\r\n".iter().copied()) {
+                    csv_core::Terminator::CRLF
+                } else if let Some(t) = s.as_str().bytes().next() {


Suggested change

} else if let Some(t) = s.as_str().bytes().next() {

} else if let Some(t) = s.as_str().as_bytes().first() {

Does this check require to ensure s.len() == 1?

I think this modification may be unnecessary. When s.len()<1, this function will throw an error, propagating the error to the upper layer, which conforms to the expected behavior in Python.

Here I need to get ownership of the first character, and because it is of type u8 I believe that copying it is inexpensive.

Oh, I am sorry to make confusion. The suggestion and question were not related. I thought creating an iterator was not necessary here (for suggestion), and worried if it requires to raise error when s.len() > 1 but it is missed (comment). *t will be a copy for Copy types.

Oh, I am sorry to make confusion. The suggestion and question were not related. I thought creating an iterator was not necessary here (for suggestion), and worried if it requires to raise error when s.len() > 1 but it is missed (comment). *t will be a copy for Copy types.

Oh, I see. Due to limitations in the current implementation within csv_core, the support for multiple characters in lineterminator is not complete.

Therefore, I intentionally ignored the case where s.len() > 1. I will add comments to explain this and thank you for your suggestion. I have already optimized the iterator part.

youknowone · 2024-02-27T11:16:09Z

stdlib/src/csv.rs

            })?;
            let input = string.as_str().as_bytes();
-
+            if input.is_empty() || input.starts_with(&[b'\n']) {


Suggested change

if input.is_empty() || input.starts_with(&[b'\n']) {

if input.is_empty() || input.starts_with(b"\n") {

Thank you for your review

Blues-star · 2024-02-27T12:22:03Z

Thank you for many efforts to add a new test and implementing a many features in rust native module!

I'm not sure if this is feasible, or if I should switch to using PyMutex instead?
PyMutex is single/multi thread support helpers. You can easilty turns parking_lot::Mutex to PyMutex. (but not std Mutex) I think PyMutex fits here. Otherwise we prefer to use parking_lot::PyMutex rather than std Mutex

The most of changes looks really good, but I have a bit of concerns about using intern_str. Could you check if they are intended to be actually interned or not?

Haha, I didn't notice this API when I was looking through the API documentation. I was still wondering why there wasn't a basic API for creating strings.

I have modified part of the code, keeping the old code for multiple type returns and operations on one of the bytes.

Now I have another question: is std::hashmap feasible here, or do I need to replace it with another no-std library such as hashbrown?

Blues-star · 2024-02-27T13:02:11Z

PyMutex doesn't seem to implement sync, so I switched back to parking_lot.

Blues-star · 2024-02-27T14:04:36Z

@youknowone

youknowone · 2024-02-28T07:20:31Z

Now I have another question: is std::hashmap feasible here, or do I need to replace it with another no-std library such as hashbrown?

Currently we don't support no-std. So that is ok while preparing it is still a good idea.

Blues-star · 2024-02-29T03:37:22Z

Are there any requested changes that I need to address?

youknowone

Thank you, I attached a commit with minor fixes

Lib/test/test_csv.py

Blues-star · 2024-03-01T07:59:51Z

Thank you, I attached a commit with minor fixes

I accidentally made changes to csv.py earlier, but now I have reverted it back to the version for Python 3.12.

Co-authored-by: Jeong, YunWon <jeong@youknowone.org>

youknowone · 2024-03-05T13:37:16Z

Thank you so much!

Blues-star · 2024-03-05T16:00:20Z

You're welcome.

Actually, there's one more thing I'd like to discuss. Regarding csv_core as the core library for csv.rs, due to issues in the upstream implementation, fully conforming to the behavior specified in csv.py is quite challenging. For example, the linedelimiter enum type is limited to u8 and does not support arbitrary length strings in UTF-8 format. Is there any special significance in choosing csv_core as the core implementation, such as no_std or wsam support? Or should we consider looking for other third-party libraries to extend the current functionality as needed?

youknowone · 2024-03-05T16:24:24Z

We probably chose it as harvesting low-hanging fruit. The first csv module writers might not know about those limitation. Any reasonable suggestion will be appreciated.

youknowone mentioned this pull request Feb 21, 2024

Update Python libraries and tests from CPython 3.12 #5104

Closed

youknowone requested changes Feb 27, 2024

View reviewed changes

Blues-star requested a review from youknowone February 28, 2024 02:15

youknowone force-pushed the csv branch 2 times, most recently from e8c6c61 to 05b4ec4 Compare February 29, 2024 13:18

youknowone approved these changes Feb 29, 2024

View reviewed changes

youknowone reviewed Feb 29, 2024

View reviewed changes

Lib/test/test_csv.py Show resolved Hide resolved

Blues-star and others added 4 commits March 5, 2024 15:10

Update csv.py from CPython v3.12.0

88ee64d

Update test_csv.py from CPython v3.12.0

d2bf69e

Mark failing tests as expectedFailure

e4be47a

implement more csv features

54247df

Co-authored-by: Jeong, YunWon <jeong@youknowone.org>

youknowone force-pushed the csv branch from bf1a606 to 54247df Compare March 5, 2024 06:10

youknowone approved these changes Mar 5, 2024

View reviewed changes

youknowone merged commit 4c8cd67 into RustPython:main Mar 5, 2024

	Terminator::Any(t) => vm.ctx.intern_str(format!("{}", t as char)).to_owned(),
	Terminator::Any(t) => vm.ctx.new_str(format!("{}", t as char)).to_owned(),

	} else if let Some(t) = s.as_str().bytes().next() {
	} else if let Some(t) = s.as_str().as_bytes().first() {

	if input.is_empty() \|\| input.starts_with(&[b'\n']) {
	if input.is_empty() \|\| input.starts_with(b"\n") {

Conversation

Blues-star commented Feb 20, 2024

Uh oh!

fanninpm commented Feb 20, 2024

Uh oh!

Blues-star commented Feb 22, 2024

Uh oh!

youknowone commented Feb 23, 2024

Uh oh!

Blues-star commented Feb 25, 2024

Uh oh!

Blues-star commented Feb 27, 2024

Uh oh!

Blues-star commented Feb 27, 2024

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youknowone Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Blues-star commented Feb 27, 2024

Uh oh!

Blues-star commented Feb 27, 2024

Uh oh!

Blues-star commented Feb 27, 2024

Uh oh!

youknowone commented Feb 28, 2024

Uh oh!

Blues-star commented Feb 29, 2024

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Blues-star commented Mar 1, 2024

Uh oh!

youknowone commented Mar 5, 2024

Uh oh!

Blues-star commented Mar 5, 2024

Uh oh!

youknowone commented Mar 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

youknowone Feb 28, 2024 •

edited

Loading