【Need Update csv.rs】Update csv.py and test_csv.py from CPython v3.12 #5176
【Need Update csv.rs】Update csv.py and test_csv.py from CPython v3.12 #5176youknowone merged 4 commits intoRustPython:mainfrom
Conversation
|
Please, go ahead and update the behavior of |
|
Does the current csv implementation not incorporate the dialect feature? |
|
I don't remember the exact state. Since test_csv didn't exist, I guess our csv implementation was very immature. |
|
I have done some work to make the csv module compatible with test_csv.py, but due to my unfamiliarity with the API, the current version contains too much template code and redundant operations. It looks like the implementation is a bit messy. I will try to optimize the code and submit a new commit. I will close the current commit because I have a few more commits to make. Here are the test results. |
|
Currently, I have adapted most of the behaviors specified by the Python CSV module, but there are still 18 test cases failing, which I have marked as 'failed.' The functionality of the module should be working as expected at the moment. By the way, I have used modules from std in the CSV module, such as Hashmap and Mutex, and initialized using Onecell. I'm not sure if this is feasible, or if I should switch to using PyMutex instead? |
|
@youknowone have a review? |
youknowone
left a comment
There was a problem hiding this comment.
Thank you for many efforts to add a new test and implementing a many features in rust native module!
I'm not sure if this is feasible, or if I should switch to using PyMutex instead?
PyMutex is single/multi thread support helpers. You can easilty turns parking_lot::Mutex to PyMutex. (but not std Mutex) I thinkPyMutexfits here. Otherwise we prefer to useparking_lot::PyMutexrather than std Mutex
The most of changes looks really good, but I have a bit of concerns about using intern_str. Could you check if they are intended to be actually interned or not?
stdlib/src/csv.rs
Outdated
| fn doublequote(&self, vm: &VirtualMachine) -> PyRef<PyInt> { | ||
| vm.ctx.new_bool(self.doublequote).to_owned() | ||
| } |
There was a problem hiding this comment.
The return type automatically be able to be turned into PyObjectRef.
So this is possible:
| fn doublequote(&self, vm: &VirtualMachine) -> PyRef<PyInt> { | |
| vm.ctx.new_bool(self.doublequote).to_owned() | |
| } | |
| fn doublequote(&self, vm: &VirtualMachine) -> bool { | |
| self.doublequote | |
| } |
stdlib/src/csv.rs
Outdated
| fn lineterminator(&self, vm: &VirtualMachine) -> PyRef<PyStr> { | ||
| match self.lineterminator { | ||
| Terminator::CRLF => vm.ctx.intern_str("\r\n".to_string()).to_owned(), | ||
| Terminator::Any(t) => vm.ctx.intern_str(format!("{}", t as char)).to_owned(), |
There was a problem hiding this comment.
is this t static value or dynamic value?
intern_str interns static strings. If this is a dynamic value,
| Terminator::Any(t) => vm.ctx.intern_str(format!("{}", t as char)).to_owned(), | |
| Terminator::Any(t) => vm.ctx.new_str(format!("{}", t as char)).to_owned(), |
There was a problem hiding this comment.
Haha, I didn't notice this API when I was looking through the API documentation. I was still wondering why there wasn't a basic API for creating dynamic strings.
stdlib/src/csv.rs
Outdated
| s @ PyStr => { | ||
| Ok(if s.as_str().bytes().eq(b"\r\n".iter().copied()) { | ||
| csv_core::Terminator::CRLF | ||
| } else if let Some(t) = s.as_str().bytes().next() { |
There was a problem hiding this comment.
| } else if let Some(t) = s.as_str().bytes().next() { | |
| } else if let Some(t) = s.as_str().as_bytes().first() { |
Does this check require to ensure s.len() == 1?
There was a problem hiding this comment.
I think this modification may be unnecessary. When s.len()<1, this function will throw an error, propagating the error to the upper layer, which conforms to the expected behavior in Python.
There was a problem hiding this comment.
Here I need to get ownership of the first character, and because it is of type u8 I believe that copying it is inexpensive.
There was a problem hiding this comment.
Oh, I am sorry to make confusion. The suggestion and question were not related. I thought creating an iterator was not necessary here (for suggestion), and worried if it requires to raise error when s.len() > 1 but it is missed (comment). *t will be a copy for Copy types.
There was a problem hiding this comment.
Oh, I am sorry to make confusion. The suggestion and question were not related. I thought creating an iterator was not necessary here (for suggestion), and worried if it requires to raise error when s.len() > 1 but it is missed (comment).
*twill be a copy for Copy types.
Oh, I see. Due to limitations in the current implementation within csv_core, the support for multiple characters in lineterminator is not complete.
Therefore, I intentionally ignored the case where s.len() > 1. I will add comments to explain this and thank you for your suggestion. I have already optimized the iterator part.
stdlib/src/csv.rs
Outdated
| })?; | ||
| let input = string.as_str().as_bytes(); | ||
|
|
||
| if input.is_empty() || input.starts_with(&[b'\n']) { |
There was a problem hiding this comment.
| if input.is_empty() || input.starts_with(&[b'\n']) { | |
| if input.is_empty() || input.starts_with(b"\n") { |
There was a problem hiding this comment.
Thank you for your review
Haha, I didn't notice this API when I was looking through the API documentation. I was still wondering why there wasn't a basic API for creating strings. I have modified part of the code, keeping the old code for multiple type returns and operations on one of the bytes. Now I have another question: is |
|
PyMutex doesn't seem to implement sync, so I switched back to parking_lot. |
Currently we don't support no-std. So that is ok while preparing it is still a good idea. |
|
Are there any requested changes that I need to address? |
e8c6c61 to
05b4ec4
Compare
youknowone
left a comment
There was a problem hiding this comment.
Thank you, I attached a commit with minor fixes
I accidentally made changes to |
Co-authored-by: Jeong, YunWon <jeong@youknowone.org>
|
Thank you so much! |
|
You're welcome. Actually, there's one more thing I'd like to discuss. Regarding |
|
We probably chose it as harvesting low-hanging fruit. The first csv module writers might not know about those limitation. Any reasonable suggestion will be appreciated. |
I've updated
csv.pyandtest_csv.py, but version 3.12 has changed the import options, and__version__now needs to be imported from_csv.c(which is nowcsv.rs). It seems thatcsv.rshas not yet aligned with the behavior ofcpythonupstream. Should we consider updatingcsv.rsor postponing this commit? Due to this reason, I'm still unable to testtest_csv.