-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Open
Labels
3.13bugs and security fixesbugs and security fixesperformancePerformance or resource usagePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
You can pass strings to csv.Sniffer that can generate significant Regex backtracing and processing time. For example
import csv
import time
NUM_ITERATIONS=200
# Example test str "","",""0""0
test_str = '"",'*NUM_ITERATIONS + '"'*NUM_ITERATIONS + '0' + '"'*NUM_ITERATIONS + '0'
print(test_str)
t0 = time.time()
dialect = csv.Sniffer().sniff(test_str)
t1 = time.time()
print(f"{t1-t0}")Some example runs
NUM_ITERATIONS Running Time (seconds)
1 0.0008
10 0.0030
100 254.32
I've checked against different versions of Python and they all return similar results.
For input NUM_ITERATIONS 100 above
Version : Running Time (Seconds)
Python 3.8.16 : 254
Python 3.9.16 : 250
Python 3.10.9 : 319
Python 3.11.1 : 236
This issue lies in this Regex for finding double quoted format
Line 274 in b303d3a
| r"((%(delim)s)|^)\W*%(quote)s[^%(delim)s\n]*%(quote)s[^%(delim)s\n]*%(quote)s\W*((%(delim)s)|$)" % \ |
I've done some testing and a zero length lookahead assertion (or atomic group) you can get a significant performance improvement
r"((%(delim)s)|^)\W*%(quote)s(?=(?P<zero>[^%(delim)s%(quote)s\n]*))(?P=zero)%(quote)s[^%(delim)s\n]*%(quote)s\W*((%(delim)s)|$)" % \CPython versions tested on:
3.8, 3.9, 3.10, 3.11
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixesperformancePerformance or resource usagePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Projects
Status
No status


