
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
gpshead commentedAug 8, 2022
•
edited
Problem
A Denial Of Service (DoS) issue was identified in CPython because we use binary bignum’s for our
intimplementation. A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise.It is quite common for Python code implementing network protocols and data serialization to do
int(untrusted_string_or_bytes_value)on input to get a numeric value, without having limited the input length or to dolog("processing thing id %s", unknowingly_huge_integer)or any similar concept to convert anintto a string without first checking its magnitude. (http,json,xmlrpc,logging, loading large values into integer via linear-time conversions such as hexadecimal stored inyaml, or anything computing larger values based on user controlled inputs… which then wind up attempting to output as decimal later on). All of these can suffer a CPU consuming DoS in the face of untrusted data.Everyone auditing all existing code for this, adding length guards, and maintaining that practice everywhere is not feasible nor is it what we deem the vast majority of our users want to do.
This issue has been reported to the Python Security Response Team multiple times by a few different people since early 2020, most recently a few weeks ago while I was in the middle of polishing up the PR so it’d be ready before 3.11.0rc2.
Mitigation
After discussion on the Python Security Response Team mailing list the conclusion was that we needed to limit the size of integer to string conversions for non-linear time conversions (anything not a power-of-2 base) by default. And offer the ability to configure or disable this limit.
The Python Steering Council is aware of this change and accepts it as necessary.
The text was updated successfully, but these errors were encountered: