-
Meta: HTML check only if needed, suppress warnings
This restores 9ce8d47 to run the HTML checker by the network API using curl but refines it such that: * the checker is only run on index.html output if encoding.bs has changed * the checker is only run on all other HTML output if the visualize.py file has changed or if any .txt sources have been changed or added * if curl hits a timeout, it will retry each request up to 5 times. From the curl(1) man page: > --retry <num> > If a transient error is returned when curl tries to perform a > transfer, it will retry this number of times before giving up. > Setting the number to 0 makes curl do no retries (which is the > default). Transient error means either: a timeout, an FTP 4xx > response code or an HTTP 5xx response code. > > When curl is about to retry a transfer, it will first wait one > second and then for all forthcoming retries it will double the > waiting time until it reaches 10 minutes which then will be the > delay between the rest of the retries. Addresses #97
-
gb18030 decoder: unwind from fourth byte when it's not a digit
Instead of always unwinding if there’s no code point when consuming the fourth byte, only unwind when the fourth byte is not an ASCII digit. This does mean that ASCII digits can be masked, but since ASCII digits are not used as delimiter in any format this is highly unlikely to be used in any attacks (and also matches existing implementations better). Fixes #110.
-
-
Editorial: check non-null before null
In particular in all places with two null checks in a row and for EUC-JP as it would stand out otherwise.
-
EUC-JP decoder: only unwind ASCII bytes
See #59 (comment) for context.
-
Revert "Meta: use HTML checker network API & suppress warnings"
This reverts commit 9ce8d47. Way too many timeouts in the build process.
-
Meta: use HTML checker network API & suppress warnings
Since we don't make regular changes having a long built time seems acceptable. We might want to consider tweaking this once we start actively working on JavaScript transform streams. Fixes #97.
-
-
Document minimal implementation requirements
This makes it clearer that alternative implementation strategies are possible. Fixes #44.
-
Add visualizations for the indexes
PUA and NFC validation warnings are expected. Exposing the PUA code points as PUA in HTML is useful to seeing what those code points map to (if to anything) in system fonts, which may give insights about their usage. Exposing singletons (compatibility ideographs and scientific units) without normalizing them is useful for having browser "Find" functionality match whatever you get as output of a converter you might be developing. Closes #78.
-
Editorial: mark certain references non-normative
Also remove the DOM reference as it’s not actually needed for anything. Fixes #86.
-
Note >8835 pointers in index jis0208 cannot be reached
In EUC-JP and ISO-2022-JP encoders, when getting a pointer for a code point in index jis0208, it’s important to note that the pointer is always less than 8836. This is because the code points are duplicated throughout the table and the index pointer returns the first. Per a suggestion in #47.
-
-
Editorial: convert to Bikeshed
I followed the steps set out in whatwg/fullscreen@f9df3ea and ended up with a nearly identical copy. For diffing the words I used git diff --word-diff -w per whatwg/fetch#399 (comment).
annevk committedNov 10, 2016 -
Editorial: rename resources before Bikeshed conversion
This gives the best git blame results.
annevk committedNov 10, 2016 -
Editorial: prepare for conversion to Bikeshed
annevk committedNov 10, 2016
-
windows-1255 map 0xCA to U+05BA
Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, it seems best to align with Windows here. Fixes #73.
annevk committedOct 24, 2016

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
