Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upOverhaul repo rule storage #3999
Comments
This comment has been minimized.
This comment has been minimized.
|
I agree! Do you mean storage in the repo, or the XML format, or the SQLite format, or ...? :-) I was thinking recently that it's probably time to move the rulesets to a new repo that gets included into the main one. Submodules would be one way, but I would also like to have the property that if you check out latest https-everywhere and build, you get the latest rulesets, without needed to constantly update the submodule head in the code repo. If we do this, I think it makes sense to start the new repo as a fork of https-everywhere, deleting everything but the XML files, and moving the rulesets higher in the tree. That way we retain the commit history for the rulesets, which is important. We would also need to make sure that the ruleset tests get run in the ruleset repo in addition to the main repo. It may also be time to split up the rulesets directory by first letter. |
This comment has been minimized.
This comment has been minimized.
|
I was pondering ... all of the above! But primarily the repo storage / XML format. I think we should:
I've been pondering some YAML style rules, or perhaps simple, empty files for non-complex rules. What if someone could |
semenko
changed the title
Overhaul rule storage
Overhaul repo rule storage
Jan 27, 2016
This comment has been minimized.
This comment has been minimized.
I disagree on this one. I've thought about it a number of times, but I think there's nothing fundamental about the XML format that makes it unsuitable, and programmatically reformatting all our XML into YAML or JSON would lose all history. I am not a huge fan of XML, but it's not bad enough to kill. That said, I'm totally into improving our build-time transforms of the XML to improve stored size, in-memory size, and speed of access.
Yes. Caveat: This will increase the total volume of submissions greatly. We need to do two things first: 1. Decide on a scope for ruleset inclusion in HTTPS Everywhere. Let's Encrypt has issued for ~450,000 unique domains. Even if we assume all those domains work perfectly, it's clearly not suitable to include them all in HTTPS Everywhere. It's just too much data. 2. Improve our fetch testing so that it runs at the time a PR is submitted, and checks for mixed content both on the root page of a domain and on a small set of pages crawled from the root.
I'm not sure what you mean here. |
This comment has been minimized.
This comment has been minimized.
e.g. #3983 -- we sometimes |
This comment has been minimized.
This comment has been minimized.
vgturtle127
commented
Feb 24, 2016
This comment has been minimized.
This comment has been minimized.
|
Only one small part of it: Now we ship the extension with a .json file containing XML rulesets, instead of a .sqlite file containing XML rulesets. |
This comment has been minimized.
This comment has been minimized.
vgturtle127
commented
Mar 14, 2016
|
@jsha That is a step in the right direction, I would say. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

semenko commentedJan 25, 2016
Our ruleset storage is chaotic. Let's ponder overhauling it. (cc @fuglede)