Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Implement an early-decrypting check #5171
Conversation
| verify_data=False, save_space=False): | ||
| """Perform a set of checks on 'repository' | ||
| """A checker of repostiory archives |
| def verify_data(self): | ||
| def do_verify_data(self): |
kdmurray91
May 4, 2020
Author
as we now have a boolean flag we need to save to self:
def __init__(self, ..., verify_data=False):
...
self.verify_data = verify_data
| checker = ArchiveChecker(repository, repair=args.repair, archive=args.location.archive, first=args.first, | ||
| last=args.last, sort_by=args.sort_by or 'ts', glob=args.glob_archives, | ||
| verify_data=args.verify_data, save_space=args.save_space) | ||
| except Exception as exc: | ||
| self.print_error(str(exc)) | ||
| return EXIT_WARNING | ||
| if not args.archives_only: | ||
| if not repository.check(repair=args.repair, save_space=args.save_space, max_duration=args.max_duration): |
ThomasWaldmann
May 4, 2020
Member
the potential fundamental problem with first accessing the repo inside ArchiveChecker.__init__ before doing repository.check is that we do not know whether we are working on valid data.
The repo check / repair makes sure that the fundamental data is ok (in memory and in repair mode also on disk).
The archives check / repair builds upon that (and assumes that lower level structures are ok).
So, guess there needs to be pretty much review before touching repo data before the repo has been checked.
kdmurray91
May 4, 2020
Author
OK, and that's what I feared. Is there some way to verify only the chunks that contain the metadata needed to decrypt in ArchiveChecker.__init__(), so we know that the crypto & manifest & whatever other metadata is kosher, but don't spend 5 hours doing all of repo/data/*
ThomasWaldmann
Jun 4, 2020
Member
yes, a "miniature version" of the borg repo check, that only makes sure that 1 chunk is ok (which could be the manifest (id 0) or any other chunk). And then detect crypto based on that chunk.
that might work, needs checking...
| self.init_chunks() | ||
| self.key = self.identify_key(repository) |
ThomasWaldmann
May 4, 2020
Member
guess these 2 need checking.
do these have any unwanted consequences when run before the repo check has happened?
|
Hmm, what are we doing with this PR? |
@ThomasWaldmann I don't think I have the confidence to verify that my changes here have no terrible side effects, so this PR has somewhat stalled. As far as I can tell it's OK, but I don't have anywhere near the knowledge of Borg's internals I'd need to be confident about that. I guess I'd need someone with good knowledge of Borg's innards to check my change and see that I'm not doing anything stupid. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Hello,
(NB: this is definitely a draft)
This is my quick attempt at implementing the early-decrypting behaviour asked for in #5170. I'm not sure if it behaves exactly as I wish, as i haven't trusted it to run on a real repo until it passes a review in case i've screwed something up. The tests that relate to
borg checkor archive checking seem to pass fine, but there are a couple of tests failing for reasons I can't easily fathom (hoping these failures go away in CI).Cheers,
Kevin