-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Experimental compression remote - third attempt #4566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
51c7bf6 to
8532139
Compare
That seems like a great idea to me!
Is it 100% compatible gzip?
Nice - yes that seems like a good solution. You might want to upstream your changes to pgzip? Did you discuss with Klaus?
Yipee!
Promote it on the forum? |
Yes! The compression creates 100% standard gzip files no special sauce.
Not yet. The changes are not yet completely final, I consider them beta right know. More importantly I don't believe these changes are upstreamable. Pgzip has a very clear goal - being a faster (multithreaded) drop in replacement for compress/gzip. While the changes technically retain this ability and the compression part barely changed at all it still introduces some overhead that I'd see contrary to the goal of pgzip.
Yes! As soon as I've done a final cleanup pass on this. One think I forgot to mention in the first post is that this implementation obviously still retains one very ugly limitation in that the uncompressed file sizes are stored in the names of the compressed files. I really, really hate this but it's only way to have accurate file sizes without requiring List to download metadata and parse metadata files. |
Yes I see what you mean. It would be possible (but a long way from ideal!) to report the size as unknown Hmm, unless we implement a general metadata service for rclone... Which is certainly something I've been putting background cycles into thinking about. Just about all the backends can store metadata of some type (even if it is only a description field). |
8532139 to
b5c0372
Compare
|
I'm pretty much done with this for now (of course i still consider this experimental). What do you should I make beta and post in the forum? |
Yes let's get some real world testing - if you push it to a branch on the main repo then it will make a beta. I was thinking of making all pull requests by rclone team members to make a beta automatically. If I can figure that out then that would be quite useful I think. So if an rclone team member sends PR #4566 then this will make a beta with tag I'll give your code a look now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given that a first pass review - see comments inline.
I think it is looking nice!
Next thing is for me to play with it some more :-)
| return dst | ||
| } | ||
|
|
||
| // Sum returns the specified hash from the multihasher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea - should have done this ages ago! This commit needs to stay on its own, but the others can be squashed.
The title of this commit should be hash: add MultiHasher.Sum() to retrieve a single specific hash probably
e24e890 to
6b40a53
Compare
|
@ncw I've addressed all your comments now and also fixed a few more things I noticed while looking over the code again.
|
6b40a53 to
39d5ed9
Compare
38784e1 to
c96c4cf
Compare
|
this feature is pretty cool!! and I would love to see the metadata service @ncw mention. this meta service also will help for future chunking backends, allowing inline dedup/snapshot and other cool features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to be merged - what do you think
Are you happy with the metadata format? Is it extensible enough for future needs?
If you are ready, can you squash it into 2 commits (one for hash one for press) and rebase?
I'm very shortly going to add more context.Context into the backend interface (into NewFs and ConfigFs) so depending on the order of things it might need an update.
c96c4cf to
86e91bd
Compare
d720886 to
1f2acfd
Compare
|
Hopefully final rebase done. |
|
That looks good to me - well done! I'll leave it to you to press the merge button :-) |
, rclone#1356, rclone#675 This remote implements transparent compression using gzip. Uses JSON as a for storing metadata. Co-authored-by: id01 <gaviniboom@gmail.com>
1f2acfd to
4b71a1d
Compare
|
congratulations! |
|
Interesting work, shame you guys have gone for gzip instead of the way more efficient zstd algorithm. |
|
We're always open to improvements I did consider using zstd but the performance of the go implementation was still somewhat lacking compared to the gain in compression ratio so I decided against it for now. |


Let's give this another try. In the last few days I've once again reworked the experimental compression remote started by id01 and continued by me a while ago. I've never been quite happy with the original - it just seemed to be to complicated and trying to do too many things as result this third attempt is now much smaller scale.
Differences from the previous attempts.
Like the previous attempts this is still in the work in progress stage. But I'm opening this now in case someone is willing to test it.