-
-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
borg2: enhance compact stats #8410
Comments
Comment about what's interesting for practical usage: #122 (comment) |
This would definitely be useful: #122 (comment) Question: if compression and/or obfuscation is enabled, would the size stats be given for the native file, pre-compression etc? |
Something else, not sure if it relates specifically to this: let's say I have a specific file backed up. I know this because it appears in a list contents of the most recent archive. Let's say I wanted to eliminate this file from the whole repo, how would I do that. Would I simply delete the first instance of it being backup up and by doing so that would automatically eliminate all dedups? If so, how would I find it? Fusermout? |
@awgcooper No, it does not work like that. But you can use |
About "what do we win?" (see top post): I guess the only thing would be the "deduplication factor", computed as:
The first value is just the sum of all plaintext chunk sizes. To do that in a memory efficient way together with the already present stats (which need the compressed chunk sizes), we need to store the plaintext size AND the compressed size into the in-memory ChunkIndex we build. So, in the end, we could show deduplication and compression factors. |
When building a ChunkIndex it currently starts from refcount=0 and then sets refcount=MAX_VALUE if a chunk is used.
That's how most of borg2 works now: it doesn't do refcounting anymore, just a boolean "do we have chunk X".
For better deduplication stats in borg compact, we could deviate from that in just borg compact and do precise refcounting without any additional effort.
Before persisting the ChunkIndex, we then need to set refcounts to MAX_VALUE, similar as we clean up the size values.
To consider:
The text was updated successfully, but these errors were encountered: