Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: deflate compatible file format #38

Open
orent opened this issue May 8, 2022 · 4 comments
Open

Idea: deflate compatible file format #38

orent opened this issue May 8, 2022 · 4 comments

Comments

@orent
Copy link

orent commented May 8, 2022

One thing that limits the applicability of streaming verification is the use a a non-standard format. I believe it could help adoption if the same unmodified file or URL could be used for both verified and unverified streaming.

This can be done be masquerading as a deflated stream.

The stream would consist of alternating blocks of uncompressed stored data (00) and "compressed" (10) blocks that contain only a fake huffman table with the verification hashes but is not followed by any actual data encoded using this table.This interleaved stream could be processed by any standard deflate decoder into the original data. The block sizes used can be fixed and deterministic, supporting random access to any position.

When served by an HTTP server the stream can use the deflate or gzip Content-Encoding: to be transparently processed by any standard naive client. The end user will not even see the wrapper file or need to manually gunzip it. An aware client (e.g. an extension to curl) can use the extra data to verify the stream as it is downloaded.

@oconnor663
Copy link
Owner

Have you looked at the --outboard command line flag? I think it achieves what you want here in a simpler way. The downside is that you need two streams, so it doesn't work well in shell pipelines, but my hope is that that's not a problem in regular code.

@oconnor663
Copy link
Owner

@orent
Copy link
Author

orent commented May 8, 2022

I am aware that a separate verification metadata stream can be used. But this is not the default - and for a good reason.

What I am suggesting is a tweak to the “in board” interleaved format that makes it possible to decode (without verification) using a ubiquitous tool.

@oconnor663
Copy link
Owner

oconnor663 commented May 8, 2022

I don't think it would make sense to support this as a first class feature of Bao, but it wouldn't be too hard to write a separate utility that converted in between the standard inline format and this deflate-based version. The main thing you need is a function to tell you how many parents nodes come before each chunk, which already exists in the Bao code. We could make it public, or you could just copy/paste it. (It's a few lines of arithmetic, and easy to test, though kind of tricky to get right the first time.)

One concern I'd have is that, since Bao is a security tool, we want to be really careful about any encoding that different clients might interpret in different ways. I'm not familiar enough with the deflate format to know what to look for here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants