Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WARC compressed record length to the extraction #300

Open
anjackson opened this issue Sep 30, 2022 · 1 comment
Open

Add WARC compressed record length to the extraction #300

anjackson opened this issue Sep 30, 2022 · 1 comment
Assignees

Comments

@anjackson
Copy link
Contributor

Can we use e.g. a counting stream-reader to work out how long each WARC record is (compressed?).

@anjackson anjackson self-assigned this Sep 30, 2022
@tokee
Copy link
Collaborator

tokee commented Aug 11, 2023

We (the Royal Danish Library) would like this for CDX API support. It is definitely possible, and I am 75% sure the functionality is already there, just buried at an unknown level in the convoluted stack of IndexStreams that is used. I'll see if I can find the time to dig into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants