Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Warc Files #22

Open
jlarmstrongiv opened this issue Apr 9, 2021 · 3 comments
Open

Invalid Warc Files #22

jlarmstrongiv opened this issue Apr 9, 2021 · 3 comments

Comments

@jlarmstrongiv
Copy link
Contributor

jlarmstrongiv commented Apr 9, 2021

Originated from #21 (comment)

Files (links expire in 7 days):

Validators:

App:

@ikreymer
Copy link
Member

ikreymer commented Apr 9, 2021

The booya.warc has resource record with no Content-Type, this is breaking the warcat validation.
ReplayWeb.page and other tools are more lenient. Did you mean to use a resource record here instead of a response?
If so, it should set a Content-Type header..

WARC/1.0
WARC-Target-URI: https://swapi.dev/api/planets/1
WARC-Date: 2021-04-09T17:15:34Z
WARC-Type: resource
WARC-Record-ID: <urn:uuid:0751dafe-046a-4bea-a519-7b6c184a4de7>
WARC-Payload-Digest: sha-256:ae44afda086df85dfef397de89f1e108aa6eb0d5d1739777749a178cce3f02dd
WARC-Block-Digest: sha-256:ae44afda086df85dfef397de89f1e108aa6eb0d5d1739777749a178cce3f02dd
Content-Length: 821

{"name":"Tatooine","rotation_period":"23","orbital_period":"304","diameter":"10465","climate":"arid","gravity":"1 standard","terrain":"desert","surface_water":"1","population":"200000","residents":["http://swapi.dev/api/people/1/","http://swapi.dev/api/people/2/","http://swapi.dev/api/people/4/","http://swapi.dev/api/people/6/","http://swapi.dev/api/people/7/","http://swapi.dev/api/people/8/","http://swapi.dev/api/people/9/","http://swapi.dev/api/people/11/","http://swapi.dev/api/people/43/","http://swapi.dev/api/people/62/"],"films":["http://swapi.dev/api/films/1/","http://swapi.dev/api/films/3/","http://swapi.dev/api/films/4/","http://swapi.dev/api/films/5/","http://swapi.dev/api/films/6/"],"created":"2014-12-09T13:50:49.641000Z","edited":"2014-12-20T20:58:18.411000Z","url":"http://swapi.dev/api/planets/1/"}

warcio.js probably should just default to application/octet-stream though, as per: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#content-type

warctools is very old and does not support WARC 1.1, your first warcinfo record is WARC 1.1, while the rest are 1.0 - changing that to 1.0 will actually have it pass...

I realize additional examples will make it easier to use, will try to add them when I have chance!

ikreymer added a commit that referenced this issue Apr 9, 2021
…and other records) if no warc content-type is specified

addresses issue discussed in #22
@jlarmstrongiv
Copy link
Contributor Author

Thanks so much for looking into this!

On my initial tests, opening the file with the Unarchiver still failed, but I’ll be able to try more combinations late tonight or tomorrow. Are there any other items or sample files I could check?

Yes, I was working on building flexible methods for saving resources related to the page that aren’t requests or responses. For compatibility, I can also try saving as a response type and see if that works.

More examples are always welcome—most of what I learned so far was from the test cases and readme.

@jlarmstrongiv
Copy link
Contributor Author

jlarmstrongiv commented Apr 10, 2021

I made the changes to calculate the warcHeaders { "Content-Type": "mime/type" } on each of my resources. I also tried removing my resources, but both The Unarchiver and jwattools still choked. How would you rate jwattools @ikreymer ? Not really sure what else to check or what’s different with the working node-warc version.

Demo file: https://share.fromtheexchange.space/file/space-fromtheexchange-share/booya-no-resources.warc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants