Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of differential uploads for PDFs #24

Open
ejhumphrey opened this issue Nov 16, 2018 · 1 comment
Open

Better handling of differential uploads for PDFs #24

ejhumphrey opened this issue Nov 16, 2018 · 1 comment

Comments

@ejhumphrey
Copy link
Collaborator

Currently, upload_to_zenodo will try to lob whatever the specified PDF is at Zenodo. Zenodo itself is idempotent, e.g. won't change the upload if the MD5 checksum matches, but this (a) is slow because each paper is 1-4MB, which is roughly 100MB per conference and over 1GB in aggregate, and (b) requires that the PDFs are accessible if local, or suffers both a download and upload if the electronic edition (ee) is a URL on the web.

There are few ways around this (and maybe others):

  1. track MD5 checksums from Zenodo in the proceedings database
  2. before uploading, ask Zenodo what the latest MD5 checksum is
  3. toggle PDF uploading as a global arg

(3) is certainly the easiest to implement, but also the easiest to misuse / create drift. That said, perhaps we start there and revisit if it becomes problematic?

@ejhumphrey
Copy link
Collaborator Author

realizing I left it out, the issue with (3) is if we want to only update some of the PDFs ... (3) is an all or nothing option. If we wanted a partial bypass, we'd have to do something else (side-chain bypass info?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant