-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ePrint: add support for non-PDF attachments #3337
ePrint: add support for non-PDF attachments #3337
Conversation
Signed-off-by: Alex T. <[email protected]>
cee6090
to
63a29a2
Compare
Reported on the dev mail list: https://groups.google.com/g/zotero-dev/c/fHwgGUICINE |
OK, it looks like the issue here is that the site uses MathJax lazy typesetting. I tried to see if we could force rendering by deleting if (doc.querySelector('mjx-lazy')) {
abstr = attr(doc, 'meta[property="og:description"]', 'content');
} |
Thanks Abe, good point. Let me take a look. |
Running tests for the PS format addition change revealed that ePrint pages seem to have switched to using MathJax lazy typesetting. That leads to differences in expected vs scraped abstract text when run under Scaffold, but not when running manually from browser. Fix by pulling the abstract from the OG tag if MathJax elements are not typeset yet. Kudos to @AbeJellinek who found the root cause and suggested the workaround. Signed-off-by: Alex T. <[email protected]> Co-authored-by: Abe Jellinek <[email protected]>
@AbeJellinek, thanks again - I've just tested the suggested workaround and indeed looks like you're absolutely right, the reason is the lazy typesetting, and the fix does work. I have added a commit with the fix + updated tests. |
[EDIT: never mind, I've just realized you've got an issue for that and the fact they are being run now is probably due to your intermediate fix listed there: https://github.com//issues/3343. Feel free to disregard the below and apologies for the noise :)] BTW (I see you're working on some CI updates in a WIP PR), it seems like the previous update to this PR did not get properly tested, and in this update it actually worked fine. As there have been no commits to this repo in the meanwhile changing anything in the CI space, I guess it's unexpected, so just wanted to draw your attention to that. This update, tests were executed as expected: Actions log link. |
CI always pulls the latest zotero/zotero-connectors master, and zotero/translate@ddd90b8 (in the translate submodule) fixed the issue. |
Thank you! |
This adds general logic to support non-PDF attachment formats for ePrint, plus fixes a bug in case there are no attachments at all. I don't think there's anything beyond PS in addition to PDF, but it's easily extendable in the allowlist fashion if needed.
Relevant tests added (no attachments at all, just PS, both PDF and PS, PDF-only is already tested by existing tests). Tested using Scaffold in Z6/7.
Note that there seems to be a Scaffold-related issue where the text contents of Mathjax elements is no longer captured properly when the test is run, that affects some of the old and new tests. When running the
doWeb()
from Scaffold's Browser tab, it works as previously/expected, and when it's run from the Tests tab, it fails with delta indicating lack of Mathjax-contained text (mostly in abstracts). This is somewhat unexpected, so I'll report this as a separate issue against Scaffold shortly.