-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
capture TextPositionSelector and/or RangeSelector #29
Comments
Huh, 1022 explains a lot. How much infrastructure would we need to have the bookmarklet load a helper script from a static url, so that the bookmarklet stays the same but we can add functionality like this? I'm thinking a single additional endpoint? Any known CORS issues with loading a remote script from a bookmarklet? Also, do we need the full rendered DOM to be able to get the xpaths or can we extract them from document.innerHtml? A problem I see with that approach would be mapping the ids found in the inner text back onto the innerHtml in cases where some markup splits an id (which is now quite frequent due to journals having completely whiffed on the typesetting ...). Webasm on my radar, though taking a look around I found https://github.com/iodide-project/pyodide which is ... not reassuring with regard to the current complexity of the setup required, would have to evaluate time tradeoffs between working on that vs a complete rewrite. |
That's a separate question to which the answer I think is "just do it" :-) There are only a handful of curators who have installed the bookmarklet, right? A one-time upgrade to a bookmarklet that's a stub pointing to malleable code is a pretty small intervention.
The possible issue is CSP (Content Security Policy). I'm not aware that any of our target sites enforce CSP. If any do, the fallback would be to package the thing in a dirt-simple Chrome extension.
It's ideal to operate in DOM context using the same code Hypothesis (and compatible clients) use, all based on the common anchoring libraries. That said, it may be easy to match TextPositionSelector by stripping markup from the innerText you get and marking positions in the stream of characters. In principle it seems possible to easily match the TextPositionSelectors that the Hypothesis client produces. In practice we'll just have to try and see what happens. |
It looks like the following will work.
I have verified that: a) with Range (XPATH) anchoring turned off, the Hypothesis client will anchor a case like hypothesis/product-backlog#1022 when it has both TextQuote and TextPosition b) The start of an RRID match in the textContent stream does match the TextPosition.start created by the Hypothesis client It would, of course, be a major change for SciBot to be looking at document.body.textContent (unparsed HTML) vs document.body.innerText (just the text), so this would require some testing and sanity-checking. I'll take a crack at making a demo that illustrates how, given the textContent of a web page, to create Hypothesis-compatible selectors for both TextQuote and TextPosition. |
As per hypothesis/product-backlog#1022, Hypothesis fails to distinguish among targets that share a common
prefix
andexact
but differ insuffix
. Annotations for multiple such targets pile up on a single highlight, preventing human curators from navigating to, and responding to, each target.One solution would be to run SciBot in the web page where it would have DOM access and could reuse the Hypothesis anchoring libraries. In the near term that would require a rewrite to JavaScript which would make it a nonstarter. In the longer term it's possible that web assembly will enable packaging the existing Python-based code into a form usable in the browser, and that's worth bearing in mind.
The other solution would be to replicate, in the Python-based SciBot code, the selectors produced by the Hypothesis JS-based anchoring machinery. There are two possibilities here: match the TextPositionSelector that Hypothesis produces, or match the RangeSelector (xpath) that Hypothesis produces. I'd be willing to investigate the feasibility of these strategies.
The text was updated successfully, but these errors were encountered: