You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present Voyant allows you to import a corpus in an XML format from a single URL or from a list of URLs.
I'd like to suggest adding the ability to ingest from a "linked list" of URLs, where the user provides a single URL, and the remaining URLs are retrieved in a recursive fashion: i.e. the resource which Voyant retrieves from the first URL itself contains a link to the second "page" of text, which contains a link to a third page, etc, until the final resource contains no further links.
The user would need to be able to provide one additional XPath parameter (called e.g. Next or similar) when importing the corpus, to identify an element or attribute in the XML data which would contain a link to the next page. e.g. in the case of a corpus of TEI elements contained in a teiCorpus wrapper element, the teiCorpus element can bear a next attribute whose semantics are defined in this way. So the default XPath expression for a TEI import could be //*[local-name()='teiCorpus']/@next.
This kind of approach would work for other XML formats such as Atom, which has link elements for this purpose e.g. <link rel="next" href="http://example.org/index.atom?page=2"/>
The text was updated successfully, but these errors were encountered:
At present Voyant allows you to import a corpus in an XML format from a single URL or from a list of URLs.
I'd like to suggest adding the ability to ingest from a "linked list" of URLs, where the user provides a single URL, and the remaining URLs are retrieved in a recursive fashion: i.e. the resource which Voyant retrieves from the first URL itself contains a link to the second "page" of text, which contains a link to a third page, etc, until the final resource contains no further links.
The user would need to be able to provide one additional XPath parameter (called e.g.
Next
or similar) when importing the corpus, to identify an element or attribute in the XML data which would contain a link to the next page. e.g. in the case of a corpus ofTEI
elements contained in ateiCorpus
wrapper element, theteiCorpus
element can bear anext
attribute whose semantics are defined in this way. So the default XPath expression for a TEI import could be//*[local-name()='teiCorpus']/@next
.This kind of approach would work for other XML formats such as Atom, which has
link
elements for this purpose e.g.<link rel="next" href="http://example.org/index.atom?page=2"/>
The text was updated successfully, but these errors were encountered: