Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: recursive import of a corpus #30

Open
Conal-Tuohy opened this issue Jun 26, 2023 · 1 comment
Open

Feature request: recursive import of a corpus #30

Conal-Tuohy opened this issue Jun 26, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@Conal-Tuohy
Copy link

Conal-Tuohy commented Jun 26, 2023

At present Voyant allows you to import a corpus in an XML format from a single URL or from a list of URLs.

I'd like to suggest adding the ability to ingest from a "linked list" of URLs, where the user provides a single URL, and the remaining URLs are retrieved in a recursive fashion: i.e. the resource which Voyant retrieves from the first URL itself contains a link to the second "page" of text, which contains a link to a third page, etc, until the final resource contains no further links.

The user would need to be able to provide one additional XPath parameter (called e.g. Next or similar) when importing the corpus, to identify an element or attribute in the XML data which would contain a link to the next page. e.g. in the case of a corpus of TEI elements contained in a teiCorpus wrapper element, the teiCorpus element can bear a next attribute whose semantics are defined in this way. So the default XPath expression for a TEI import could be //*[local-name()='teiCorpus']/@next.

This kind of approach would work for other XML formats such as Atom, which has link elements for this purpose e.g. <link rel="next" href="http://example.org/index.atom?page=2"/>

@Conal-Tuohy
Copy link
Author

maybe this issue belongs on the Trombone repo? Apologies if so

@ajmacdonald ajmacdonald added the enhancement New feature or request label Jun 26, 2023
@ajmacdonald ajmacdonald transferred this issue from voyanttools/VoyantServer Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants