Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List items being picked up as independent paragraphs #41

Open
keynmol opened this issue Jun 9, 2017 · 0 comments
Open

List items being picked up as independent paragraphs #41

keynmol opened this issue Jun 9, 2017 · 0 comments
Labels

Comments

@keynmol
Copy link

keynmol commented Jun 9, 2017

Example: https://simple.wikipedia.org/wiki/Human_evolution ("Species list" section)

In XML dump this looks like this:

== Species list ==
This list is in chronological order by [[genus]].

* ''[[Sahelanthropus]]''
** ''[[Sahelanthropus tchadensis]]''
* ''[[Orrorin]]''
** ''[[Orrorin tugenensis]]''
* ''[[Ardipithecus]]''
** ''[[Ardipithecus kadabba]]''
** ''[[Ardipithecus ramidus]]''
* ''[[Australopithecus]]''
** ''[[Australopithecus anamensis]]''
** ''[[Australopithecus afarensis]]''
** ''[[Australopithecus bahrelghazali]]''
** ''[[Australopithecus africanus]]''
** ''[[Australopithecus garhi]]''
...

Jsonpedia contains a very weird split with annotations being jammed together with wrong offsets:

{
      "paragraph": "Australopithecus anamensis Australopithecus afarensis Australopithecus bahrelghazali Australopithecus africanus Australopithecus garhi",
      "links": [
        {
          "id": "Australopithecus_anamensis",
          "anchor": "Australopithecus anamensis",
          "start": 0,
          "end": 26
        },
        {
          "id": "Australopithecus_afarensis",
          "anchor": "Australopithecus afarensis",
          "start": 0,
          "end": 26
        },
        {
          "id": "Australopithecus_bahrelghazali",
          "anchor": "Australopithecus bahrelghazali",
          "start": 0,
          "end": 30
        },
        {
          "id": "Australopithecus_africanus",
          "anchor": "Australopithecus africanus",
          "start": 0,
          "end": 26
        },
        {
          "id": "Australopithecus_garhi",
          "anchor": "Australopithecus garhi",
          "start": 0,
          "end": 22
        }
      ]
    }
@keynmol keynmol added the icebox label Jun 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant