Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add more NIF 2.0 data properties to output? #15

Open
ghost opened this issue Jan 6, 2017 · 7 comments
Open

How to add more NIF 2.0 data properties to output? #15

ghost opened this issue Jan 6, 2017 · 7 comments

Comments

@ghost
Copy link

ghost commented Jan 6, 2017

Hi,
I am trying to use this NIF-lib in order to transform my Stanford CoreNLP results to NIF. But as far as I can see, the provided data properties are limited to just a few (beginIndex, endIndex, score etc.).

So, how can I add more data properties such as lemma, posTag etc. or add new namespaces and ontologies respectively?

Is this easily realizable or do I have to dig deeply into the code?

Thanks in advance!

@sandroacoelho
Copy link
Collaborator

Hi @Phauly1 ,

Thank you for your enquiry. We made this small lib to reuse the code between some projects such as FREME, DBpedia Spotlight and DBpedia Lookup. These projects use just a small set that NIFprovides, but it will be a pleasure to expand it.

What are the most urgent/important properties for you?

Best,

@ghost
Copy link
Author

ghost commented Jan 7, 2017

Hi @sandroacoelho ,

thanks for your reply! Actually, I need quite a few properties and I also need to output them as JSON-LD. All of them originate from Stanford CoreNLP, but I did not find the respective equivalents in the NIF 2.0 Core Ontology yet.
Additional to the already existing ones in the lib, the CoreNLP properties I need as NIF are (the most urgent ones are for the tokens and dependency parsing):

For the tokens

  • index
  • lemma
  • pos
  • ner
  • before
  • after

For coreference resolution

  • id
  • text
  • type
  • number
  • gender
  • animacy
  • startIndex
  • endIndex
  • headIndex
  • sentNum
  • position
  • isRepresentativeMention

For dependency parsing

  • dep
  • governor
  • governorGloss
  • dependent
  • dependentGloss

For Open Information Extraction

  • subject
  • subjectSpan
  • relation
  • relationSpan
  • object
  • objectSpan

If these properties are not in the NIF set, how are they represented nevertheless? Maybe with the help of an ontology like OLiA?
Otherwise, if all that is too much work, could you recommend a certain approach on how to do this on my own? Like using Apache Jena to store everything as a model and then print it out as JSON-LD via JSONLD-JAVA, for example.

@kurzum
Copy link
Member

kurzum commented Jan 13, 2017

Hi @Phauly1, did you check https://github.com/NLP2RDF/software/tree/master/java-maven/implementation/stanfordcorenlp
this code should actually work, if you checkout the repo there and compile it.

Especially the properties for token are in NIF since 1.1. http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# already:

  • index: missing, but we can add it as nif:tokIndex ^^xsd:nonNegativeInteger
  • lemma,
  • pos (posTag and olialink)
  • ner -> use itsrdf:taClassRef from the ITS 2.0 standard https://www.w3.org/TR/its20/#conversion-to-nif
  • before, after see nif:before and nif:after

CoReference is missing, our approach was to assign simply UUID's via itsrdf:taIdentRef or itsrdf:taTermRef and then link them together via owl:sameAs then you can probably assign anything you want to it

https://github.com/NLP2RDF/software/tree/master/java-maven/implementation/stanfordcorenlp should also contain dependency relation.

We are currently developing an Open Information Extraction Format for NIF. The easy part, i.e. continous subject, relation, object was already covered, see page 8 of https://pdfs.semanticscholar.org/e2cb/04541b3d33ea6cad4a0fcd499a8c77aff2b0.pdf:

provenance tracking with NIF:

<char=0,10>    itsrdf:taClassRef  dbo:Person ;
   itsrdf:taIdentRef  rlnr:Rolf_Heuer .
<char =14,18>   itsrdf:taIdentRef  dbpedia:CERN .
<char =11,24>   nif:anchorOf       ", director  of"^^xsd:string ;
    itsrdf:taPropRef  rlno:directorOf .

The problem was non-continous string: Peter [took] the company [over] last year.
(this is not common in English, but very frequent in German )

@ghost
Copy link
Author

ghost commented Jan 13, 2017

Thanks for your help, @kurzum ! That is actually very helpful. I will take a look at the CoreNLP implementation. But are these properties expected to be integrated into the NIF-lib?

@kurzum
Copy link
Member

kurzum commented Jan 13, 2017 via email

@kurzum
Copy link
Member

kurzum commented Jan 14, 2017

@Phauly1 Actually, we are preparing a challenge with Wikipedia article text. A stanford core implementation could participate there. if you are interested, I could send you details per email. mine is [email protected]

@ghost
Copy link
Author

ghost commented Jan 23, 2017

Hi @kurzum , thanks for your invitation. Unfortunately, I am having time pressure right now, so I cannot participate in both. But I get your point about the lib - that is no problem. It just would have been nice to have a lib which helps transforming the output of an arbitrary NLP framework to NIF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants