Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and Populate a Knowledge Graph storing the annotations of articles #1

Open
1 of 3 tasks
rtroncy opened this issue Sep 7, 2023 · 2 comments
Open
1 of 3 tasks
Assignees
Labels
enhancement New feature or request

Comments

@rtroncy
Copy link
Member

rtroncy commented Sep 7, 2023

The demonstration at https://jde-predict.tools.eurecom.fr/ enables to submit any article from the JDE and to visualize a number of annotations on this article namely:

  • a prediction of the Business Events relevant for the given article (among 11 possible classes); 4 different algorithms are providing predictions together with a score.
  • a list of named entities extracted in the given article; 3 NER tools are used (spaCy, Flair, and a pre-trained CamemBERT model) and the final results include majority voting and other post-processing.
  • a prediction of the general themes (among 10 possible classes)

The goal is to materialize these annotations in a KG. The tasks are:

  • Design a lightweight model to represent/identify the news article and these annotations. The Business Events could be represented as skos:Concept in a dedicated ConceptScheme. Similarly, the Themes could also be represented as skos:Concept in another ConceptScheme. Named Entity annotations could re-use the NIF ontology.
  • Implement a converter for transforming the current JSON format in RDF to populate this KG
  • Propose in a README a number of useful SPARQL queries for this KG
@rtroncy rtroncy added the enhancement New feature or request label Sep 7, 2023
@ehrhart
Copy link
Contributor

ehrhart commented Sep 11, 2023

@rtroncy Here is a draft for the RDF representation:

Business Events controlled vocabulary

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix event: <http://business-predict.tools.eurecom.fr/vocabulary/business-event> .

event: rdf:type skos:ConceptScheme ;
  rdfs:label "Business Events"@en ;
  rdfs:label "Evénements Business"@fr ;
  skos:hasTopConcept events:buyout-transfer ;
  skos:hasTopConcept events:fundraising ;
  skos:hasTopConcept events:new-site ;
  skos:hasTopConcept events:manager-change ;
  skos:hasTopConcept events:safeguard-procedure ;
  skos:hasTopConcept events:site-closure ;
  skos:hasTopConcept events:recruitment ;
  skos:hasTopConcept events:geographical-expansion ;
  skos:hasTopConcept events:investment ;
  skos:hasTopConcept events:new-activity-product ;
  skos:hasTopConcept events:acquisition-project .

event:buyout-transfer rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Rachat / Cession"@fr ;
  skos:prefLabel "Buyout / Transfer"@en .

event:fundraising rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Levée de fonds"@fr ;
  skos:prefLabel "Fundraising"@en .

event:new-site rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Nouveau site"@fr ;
  skos:prefLabel "New site"@en .

event:manager-change rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Changement de Dirigeant"@fr ;
  skos:prefLabel "Management change"@en .

event:safeguard-procedure rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Procédure de sauvegarde"@fr ;
  skos:prefLabel "Safeguard procedure"@en .

event:site-closure rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Fermeture de site"@fr ;
  skos:prefLabel "Site closure"@en .

event:recruitment rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Création d'emploi / recrutement"@fr ;
  skos:prefLabel "Job creation / recruitment"@en .

event:geographical-expansion rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Expansion géographique"@fr ;
  skos:prefLabel "Geographical expansion"@en .

event:investment rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Investissement"@fr ;
  skos:prefLabel "Investment"@en .

event:new-activity-product rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Nouvelle activité / produit"@fr ;
  skos:prefLabel "New activity / product"@en .

event:acquisition-project rdf:type skos:Concept ;
  skos:inScheme event: ;
  skos:prefLabel "Projet d'acquisition"@fr ;
  skos:prefLabel "Acquisition project"@en .

Themes controlled vocabulary

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix theme: <http://business-predict.tools.eurecom.fr/vocabulary/theme> .

theme: rdf:type skos:ConceptScheme ;
  rdfs:label "List of Themes"@en ;
  rdfs:label "Liste de Thèmes"@fr ;
  skos:hasTopConcept theme:merger-acquisition ;
  skos:hasTopConcept theme:csr ;
  skos:hasTopConcept theme:human-resource ;
  skos:hasTopConcept theme:employment ;
  skos:hasTopConcept theme:international ;
  skos:hasTopConcept theme:portfolio ;
  skos:hasTopConcept theme:investment ;
  skos:hasTopConcept theme:project .

theme:merger-acquisition rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "Fusion - Acquisition"@fr ;
  skos:prefLabel "Mergers - Acquisition"@en.

theme:csr rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "RSE"@fr ;
  skos:prefLabel "CSR"@en ;
  skos:altLabel "Responsabilité Sociale des Entreprises"@fr ;
  skos:altLabel "Corporate Social Responsibility"@en .

theme:human-resource rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "Ressource Humaine"@fr ;
  skos:prefLabel "Human Resource"@en .

theme:employment rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "Emploi"@fr ;
  skos:prefLabel "Employment"@en .

theme:international rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "International"@fr ;
  skos:prefLabel "International"@en .

theme:portfolio rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "Carnet"@fr ;
  skos:prefLabel "Portfolio"@en .

theme:investment rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "Investissement"@fr ;
  skos:prefLabel "Investment"@en .

theme:project rdf:type skos:Concept ;
  skos:inScheme theme: ;
  skos:prefLabel "Projet"@fr ;
  skos:prefLabel "Project"@en .

Example of annotated document

@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix nif-ann: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-annotation#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix : <http://business-predict.tools.eurecom.fr/> .
@prefix agent: <http://business-predict.tools.eurecom.fr/agent/> .
@prefix article: <http://business-predict.tools.eurecom.fr/article/> .
@prefix theme: <http://business-predict.tools.eurecom.fr/vocabulary/theme> .
@prefix event: <http://business-predict.tools.eurecom.fr/vocabulary/business-event> .

:gpt4 a prov:Agent ;
  foaf:name "GPT-4" .

:claude_v1 a prov:Agent ;
  foaf:name "Claude v1" .

:zeste_nli a prov:Agent ;
  foaf:name "ZeSTE NLI" .

:bert a prov:Agent ;
  foaf:name "BERT-Based" .

:articles a nif:ContextCollection ;
  nif:hasContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  dcterms:conformsTo <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1> .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d a nif:Context, nif:OffsetBasedString ;
  nif:isString "David Gesbert a été nommé directeur d’Eurecom, en remplacement d’Ulrich Finger qui prend sa retraite après avoir occupé ce poste pendant plus de vingt ans. Basé à Sophia Antipolis, Eurecom comprend à la fois une école d’ingénieurs et un centre de recherche en sciences du numérique organisée sous forme de GIE (Groupement d’Intérêt Économique) réunissant des partenaires académiques et industriels (Orange, SAP, BMW group…) internationaux. Chercheur et professeur au Laboratoire Communications Mobiles d’Eurecom, David Gesbert est un des pionniers de la technologie MIMO, utilisée dans de nombreux systèmes de télécommunications sans fil. Il a contribué à l’essor des technologies wifi, 3G, 4G, 5G, et explore aujourd’hui ce que pourrait être la 6G de demain. Pour l’ensemble de ses travaux, il a reçu en novembre dernier le Grand prix IMT (Institut Mines Télécoms) - Académie des Sciences qui récompense des contributions scientifiques exceptionnelles au niveau européen."^^xsd:string ;
  nif:beginIndex "0"^^xsd:nonNegativeInteger ;
  nif:endIndex "972"^^xsd:nonNegativeInteger ;
  nif:sourceUrl <https://www.lejournaldesentreprises.com/region-sud/breve/david-gesbert-est-le-nouveau-directeur-deurecom-1763860> ;
  :hasTheme theme:portfolio ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:manager-change ;
    nif:confidence "1.0"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:gpt4
  ] ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:recruitment ;
    nif:confidence "1.0"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:gpt4
  ] ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:manager-change ;
    nif:confidence "1.0"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:claude_v1
  ] ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:recruitment ;
    nif:confidence "1.0"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:claude_v1
  ] ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:buyout-transfer ;
    nif:confidence "0.1752"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:zeste_nli
  ] ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:acquisition_project ;
    nif:confidence "0.1467"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:zeste_nli
  ] ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:site-closure ;
    nif:confidence "0.1379"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:zeste_nli
  ] ;
  nif-ann:annotationUnit [
    itsrdf:taIdentRef event:manager-change ;
    nif:confidence "1.0"^^xsd:decimal ;
    itsrdf:taAnnotatorRef agent:bert
  ] .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_0_13 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "David Gesbert"^^xsd:string ;
  nif:beginIndex "0"^^xsd:nonNegativeInteger ;
  nif:endIndex "13"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Person .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_38_45 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Eurecom"^^xsd:string ;
  nif:beginIndex "38"^^xsd:nonNegativeInteger ;
  nif:endIndex "45"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_65_78 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Ulrich Finger"^^xsd:string ;
  nif:beginIndex "65"^^xsd:nonNegativeInteger ;
  nif:endIndex "78"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Person .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_163_179 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Sophia Antipolis"^^xsd:string ;
  nif:beginIndex "163"^^xsd:nonNegativeInteger ;
  nif:endIndex "179"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Place .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_181_188 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Eurecom"^^xsd:string ;
  nif:beginIndex "181"^^xsd:nonNegativeInteger ;
  nif:endIndex "188"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_306_309 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "GIE"^^xsd:string ;
  nif:beginIndex "306"^^xsd:nonNegativeInteger ;
  nif:endIndex "309"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_311_342 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Groupement d'Intérêt Économique"^^xsd:string ;
  nif:beginIndex "311"^^xsd:nonNegativeInteger ;
  nif:endIndex "342"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_399_405 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Orange"^^xsd:string ;
  nif:beginIndex "399"^^xsd:nonNegativeInteger ;
  nif:endIndex "405"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_407_410 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "SAP"^^xsd:string ;
  nif:beginIndex "407"^^xsd:nonNegativeInteger ;
  nif:endIndex "410"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_412_421 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "BMW group"^^xsd:string ;
  nif:beginIndex "412"^^xsd:nonNegativeInteger ;
  nif:endIndex "421"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_467_501 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Laboratoire Communications Mobiles"^^xsd:string ;
  nif:beginIndex "467"^^xsd:nonNegativeInteger ;
  nif:endIndex "501"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_504_511 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "David Gesbert"^^xsd:string ;
  nif:beginIndex "504"^^xsd:nonNegativeInteger ;
  nif:endIndex "511"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Person .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_841_864 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Institut Mines Télécoms"^^xsd:string ;
  nif:beginIndex "861"^^xsd:nonNegativeInteger ;
  nif:endIndex "864"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

article:886313e1-3b8a-5372-9b90-0c9aee199e5d#offset_868_889 a nif:Phrase, nif:OffsetBasedString ;
  nif:anchorOf "Académie des Sciences"^^xsd:string ;
  nif:beginIndex "868"^^xsd:nonNegativeInteger ;
  nif:endIndex "889"^^xsd:nonNegativeInteger ;
  nif:referenceContext article:886313e1-3b8a-5372-9b90-0c9aee199e5d ;
  itsrdf:taClassRef dbo:Organisation .

@rtroncy
Copy link
Member Author

rtroncy commented Sep 14, 2023

Thanks @ehrhart for this modeling!
I have directly hacked your message above, correcting all mistakes I saw. I mostly put singular to follow a REST pattern and I have modified the labels of the classes / themes in English to match the ones in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants