Plateforme de Connaissances Unifiées (PCU) project (i.e Unified Knowledge Platform).
Semantic platform for valuing data. Open-source, configurable, written in Python 3.
The platform is composed of several components :
- pcu_io : Parse a file to get its textual content.
- pcu_pdf : Parse PDF files (to an extent, files format supported by Apache Tika).
- pcu_json : Parse JSON files.
- pcu_language : Detect the main language used or all the languages used within a text. Based on langdetect.
- pcu_nlp : Get syntactic annotations of a text. Based on spacy.io.
- pcu_keyphrase : Get keyphrases of a text. Based on kleis.
- pcu_relation : Get semantic relationships existing between keyphrases of a text. Based on Kata Gábor's algorithm.
To install requirements, execute the Makefile with the following command line :
make install
The semantic platform is entirely configurable. To use it, download the sources, go to pcu/ directory and tune the configuration file as you wish.
[pipeline]
language=
; default language : if empty, language will be automatically detected
nlp=spacy
; name of the NLP pipeline to use
keyphrase=yes
; yes if keyphrase extraction is enabled, no otherwise
- language : default language (en for English, fr for French). If empty, language will be automatically detected
- nlp : name of the NLP pipeline to use (spacy)
- keyphrase : yes if keyphrase extraction algorithm is enabled, no otherwise
To execute the workflow on your data, use the following command line :
python3 core.py path/to/data/to/process
Some Windows users might encounter linking problems installing spacy, if so, launch make install
as an administrator.