Skip to content

Adding New Ontologies

Enrique Noriega edited this page Feb 16, 2022 · 1 revision

REACH supports adding new ontologies to the NER component easily. The following two steps are necessary to add a new ontology.

Creating the ontology file

Ontology files follow a simple TSV format, where the the first column contains the text to be matched and the second column the id for the specific entity.

Multiple rows can have the same id value to account for different spellings and variations of the same entity. For example: mouse, mice and mus musculus share the same id: 1758, which originates from the Linnaeus knowledge base.

Ontology file excerpt

R.spheroides	1063
R.syncytial virus	12814
South African angora goat	9925
Southern cattle tick	6941

Configuration File

The ontology configuration file enumerates the paths and settings to each ontology file. It is located at bioresources/src/main/resources/applictaion.conf.

The new ontology will be enabled automatically the next time REACH is run.

To configure an ontology file, add new entry that contains the path and the metadata associated to that knowledge base. Refer to the example excerpt as a guide

Configuration entry excerpt

The following excerpt represents the minimum necessary fields to configure an ontology file.

StaticProteinFamilyOrComplex{
    path = ${KnowledgeBasesPath}/famplex.tsv
    namespace = fplx
    priority = 12
    labels = [Family]
}
  • path: to the ontology file of the configuration item
  • namespace: of the ontology. Used to avoid clashes of shared ids that belong to different ontologies
  • priority: specifies the order in which the different ontologies will be processed. If several ontologies share the same priority, they will be processed in their order of appereance
  • labels: array of named entity labels assigned to the matches. Can have one or more values. These labels are used in the grammar to craft higher-level rules