-
Notifications
You must be signed in to change notification settings - Fork 39
Adding New Ontologies
REACH supports adding new ontologies to the NER component easily. The following two steps are necessary to add a new ontology.
Ontology files follow a simple TSV format, where the the first column contains the text to be matched and the second column the id
for the specific entity.
Multiple rows can have the same id
value to account for different spellings and variations of the same entity. For example: mouse
, mice
and mus musculus
share the same id: 1758
, which originates from the Linnaeus knowledge base.
R.spheroides 1063
R.syncytial virus 12814
South African angora goat 9925
Southern cattle tick 6941
The ontology configuration file enumerates the paths and settings to each ontology file. It is located at bioresources/src/main/resources/applictaion.conf
.
The new ontology will be enabled automatically the next time REACH is run.
To configure an ontology file, add new entry that contains the path and the metadata associated to that knowledge base. Refer to the example excerpt as a guide
The following excerpt represents the minimum necessary fields to configure an ontology file.
StaticProteinFamilyOrComplex{
path = ${KnowledgeBasesPath}/famplex.tsv
namespace = fplx
priority = 12
labels = [Family]
}
-
path
: to the ontology file of the configuration item -
namespace
: of the ontology. Used to avoid clashes of shared ids that belong to different ontologies -
priority
: specifies the order in which the different ontologies will be processed. If several ontologies share the same priority, they will be processed in their order of appereance -
labels
: array of named entity labels assigned to the matches. Can have one or more values. These labels are used in the grammar to craft higher-level rules