Skip to content
Lukas Mueller edited this page Mar 12, 2024 · 10 revisions

Loading Ontologies

Ontologies describe the traits that can be characterized (measured, estimated, etc) in a given crop. If an appropriate ontology does not exist for the crop in question, it has to be created. This can be a long and difficult task, as it can be difficult to find a consensus, especially in larger projects.

Recommended tools for creating and editing ontologies are DAGEdit or Protege.

If a suitable ontology is available, it can be loaded into Breedbase. Currently, only loading from the backend is supported for the initial loads. The ontology needs to be available in .obo format, which both DAGEdit and Protege can produce.

The ontology consists of separate types of terms: traits, methods, scales, and variables. In Breedbase, often only the traits and variables are loaded. The traits are an abstract description of the character under consideration, whereas the variable is combination of a trait, method, and a scale. Variables are the only entities that can have associated measurements.

When the -u option is not used, the name and db name of the ontology will be read from the command line and inserted into the database (-n and -s options).

The dbname is often of the form CO_NNN for CropOntology ontologies. The dbname is used as a prefix for the numeric code of the term, such as GO:0001234 or CO_332:0063636.

The cvname has to match the cv name in the obo file.

To load the ontology, use the script gmod_load_cvterms.pl in the Chado repo at Chado/chado/bin/gmod_load_cvterms.pl:

perl gmod_load_cvterms.pl -s CO_NNN -n cvname -u -v -H breedbase_db -D breedbase -p password -r postgres -d Pg file.obo

The loaded ontology has to be indexed for certain features, such as the ontology browser, to work correctly:

perl gmod_make_cvtermpath.pl -c cvname -v -D breedbase_db -H breedbase_db -u postgres -p password

The variables can have associated limits, which will be exported to the Fieldbook app. There are two main types of variables, qualitative and numeric. Numeric variables can have lower and upper limits, whereas qualitative variables can specify category names.

This metadata can be specified in the an excel file (.xls) file with the following columns:

trait_name trait_format trait_default_value trait_minimum trait_maximum trait_categories trait_details

The loading script is in the sgn repository, under bin/ :

perl load_trait_props.pl -H breedbase_db -D breedbase -I inputfile.xls 

Displaying Ontologies on the website

Not all ontologies in the system are displayed in the ontology browser on the website by default.

The ontologies that are supposed to be displayed have to be configured in the sgn_local.conf file, using the onto_root_namespaces parameter, for example:

onto_root_namespaces  GO (Gene Ontology), PO (Plant Ontology), SO (Sequence Ontology), COMP (Composed Variables)

Post-composing vocabularies

To create combinations of terms between two parent ontologies, such as combinations of trait variables with time terms, post-composed terms can be defined.

Both parent ontologies have to be loaded into the database.

The post-composed variables can currently be created on the post-composing page at the URL /tools/compose.

To specify which terms can be post-composed, the cvprop table in the database has to be populated correctly. For example, in the fixture database, this table contains:

cxgn_fixture=# select * from cvprop;
 cvprop_id | cv_id | type_id | value | rank 
-----------+-------+---------+-------+------
         1 |    58 |   77542 |       |    0
         2 |    64 |   77541 |       |    0
         3 |    61 |   77545 |       |    0
         4 |    59 |   77543 |       |    0
         5 |    16 |   77540 |       |    0
         6 |    62 |   77546 |       |    0
(6 rows)

cv_id of 16 is the cassava_trait ontology, and a cv_id of 62 is the cxgn_time_ontology. The type_ids refer to entries in the composable_cvtypes ontology, and specify the type of ontology which defines how they can be combined. For example, 77540 is the trait_ontology, whereas 77546 specifies the time_ontology:

cxgn_fixture=# select cvterm_id, cv_id, name from cvterm where cv_id=63;
 cvterm_id | cv_id |          name           
-----------+-------+-------------------------
     77540 |    63 | trait_ontology
     77541 |    63 | composed_trait_ontology
     77542 |    63 | object_ontology
     77543 |    63 | attribute_ontology
     77544 |    63 | method_ontology
     77545 |    63 | unit_ontology
     77546 |    63 | time_ontology

Clone this wiki locally