edamverify is a utility suite for verification of the EDAM ontology. It implements a set of quality control (QC) checks based upon:
- Guidelines from the Developers Guide for
- Rules of thumb for EDAM development from the Editors Guide
edamverify implement all checks previously implemented in edamxpathvalidator.
edamverify is invoked whenever the development copy of EDAM (EDAM_dev.owl) is changed, using the EDAM Travis CI system.
NB Current status: edamverify is fully specified - implementation is ongoing.
EDAM QC consists of:
- invocation of the report utility from the ROBOT ontology verification suite. This runs a series of basic quality control SPARQL queries, such as duplicated labels or synyonms, missing ontology metadata, references to deprecated concepts etc.
- invocation of edamverify which runs a series of SPARQL and SHACL queries, defined in the
queries/
folder (see below) which are tailored specifically to EDAM. The SPARQL queries are invoked using the ROBOT verify utility. The SHACL queries are invoked directly.
Each query has a logging level (based on ROBOT report) which defines the severity of the issue:
- ERROR: Must be fixed before releasing EDAM. These issues will cause problems for users, such as classes with multiple labels.
- WARN: Should be fixed as soon as possible. These will not cause problems for all users, but may not be what they expect. For example, a class that is inferred to be equivalent to another named class.
- INFO: Should be fixed if possible. These are for consistency and cleanliness, such as definitions that do not start with an uppercase character.
- NOERR: No error found.
The problems detected by each query and its remedy are documented in the docs folder.
The QC check results are written to the last cell [1] of the Jupyter notebook in a consistent JSON format, for example:
[1] as required by the script which invokes and parses these notebooks in Travis CI
{
"test_name": "fileExtensionBadCharacter",
"reason": [
"Bad characters found in <file_extension> property of these concepts:",
"http://edamontology.org/format_3556 (MHTML): mhtml|mht|eml",
"http://edamontology.org/format_3682 (imzML metadata file): imzML",
"http://edamontology.org/format_3789 (XQuery): xq|xqy|xquery",
"http://edamontology.org/format_3475 (TSV): tsv|tab",
"http://edamontology.org/format_3750 (YAML): yaml|yml"
],
"status": "WARN"
}
Test | Level | Docs | Issue | Solution [1] | File | Status |
---|---|---|---|---|---|---|
Omission of properties required for deprecated concepts | INFO - ERROR | docs | 3 | IPYNB | annotationDeprecationOmission.ipynb | DONE |
Misuse of properties intended for deprecated concepts only | ERROR | docs | 2 | IPYNB | annotationDeprecationMisuse.ipynb | DONE |
Ontology max depth exceeded | WARN | docs | 6 | SPARQL | maxDepthExceeded.sparql | todo |
Singleton leaf node | WARN | docs | 7 | SPARQL | singletonLeaf.sparql | todo |
Subset misuse | ERROR | docs | 14, 17, 25, 27, 28 | IPYNB | subsetMisuse.ipynb | DONE |
Disallowed synonym | ERROR | docs | 11 | IPYNB | disallowedSynonym.ipynb | DONE |
Placeholder chain too long | ERROR | docs | 8 | SPARQL | placeholderChainTooLong.sparql | todo |
Unexpected multiple parents | WARN | docs | 9 | SPARQL | unexpectedMultipleParents.sparql | todo |
Possible spelling mistake | INFO | docs | 10 | SPARQL | spellingMistake.sparql | todo |
Bad EDAM URI reference | ERROR | docs | 12 | SPARQL | badEdamUriReference.sparql | todo |
Bad non-boolean value | WARN | docs | 13 | IPYNB | badNonBooleanValue.ipynb | DONE |
Mandatory property missing | ERROR | docs | 8 | IPYNB | mandatoryPropertyMissing.ipynb | DONE |
Format property missing | INFO - WARN | docs | 9, 11 | IPYNB | formatPropertyMissing.ipynb | DONE |
Identifier property missing | INFO | docs | 10 | IPYNB | identifierPropertyMissing.ipynb | DONE |
Wikipedia link missing | INFO | docs | 24 | IPYNB | wikipediaLinkMissing.ipynb | DONE |
Leaf concept is placeholder | WARN | docs | 12 | SPARQL | placeholderLeafConcept.sparql | todo |
isIdentifierOf redundancy | WARN | docs | 13 | SPARQL | isIdentifierOfRedundancy.sparql | todo |
Identifier relation missing | ERROR | docs | 14 | SPARQL | identifierRelationMissing.sparql | todo |
Format relation missing | ERROR | docs | 26 | SPARQL | formatRelationMissing.sparql | todo |
Redundant subclass relation | WARN | docs | 15 | SPARQL | redundantSubclassRelation.sparql | todo |
Deprecated concept with disallowed annotations or axioms | WARN | docs | 16 | IPYNB | disallowedDeprecatedContent.ipynb | DONE |
Concept ID numerical duplication | ERROR | docs | 18 | IPYNB | idNumericalDuplication.ipynb | DONE |
File extension lacks synyonm | WARN | docs | 19 | SPARQL | fileExtensionMissingSynonym.ipynb | DONE |
File extension bad characters | WARN | docs | 19, 20 | IPYNB | fileExtensionBadCharacter.ipynb | DONE |
Misuse of Wikipedia links | WARN | docs | 23 | IPYNB | wikipediaMisuse.ipynb | DONE |
[1] things labellled as "SPARQL" are implemented purely in SPARQL. "SHACL" is another possibility. Failing that "IPYNB" (Juypter notebook with SPARQL and Python code) or "Python" (in later two cases the links under "File" will be replaced with links to the relevant notebook or Python script).
General queries (from ROBOT report)
Query | Description | Level |
---|---|---|
annotation whitespace | link | WARN |
deprecated boolean datatype | link | ERROR |
deprecated class reference | link | ERROR |
deprecated property reference | link | ERROR |
duplicate definition | link | ERROR |
duplicate exact synonym | link | WARN |
duplicate label synonym | link | WARN |
duplicate label | link | ERROR |
duplicate scoped synonym | link | WARN |
equivalent pair | link | WARN |
invalid xref | link | WARN |
label formatting | link | ERROR |
label whitespace | link | ERROR |
lowercase definition | link | INFO |
missing definition | link | WARN |
missing label | link | ERROR |
missing obsolete label | link | WARN |
missing ontology description | link | ERROR |
missing ontology license | link | ERROR |
missing ontology title | link | ERROR |
missing superclass | link | INFO |
misused obsolete label | link | ERROR |
multiple definitions | link | ERROR |
multiple equivalent classes | link | ERROR |
multiple labels | link | ERROR |
File | Description |
---|---|
src/edamverify.py | edamverify utility |
queries/ | Queries in SPARQL query language and SHACL constraint language format |
docs/ | Query documentation (the problem detected by the query and its remedy) |
reports/ | Reports from running edamverify on EDAM_dev.owl |
README.md | This file |