Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing ontologies on data load #453

Open
serjoshua opened this issue Aug 3, 2023 · 4 comments
Open

Failing ontologies on data load #453

serjoshua opened this issue Aug 3, 2023 · 4 comments
Labels
low priority Workaround available

Comments

@serjoshua
Copy link
Contributor

serjoshua commented Aug 3, 2023

OBO

Fixed ID Status Download Parsing EBI Override PURL Notes Pipeline Error
cido active http://purl.obolibrary.org/obo/cido.owl failed import http://purl.obolibrary.org/obo/DrugsNoChEBI_interactions_with_targets.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
cto active http://purl.obolibrary.org/obo/cto.owl [line: 25, col: 32] {E201} Multiple children of property element
cvdo active http://purl.obolibrary.org/obo/cvdo.owl failed import http://purl.obolibrary.org/obo/cvdo/external/doid_import.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
upheno active http://purl.obolibrary.org/obo/upheno.owl failed import http://purl.obolibrary.org/obo/upheno/upheno_root_alignments.owl [line: 1, col: 1 ] Content is not allowed in prolog.
mamo orphaned http://purl.obolibrary.org/obo/mamo.owl [line: 17, col: 74] {E201} Multiple children of property element
vario orphaned http://purl.obolibrary.org/obo/vario.owl [line: 1, col: 1 ] Content is not allowed in prolog.
olatdv inactive http://purl.obolibrary.org/obo/olatdv.owl Not found
pdumdv inactive http://purl.obolibrary.org/obo/pdumdv.owl Not found
rnao inactive http://purl.obolibrary.org/obo/rnao.owl failed import http://www.obofoundry.org/ro/ro.owl Not found
dinto inactive http://purl.obolibrary.org/obo/dinto.owl [line: 13, col: 4 ] {E201} The attributes on this property element, are not permitted with any content; expecting end element tag.
eo inactive http://purl.obolibrary.org/obo/eo.owl Not found
epo inactive http://purl.obolibrary.org/obo/epo.owl Not found
ero inactive http://purl.obolibrary.org/obo/ero.owl [line: 4, col: 27] {E202} Expecting XML start or end element(s). String data "redirecting" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
flu inactive http://purl.obolibrary.org/obo/flu.owl failed import http://purl.obolibrary.org/obo/ido/2010-12-02/ido-main-workaround.owl Not found
mfo inactive http://purl.obolibrary.org/obo/mfo.owl [line: 1, col: 3 ] The markup in the document preceding the root element must be well-formed.
mirnao inactive http://purl.obolibrary.org/obo/mirnao.owl Not found
mo inactive http://purl.obolibrary.org/obo/mo.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
nmr inactive http://purl.obolibrary.org/obo/nmr.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
ogi inactive http://purl.obolibrary.org/obo/ogi.owl Not found
sep inactive http://purl.obolibrary.org/obo/sep.owl redirects to http://ontologies.berkeleybop.org/sep.owl Cannot read field "properties" because "this.ontologyNode" is null
vhog inactive http://purl.obolibrary.org/obo/vhog.owl redirects to file points to http://ontologies.berkeleybop.org/vhog.owl Cannot read field "properties" because "this.ontologyNode" is null

EBI OLS Ontologies

Fixed ID Download Parsing PURL Notes Pipeline Error
phi file:/nfs/panda/ensembl/production/ensprod/ontologies/phi/PHI.obo Not found
atol http://www.atol-ontology.com/public/telechargement/atol.owl javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
eol http://www.atol-ontology.com/public/telechargement/eol.owl javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
lbo http://data.bioontology.org/ontologies/LBO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb Failed to determine the content type: (URI=http://data.bioontology.org/ontologies/LBO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb : stream=application/octet-stream)
pride https://raw.githubusercontent.com/PRIDE-Utilities/pride-ontology/master/pride_cv.obo migrate away from OBO Failed to determine the content type: (URI=https://raw.githubusercontent.com/PRIDE-Utilities/pride-ontology/master/pride_cv.obo : stream=text/plain)
unimod http://www.unimod.org/obo/unimod.obo migrate away from OBO Failed to determine the content type: (URI=http://www.unimod.org/obo/unimod.obo : stream=null)
hpath https://raw.githubusercontent.com/Novartis/hpath/master/src/hpath.obo migrate away from OBO Failed to determine the content type: (URI=https://raw.githubusercontent.com/Novartis/hpath/master/src/hpath.obo : stream=text/plain)
vido https://raw.githubusercontent.com/infectious-disease-ontology-extensions/ido-virus/master/ontology/vido.owl [line: 23, col: 18] {E201} Multiple children of property element

Original spreadsheet

@serjoshua serjoshua pinned this issue Aug 3, 2023
@serjoshua
Copy link
Contributor Author

(1) The OLS4 dataloader is an RDF tool and therefore only supports loading RDF files. This means that other non-RDF OWL serialisations such as OBO format and OWL XML are never going to be supported (though of course they can be converted prior to loading). For these (very few) cases we can either ask the upstream ontology vendors to provide an RDF/XML file, or possibly outsource conversion to Robot.

(2) Though we support all the different RDF serializations, the majority of the ontologies are provided without any content-type or any useful file extension to indicate which serialization format they contain. For example, this ontology from the OLS config is Turtle, but the file extension is owl and the content-type is text/plain. No suggestion of Turtle encoding anywhere.

Even the OBO foundry ontologies do this. If we resolve for example http://purl.obolibrary.org/obo/ro.owl it redirects to https://raw.githubusercontent.com/oborel/obo-relations/master/ro.owl. File extension: .owl, content-type is text/plain. While the file content is RDF/XML, there is nothing to suggest that it isn't, for example, OWL XML, or Turtle, or JSON-LD. We only know how to load it in OLS4 because RDF/XML is the hardcoded default.

Why does this work in Protegé and OLS3? Because OWLAPI literally bruteforce loads ontology files by trying every loader until it finds one which works.

While we could probably do something similar in OLS4, I think ultimately it is up to the ontology developers to provide correct metadata, if not by content-type then at least by file extension. The whole .owl thing is a mess. If it's RDF/XML it should be .xml and if it's Turtle it should be .ttl. OR if it really wants to be .owl it should be served up with a content-type.

So TL;DR I think we should continue to default to trying to read .owl files as whatever content-type is provided and falling back on RDF/XML.

Original comment

serjoshua referenced this issue Aug 3, 2023
- Remove GAZ override for PURL
- Update TEDDY config for PURL to point to nfs file in codon
@matentzn
Copy link
Contributor

matentzn commented Aug 3, 2023

I would recommend for the OBO ones:

  1. Ignore all failing "inactive" or "orphaned" ontologies from obo, don't try to fix them. Just record that they don't parse, that's it. Only do anything about them if someone asks. The less ontologies there are, the better.
  2. Make issues on the issue trackers for the active OBO ontologies to get their act together (linking to this issue), then stop trying to fix them (30 minutes of work for all of them).

@linikujp
Copy link

@serjoshua What do you recommend the failed ontologies that are .OWL to do? Resave things as .rdf file?

@jamesamcl
Copy link
Member

Yes or use robot to convert them on the command line http://robot.obolibrary.org/

@haideriqbal haideriqbal unpinned this issue Jun 5, 2024
@haideriqbal haideriqbal pinned this issue Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low priority Workaround available
Projects
None yet
Development

No branches or pull requests

5 participants