Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'url' property with a description that was difficult to write in … #1492

Merged
merged 2 commits into from
May 10, 2024

Conversation

sierra-moxon
Copy link
Member

@sierra-moxon sierra-moxon commented Apr 16, 2024

…the context of the existing xref property

for your review @codewarrior2000

fixes ##1486

@codewarrior2000
Copy link

codewarrior2000 commented Apr 16, 2024

Thank you @sierra-moxon.It's nice, simple and it works for me.
The cautionary note contrasting the "url" property with the "xref" property is very helpful.

@sierra-moxon
Copy link
Member Author

sierra-moxon commented Apr 16, 2024

@codewarrior2000 - thanks Larry! The thing I am slightly worried about is that for your reactome example, to fully represent both reactome links on a single node, you'd have one xref property filled out (a CURIE) on the node and one url property filled out (the full expanded URL) on the node. Should the guidance be that when url is present on a node, any xref annotation should also be expanded fully into a URL and stored in a url property? I'm thinking of downstream consumers, like the UI, that will need special code to display one or the other.

I'm also a little worried that the definition of xref currently allows URIs. So, technically, someone could use xref for any URL/URI* that is related to the node or edge it is used on. Should we restrict the definition of xref to just be a CURIE? If we do that, we may be out of sync with other databases and/or have refactoring to do.

Per our discussion, this property is also single-valued (assuming that in TRAPI, many such attributes will be submitted in the TRAPI message when necessary). How will that be stored in the data store? Perhaps this is unnecessary for me to know, but I am trying to imagine how a non-Translator user would use this slot if they need to represent more than one of these kinds of URLs for a particular node.

@codewarrior2000
Copy link

codewarrior2000 commented Apr 16, 2024

@sierra-moxon, thank you.
I have been wondering why the meeting discussions had been concerned about two reactome links. The original intention was just to pass along the one URL that we found in the Reactome database, which we had called the "reaction_url", which links to the Reactome Pathway Browser. (e.g., https://reactome.org/PathwayBrowser/#/R-MMU-5655466)

Is there a Biolink Model requirement that I am not aware of that requires both links?

@codewarrior2000
Copy link

codewarrior2000 commented Apr 17, 2024

@sierra-moxon
Sorry, Sierra, I had to talked it over with Vlado about the url property being single-valued though, yet there is a need to handle multiple URLs. Any user with multiple URLs for a node should know to present each URL as an individual node attribute. The technical requirement will be imposed on ARAs to recognize that there can be multiple node attributes of the same type (URL).

@codewarrior2000
Copy link

codewarrior2000 commented Apr 17, 2024

I'm also a little worried that the definition of xref currently allows URIs

@sierra-moxon
Question. As the most conservative approach, what if we continue to let the xref property be used for both URL/URI and for CURIE? Has that broken anything in Translator yet?

@sierra-moxon
Copy link
Member Author

sierra-moxon commented Apr 17, 2024

thanks @codewarrior2000 and @vdancik!

TL;DR:

  1. I think the trouble I am having with biolink:url being single-valued is that it does not make sense outside of Translator technical architecture.
  2. I am trying to make this property generic so that if the concept of an alternative url is different in Biolink from an xref, we have it available for use in other contexts besides reactome.
  3. without changing the definition of xref we likely could have nodes with one or the other xref or url with the intent to represent the same thing -- it will be inconsistent for the UI and difficult to choose between for folks using Biolink outside of Translator.

  1. I think the trouble I am having with biolink:url being single-valued is that it does not make sense outside of Translator technical architecture.

Sort of self-explanatory, but Biolink is used for KGs other than those currently in Translator and I think without TRAPI its difficult for a user of Biolink to use our proposed biolink:url without it being multivalued.

  1. I am trying to make this property generic so that if the concept of an alternative url is different in Biolink from an xref and we have it available for use in other contexts besides reactome.

For example, the same CURIE that represents a mouse gene can be used in many URLs to see different views of that gene at MGI:

https://www.informatics.jax.org/marker/MGI:97486 <-- the full gene page at MGI, the default URI expansion of the curie: MGI:97486
https://www.informatics.jax.org/gxd/marker/MGI:97486 <-- the gene expression information at MGI
http://www.informatics.jax.org/gxd/marker/MGI:97486?tab=imagestab <-- just the images of the gene expression at MGI

This is a very similar use case as the reactome use case. If we chose the biolink:id for a mouse gene node to be the NCBIGene identifier (NCBIGene:18504, then we could include an xref property on that node, MGI:97486. Its default URI expansion would be: http://identifiers.org/mgi/MGI:97486 and this URL could be used to redirect the user to https://www.informatics.jax.org/marker/MGI:97486

But those other two MGI links are also valid, and take the user to a different view of the data. So similarly, we'd argue in this PR, that those two other MGI links are not biolink:xrefs, they are biolink:urls. But, technically, someone could provide those three MGI URLs in the biolink:xref field because we're allowing URIs as well (some handwaving here between URI and URL, I do know that the URI is technically the default expansion of this CURIE, but I'm not confident that users will take the time to disambiguate).

  1. And I'd like to disambiguate xref from url so that new users not privy to this PR can decide which property to use to provide links to other sources with different views of the "same" data. It could be that we need a better description here: it was hard to clarify the distinction.

biolink:xref has these properties:

  • Typically xrefs are CURIEs (but Biolink also says they can be URIs) that are different from the CURIE in the biolink:id field for the node. It is an alternative identifier for the node.
  • If a CURIE, an xref should be expandable to a URI using a prefix map.
  • xref is multivalued; multiple values in the xref slot can be used to provide webpage link-outs to alternate views of the node at different databases/websites.
  • there is nothing in the schema to enforce that if a URI is provided in the xref field, it must be the default expansion based on a unique prefix. Therefore, someone could provide something that technically we would consider to be a url in the xref field.

biolink:url or biolink:alternative_url has these properties:

  • Not a CURIE
  • can be used to provide webpage link-outs to alternative views of the node at different databases/websites.
  • single-valued (this is problematic if a KG node has more than one alternative url and we don't use TRAPI)

Perhaps we should require that if an xref is provided on a node as a CURIE, it is expanded and added into a url property as well. Similarly, if the xref provided is a URI, then it should be duplicated into the url property. This helps us with consistency for downstream consumption.

At a minimum I think we need text descriptions that help disambiguate and I would welcome help here, of course. :)

@codewarrior2000
Copy link

@sierra-moxon
We appreciate the TL;DR.
I will need some time to digest the implications of how xref and url properties coexist and interplay.

Copy link
Collaborator

@mikebada mikebada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sierra-moxon A few typos :D
Unlink -> Unlike
can not -> cannot
an unique -> a unique

per code review, fix typos
@sierra-moxon sierra-moxon merged commit d2f7eaa into master May 10, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants