Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate to Multiomics CTKP, remove other clinicaltrials.gov-based x-bte operations #861

Open
andrewsu opened this issue Aug 30, 2024 · 19 comments
Labels
On Test Related changes are deployed to Test server

Comments

@andrewsu
Copy link
Member

per @gglusman's comment

The Plover CTKP will be available in ITRB/CI next Friday. (Sep 6)

Whenever this is ready and the SmartAPI record is updated (presumably this one), we should remove the in_clinical_trials_for edges based on ChEMBL and add CTKP.

@andrewsu andrewsu changed the title replace ChEMBL's in_clinical_trials_for edges for the Plover CTKP instsance replace ChEMBL's in_clinical_trials_for edges for the Plover CTKP instance Aug 30, 2024
@colleenXu colleenXu added the On CI Related changes are deployed to CI server label Sep 19, 2024
@colleenXu
Copy link
Collaborator

colleenXu commented Sep 19, 2024

UPDATE: On Friday 9/20, I reverted these changes (commented out) so BTE would still use clinical trials data from MyChem + BioThings repoDB during the automated test run this weekend.


PREVIOUSLY...

I deployed the following changes on CI (PR, triggering deployment):

Note that Multiomics CTKP only has CI/dev instances right now. So BTE Test/Prod won't use it.

I kept TTD's operations because it seems like they use a manual curation process that DOESN'T start with clinicaltrials.gov / AACT

Old notes on x-bte operations and clinical trials data

from Translator Slack link

  • TTD's data: I'm not sure, but it seems like there's a manual curation process that DOESN'T start with clinicaltrials.gov / AACT
  • repoDB: seems to use AACT, finds drugs in terminated/withdrawn/suspended trials
  • MyChem's data comes from Chembl drug indications

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 19, 2024

However, I see multiple issues using Multiomics CTKP:

  1. NO Disease/Pheno -> Chemical MetaEdges in meta_knowledge_graph response ("reverses"). This means BTE isn't calling this KP during MVP1/creative-treats queries, where we start with a Disease ID. (AKA KP only has Chem ➡️ Disease/Pheno MetaEdges)
  2. There are TRAPI validation errors coming from this KP's edge-attributes (lower-level orange?). Example: main PK, look at BTE's response validation, "Error" section.
    • unknown attribute_type_id: biolink:supporting_study. I'm also unsure of some of the nested attribute's type ids (not curies)...
    • unknown predicate biolink:mentioned_in_trials_for (see next point too)
  3. Some MetaEdges use the predicate biolink:mentioned_in_trials_for, which isn't in biolink-model 4.2.2 so BTE won't use these MetaEdges when QEdge predicates are set (BTE then consults the biolink-model predicate hierarchy and that term isn't there)
  4. Some MetaEdges use the node category biolink:UNKNOWN. Those MetaEdges are unusable.

@tokebe
Copy link
Member

tokebe commented Sep 20, 2024

In order to better support CTKP we need to inject inverse operations as we parse their metakg. Proposed way of implementing this:

  • Parse the API list entries for an injectInverses property
  • When parsing operations, if the API has said property, attempt to generate a reverse operation
    • Flip subject/object, get inverse predicate, flip anything else that's necessary
    • If there is no inverse predicate, skip the operation for the purposes of inverse-injection

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 20, 2024

UPDATE: On Friday 9/20, I reverted these changes (commented out) so BTE would still use clinical trials data from MyChem + BioThings repoDB during the automated test run this weekend.

Direct commit: biothings/bte-server@b9e489b
Trigger deployment: 19a78d3

@gglusman
Copy link

  • unknown attribute_type_id: biolink:supporting_study. I'm also unsure of some of the nested attribute's type ids (not curies)...

Modified it to the (existing) biolink:supporting_study_metadata.

  • unknown predicate biolink:mentioned_in_trials_for (see next point too)

Pending inclusion in biolink.

  1. Some MetaEdges use the node category biolink:UNKNOWN. Those MetaEdges are unusable.

There's currently just one node ("aging") with that erroneous category. Will fix.

@colleenXu colleenXu removed the On CI Related changes are deployed to CI server label Sep 23, 2024
@colleenXu colleenXu changed the title replace ChEMBL's in_clinical_trials_for edges for the Plover CTKP instance migrate to Multiomics CTKP, remove other clinicaltrials.gov-based x-bte operations Sep 24, 2024
@colleenXu
Copy link
Collaborator

Pasting from Slack:

Made some quick slides to illustrate my understanding of query direction / "inverting"/"reverses": https://docs.google.com/presentation/d/19PPPmkHEUStBUkEGBzVIGooB8hADHMBcoNRSmAsedYQ/edit?usp=sharing
I imagine it'd be nice to support/keep supporting all 4 permutations of the query that I draw out, but I dunno if that complicates things.

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 24, 2024

Example queries for CTKP (all return in <1s, should have less results to review):

(1) Needs "reverse MetaEdge": Disease ID on object

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "ids": ["MONDO:0015243"],
                    "categories": ["biolink:Disease"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:in_clinical_trials_for"]
                }
            }
        }
    }
}

(2) Needs "reverse MetaEdge": Disease ID on subject, flipped version of (1)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["MONDO:0015243"],
                    "categories": ["biolink:Disease"]
                },
                "n1": {
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:tested_by_clinical_trials_of"]
                }
            }
        }
    }
}

(3) Uses original MetaEdge: Chemical ID on subject

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["PUBCHEM.COMPOUND:46871657"],
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories": ["biolink:Disease"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:in_clinical_trials_for"]
                }
            }
        }
    }
}

(4) Uses original MetaEdge: Chemical ID on object, flipped version of (3)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Disease"]
                },
                "n1": {
                    "ids": ["PUBCHEM.COMPOUND:46871657"],
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:tested_by_clinical_trials_of"]
                }
            }
        }
    }
}

@gglusman
Copy link

Finding suitable reverse predicates can be really hard. In this case, I find it strange to state that a disease/condition is being tested by a clinical trial of a drug. The drug is being tested, not the disease.
For another predicate in CTKP (mentioned_in_trials_for), the best I could come up with was "target_of_trial_mentioining"... which is still very cumbersome...

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 25, 2024

@gglusman

For chem "mentioned in trials for" disease, maybe disease "mentioned in trials of" chem would work? or disease "mentioned in trials with" chem?


There are inverse predicates for all currently-existing predicates in the biolink model that should be used. But I agree that many of the new "treats" inverses have odd wording.

I. In practice, BTE will format edges in its response to match the creative-mode query/query-edge direction (flipping them if necessary). This means the inverse predicates won't show in the response for MVP1 creative-mode (I dunno if this was one of your worries?). The "reverse" MetaEdges are just needed for BTE's internal MetaEdge representation and KP-querying process.
2. I previously raised concerns about these inverse predicates' wording in DMs with Sierra (pasted below. I also buried some notes here). It may be worth bringing up your concerns with the inverse predicates with Sierra/in data-modeling...

some of the new inverses sound a bit odd? (I took inspiration from the 4.1.3 inverse for studied to treat, which was studied for treatment with. That inverse is now treated in studies with)

  • "disease tested by clinical trials of drug". Maybe "studied in clinical trials with" (or for) sounds better?
  • "disease tested by preclinical trials of drug"
  • "disease models demonstrating benefits for drug". Maybe "models show benefit from" sounds better?
  • "disease treatment applications from drug". Hmm...the wording for this is tricky >.< (easier if the canonical predicate reworded to reported to treat , then it could be reportedly treated with)

@gglusman
Copy link

A thought: if inverse predicates are just an internal construct and never displayed to a human, the simplest and best way to construct them might be "inverse_of_[original_predicate]".

@gglusman
Copy link

For chem "mentioned in trials for" disease, maybe disease "mentioned in trials of" chem would work? or disease "mentioned in trials with" chem?

Probably not, as the concept is asymmetric. We use the "mentioned" predicate when there is more than one intervention stated and it is not [yet] clear which one is being tested. Inverting it by saying that the disease is mentioned in a trial of one of those interventions would imply that the trial is indeed testing that intervention.

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 25, 2024

@gglusman

disease "involved in trials that mention" chem then?

@gglusman
Copy link

disease "involved in trials that mention" chem then?

That's quite similar to "target_of_trial_mentioining". :)

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 26, 2024

BTE could benefit from using the "reverse MetaEdge creation" on all new Multiomics TRAPI KPs, not just CTKP.

@tokebe here's example queries for them. You can use the config here if it's helpful biothings/bte-server#43

Drug Approvals

https://smart-api.info/ui/edc04feaf16c12424737988ce2e90d60

Also has Chem -> Disease MetaEdges

(1) Needs "reverse MetaEdge": Disease ID on object

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "ids": ["MONDO:0006014"],
                    "categories": ["biolink:Disease"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:treats"]
                }
            }
        }
    }
}

(2) Needs "reverse MetaEdge": Disease ID on subject, flipped version of (1)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["MONDO:0006014"],
                    "categories": ["biolink:Disease"]
                },
                "n1": {
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:treated_by"]
                }
            }
        }
    }
}

(3) Uses original MetaEdge: Chemical ID on subject

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["PUBCHEM.COMPOUND:46871657"],
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories": ["biolink:Disease"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:treats"]
                }
            }
        }
    }
}

(4) Uses original MetaEdge: Chemical ID on object, flipped version of (3)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Disease"]
                },
                "n1": {
                    "ids": ["PUBCHEM.COMPOUND:46871657"],
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:treated_by"]
                }
            }
        }
    }
}

Microbiome

https://smart-api.info/ui/a8be4ea3fe8fa80a952ead0b3c5e4bc1

Using BiologicalProcess -(correlated_with)-> SmallMolecule MetaEdge as reference.

(1) Needs "reverse MetaEdge": SmallMolecule ID on object

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:BiologicalProcess"]
                },
                "n1": {
                    "ids":["PUBCHEM.COMPOUND:24749"],
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:correlated_with"]
                }
            }
        }
    }
}

(2) Needs "reverse MetaEdge": SmallMolecule ID on subject, flipped version of (1)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:24749"],
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories":["biolink:BiologicalProcess"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:correlated_with"]
                }
            }
        }
    }
}

(3) Uses original MetaEdge: BiologicalProcess ID on subject

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["GO:0009106"],
                    "categories":["biolink:BiologicalProcess"]
                },
                "n1": {
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:correlated_with"]
                }
            }
        }
    }
}

(4) Uses original MetaEdge: BiologicalProcess ID on object, flipped version of (3)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "ids":["GO:0009106"],
                    "categories":["biolink:BiologicalProcess"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:correlated_with"]
                }
            }
        }
    }
}

Multiomics

https://smart-api.info/ui/1b6de23ed3c4e0713b20794477ba1e39

Using Protein -(associated_with)-> Disease / PhenotypicFeature MetaEdges as reference.

(1) Needs "reverse MetaEdge": Disease ID on object

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:Protein"]
                },
                "n1": {
                    "ids":["MONDO:0005477"],
                    "categories": ["biolink:Disease"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:associated_with"]
                }
            }
        }
    }
}

(2) Needs "reverse MetaEdge": Disease ID on subject, flipped version of (1)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MONDO:0005477"],
                    "categories": ["biolink:Disease"]

                },
                "n1": {
                    "categories":["biolink:Protein"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:associated_with"]
                }
            }
        }
    }
}

(3) Uses original MetaEdge: Protein ID on subject

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["UniProtKB:Q8NET8"],
                    "categories":["biolink:Protein"]
                },
                "n1": {
                    "categories": ["biolink:Disease"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:associated_with"]
                }
            }
        }
    }
}

(4) Uses original MetaEdge: Protein ID on object, flipped version of (3)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Disease"]
                },
                "n1": {
                    "ids":["UniProtKB:Q8NET8"],
                    "categories":["biolink:Protein"]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:associated_with"]
                }
            }
        }
    }
}

@tokebe
Copy link
Member

tokebe commented Sep 26, 2024

@colleenXu I've made PRs for the reverse-metaEdge injection behavior, see branches named ctkp-reverse

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 27, 2024

@tokebe

YAY it all works! I tested...

  • all 4 Multiomics TRAPIs with the example queries in earlier comments - looks like edges are being retrieved properly in all 4 query-flavors.
  • a MVP1 creative-mode: CTKP + DAKP (drug approvals) edges were successfully retrieved and showed up in the response

But note!

  • I didn't check if Multiomics TRAPIs can handle multiple input IDs on the same QNode (return edges for both input IDs properly). Did you @tokebe?
  • the logs on "created reverse metaEdges" are useful...but kinda a lot. And they show up in console logs at the beginning of each query execution....which seems like a bit much?
  • I noticed one obviously edge-direction-specific edge-attribute: CTKP's subject_boxed_warning. I think this refers to the chemical/intervention having a boxed warning. I think we can leave this alone for now.
    • @gglusman Could this edge-attribute be renamed to something direction-agnostic like drug_boxed_warning or intervention_boxed_warning? It is also confusing because subject in clinical trials research can mean multiple things...

And for the PRs...

I updated biothings/bte-server#43 to uncomment Multiomics CTKP.
Then I merged this add-trapi-kps stuff into biothings/bte-server#44. So now the diff between 43 and 44 is that 44 has the new flag.
I was trying to avoid merge conflicts, and I hope that works okay...

Also I made a PR for the upcoming semmeddb override biothings/bte-server#45. That may have a merge conflict with 44.

@tokebe
Copy link
Member

tokebe commented Sep 27, 2024

In my debug testing I was able to see that a query going to CTKP had multiple IDs on the node and results were still transformed correctly.

I've also pushed a change to the logging so it's just one line per KP.

@gglusman
Copy link

gglusman commented Oct 4, 2024

  • @gglusman Could this edge-attribute be renamed to something direction-agnostic like drug_boxed_warning or intervention_boxed_warning? It is also confusing because subject in clinical trials research can mean multiple things...

Absolutely! I just released version 2.4.0 of CTKP, now using intervention_boxed_warning.

@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI -> Test labels Oct 28, 2024
@colleenXu
Copy link
Collaborator

Current situation:

  • we use overrides to remove the other clinical-trial-data-based operations. This is so their removal can be deployed in tandem with adding Multiomics CTKP to BTE's use.
  • For now, we're going to keep those overrides (no hot-fixes) AND we aren't going to merge the yaml changes into master
    • wait until we know more about the next phase starting 12/2. Is Multiomics CTKP still around/active development? Does Translator still want us to use it exclusively for clinical trials info?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
On Test Related changes are deployed to Test server
Projects
None yet
Development

No branches or pull requests

4 participants