Supporting the Mapping Integration workflow #334

matentzn · 2023-01-31T12:12:36Z

The mapping Integration (as opposed to QC #333) workflow is about effective integration of new mappings into an ontology while maintaining consistency. The goal is to be able to rapidly slurp up existing mappings (almost) without the need of human review

Workflow:

Input Ontology O (e.g. Mondo)
Input M:
- Merged set of mappings:
- Internal (existing, verified mappings):
  - Reviewed: 0.99 %
  - Not reviewed 0.95 %
- External (OAK lexmatch, existing mapping sets)
  - Confidence on a case by case basis, configured as part of mapping commons
PT=sssom-py:ptable(M)
{best-guess.sssom.tsv, results.json, |cluster-X.png|, |cluster-X.md}, =boomer(PT, O)
EDIT: I thought we would do a proper human review of questionable clusters here, but maybe we leave this to Supporting Mapping QC workflow #333 instead to make this workflow more scalable
difference.sssom.tsv = sssom-py:diff(M, best-guess.sssom.tsv)
Cursory human review of difference.sssom.tsv (eyeballing), no semapv:MappingReview justification added. Links from SSSOM file to related cluster facilitates to effectively review using a nice image (this could be an app one day).
Rejected mappings from the difference.sssom.tsv should be recorded in a "negative.sssom.tsv" mapping file by the curators

New boomer requirements

Output best-guess.sssom.tsv should be sssom accept SSSOM format #47 and also include a notion of mapping confidence (I didn't get 100% how cluster and mapping confidence should relate in our meeting, but I think you did) and a link to the associated mapping cluster. If there is other metadata you think that can help with the review, you can add it into the comment section.
Most of the stuff in Supporting Mapping QC workflow #333

Comments:

"low prior property mapping will be rejected in a high probability clique" (@cmungall)
boomer does not necessarily create a globally coherent outcome model (@balhoff)

balhoff · 2023-05-03T16:30:30Z

Discussion with @matentzn: implement output of cliques in order of increasing confidence (joint posterior prop most likely of clique / prop next most).

matentzn added this to the Mondo workflow milestone Jan 31, 2023

cmungall added a commit to INCATools/ontology-access-kit that referenced this issue Jan 31, 2023

Adding compare; first pass at INCATools/boomer#334

e8699d7

cmungall added a commit to INCATools/ontology-access-kit that referenced this issue Jan 31, 2023

Adding BoomerEngine.compare, addresses INCATools/boomer#334

b1e4a9b

cmungall mentioned this issue Jan 31, 2023

Tool for processing mapping clusters (boomer reports) INCATools/ontology-access-kit#440

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting the Mapping Integration workflow #334

Supporting the Mapping Integration workflow #334

matentzn commented Jan 31, 2023 •

edited

Loading

balhoff commented May 3, 2023

Supporting the Mapping Integration workflow #334

Supporting the Mapping Integration workflow #334

Comments

matentzn commented Jan 31, 2023 • edited Loading

Workflow:

New boomer requirements

Comments:

balhoff commented May 3, 2023

matentzn commented Jan 31, 2023 •

edited

Loading