Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Mapping QC workflow #333

Open
3 tasks
matentzn opened this issue Jan 31, 2023 · 0 comments
Open
3 tasks

Supporting Mapping QC workflow #333

matentzn opened this issue Jan 31, 2023 · 0 comments

Comments

@matentzn
Copy link

matentzn commented Jan 31, 2023

The mapping QC workflow is about reviewing the existing mappings on an ongoing basis. The idea is to review the bottom N clusters once per month and thereby implement an ongoing cycle of ever improving mappings.

Note, there is no mappings being generated by this workflow. This is part of another issue.

Workflow:

  • Input Ontology O
  • Input M: existing mappings separated into two levels of confidence
    • Reviewed: 0.99 %
    • Not reviewed 0.95 %
    • Key: No new mappings are added
  • PT=sssom-py:ptable(M)
  • {results.json, |cluster-X.png|, |cluster-X.md}, =boomer(PT, O)
  • {BOTTOM_10_CLUSTERS, LEAST_PROBABLE_MAPPINGS} = oak:boomerang(results.json, N)
  • GitHub Action: make issues for BOTTOM_10_CLUSTERS, including cluster-X.png and cluster-X.md
  • The reviewer now checks each cluster and _adds a semapv:MappingReview justification, which is separately curated from the existing mapping. If need be the existing mapping will be changed as well. This will be used to generate confidence scores for input M. There should never be more than 10 issues open. Ideally we can somehow recognise for a given cluster that an issue already exists (by parsing its title for the hashcode boomer provides).

image

New boomer requirements

  • Output report results.json contains probability scores that enable us to select cliques which should be reviewed.
  • results.json should conform to the new OAK cluster data model
  • cluster-X.md files should be on a by-clique basis rather than one huge file and ideally already contain the image tag which can be assumed to be in the same directory (not sure how this will work with posting a github issue though - maybe you know how this could be automated)

Comments

  • "joint posterior prop most likely of clique / prop next most likely - how interesting is this cluster?" @cmungall
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant