Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exact synonym updates - Build 1 #601

Closed
wants to merge 2 commits into from
Closed

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Jul 1, 2024

@joeflack4 joeflack4 marked this pull request as draft July 1, 2024 18:22
@joeflack4 joeflack4 mentioned this pull request Jul 1, 2024
9 tasks
@joeflack4 joeflack4 self-assigned this Jul 1, 2024
@joeflack4 joeflack4 added the build Mostly for build PRs: when changes only to data files post `build-mondo-ingest`; no code changes label Jul 1, 2024
@joeflack4 joeflack4 force-pushed the exact-syn-updates-build branch 4 times, most recently from 27713f0 to a8d3777 Compare July 3, 2024 03:20
docs/metrics/doid.md Show resolved Hide resolved
Copy link
Contributor Author

@joeflack4 joeflack4 Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

@matentzn @twhetzel I haven't reviewed this yet, but I believe all our outputs are here, and we can review this now if you like even though the build is failing, because it is failing on the prepare_release step at the very end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@joeflack4 joeflack4 marked this pull request as ready for review July 3, 2024 17:54
- Update: remaps IAO:0000118 (alt label) to oboInOwl:exactSynonym
- Add: exact_syn_from_label.ru: adds oboInOwl:exactSynonym for all rdfs:label on components
- Update: component goals to use new query
Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better leave branch builds in draft PR mode? not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive change.. interesting to look out for in general, but no indication anything is broken in this pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better leave branch builds in draft PR mode? not sure.

@twhetzel I know this is another tedious SOP detail. Do you have opinion? I like to undraft to let it be known it's ready for review. In case of accidental merge I know how to fix. I don't mind either way

Copy link
Contributor Author

@joeflack4 joeflack4 Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the changes... you're saying it's not anything to worry about in scope of this PR, so I'll stop here at the analysis. I just checked other recent PRs because I expected to see similar diffs in those ones as well, thinking that this was due to a recent change in the upstream source and nothing to do with the changes in this PR. But this PR is the only one with such large differences.

This PR: many changes

exact-syn-updates-build

git diff --stat origin/exact-syn-updates-build develop src/ontology/external/nord.robot.owl
src/ontology/external/nord.robot.owl | 7866 ++++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 6873 insertions(+), 993 deletions(-)

Other recent PRs: minimal changes

dataload-subclass-sync-switch-edit

git diff --stat origin/dataload-subclass-sync-switch-edit develop src/ontology/external/nord.robot.owl
src/ontology/external/nord.robot.owl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

data-built-refactor-git-refresh

git diff --stat origin/data-built-refactor-git-refresh develop src/ontology/external/nord.robot.owl
src/ontology/external/nord.robot.owl | 60 ++++++++----------------------------------------------------
1 file changed, 8 insertions(+), 52 deletions(-)

build_2024-07-03_main

git diff --stat origin/build_2024-07-03_main develop src/ontology/external/nord.robot.owl
src/ontology/external/nord.robot.owl | 86 +++++++++++++++---------------------------------------------
1 file changed, 21 insertions(+), 65 deletions(-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking. I also prefer to have feature branch builds in Draft mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huge change, maybe investigate the pattern of this change. Probably a serialiser thing with blank nodes, just guessing

Copy link
Contributor Author

@joeflack4 joeflack4 Jul 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look soon.
I wonder if / how this might have resulted from the changes in this pr

Copy link
Contributor Author

@joeflack4 joeflack4 Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so @matentzn I don't think this is worrisome at all. This is expected.

 doid-unmapped.owl            |   104 +-
 icd10cm-unmapped.owl         | 95849 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 icd10who-unmapped.owl        | 12544 ++++++++++++++++-
 icd11foundation-unmapped.owl | 70034 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 ncit-unmapped.owl            |     2 +-
 omim-unmapped.owl            |   795 +-
 ordo-unmapped.owl            | 10925 ++++++++++++++-

With the exception of NCIT & DOID (see comment below), I think that the size of these diffs are simply roughly correlated to the amount of unmapped terms.

unmapped.md table

Note that they don't correlate quite exactly. That bugs me a little bit but I'm not going to worry about it.

Mapping progress report

Ontology Tot terms Tot excluded Tot deprecated Tot deprecated unmapped Tot mappable (!excluded, !deprecated) Tot mapped (mappable) Tot unmapped (mappable) % unmapped (mappable)
ICD10WHO 12,542 0 0 0 12,542 18 12,524 99.9%
ICD10CM 95,847 15,452 0 0 80,395 1,163 79,232 98.6%
NCIT 191,123 169,937 5,221 5,199 15,965 3,675 12,290 77.0%
ICD11FOUNDATION 100,382 30,335 6,587 6,587 64,451 0 64,451 100.0%
ORDO 15,402 6,212 1,391 1,165 9,190 9,137 53 0.6%
DOID 14,082 2,656 2,484 2,468 11,424 11,391 33 0.3%
OMIM 29,382 19,278 1,364 1,318 8,741 8,739 2 0.0%

The diffs are as I would expect. These are just newly added synonyms. There are a lot of unmapped terms, so the diff is large.

@matentzn You agree that this is expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, notice that the diff for NCIT and DOID is very small. I looked, and for some reason, the unmapped/*.owl files for these sources don't have any synonyms or labels (or really any properties in them), even though the components*.owl files do. I created an issue for this:

@joeflack4 joeflack4 marked this pull request as draft July 8, 2024 20:07
@joeflack4 joeflack4 changed the title Exact synonym updates - Build Exact synonym updates - Build 1 - erroneous? Jul 8, 2024
@joeflack4 joeflack4 changed the title Exact synonym updates - Build 1 - erroneous? Exact synonym updates - Build 1 Jul 8, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: newer build

Since this build initially froze, I had to run it several times. It looks like #609 is the final build that ran. I just pushed it after noticing that my local files in the clone where I ran the build were different than what was here in #601. Note that there are 21 more files changed in that PR than in this one.

I probably just forgot to --force push when finishing that final build. Sorry.

@joeflack4 joeflack4 deleted the exact-syn-updates-build branch August 11, 2024 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Mostly for build PRs: when changes only to data files post `build-mondo-ingest`; no code changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants