Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: SNOMED::ICD10CM Mapping Support #207

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/parsers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
Parsers
========

Field descriptions
------------------

Taken from:

https://www.nlm.nih.gov/healthit/snomedct/us_edition.html

Download a zip file there, and inside there will be the following PDF, which documents the fields as shown below.

doc_Icd10cmMapReleaseNotes_Current-en-US_US1000124_20210901.pdf

More info:

https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html

FIELD,DATA_TYPE,PURPOSE,SSSOM Dev Comments
- id,UUID,A 128 bit unsigned integer, uniquely identifying the map record,
- effectiveTime,Time,Specifies the inclusive date at which this change becomes effective.,
- active,Boolean,Specifies whether the member’s state was active (=1) or inactive (=0) from the nominal release date specified by the effectiveTime field.,
- moduleId,SctId,Identifies the member version’s module. Set to a child of 900000000000443000|Module| within the metadata hierarchy.,The only value in the entire set is '5991000124107', which has label 'SNOMED CT to ICD-10-CM rule-based mapping module' (https://www.findacode.com/snomed/5991000124107--snomed-ct-to-icd-10-cm-rule-based-mapping-module.html).
- refSetId,SctId,Set to one of the children of the |Complex map type| concept in the metadata hierarchy.,The only value in the entire set is '5991000124107', which has label 'ICD-10-CM complex map reference set' (https://www.findacode.com/snomed/6011000124106--icd-10-cm-complex-map-reference-set.html).
- referencedComponentId,SctId,The SNOMED CT source concept ID that is the subject of the map record.,
- mapGroup,Integer,An integer identifying a grouping of complex map records which will designate one map target at the time of map rule evaluation. Source concepts that require two map targets for classification will have two sets of map groups.,
- mapPriority,Integer,Within a map group, the mapPriority specifies the order in which complex map records should be evaluated to determine the correct map target.,
- mapRule,String,A machine-readable rule, (evaluating to either ‘true’ or ‘false’ at run-time) that indicates whether this map record should be selected within its map group.,
- mapAdvice,String,Human-readable advice that may be employed by the software vendor to give an end-user advice on selection of the appropriate target code. This includes a) a summary statement of the map rule logic, b) a statement of any limitations of the map record and c) additional classification guidance for the coding professional.,
- mapTarget,String,The target ICD-10 classification code of the map record.,
- correlationId,SctId,A child of |Map correlation value| in the metadata hierarchy, identifying the correlation between the SNOMED CT concept and the target code.,
- mapCategoryId,SctId,Identifies the SNOMED CT concept in the metadata hierarchy which is the MapCategory for the associated map record. This is a subtype of 447634004 |ICD-10 Map Category value|.,

Mappings: SSSOM::SNOMED_Complex_Map
-----------------------------------
Copy/pasta of state of mappings as of 2022/03/04:

'subject_id': f'SNOMED:{row["referencedComponentId"]}',
'subject_label': row['referencedComponentName'],

# 'predicate_id': 'skos:exactMatch',
# - mapCategoryId: can use for mapping predicate? Or is correlationId more suitable?
# or is there a SKOS predicate I can map to in case where predicate is unknown? I think most of these
# mappings are attempts at exact matches, but I can't be sure (at least not without using these fields
# to determine: mapGroup, mapPriority, mapRule, mapAdvice).
# mapCategoryId,mapCategoryName: Only these in set: 447637006 "MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED",
# 447638001 "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA",
# 447639009 "MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT"
# 'predicate_modifier': '???',
# Description: Modifier for negating the prediate. See https://github.com/mapping-commons/sssom/issues/40
# Range: PredicateModifierEnum: (joe: only lists 'Not' as an option)
# Example: Not Negates the predicate, see documentation of predicate_modifier_enum
# - predicate_id <- mapAdvice?
# - predicate_modifier <- mapAdvice?
# mapAdvice: Pipe-delimited qualifiers. Ex:
# "ALWAYS Q71.30 | CONSIDER LATERALITY SPECIFICATION"
# "IF LISSENCEPHALY TYPE 3 FAMILIAL FETAL AKINESIA SEQUENCE SYNDROME CHOOSE Q04.3 | MAP OF SOURCE CONCEPT
# IS CONTEXT DEPENDENT"
# "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA"
'predicate_id': f'SNOMED:{row["mapCategoryId"]}',
'predicate_label': row['mapCategoryName'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share an example of how the resulting sssom looks like, maybe, 10 rows in markdown here in this pull request?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sure thing.

Copy link
Collaborator Author

@joeflack4 joeflack4 Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table

I don't know if this table appears very strange / unreadable to others. It wasn't the case before, but as I am editing this today (2023/11/14) it is strange now.

subject_id predicate_id object_id match_type subject_label subject_category predicate_label predicate_modifier object_label object_category author_id author_label reviewer_id reviewer_label creator_id creator_label license subject_source subject_source_version object_source object_source_version mapping_provider mapping_cardinality mapping_tool mapping_tool_version mapping_date confidence subject_match_field object_match_field match_string subject_preprocessing object_preprocessing match_term_type semantic_similarity_score semantic_similarity_measure see_also other comment
SNOMED:10000006 SNOMED:447637006 ICD10CM:R07.9 Unspecified Radiating chest pain MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Chest pain, unspecified 2021-09-01 id=4c4d2755-e65f-52dc-8353-ecf437495eb8|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS R07.9
SNOMED:10001005 SNOMED:447639009 ICD10CM:P36.9 Unspecified Bacterial sepsis MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT Bacterial sepsis of newborn, unspecified 2021-09-01 id=f7e1ed02-3107-5a6c-9439-21e9879fb746|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=IFA 445518008 | Age at onset of clinical finding (observable entity) | < 29.0 days|mapAdvice=IF AGE AT ONSET OF CLINICAL FINDING BEFORE 29.0 DAYS CHOOSE P36.9 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | DESCENDANTS NOT EXHAUSTIVELY MAPPED | MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT
SNOMED:10001005 SNOMED:447637006 ICD10CM:A41.9 Unspecified Bacterial sepsis MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Sepsis, unspecified organism 2021-09-01 id=907aa2b0-a832-5254-acdf-6f366e406d2b|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=2|mapRule=OTHERWISE TRUE|mapAdvice=ALWAYS A41.9 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | DESCENDANTS NOT EXHAUSTIVELY MAPPED
SNOMED:10007009 SNOMED:447637006 ICD10CM:Q04.8 Unspecified Coffin-Siris syndrome MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Other specified congenital malformations of brain 2021-09-01 id=0b918513-be22-5059-beb6-e8348dfc631f|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS Q04.8
SNOMED:1001000119102 SNOMED:447637006 ICD10CM:I26.99 Unspecified Pulmonary embolism with pulmonary infarction MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Other pulmonary embolism without acute cor pulmonale 2021-09-01 id=7f3f0a95-0441-57ba-a58f-03d2f2736c60|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS I26.99
SNOMED:10017004 SNOMED:447637006 ICD10CM:K03.0 Unspecified Occlusal wear of teeth MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Excessive attrition of teeth 2021-09-01 id=3fa74c85-56d0-5819-97e8-80c8d4c053c0|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS K03.0
SNOMED:100191000119105 SNOMED:447637006 ICD10CM:N42.89 Unspecified Asymmetry of prostate MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Other specified disorders of prostate 2021-09-01 id=51537076-a3bf-59f8-aa5f-f571dfcf32cb|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS N42.89
SNOMED:100211000119106 SNOMED:447637006 ICD10CM:M62.830 Unspecified Muscle spasm of thoracic back MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Muscle spasm of back 2021-09-01 id=a0263a65-4056-5cf9-8caa-669afce20189|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS M62.830
SNOMED:1002229008 SNOMED:447637006 ICD10CM:L75.2 Unspecified Apocrine miliaria of areola MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Apocrine miliaria 2021-09-01 id=65fa44e5-272d-54e5-b787-cb977fc24720|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS L75.2
SNOMED:1002253002 SNOMED:447637006 ICD10CM:Z28.3 Unspecified Immunization series incomplete MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED Underimmunization status 2021-09-01 id=c2ab86f4-29c5-5ea3-8818-1e389d9c59a6|active=1|moduleId=5991000124107|refsetId=6011000124106|mapGroup=1|mapPriority=1|mapRule=TRUE|mapAdvice=ALWAYS Z28.3 | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • match_type does not exist anymore
  • Why all these empty columns?
  • predicate_id is not legal - I looked this SNOMED rel up but could not understand what it means at all. Can you explain how "Map source concept is properly classified" is a mapping relation? how does it map to skos?
  • you should add a test that runs validate_file on the a small sample of the output I think,

Copy link
Collaborator Author

@joeflack4 joeflack4 Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • match_type: Oh this is an old copy. I haven't re-ran it in awhile. The code has since been updated to mapping_justification.
  • Empty columns: I haven't determined how to populate these SSSOM fields. For many of them, there may not be any information available. I have enumerated the SSSOM Mapping data model as of 2022/30 in my code comments (link). If you click that link and check it out, you'll see that I have some thoughts/suggestions/questions as to what/how to populate. Any feedback/suggestions to those questions/comments are appreciated.
  • predicate_id: I opened a review comment in the first commit about populating this field from SNOMED's mapCategoryId and mapCategoryName fields. They don't use too many mapping predicates, unfortunately. In that linked review comment, I stated that there were only 3 values in the field, but the ICD10CM mapping dataset only includes 1: MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED (447637006). You might be able to find more information in the "codebook" that SNOMED has published on these mappings. That codebook is named doc_Icd10cmMapReleaseNotes_Current-en-US_US1000124_20210901.pdf which I have zipped up for you (w/ some other related files) here:
    SNOMED ICD10CM map technical guide etc.zip
  • test: Sure thing, I'll add one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeflack4 TODO: Note to self: Add a test that does validate_file on the output.


'object_id': f'ICD10CM:{row["mapTarget"]}',
'object_label': row['mapTargetName'],

# match_type <- mapRule?
# ex: TRUE: when "ALWAYS <code>" is in pipe-delimited list in mapAdvice, this always shows TRUE. Does this
# mean I could use skos:exactMatch in these cases?
# match_type <- correlationId?: This may look redundant, but I want to be explicit. In officially downloaded
# SNOMED mappings, all of them had correlationId of 447561005, which also happens to be 'unspecified'.
# If correlationId is indeed more appropriate for predicate_id, then I don't think there is a representative
# field for 'match_type'.
'match_type': MatchTypeEnum('Unspecified') if row['correlationId'] == match_type_snomed_unspecified_id \
else MatchTypeEnum('Unspecified'),

'mapping_date': date_parser.parse(str(row['effectiveTime'])).date(),
'other': '|'.join([f'{k}={str(row[k])}' for k in [
'id',
'active',
'moduleId',
'refsetId',
'mapGroup',
'mapPriority',
'mapRule',
'mapAdvice',
]]),

# More fields (https://mapping-commons.github.io/sssom/Mapping/):
# - subject_category: absent
# - author_id: can this be "SNOMED"?
# - author_label: can this be "SNOMED"?
# - reviewer_id: can this be "SNOMED"?
# - reviewer_label: can this be "SNOMED"?
# - creator_id: can this be "SNOMED"?
# - creator_label: can this be "SNOMED"?
# - license: Is this something that can be determined?
# - subject_source: URL of some official page for SNOMED version used?
# - subject_source_version: Is this knowable?
# - objectCategory <= mapRule?
# mapRule: ex: TRUE: when "ALWAYS <code>" is in pipe-delimited list in mapAdvice, this always shows TRUE.
# Does this mean I could use skos:exactMatch in these cases?
# object_category:
# objectCategory:
# Description: The conceptual category to which the subject belongs to. This can be a string denoting
# the category or a term from a controlled vocabulary.
# Example: UBERON:0001062 (The CURIE of the Uberon term for "anatomical entity".)
# - object_source: URL of some official page for ICD10CM version used?
# - object_source_version: would this be "10CM" as in "ICD10CM"? Or something else? Or nothing?
# - mapping_provider: can this be "SNOMED"?
# - mapping_cardinality: Could I determine 1:1 or 1:many or many:1 based on:
# mapGroup, mapPriority, mapRule, mapAdvice?
# - match_term_type: What is this?
# - see_also: Should this be a URL to the SNOMED term?
# - comment: Description: Free text field containing either curator notes or text generated by tool providing
# additional informative information.


SNOMED mapping related codes
----------------------------
match_type_snomed_unspecified_id = 447561005
https://www.findacode.com/snomed/447561005--snomed-ct-source-code-to-target-map-correlation-not-specified.html

Additional resources
--------------------
About SNOMED simple and complex refsets:
https://github.com/HOT-Ecosystem/tccm/blob/master/docs/SNOMED/MapRefsets.md
184 changes: 183 additions & 1 deletion sssom/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re
import typing
from collections import Counter
from dateutil import parser as date_parser
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, TextIO, Tuple, Union, cast
from urllib.request import urlopen
Expand All @@ -19,9 +20,9 @@
from linkml_runtime.loaders.json_loader import JSONLoader
from rdflib import Graph, URIRef

# from .sssom_datamodel import Mapping, MappingSet
from sssom_schema import Mapping, MappingSet


from sssom.constants import (
CONFIDENCE,
CURIE_MAP,
Expand Down Expand Up @@ -261,6 +262,28 @@ def parse_obographs_json(
)


def parse_snomed_complex_map_tsv(
file_path: str,
prefix_map: Dict[str, str] = None,
meta: Dict[str, str] = None,
filter_by_confident_mappings=True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your request to filter only mapping rows that contain 'ALWAYS <code>', and possible future flexibility to fine tune the mappings, I added a param here and just set it to True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

) -> MappingSetDataFrame:
"""Parse special SNOMED ICD10CM mapping file and translates it into a MappingSetDataFrame.

:param file_path: The path to the source file
:param prefix_map: an optional prefix map
:param meta: an optional dictionary of metadata elements
:param filter_by_confident_mappings: Will only include mapping rows where the `mapAdvice` field includes an 'ALWAYS
<code>' pattern.
:return: A SSSOM MappingSetDataFrame
"""
raise_for_bad_path(file_path)
df = read_pandas(file_path)
df2 = from_snomed_complex_map_tsv(
df, prefix_map=prefix_map, meta=meta, filter_by_confident_mappings=filter_by_confident_mappings)
return df2


def _get_prefix_map_and_metadata(
prefix_map: Optional[PrefixMap] = None, meta: Optional[MetadataType] = None
) -> Metadata:
Expand Down Expand Up @@ -666,6 +689,163 @@ def from_obographs(
return to_mapping_set_dataframe(mdoc)


def from_snomed_complex_map_tsv(
df: pd.DataFrame,
prefix_map: Optional[PrefixMap] = None,
meta: Optional[MetadataType] = None,
filter_by_confident_mappings=True
) -> MappingSetDataFrame:
"""Convert a snomed_icd10cm_map dataframe to a MappingSetDataFrame.

:param df: A mappings dataframe
:param prefix_map: A prefix map
:param meta: A metadata dictionary
:param filter_by_confident_mappings: Will only include mapping rows where the `mapAdvice` field includes an 'ALWAYS
<code>' pattern.
:return: MappingSetDataFrame

# Field descriptions
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
# - Taken from: doc_Icd10cmMapReleaseNotes_Current-en-US_US1000124_20210901.pdf
FIELD,DATA_TYPE,PURPOSE,Joe's comments
- id,UUID,A 128 bit unsigned integer, uniquely identifying the map record,
- effectiveTime,Time,Specifies the inclusive date at which this change becomes effective.,
- active,Boolean,Specifies whether the member’s state was active (=1) or inactive (=0) from the nominal release date
specified by the effectiveTime field.,
- moduleId,SctId,Identifies the member version’s module. Set to a child of 900000000000443000|Module| within the
metadata hierarchy.,The only value in the entire set is '5991000124107', which has label 'SNOMED CT to ICD-10-CM
rule-based mapping module' (
https://www.findacode.com/snomed/5991000124107--snomed-ct-to-icd-10-cm-rule-based-mapping-module.html).
- refSetId,SctId,Set to one of the children of the |Complex map type| concept in the metadata hierarchy.,The only
value in the entire set is '5991000124107', which has label 'ICD-10-CM complex map reference set' (
https://www.findacode.com/snomed/6011000124106--icd-10-cm-complex-map-reference-set.html).
- referencedComponentId,SctId,The SNOMED CT source concept ID that is the subject of the map record.,
- mapGroup,Integer,An integer identifying a grouping of complex map records which will designate one map target at
the time of map rule evaluation. Source concepts that require two map targets for classification will have two sets
of map groups.,
- mapPriority,Integer,Within a map group, the mapPriority specifies the order in which complex map records should be
evaluated to determine the correct map target.,
- mapRule,String,A machine-readable rule, (evaluating to either ‘true’ or ‘false’ at run-time) that indicates
whether this map record should be selected within its map group.,
- mapAdvice,String,Human-readable advice that may be employed by the software vendor to give an end-user advice on
selection of the appropriate target code. This includes a) a summary statement of the map rule logic, b) a statement
of any limitations of the map record and c) additional classification guidance for the coding professional.,
- mapTarget,String,The target ICD-10 classification code of the map record.,
- correlationId,SctId,A child of |Map correlation value| in the metadata hierarchy, identifying the correlation
between the SNOMED CT concept and the target code.,
- mapCategoryId,SctId,Identifies the SNOMED CT concept in the metadata hierarchy which is the MapCategory for the
associated map record. This is a subtype of 447634004 |ICD-10 Map Category value|.,
"""
# Local variables
# https://www.findacode.com/snomed/447561005--snomed-ct-source-code-to-target-map-correlation-not-specified.html
mapping_justification_snomed_unspecified_id = 447561005
# - Note: joeflack4: I used this info as a reference for this pattern.
# https://www.medicalbillingandcoding.org/icd-10-cm/#:~:text=ICD%2D10%2DCM%20is%20a,decimal%20point%20and%20the%20subcategory.
always_confidence_pattern = 'ALWAYS [A-Z]{1}[0-9]{1,2}\.[0-9A-Z]{1,4}'
always_confidence_antipattern = always_confidence_pattern + '\?'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found edge cases where 'ALWAYS <code>' appears, but has a ? mark after, indicating that it is not 100% confident. Example:

ALWAYS T50.905? | CONSIDER ADDITIONAL CODE TO IDENTIFY SPECIFIC CONDITION OR DISEASE | EPISODE OF CARE INFORMATION NEEDED

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix_map = _ensure_prefix_map(prefix_map)
ms = _init_mapping_set(meta)

# Filtering
if filter_by_confident_mappings:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the filtering you requested.

It turns out that I was wrong earlier when I thought that if mapAdvice contained 'ALWAYS <code>', mapRule would always be TRUE. If this was the case, filtering by mapRule would've been much easier, but unfortunately it did not turn out to be the case.

I was thinking about a simple .str.contains('ALWAYS'), but it is possible that the word 'ALWAYS' could appear in cases other than 'ALWAYS <code>', so I decided to go with the stricter approach of a regex.

I created my own regex. But now I just realized that ICD10 is such a big thing that I should've just been googled a pre-baked regex. Here's one I found: https://www.johndcook.com/blog/2019/05/05/regex_icd_codes/. If you'd like me to use that or another regex instead, let me know.

I checked to see how many rows were removed by this filter, and it was about ~50%.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df = df[
(df['mapAdvice'].str.contains(always_confidence_pattern, regex=True, na=False)) &
(~df['mapAdvice'].str.contains(always_confidence_antipattern, regex=True, na=False))]

# Map mappings
mlist: List[Mapping] = []
for _, row in df.iterrows():
mdict = {
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
'subject_id': f'SNOMED:{row["referencedComponentId"]}',
'subject_label': row['referencedComponentName'],

# 'predicate_id': 'skos:exactMatch',
# - mapCategoryId: can use for mapping predicate? Or is correlationId more suitable?
# or is there a SKOS predicate I can map to in case where predicate is unknown? I think most of these
# mappings are attempts at exact matches, but I can't be sure (at least not without using these fields
# to determine: mapGroup, mapPriority, mapRule, mapAdvice).
# mapCategoryId,mapCategoryName: Only these in set: 447637006 "MAP SOURCE CONCEPT IS PROPERLY CLASSIFIED",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapCategoryId and mapCategoryName

I'm not sure if there's any useful information here that could be put anywhere than the other field. There are only 3 values that show up in the entire mapping dataset for both mapCategoryId and mapCategoryName, and I listed them out int he comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# 447638001 "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA",
# 447639009 "MAP OF SOURCE CONCEPT IS CONTEXT DEPENDENT"
# 'predicate_modifier': '???',
# Description: Modifier for negating the prediate. See https://github.com/mapping-commons/sssom/issues/40
# Range: PredicateModifierEnum: (joe: only lists 'Not' as an option)
# Example: Not Negates the predicate, see documentation of predicate_modifier_enum
# - predicate_id <- mapAdvice?
# - predicate_modifier <- mapAdvice?
# mapAdvice: Pipe-delimited qualifiers. Ex:
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
# "ALWAYS Q71.30 | CONSIDER LATERALITY SPECIFICATION"
# "IF LISSENCEPHALY TYPE 3 FAMILIAL FETAL AKINESIA SEQUENCE SYNDROME CHOOSE Q04.3 | MAP OF SOURCE CONCEPT
# IS CONTEXT DEPENDENT"
# "MAP SOURCE CONCEPT CANNOT BE CLASSIFIED WITH AVAILABLE DATA"
'predicate_id': f'SNOMED:{row["mapCategoryId"]}',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned elsewhere, I'm not 100% sure if predicate_* is best taken from mapCategory*. It's possible that mapRule and mapAdvice could be better. Would like to defer to others for judgement on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a much deeper understanding of these is needed.. Ask in the call later.

'predicate_label': row['mapCategoryName'],

'object_id': f'ICD10CM:{row["mapTarget"]}',
'object_label': row['mapTargetName'],

# mapping_justification <- mapRule?
# ex: TRUE: when "ALWAYS <code>" is in pipe-delimited list in mapAdvice, this always shows TRUE. Does this
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
# mean I could use skos:exactMatch in these cases?
# mapping_justification <- correlationId?: This may look redundant, but I want to be explicit. In officially downloaded
# SNOMED mappings, all of them had correlationId of 447561005, which also happens to be 'unspecified'.
# If correlationId is indeed more appropriate for predicate_id, then I don't think there is a representative
# field for 'mapping_justification'.
# TODO: How to properly get mapping_justification?
Copy link
Collaborator Author

@joeflack4 joeflack4 Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 3: solved

Not sure how to use sssom_schema.slots.mapping_justification here.
edit: Harshad recommended I use SEMAPV

Issue 4

It might take a significant amount of time (maybe) to map all the SNOMED mapping fields to SSSOM mapping fields. But I should probably look into that again.


@matentzn I recommend we merge this now while this is rebased and functional, and we can make improvements to this feature later.

# I think I need to use sssom_schema.slots.mapping_justification, but not sure how to use.
# slots.mapping_justification = Slot(uri=SSSOM.mapping_justification, name="mapping_justification", curie=SSSOM.curie('mapping_justification'),
# model_uri=SSSOM.mapping_justification, domain=None, range=Union[str, EntityReference],
# pattern=re.compile(r'^semapv:(MappingReview|ManualMappingCuration|LogicalReasoning|LexicalMatching|CompositeMatching|UnspecifiedMatching|SemanticSimilarityThresholdMatching|LexicalSimilarityThresholdMatching|MappingChaining)$'))
'mapping_justification':
'Unspecified' if row['correlationId'] == mapping_justification_snomed_unspecified_id else 'Unspecified',
'mapping_date': date_parser.parse(str(row['effectiveTime'])).date(),
'other': '|'.join([f'{k}={str(row[k])}' for k in [
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
'id',
'active',
'moduleId',
'refsetId',
'mapGroup',
'mapPriority',
'mapRule',
'mapAdvice',
]]),

# More fields (https://mapping-commons.github.io/sssom/Mapping/):
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
# - subject_category: absent
# - author_id: can this be "SNOMED"?
# - author_label: can this be "SNOMED"?
# - reviewer_id: can this be "SNOMED"?
# - reviewer_label: can this be "SNOMED"?
# - creator_id: can this be "SNOMED"?
# - creator_label: can this be "SNOMED"?
# - license: Is this something that can be determined?
# - subject_source: URL of some official page for SNOMED version used?
# - subject_source_version: Is this knowable?
# - objectCategory <= mapRule?
# mapRule: ex: TRUE: when "ALWAYS <code>" is in pipe-delimited list in mapAdvice, this always shows TRUE.
# Does this mean I could use skos:exactMatch in these cases?
# object_category:
# objectCategory:
# Description: The conceptual category to which the subject belongs to. This can be a string denoting
# the category or a term from a controlled vocabulary.
# Example: UBERON:0001062 (The CURIE of the Uberon term for "anatomical entity".)
# - object_source: URL of some official page for ICD10CM version used?
# - object_source_version: would this be "10CM" as in "ICD10CM"? Or something else? Or nothing?
# - mapping_provider: can this be "SNOMED"?
# - mapping_cardinality: Could I determine 1:1 or 1:many or many:1 based on:
# mapGroup, mapPriority, mapRule, mapAdvice?
# - match_term_type: What is this?
# - see_also: Should this be a URL to the SNOMED term?
# - comment: Description: Free text field containing either curator notes or text generated by tool providing
# additional informative information.
}
mlist.append(_prepare_mapping(Mapping(**mdict)))

ms.mappings = mlist
_set_metadata_in_mapping_set(mapping_set=ms, metadata=meta)
doc = MappingSetDocument(mapping_set=ms, prefix_map=prefix_map)
return to_mapping_set_dataframe(doc)


# All from_* take as an input a python object (data frame, json, etc) and return a MappingSetDataFrame
# All read_* take as an input a a file handle and return a MappingSetDataFrame (usually wrapping a from_* method)

Expand All @@ -690,6 +870,8 @@ def get_parsing_function(input_format: Optional[str], filename: str) -> Callable
return parse_alignment_xml
elif input_format == "obographs-json":
return parse_obographs_json
elif input_format == "snomed-complex-map-tsv":
return parse_snomed_complex_map_tsv
else:
raise Exception(f"Unknown input format: {input_format}")

Expand Down
Loading