Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.

Workflow 4: Green/Gamma Implementation of Modules 1-4 & 5-8 #44

Open
karafecho opened this issue Jan 22, 2019 · 6 comments
Open

Workflow 4: Green/Gamma Implementation of Modules 1-4 & 5-8 #44

karafecho opened this issue Jan 22, 2019 · 6 comments

Comments

@karafecho
Copy link
Contributor

This issue relates to Green/Gamma's efforts on Workflow 4.

@karafecho
Copy link
Contributor Author

karafecho commented Feb 14, 2019

Scroll down for updates to plan

Plan for implementation of Workflow 4:

  1. Use functionality four in ICEES to stratify/cluster by Sex2 (Male vs Female) and return phenotypes that demonstrate a significant difference between the strata. The phenotypes will be diagnoses and certain demographic variables. The output list will be passed to ROBOKOP for execution of queries in the form: "disease or phenotypic feature -> gene -> biological process/activity -> chemical substance" and/or "disease or phenotypic feature -> gene -> biological process/activity -> gene -> drug".

Note that other paths are possible and may be attempted.

  1. Use COHD to stratify/cluster by Sex. See COHD UI and a query template plus a specific instance of Workflow 5. Retrieve top 20 diagnoses (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "disease -> gene -> biological process/activity -> chemical substance".

  2. Use Clinical Profiles to identify/create sub-cohorts of males and females with asthma. Retrieve top 20 diagnoses (based on frequency) for each sub-cohort. The output list will be passed to ROBOKOP for execution of queries in the form: "disease -> gene -> biological process/activity -> phenotype".

Note re ICEES: We will need to capture directionality as part of the output for the workflow. By "directionality", I mean that we need to capture which strata is "enriched" for a given phenotype (i.e., has a higher percentage of patients with XXX). The Chi Square statistic that ICEES provides informs one of differences between groups or bins, but it does not provide any information on the directionality of the differences. Relative risks and odds ratios may suffice.

A. ICEES example query

Input:

Feature variables: AvgDailyPM2.5Exposures < 3, TotalEDInpatientVisits < 2
Version of data: 1.0.0
Table: patient
Year: 2010
Cohort ID: COHORT:22

Output:*

+----------------------------+------------------------------+-------------------------------+---------+
| feature | TotalEDInpatientVisits < 2 | TotalEDInpatientVisits >= 2 | |
+============================+==============================+===============================+=========+
| AvgDailyPM2.5Exposure < 3 | 297 91.10% | 29 8.90% | 326 |
| | 5.85% 4.66% | 2.22% 0.45% | 5.11% |
+----------------------------+------------------------------+-------------------------------+---------+
| AvgDailyPM2.5Exposure >= 3 | 4776 78.90% | 1277 21.10% | 6053 |
| | 94.15% 74.87% | 97.78% 20.02% | 94.89% |
+----------------------------+------------------------------+-------------------------------+---------+
| | 5073 | 1306 | 6379 |
| | 79.53% | 20.47% | 100.00% |
+----------------------------+------------------------------+-------------------------------+---------+
+-------------+---------------+
| p_value | chi_squared |
+=============+===============+
| 3.16593e-06 | 28.2841 |
+-------------+---------------+
*AvgDailyPM2.5Exposure <3 range: 1.58, 9.63 µg/m3; AvgDailyPM2.5Exposure >=3 range: 9.63, 17.33 µg/m3; TotalEDInpatientVisits = # emergency department or inpatient visits for a respiratory issue over a one-year ‘study’ period (the example here is for calendar year 2010).

B. COHD example queries

Input: Asthma (ID #317009) and Black or African American (ID #8516)

Output:
{
"concept_2_count": 208438,
"concept_id_1": 317009,
"concept_id_2": 8516,
"concept_pair_count": 11716,
"dataset_id": 2,
"relative_frequency": 0.05620856081904451
}

Input: Asthma (ID #317009) and White (ID #8527)

Output:
{
"concept_2_count": 601167,
"concept_id_1": 317009,
"concept_id_2": 8527,
"concept_pair_count": 29913,
"dataset_id": 2,
"relative_frequency": 0.049758220261591206
}

C. Clinical Profiles links

HAPI-FHIR

Custom Translator JHU Clinical Profiles Build

@karafecho
Copy link
Contributor Author

karafecho commented Mar 29, 2019

See Green/Gamma TranQL implementation of Workflow 5, which is related to Workflow 4, here.

@karafecho
Copy link
Contributor Author

karafecho commented Apr 3, 2019

WORKFLOW INPUT:

See ICEES_FeatureVariables and ICEES_Identifiers here for diagnoses. Note that these docs are updated as new variables are added to the ICEES integrated feature tables.

WORKFLOW (Gamma) QUESTION TEMPLATE:

Note that the second gene hop was added per ROBOKOP Neo4J constraints. If we can avoid this, great; if not, that's fine, too.

{
"name": "Gamma WF4 template",
"natural_question": "disease or phenotypic feature to gene to biological process/activity to gene to drug",
"notes": "",
"machine_question": {
"nodes": [
{
"id": "n0",
"curie": "MONDO:0008300",
"name": "ObesityDx",
"type": "disease or phenotypic feature"
},
{
"id": "n1",
"type": "gene"
},
{
"id": "n2",
"type": "biological_process_or_activity"
},
{
"id": "n3",
"type": "gene"
},
{
"id": "n4",
"type": "drug"
}
],
"edges": [
{
"id": "e0",
"source_id": "n0",
"target_id": "n1"
},
{
"id": "e1",
"source_id": "n1",
"target_id": "n2"
},
{
"id": "e2",
"source_id": "n2",
"target_id": "n3"
}
]
}
}

@karafecho
Copy link
Contributor Author

karafecho commented Apr 3, 2019

ROBOKOP queries and RTX queries are being pre-computed for this workflow using all available ICEES phenotypes/diagnoses. Example ICEES queries are included below as an FYI:

curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"TotalEDInpatientVisits":{"operator":"<", "value":2}},"maximum_p_value":0.1}' -H "Accept: application/json"

curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"ur":{"operator":"=", "value":"U"}},"maximum_p_value":0.1}' -H "Accept: application/json"

curl -k -XPOST https://localhost:8080/1.0.0/patient/2010/cohort/COHORT:22/associations_to_all_features -H "Content-Type: application/json" -d '{"feature":{"Sex2":{"operator":"=", "value":"Male"}},"maximum_p_value":0.1}' -H "Accept: application/json"

@karafecho
Copy link
Contributor Author

karafecho commented Apr 3, 2019

Green/Gamma initial plan is to refine end-to-end execution of WF4 using TranQL, with ICEES/COHD/Clinical Profiles for execution of modules 1-4 input and ROBOKOP/RTX/mediKanren for execution of modules 5-8.

@karafecho karafecho changed the title Workflow 4: Green/Gamma Implementation Workflow 4: Green/Gamma Implementation of Modules 1-4 & 5-8 Apr 3, 2019
@karafecho
Copy link
Contributor Author

Mini-hackathon was held on Friday, April 12, 12-4 pm ET. Topic: Unified Translator-compliant Clinical Knowledge Source API. Attendees: Hao Xu, Richard Zhu, Casey Ta, Steve Cos, and Kara Fecho. Event was successful. Team developed a plan of action and is moving forward with execution of the plan. The unified Translator Clinical Knowledge Source API will foster efforts on Workflows 4 and 5, as well as any efforts related to COHD, Clinical Profiles, and ICEES.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests