AwareDX: Using machine learning to identify drugs posing increased risk of adverse reactions to women
Adverse drug reactions (ADRs) are the fourth leading cause of death in the US. Although women take longer to metabolize medications and experience twice the risk of developing ADRs compared to men, these sex differences are not comprehensively understood. Real-world clinical data provides an opportunity to estimate safety effects in otherwise understudied populations, ie. women. These data, however, are subject to confounding biases and correlated covariates. We present AwareDX, a pharmacovigilance algorithm that leverages advances in machine learning to study sex risks. Our algorithm mitigates these biases and quantifies the differential risk of a drug causing an adverse event in either men or women. We present a resource of 20,817 adverse drug effects posing sex specific risks. We independently validated our algorithm against known pharmacogenetic mechanisms of genes that are sex-differentially expressed. AwareDX presents an opportunity to minimize adverse events by tailoring drug prescription and dosage to sex.
To be able to run this project, it is necessary access to two databases, OpenFDA and AWAREdx.
OpenFDA tables are created using the following repository ngiangre/openFDA_drug_event_parsing.
- drugcharacteristics
- drugs
- drugs_atc
- patient
- reactions
- report
- report_serious
- reporter
- standard_drugs
- standard_drugs_atc: in order to create this table it is necessary an extra mapping from RxNorm - atc, not present in the aforementioned repository
- standard_drugs_rxnorm_ingredients
- standard_reactions
- standard_reactions_meddra_hlgt
- standard_reactions_meddra_hlt
- standard_reactions_meddra_relationships
- standard_reactions_meddra_soc
- standard_reactions_snomed
This tables are subsets from the OpenFDA database combined with additional information like CONCEP table from OMOP data structure. atc 4 and 5 are extracted from CONCEPT table.
- atc_4_name
- atc_5_id (concept_id)
- atc_name
- atc_5_name
- atc_5_id (concept_id)
- atc_name
- atc_5_patient: extracted from standard_drugs_atc
- PID (safetyreportid)
- atc_5_id
- atc_5_patient_psm: this table includes atc_5_patient and adds all the PIDs that do not match to atc5(mapping it to 0), making a table with all PIDS for psm.
- PID (safetyreportid)
- atc_5_id
- pt_patient: extracted from standard_reactions
- PID
- meddra_concept_id
- pt_name: extracted from standard_reactions
- meddra_concept_id
- meddra_concept_name
- hglt_patient: extracted from standard_reactions_meddra_hlgt
- PID
- meddra_concept_id
- hlgt_name: extracted from standard_reactions_meddra_hlgt
- meddra_concept_id
- meddra_concept_name
- soc_patient: extracted from reactions_meddra soc
- PID
- meddra_concept_id
- soc_name: extracted from reactions_meddra soc
- meddra_concept_id
- meddra_concept_name
It is necessary to have access to the following OMOP concept tables stored as CSV:
- CONCEPT
- CONCEPT_ANCESTOR
- CONCEPT_RELATIONSHIP
We need the following folder setup:
├── Code
├── Data
│ ├── Ad_Hoc
│ ├── PSM_features
│ ├── PSM_models
│ │ └── RF2
│ ├── Results
│ └── Status
└── Results
To run the code, it is necessary to install the requirements: pip install -r requirements.txt
Then, we need to ensure we set the correct information in a config.ini file for the database connection.
Finally run: python3 Code/pipeline.py