ReXKG is a system that extracts structured information from processed reports to construct a comprehensive radiology knowledge graph.
src/
├── data/
│ └── chexpert_plus/
│ ├── ──df_chexpert_plus_onlyfindings.csv
├── ner/
│ ├── data/
│ ├── entity/
│ ├── relation/
│ ├── shared/
│ ├── run_entity.py
│ └── run_relation.py
└── kg_construct/
└── code/
└── result/
The ReXKG system consists of three main components:
- Information Extraction System
- Node Construction
- Edge Construction
We use the entity extraction method proposed by PURE for our information extraction system.
conda env create -f environment.yml
-
Data Preparation: Annotate data with GPT4, split it into train and test
./src/ner/data
Runpython gpt4_entity_extraction.py
andpython gpt4_relation_extraction.py
Runpython structure_data.py
to convert report data into the format used by PURE for training. -
Entity Extraction:
./src/ner
Runsh run_entity.sh
to train the entity extraction model. -
Relation Extraction:
./src/ner
Runsh run_relation.sh
to train the relation extraction model. -
Inference:
./src/ner/data
You can also download model checkpoint from Google Drive to ./result/ Convert data file in to the test format withpython get_inference_data.py
./src/ner
Runsh inference.sh
to perform inference on the entire dataset. -
Data Post-processing: Run
python ./result/run_relation/reverse_structure_data.py
to prepare the data for node construction and edge construction.
./src/kg_construct/code
-
run
sh auto_build_kg.sh
to get kg atresult
-
We give an example of obtained kg files in
./src/kg_construct/result
If you use this code for your research or project, please cite:
@article{zhang2024uncovering,
title={Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs},
author={Zhang, Xiaoman and Zhou, Hong-Yu and N. Acosta, Juli´an and Rajpurkar, Pranav},
journal={arXiv:2408.14397},
year={2024},
}
If you have any question, please feel free to contact.