Skip to content

ChantalMP/RaDialog_LLaVA

Repository files navigation

RaDialog-LLaVA

RaDialog-LLaVA is the improved version of the original RaDialog model which can be found on Github and Arxiv. It follows the same concepts, including the same image encoder, chexbert classifier, prompt construction and language model. However, we followed the LLaVA methodolgy for image-text alignment, leading to improved conversational assistance and making the model easier use. The main differences are the following:

  • image projection: instead of the BLIP-inspired alignment module, we follow the LLaVA approach and use a simple MLP projection to project the image features to the language model input size, leading to more image tokens.
  • the image encoder is fine-tuned during LLM training
  • the model was trained on an updated version of the RaDialog-Instruct dataset with three additional instruct tasks: impression generation, view classification and Rad-ReStruct QA

RaDialog-LLaVA main results:

teaser

✨ News ✨

  • 29 May 2024: RaDialog-LLaVA is now available on Hugging Face

teaser

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.

Getting Started with RaDialog-LLaVA

To test RaDialog and use it for inference, follow the instructions in our huggingface repository here.

For more detailed instructions on how to train and evaluate RaDialog, please refer to the instructions below.

Repository Installation

Environment Setup:

1) RaDialog Environment

  • clone this repository and move to the radialog directory with cd RaDialog_LLaVA
  • Install the RaDialog environment with conda create --name radialog python=3.10
  • Activate the environment with conda activate radialog
  • conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
  • pip install -r requirements.txt
  • Install java and set JAVA_HOME and PATH in local_config.py (we used jre1.8.0)
  • install flashattention following https://github.com/Dao-AILab/flash-attention

2) CheXbert Environment (only needed for CheXbert score evaluation)

  • Install the CheXbert environment with conda create --name chexbert python=3.7
  • Activate the environment with conda activate chexbert
  • Move to the chexbert directory with cd chexbert
  • Install the requirements with pip install -r requirements.txt
  • Set the absolute path to the chexbert env and folder in RaDialog_LLaVA/local_config.py
  • Download chexbert.pth from here and place it in in RaDialog_LLaVA/chexbert/src/checkpoint/

Prepare Data

1) Download the RaDialog-Instruct dataset

  • Download the instruct dataset from PhysioNet
  • unzip it place it in RaDialog_LLaVA/data/
  • in train.sh set the path to the instruct dataset (e.g. --data_path home/RaDialog_LLaVA/data/mimic_cxr_instruct_stratified.json)

2) Download MIMIC-CXR

  • Download the MIMIC-CXR-JPG dataset from here
  • The dataset should be saved in .../physionet.org/files/mimic-cxr-jpg
  • Go to physionet.org/files/mimic-cxr-jpg/files/ and unzip mimic-cxr-2.0.0-split.csv.gz
  • from here, dowload mimic-cxr-reports.zip
  • unzip it and place the folder in the same directory as the MIMIC-CXR-JPG dataset (e.g. physionet.org/files/)
  • in local_config.py set the path to the MIMIC-CXR dataset (e.g. .../physionet.org/files/)
  • in model/lavis/defaults_report.yaml set the path to the MIMIC-CXR-JPG dataset (e.g. .../physionet.org/files/mimic-cxr-jpg/2.0.0 )

3) Create sectioned report data

  • go to the mimic-cxr folder in the code with cd mimic-cxr
  • run python create_section_files.py to prepare the report data
  • go back to the RaDialog_LLaVA directory with cd ..

Evaluate RaDialog on MIMIC-CXR test set:

  • RaDialog-INS: run python -m test --prompt img_matching_examples_ig2_noexamples_IMG_findings --split "test" --vision_tower biovil
  • RaDialog-INS (downstream tasks): add --do_corr, --do_cp_bin_qa, --do_cp_all_qa, --do_view_class or --do_impression respectively to the command above

Train RaDialog:

1) CheXbert classifier Training

  • run python -m findings_classifier.chexpert_train --train --run_name "train_chexbert"
  • in chexpert_train.py set ckpt_path (line 152) to the path of the model you just trained
  • then run python -m findings_classifier.chexpert_train --run_name "save_preds" to save the predictions of the trained model

3) LLM Training

  • move to LLaVA directory with cd LLaVA
  • in train.sh set PYTHONPATH to the path of the RaDialog_LLaVA directory
  • run ./train.sh to start training the model
  • we used checkpoint-21000

Demo

Demo Environment

  • Install the RaDialog demo environment with conda create --name radialog_demo python=3.10
  • Activate the environment with conda activate radialog_demo
  • conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
  • pip install -r requirements_demo.txt

Run Demo:

  • run python demo.py to start the demo
  • connect to the demo with a browser at http://127.0.0.1:7861 and start chatting with RaDialog

Reference

When using our model (original and LLaVA version) or dataset, please cite:

@article{pellegrini2023radialog,
  title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance},
  author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Navab, Nassir and Keicher, Matthias},
  journal={arXiv preprint arXiv:2311.18681},
  year={2023}
}

About

LLaVa Version of RaDialog

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published