Exploring Uni-modal, Cross-modal, and Multi-modal Diagnostic Captioning
Recent years have witnessed an increase in studies associated with image captioning, but little of that knowledge has been utilised in the biomedical field. This thesis addresses medical image captioning, referred as Diagnostic Captioning (DC), the task of assisting medical experts in diagnosis/report drafting. We present deep learning uni-modal, cross-modal and multi-modal methods that aim to generate a representative ``diagnostic text'' for a given medical image. The multi-modal approaches, utilise the radiology concepts (tags) used by clinicians to describe a patient's image (e.g., X-Ray, CT scan, etc.) as an additional input data. These methods, have not been adequately applied to biomedical research. We also experimented with a novel technique that utilises the captions generated from all the systems implemented as part of this thesis. Lastly, this thesis concerns the participation of AUEB's NLP Group, with the author being the main driver, on the 2022 ImageCLEFmedical Caption Prediction task. Out of 10 teams, our team came in second based on the primary evaluation metric, using an encoder-decoder approach, and first based on the secondary metric, utilising an ensemble technique applied on our generated captions. More about our paper can be found here
If you have GPU installed on your system, it is highly suggested to use conda as your virtual enviroment to run code. You can download conda from here
After the installation is completed, open a terminal inside this project and run the following commands, to setup conda enviroment. The latter will be compatible with Tensorflow.
1. conda create --name tf_gpu
2. activate tf_gpu
3. conda install tensorflow-gpu
4. pip install -r requirements.txt
If you decide to use ClinicalBERT as the main text embeddings extraction model, you have to execute the dc.py
in Pytorch-based enviroment. Thus, follow the next steps:
1. conda create --name torch_gpu
2. activate torch_gpu
3. conda install torch-gpu
4. pip install -r requirements.txt
Now, your environment will be compatible with Pytorch. Then comment-out the imports from models/__init__.py
and models/kNN.py
As mentioned in the Abstract
section, I participated in ImageCLEFmedical 2022 Caption Prediction task. The code also handles ImageCLEF dataset, but the latter as well as evaluation measures are not provided, due to the fact that we, as a group, signed an End User Agreement. Thus, only the IU X-Ray dataset is available. Go to Datasets, download the dataset (i.e. IU X-Ray) and store it to the data
directory
You have to have something like this:
.
├── data
│ ├── iu_xray
| | ├──two_captions.json
| | ├──two_images.json
| | ├──two_tags.json
| | └──densenet121.pkl
| |
| ├──fasttext_voc.pkl
| └──fasttext.npy
Throughout my research on this Thesis, I experimented with models that had state-of-the-art performance (SOTA) on several biomedical datasets (like IU X-Ray, MIMIC III. etc.). These models are provided in SOTA_models
directory as submodules repos. More details about each model are provided on my Thesis paper. I do not provide any additional data loaders, which I created for this models. Thus, if you want to further experiment with these models, please do so according to the guidelines provided in each of these repositories.
Follow the aforementioned steps to use conda and run the following command, to train my implemented methods (i.e. CNN-RNN, kNN). Default arguments are set.
python3 dc.py
For arguments passing, run the following command in order to watch the available args.
python3 dc.py -h
It is suggested to use a Unix-like OS (like Linux) to execute the following specific processes or using WSL in Windows OS.
- Cross-modal CNN-RNN:
bash cross_modal_cnn_rnn.sh
- Multi-modal CNN-RNN:
bash multi_modal_cnn_rnn.sh
- Cross-modal k-NN:
bash cross_modal_kNN.sh
- Multi-modal CNN-RNN:
bash multi_modal_kNN.sh
If you use or extend my work, please cite my paper.
@unpublished{Zachariadis2022,
author = "G. Zachariadis",
title = "Exploring Uni-modal, Cross-modal, and Multi-modal Diagnostic Captioning",
year = "2022",
note = "B.Sc. thesis, Department of Informatics, Athens University of Economics and Business}
}
You can read our publication "AUEB NLP Group at ImageCLEFmedical Caption 2022", Proceedings of the CLEF 2022 at this link. If you use or extend our work, please cite our paper:
@article{charalampakos2022aueb,
title={Aueb nlp group at imageclefmedical caption 2022},
author={Charalampakos, Foivos and Zachariadis, Giorgos and Pavlopoulos, John and Karatzas, Vasilis and Trakas, Christoforos and Androutsopoulos, Ion},
year={2022}
}