This repository includes the implementation for Image Captioning with End-to-End Attribute Detection and Subsequent Attributes Prediction (IEEE TIP, 2020, vol 29).
- Python 2.7
- Java 1.8.0
- PyTorch 0.4.0
- cider (already been added as a submodule)
- coco-caption (already been added as a submodule)
- tensorboardX
See details in data/README.md
.
You should also preprocess the dataset and get the cache for calculating cider score for SCST:
$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk_attr.json --output_pkl data/coco-train-new --split train
$ bash train.sh
We train our model on 2 TitanXp GPUs, you can change the batch_size and gpu_nums in xe_train.sh to train the model on your own hardware.
See opts.py
for the options. The pretrained models can be downloaded here.
You should enter the model id and checkpoint number in eval.py before evaluation. Note that the beam size can only be altered in AttModel_MAD_SAP.py and CaptionModel.py manually. This is because opt is not compatible with multi-GPU training.
$ CUDA_VISIBLE_DEVICES=0 python eval.py --num_images -1 --language_eval 1 --batch_size 100 --split test
If you find this repo helpful, please consider citing:
@ARTICLE{huang2020madsap,
author={Y. {Huang} and J. {Chen} and W. {Ouyang} and W. {Wan} and Y. {Xue}},
journal={IEEE Transactions on Image Processing},
title={Image Captioning With End-to-End Attribute Detection and Subsequent Attributes Prediction},
year={2020},
volume={29},
number={},
pages={4013-4026},
doi={10.1109/TIP.2020.2969330},
ISSN={1941-0042},
month={},}
This repository is based on self-critical.pytorch, and you may refer to it for more details about the code.