All the code has been run and tested on:
- Python 2.7.15 (coco-caption requires 2.7)
- Pytorch 1.0.0
- CUDA 9.0
- TITAN X/Xp and GTX 1080Ti GPUs
First clone the repository:
git clone https://github.com/shenkev/Caption-Images-through-a-Lifetime-by-Asking-Questions.git
- Go into the downloaded code directory
- Add the project to PYTHONPATH
cd <path_to_downloaded_directory>
export PYTHONPATH=$PWD
chmod +x setup.sh
./setup.sh
This will:
- Install python dependencies
- Download Stanford NLP package for parsing part-of-speech
- Download coco-caption
- Download pyciderevalcap
-
Download the images from this link. We need the 2014 training images and 2014 val images.
-
You should put the train2014/ and val2014/ in a directory of your choice, denoted as
$IMAGE_ROOT
. -
Download pretrained resnet model from here and place in Utils/preprocess/checkpoint
-
Preprocess images the images by running
python Utils/preprocess/preprocess_imgs.py --input_json Data/annotation/dataset_coco.json --output_dir $IMAGE_ROOT/features --images_root $IMAGE_ROOT
Warning: the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See this issue for the fix, it involves manually replacing one image in the dataset.
- Download training data here
- Unzip it into Data/annotation
- Precompute indexes for CIDEr
python Utils/preprocess/preprocess_cider.py --data_file Data/annotation/cap_train.p --output_file Data/annotation/coco-words
- Prepare lifelong learning data splits
python Utils/preprocess/preprocess_llsplits.py --data_file Data/annotation/cap_train.p --output_file Data/annotation/train3_split --warmup 3 --num_splits 4 --num_caps 2
- You can play with the chunk sizes and # chunks using
warmup
andnum_splits
parameters
- You can either download trained caption, question generator, VQA modules or train them yourself
- You can download trained Caption, Question generator, VQA modules
- Download model checkpoints here
- Place in Data/model_checkpoints
- The captioning module was trained using 10% warmup data
- Train caption module
- In
Experiments/caption.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
python Scripts/train_caption3.py --experiment Experiments/caption3.json
- Train VQA module
- In
Experiments/vqa.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
python Scripts/train_vqa.py --experiment Experiments/vqa.json
- Train question generator module
- In
Experiments/question3.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
,vqa_path
to vqa model checkpoint andcap_path
to caption model checkpoint
python Scripts/train_quegen.py --experiment Experiments/question3.json
-
In
Experiments/lifelong3.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
,vqa_path
to vqa model checkpoint andcap_path
to caption model checkpoint,quegen_path
to question generator model checkpoint -
You can play with parameters
H, lamda, k
python Scripts/train_lifelong.py --experiment Experiments/lifelong3.json
- Track training
cd Results/lifelong
tensorboard --logdir tensorboard/
- Visualize qualitative results
cd Results/lifelong/lifelong3