This repository constitutes an implementation of an image captioner for large datasets, aiming to streamline the creation process of supervised datasets to aid in the data augmentation procedure for image captioning deep learning architectures.
The foundational framework utilized is the MiniGPT-4, supplemented by the pre-trained Vicuna model boasting 13 billion parameters.
You must have a GPU-enabled machine with a memory capacity of at least 23 GB.
git clone https://github.com/neemiasbsilva/MiniGPT-4-image-caption-implementation.git
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
cd MiniGPT-4
conda env create -f environment.yml
conda activate minigptv
conda install pandas
mv MiniGPT-4/* ../.
In the shell file (run.sh
) you have to specify:
data_path
: the path where your image dataset are.beam_search
: hyperparameter that is a range 0 to 10;temperature
: hyperparameter (between 0.1 to 1.0);save_path
: local you have to save your caption data set.
-
Download the Vicuna 13 B
-
Set the LLM path
minigpt4/configs/models/minigpt4_vicuna0.yaml
in Line 15.llama_model: "vicuna"
-
Download the MiniGPT-4 Checkpoint Model
-
Set the LLM path
eval_configs/minigpt4_eval.yaml
in Line 8.ckpt: pretrained_minigpt4.pth
sh run.sh