This project aims to develop image captioning models using recurrent neural networks (RNNs) and attention mechanisms used in LSTM. The models are trained on the COCO dataset, a large-scale dataset of images with corresponding captions.
The project consists of the following components:
- Data downloading and preprocessing
- Minibatch visualization
- Overfitting test
- RNN model training
- Attention LSTM model training
- Result visualization
To run the project, follow these steps:
- Install the required dependencies.
- Clone the repository.
- Download coco.py and save in datasets directory
- Use the ipynb notebook to train the models and show the result.
We can clearly see that the results on the train data are much better then on the val data.
This project provided a comprehensive overview of image captioning models and their implementation. The user gained hands-on experience with data preprocessing, model training, and result visualization. The project also highlighted the importance of attention mechanisms in improving the performance of image captioning models.