This project was implemented by:
- Ashwini Marathe
- Akshay Punwatkar
- Anshupriya Srivastava
- Srishti Saha
and was submitted as our final project for the course ECE 590 (Data Analysis at Scale in the Cloud) at Duke University.
The applications takes an image file as an input and uses a Machine Learning model to generate a caption for the image.
The training and test data for the application and the demo are as follows:
- COCO dataset: link
The model was trained using Tensorflow and the methodology was based on the notebook provided by Google Colab. The basic steps in training this model were as follows:
- Tokenize vocabulary from Training captions data
- Implement a Bahdanau Attention based recurrent neural network
- Use a CNN encoder & RNN decoder to train the model for caption prediction
- Test the model on a test dataset
The model was adjusted and trained to fit our requiremets for the app.
Below is an example of the caption generated by our model for the given input image:
Application also provide REST response, both for local and web-url files:
- Eg. for web URL: curl -X GET -d filepath=https://media.stadiumtalk.com/51/78/5178471c78244562a6fa79e0e14d7a32.jpg 'http://35.243.242.165/predict_api'
- Eg. for local file: curl -X GET -d filepath=sample/image_1.jpg 'http://35.243.242.165/predict_api'
The application was deployed using Kubernetes on Google cloud platform.
Container image from docker hub can be accessed from here: Link to Container Image
Steps to deploy the app on a Kubernetes Engine can be found here
Post deployment the app could be accessed on the link: http://34.71.22.23:8088
We used the Locust Software on the Google Kubernetes engine to check our app for load testing. For this, we followed the step-by-step tutorial given here.
Further details on the performance can be seen in the demo video linked below.
Link to the video: Here