Pic Scribe

Example Inferene

Demo Video(testing app on iPhone 13 Pro)

About the project

This project implements an image captioning model. The model uses a Vision Transformer (ViT) as the encoder and a Transformer decoder as the decoder. It's trained on the COCO2017 dataset. The trained models are then deployed to an iOS app built with SwiftUI. You can test the app by cloning this repository or just try out the model with uplaoded weights.

More Details about the model

The image is first processed by a ViT_b_32 model (with the classification layer removed). This outputs a tensor of size (N, 768). This tensor is then reshaped to (N, 32, 768) to be compatible with the CrossMultiHeadAttention block in the decoder. The decoder itself has 44.3 million parameters. It takes an input with a shape of (N, max_length=32) and outputs a tensor with a shape of (N, max_length=32, vocab_size). BERT tokenizer is used for text preprocessing.

For more details about training the model, loading model with weights and doing inference, please check out this notebook: Google Colab

Converting PyTorch Models to CoreML

For details on converting these models into CoreML models and testing them, please refer to the following resources:

Converting two PyTorch models into CoreML models

CoreML model - encoder

CoreML model - decoder

How to use this project

Create a folder on your local machine and navigate to it using your terminal.
Clone this repository using the following command:

git clone https://github.com/seungjun-green/PicScribe.git

Navigate to the project directory:

cd PicScribe/Pic_Scribe.xcodeproj

Open the Pic Scribe.xcodeproj file using Xcode.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Pic Scribe		Pic Scribe
.gitattributes		.gitattributes
.gitignore		.gitignore
Convert_PyTorch_Models_to_CoreML_Models.ipynb		Convert_PyTorch_Models_to_CoreML_Models.ipynb
Make Image Captioner Model.ipynb		Make Image Captioner Model.ipynb
README.md		README.md
checkpoint_epoch_0_batch_2400.pth		checkpoint_epoch_0_batch_2400.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pic Scribe

About the project

More Details about the model

Converting PyTorch Models to CoreML

How to use this project

About

Releases

Packages

Languages

seungjun-green/PicScribe

Folders and files

Latest commit

History

Repository files navigation

Pic Scribe

About the project

More Details about the model

Converting PyTorch Models to CoreML

How to use this project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages