Llama Image Captioner

This Python application leverages the OpenRouter API and the Meta-Llama model to generate detailed captions for uploaded images.

Features

User-friendly Gradio interface for image uploads
Option to generate short or long image descriptions
Detailed, accurate, and concise image captions
Powered by the advanced Meta-Llama 3.2 90B Vision Instruct model
Robust error handling and logging for improved debugging

Prerequisites

Before getting started, make sure you have:

Python 3.7 or higher installed
An OpenRouter API key
The following Python libraries: gradio, requests

Installation

Clone this repository:

git clone https://github.com/PierrunoYT/llama-image-captioner.git
cd llama-image-captioner

Install the required packages:
```
pip install -r requirements.txt
```
Create a requirements.txt file with the following content:
```
gradio
requests
```

Set up your environment variables:

On Windows:

setx OPENROUTER_API_KEY "your_api_key_here"
setx YOUR_SITE_URL "https://your-actual-site-url.com"
setx YOUR_APP_NAME "Llama Image Captioner"

On Unix-based systems:

export OPENROUTER_API_KEY="your_api_key_here"
export YOUR_SITE_URL="https://your-actual-site-url.com"
export YOUR_APP_NAME="Llama Image Captioner"

Note: After setting these environment variables, restart your command prompt or terminal for the changes to take effect.

Usage

To run the application:

python ImageCaption.py

This will launch the Gradio interface. Follow these steps:

Upload an image using the provided interface.
Choose the caption length: "Short" for a brief description or "Long" for a detailed analysis.
Click the submit button to generate the image caption.
The app will display the generated description based on your chosen length.

How It Works

The user uploads an image through the Gradio interface.
The image is converted to a base64-encoded string.
A request is sent to the OpenRouter API, which uses the Meta-Llama model.
The API returns a detailed description of the image.
The description is displayed to the user through the Gradio interface.

Contributing

Contributions to this project are welcome! Please follow these steps:

Fork the repository
Create a new branch (git checkout -b feature/AmazingFeature)
Make your changes
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

OpenRouter for providing the API
Meta for the Llama model
Gradio for the user-friendly interface building tools

Project Link

https://github.com/PierrunoYT/llama-image-captioner

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ImageCaption.py		ImageCaption.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama Image Captioner

Features

Prerequisites

Installation

Usage

How It Works

Contributing

License

Acknowledgments

Project Link

About

Releases

Packages

Languages

License

PierrunoYT/llama-image-captioner

Folders and files

Latest commit

History

Repository files navigation

Llama Image Captioner

Features

Prerequisites

Installation

Usage

How It Works

Contributing

License

Acknowledgments

Project Link

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages