Skip to content

BLIP Live Image Captioning with Real-Time Video Stream This repository provides a Python-based implementation for real-time image captioning using the BLIP (Bootstrapped Language-Image Pretraining) model. The program captures live video from a webcam.

License

Notifications You must be signed in to change notification settings

zawawiAI/BLIP_CAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BLIP CAM:Self Hosted Live Image Captioning with Real-Time Video Stream 🎥

This repository implements real-time image captioning using the BLIP (Bootstrapped Language-Image Pretraining) model. The system captures live video from your webcam, generates descriptive captions for each frame, and displays them in real-time along with performance metrics.

🚀 Features

  • Real-Time Video Processing: Seamless webcam feed capture and display with overlaid captions
  • State-of-the-Art Captioning: Powered by Salesforce's BLIP image captioning model (blip-image-captioning-large)
  • Hardware Acceleration: CUDA support for GPU-accelerated inference
  • Performance Monitoring: Live display of:
    • Frame processing speed (FPS)
    • GPU memory usage
    • Processing latency
  • Optimized Architecture: Multi-threaded design for smooth video streaming and caption generation

📋 Requirements

  • Python 3.8+
  • NVIDIA GPU (optional, for CUDA acceleration)
  • Webcam

Core Dependencies

opencv-python>=4.5.0
torch>=1.9.0
transformers>=4.21.0
Pillow>=8.0.0

🛠️ Installation

  1. Clone the repository:
git clone https://github.com/zawawiAI/BLIP_CAM.git
cd BLIP_CAM
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python BLIP_CAM.py

💡 Use Cases

  • Accessibility Tools: Real-time scene description for visually impaired users
  • Content Analysis: Automated video content understanding and tagging
  • Smart Conferencing: Enhanced video calls with automatic scene descriptions
  • Educational Tools: Visual learning assistance and scene comprehension
  • Security Systems: Intelligent surveillance with scene description capabilities

🎮 Usage Controls

  • Press Q to quit the application
  • Press S to save the current frame with caption
  • Press P to pause/resume caption generation

🔧 Configuration

The application can be customized through the following parameters in config.py:

  • Frame processing resolution
  • Caption update frequency
  • GPU memory allocation
  • Model confidence threshold
  • Display preferences

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Salesforce for the BLIP model
  • PyTorch team for the deep learning framework
  • Hugging Face for the transformers library

📧 Contact

For questions and support, please open an issue in the GitHub repository or reach out to the maintainers.


⭐ If you find this project useful, please consider giving it a star!

blip-2025-01-02_09.online-video-cutter.com.2.1.mp4

About

BLIP Live Image Captioning with Real-Time Video Stream This repository provides a Python-based implementation for real-time image captioning using the BLIP (Bootstrapped Language-Image Pretraining) model. The program captures live video from a webcam.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages