[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
-
Updated
Jun 4, 2024 - Python
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Simplified local Windows OS setup of MiniGPT-4 running in an Anaconda environment; includes example local server and client.
Streamline the creation of supervised datasets to facilitate data augmentation for deep learning architectures focused on image captioning. The core framework leverages MiniGPT-4, complemented by the pre-trained Vicuna model, which boasts 13 billion parameters.
Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)
MiniGPT-4 : Enhancing Vision-language Understanding with Advanced Large Language Models
About Implementation for paper "InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4" (https://arxiv.org/abs/2308.12067)
Add a description, image, and links to the minigpt4 topic page so that developers can more easily learn about it.
To associate your repository with the minigpt4 topic, visit your repo's landing page and select "manage topics."