Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
llava-7b-v1_caption.py		llava-7b-v1_caption.py
metafile.yml		metafile.yml

README.md

LLaVA

Visual Instruction Tuning

Abstract

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.

How to use it?

Prepare the checkpoint

According to the license of LLaMA, we cannot provide the merged checkpoint directly. Please use the below script to download and get the merged the checkpoint.

python tools/model_converters/llava-delta2mmpre.py huggyllama/llama-7b liuhaotian/LLaVA-Lightning-7B-delta-v1-1 ./LLaVA-Lightning-7B-delta-v1-1.pth

Use the model

import torch
from mmpretrain import get_model, inference_model

model = get_model('llava-7b-v1_caption', pretrained='MERGED_CHECKPOINT_PATH', device='cuda')
out = inference_model(model, 'demo/cat-dog.png')
print(out)
# {'pred_caption': 'In the image, there are two cats sitting on a blanket.'}

Test Command

Prepare your dataset according to the docs.

Test:

python tools/test.py configs/llava/llava-7b-v1_caption.py MERGED_CHECKPOINT_PATH

Models and results

Image Caption on COCO

Model	Params (M)	BLEU-4	CIDER	Config	Download
`llava-7b-v1_caption`	7045.82	Upcoming	Upcoming	config	See the above tutorial

Citation

@misc{liu2023llava,
      title={Visual Instruction Tuning},
      author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
      publisher={arXiv:2304.08485},
      year={2023},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llava

llava

README.md

LLaVA

Abstract

How to use it?

Models and results

Image Caption on COCO

Citation

Files

llava

Directory actions

More options

Directory actions

More options

Latest commit

History

llava

Folders and files

parent directory

README.md

LLaVA

Abstract

How to use it?

Models and results

Image Caption on COCO

Citation