diff --git a/README.md b/README.md index a55efa8b..7d30fb60 100644 --- a/README.md +++ b/README.md @@ -16,8 +16,9 @@ VILA is a visual language model (VLM) pretrained with interleaved image-text dat ## 💡 News -- [2024/05] We release [AWQ](https://arxiv.org/pdf/2306.00978.pdf)-quantized 4bit VILA-1.5 models supported by [TinyChat](https://github.com/mit-han-lab/llm-awq/tree/main/tinychat) and [TensorRT-LLM](demo_trt_llm) backends. -- [2024/05] We release VILA-1.5, which comes with four model sizes (3B/8B/13B/40B) and offers native support for multi-image and video understanding. +- [2024/05] We release VILA-1.5, which offers **video understanding support** and comes with four model sizes (3B/8B/13B/40B). +- [2024/05] We release [AWQ](https://arxiv.org/pdf/2306.00978.pdf)-quantized 4bit VILA-1.5 models supported by [TinyChat](https://github.com/mit-han-lab/llm-awq/tree/main/tinychat) and [TensorRT-LLM](demo_trt_llm) backends. +- [2024/03] VILA has been accepted by CVPR 2024! - [2024/02] We release [AWQ](https://arxiv.org/pdf/2306.00978.pdf)-quantized 4bit VILA models, deployable on Jetson Orin and laptops through [TinyChat](https://github.com/mit-han-lab/llm-awq/tree/main/tinychat) and [TinyChatEngine](https://github.com/mit-han-lab/TinyChatEngine). - [2024/02] VILA is released. We propose interleaved image-text pretraining that enables multi-image VLM. VILA comes with impressive in-context learning capabilities. We open source everything: including training code, evaluation code, datasets, model ckpts. - [2023/12] [Paper](https://arxiv.org/abs/2312.07533) is on Arxiv! @@ -224,7 +225,7 @@ python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-3b \ --conv-mode vicuna_v1 \ --query "