📄 Citation

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Xinhao Li, Yi Wang, Jiashuo Yu, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, and Limin Wang

🤗 Model & Data ｜ 🖥️ Demo | 📑 Paper | 🌐 Blog

🔥 Updates

2025/01/12: 🔥🔥🔥Release VideoChat2-Flash, a powerfull MLLM built on video encoder (UMT) and LLM (Qwen).
- We offer three models, VideoChat2-Flash-2B@224, VideoChat2-Flash-7B@224 and VideoChat2-Flash-7B@448.
Dataset and evaluation codes for single-hop and multi-hop needle-in-a-haystack;
Dataset and training codes.

🦜 Introduction

🚀State-of-the-art performance in short and long video understanding, with temporal localization capabilities comparable to expert models. 🔭Supports ultra-long video inputs, achieving a groundbreaking needle-in-a-haystack evaluation accuracy of 99.1% on 10,000 frames, capable of processing videos up to three hours long. ⚡Highly efficient model architecture with exceptional inference speed, encoding each video frame into just 16 tokens, making it 5–10 times faster than the previous model.

Demo & Inference

Evaluation

We modify lmms-eval to eval ..

Training

Instruction Data

Stage	Num. frames	ViT	Connector	LLM	Shell
Stage-1	4	❄️	🔥	❄️	TBD
Stage-2	4-8	🔥	🔥	🔥	TBD
Stage-3	64-512	🔥	🔥	🔥	TBD
Stage-4	64-512	🔥	🔥	❄️	TBD

📊 NIAH

📄 Citation

If you find this project useful in your research, please consider cite:

@article{li2024videochat,
  title={VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling},
  author={Li, Xinhao and Wang, Yi and Yu, Jiashuo and Zeng, Xiangyu and Zhu, Yuhan and Huang, Haian and Gao, Jianfei and Li, Kunchang and He, Yinan and Wang, Chenting and Qiao, Yu and Wang, Yali and Wang, Limin},
  journal={arXiv preprint arXiv:2501.00574},
  year={2024}
}

💫 Acknowledgement

Thanks to the open source of the following projects: InternVideo, UMT, Qwen, LLaVA-VL, lmms-eval, Ask-Anything, ToMe, LongVLM, FastV, LLaVolta, PyramidDrop, LongVA, their implementation provides valuable reference experience for our project.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
img		img
test		test
.gitattributes		.gitattributes
BENCH.md		BENCH.md
DATA.md		DATA.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

🔥 Updates

🦜 Introduction

Demo & Inference

Evaluation

Training

Instruction Data

📊 NIAH

📄 Citation

💫 Acknowledgement

About

Releases

Packages

Contributors 3

License

OpenGVLab/VideoChat-Flash

Folders and files

Latest commit

History

Repository files navigation

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

🔥 Updates

🦜 Introduction

Demo & Inference

Evaluation

Training

Instruction Data

📊 NIAH

📄 Citation

💫 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages