Skip to content

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

License

Notifications You must be signed in to change notification settings

OpenGVLab/VideoChat-Flash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Xinhao Li, Yi Wang, Jiashuo Yu, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, and Limin Wang

🤗 Model & Data    |   🖥️ Demo    |    📑 Paper    |    🌐 Blog

🔥 Updates

🦜 Introduction

🚀State-of-the-art performance in short and long video understanding, with temporal localization capabilities comparable to expert models. alt text 🔭Supports ultra-long video inputs, achieving a groundbreaking needle-in-a-haystack evaluation accuracy of 99.1% on 10,000 frames, capable of processing videos up to three hours long. alt text ⚡Highly efficient model architecture with exceptional inference speed, encoding each video frame into just 16 tokens, making it 5–10 times faster than the previous model. alt text

Demo & Inference

Evaluation

We modify lmms-eval to eval ..

Training

Stage Num. frames ViT Connector LLM Shell
Stage-1 4 ❄️ 🔥 ❄️ TBD
Stage-2 4-8 🔥 🔥 🔥 TBD
Stage-3 64-512 🔥 🔥 🔥 TBD
Stage-4 64-512 🔥 🔥 ❄️ TBD

📊 NIAH

alt text

📄 Citation

If you find this project useful in your research, please consider cite:

@article{li2024videochat,
  title={VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling},
  author={Li, Xinhao and Wang, Yi and Yu, Jiashuo and Zeng, Xiangyu and Zhu, Yuhan and Huang, Haian and Gao, Jianfei and Li, Kunchang and He, Yinan and Wang, Chenting and Qiao, Yu and Wang, Yali and Wang, Limin},
  journal={arXiv preprint arXiv:2501.00574},
  year={2024}
}

💫 Acknowledgement

Thanks to the open source of the following projects: InternVideo, UMT, Qwen, LLaVA-VL, lmms-eval, Ask-Anything, ToMe, LongVLM, FastV, LLaVolta, PyramidDrop, LongVA, their implementation provides valuable reference experience for our project.

About

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •