Skip to content

Latest commit

 

History

History
124 lines (85 loc) · 8.29 KB

README.md

File metadata and controls

124 lines (85 loc) · 8.29 KB

🦜 VideoChat Family: Ask-Anything

Open in OpenXLab | | | |
Open in Spaces [VideoChat-7B-8Bit] End2End ChatBOT for video and image. Open in Spaces [InternVideo2-Chat-8B-HD]

中文 README 及 中文交流群 | Paper

⭐️: We are also working on a updated version, stay tuned!

🔥 Updates

  • 2024/06/25: We release the branch of videochat2 using vllm, speed up the inference of videochat2.

  • 2024/06/19: 🎉🎉 Our VideoChat2 achieves the best performances among the open-sourced VideoLLMs on MLVU, a multi-task long video understanding benchmark.

  • 2024/06/13: Fix some bug and give testing scripts/

  • 2024/06/07: 🔥🔥🔥 We release VideoChat2_HD, which is fine-tuned with high-resolution data and is capable of handling more diverse tasks. It showcases better performance on different benchmarks, especially for detailed captioning. Furthermore, it achieves 54.8% on Video-MME, the best score among 7B MLLMs. Have a try! 🏃🏻‍♀️🏃🏻

  • 2024/06/06: We release VideoChat2_phi3, a faster model with robust performaces.

  • 2024/05/22: We release VideoChat2_mistral, which shows better capacity on diverse tasks (60.4% on MVBench, 78.6% on NExT-QA, 63.8% on STAR, 46.4% on TVQA, 54.4% on EgoSchema-full and 80.5% on IntentQA). More details have been updated in the paper.

  • 2024/04/05 MVBench is selected as Poster (Highlight)!

  • 2024/2/27 MVBench is accepted by CVPR2024.

  • 2023/11/29 VideoChat2 and MVBench are released.

  • 2023/05/11 End-to-end VideoChat and its technical report.

    • VideoChat1: Instruction tuning for video chatting (also supports image one).
    • Paper: We present how we craft VideoChat with two versions (via text and embed) along with some discussions on its background, applications, and more.
  • 2023/04/25 Watch videos longer than one minute with chatGPT

  • 2023/04/21 Chat with MOSS

  • 2023/04/20: Chat with StableLM

  • 2023/04/19: Code release & Online Demo

🔨 Getting Started

Build video chat with:

english.mp4
intro.mp4

📄 Citation

If you find this project useful in your research, please consider cite:

@article{2023videochat,
  title={VideoChat: Chat-Centric Video Understanding},
  author={KunChang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, and Yu Qiao},
  journal={arXiv preprint arXiv:2305.06355},
  year={2023}
}

@inproceedings{li2024mvbench,
  title={Mvbench: A comprehensive multi-modal video understanding benchmark},
  author={Li, Kunchang and Wang, Yali and He, Yinan and Li, Yizhuo and Wang, Yi and Liu, Yi and Wang, Zun and Xu, Jilan and Chen, Guo and Luo, Ping and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={22195--22206},
  year={2024}
}

🌤️ Discussion Group

If you have any questions during the trial, running or deployment, feel free to join our WeChat group discussion! If you have any ideas or suggestions for the project, you are also welcome to join our WeChat group discussion!

image

We are hiring researchers, engineers and interns in General Vision Group, Shanghai AI Lab. If you are interested in working with us, please contact Yi Wang ([email protected]).