I am a Master of HUST (Huazhong University of Science and Technology), supervised by Nong Sang.
π Reseach-wise, I mainly focus on:
- Multi-modal Large Language Models
- Video Understanding, more specifically, Weakly-supervised Temporal Action Localization (WSTAL) & Weakly-suervised Video Anomaly Detection (WSVAD).
π I am open to:
- A internship/job/PhD offer with computer vision/multimodal LLM research and engineering.
π« Contact me by:
- Email: [email protected]
π¬ News:
- 2024-07-01: We release our code and model of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM".[project page]
- 2024-06-10: We release our code and model of "Arcana: Improving Multi-modal Large Language Model through Boosting Vision Capabilities".[project page]
- 2024-01-29: I start my internship in Baidu VIS, to do some research on Multi-modal Large Language Model (MLLM).
- 2023-12-09: One paper about point supervised temporal action localization is accepted on AAAI 2024.