Skip to content

Commit

Permalink
clunch: 9/23
Browse files Browse the repository at this point in the history
  • Loading branch information
zhudotexe committed Sep 19, 2024
1 parent 0be481f commit 2c832a8
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions _data/future_clunch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,13 @@
title: TBD
abstract: TBD

- speaker: Reno Kriz
url: https://hltcoe.jhu.edu/researcher/reno-kriz/
affiliation: Johns Hopkins University Human Language Technology Center of Excellence (HLTCOE)
date: September 23, 2024
title: "Takeaways from the SCALE 2024 Workshop on Video-based Event Retrieval"
abstract: "Information dissemination for current events has traditionally consisted of professionally collected and produced materials, leading to large collections of well-written news articles and high-quality videos. As a result, most prior work in event analysis and retrieval has focused on leveraging this traditional news content, particularly in English. However, much of the event-centric content today is generated by non-professionals, such as on-the-scene witnesses to events who hastily capture videos and upload them to the internet without further editing; these are challenging to find due to quality variance, as well as a lack of text or speech overlays providing clear descriptions of what is occurring. To address this gap, SCALE 2024, a 10-week research workshop hosted at the Human Language Technology Center of Excellence (HLTCOE), focused on multilingual event-centric video retrieval, or the task of finding relevant videos about specific current events. Around 50 researchers and students participated in this workshop and were split up into five sub-teams. The Infrastructure team focused on developing MultiVENT 2.0, a challenging video retrieval dataset consisting of 20x more videos than prior work and targeted queries about specific world events across six languages. Other teams worked on improving models from specific modalities, specifically Vision, Optical Character Recognition (OCR), Audio, and Text. Overall, we came away with three primary findings: extracting specific text from a video allows us to take better advantage of powerful methods from the text information retrieval community; LLM summarization of initial text outputs from videos is helpful, especially for noisy text coming from OCR; and no one modality is sufficient, with fusing outputs from all modalities resulting in significantly higher performance."

- speaker: Ajay Patel
url: https://ajayp.app/
affiliation: University of Pennsylvania
Expand Down

0 comments on commit 2c832a8

Please sign in to comment.