Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method of running evaluation on MSR-VTT dataset #58

Open
sartaki opened this issue Aug 6, 2024 · 0 comments
Open

Method of running evaluation on MSR-VTT dataset #58

sartaki opened this issue Aug 6, 2024 · 0 comments

Comments

@sartaki
Copy link

sartaki commented Aug 6, 2024

Thanks for the paper and the open sourcing the code base.

I would like to know how evaluation is performed on the MSR-VTT dataset for zero shot text to video retrieval.

  • Are the metrics reported for MSR-VTT for the entire test split (~ 2990 videos, 59800 captions) or for 1kA subset (~ 1000 videos or 20000 captions)?
    • Section C in Appendix mentions use of the 1KA subset for MSR-VTT. Is this split used to report the results else where?
  • Are each of the caption (20 for each video) used to perform retrieval and find the recall metrics?
  • Are errors being accounted for?
    • As the captions are not too descriptive and similar types of videos / captions exists, how are errors adjusted? For example, one of the caption for video7960 is a band performing in a small club but video8978 fits the same profile. Another caption for the same video video7960 is a group of boys and girls are dancing but video9957 also can be considered correct if retrieved. I will be happy to provide more such examples

Looking forward for your clarification. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant