-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about frame_difference metric calculation #31
Comments
Hi, 'avoiding the bias' is to avoid the model uses a simple behavior, e.g. always predict after the ground-truth timestamp. The motivation is to ensure the chance equally: (1) predicting in advance, (2) predicting afterwards. The case you provided is correct. The problem does exist. Many thanks for the careful checking! Maybe we can make the evaluation temporal boundary still be the last response between the next response, no min() operation here. BTW, do you have any other advice to improve that? I can update the results on a new arxiv version. |
Thank you for the reply! Another thought: I can't think of a scenario where early response is useful. For example, as shown in Figure 4 of the paper, the user would query something like "Remind me when yellow card appears", then, early response is totally useless since yellow card didn't appear yet. Only the reponse after yellow card occurrence is useful. So I thought only the just-in-time or late responses should be considered for frame_diff calculation. But I'm not sure about this. If there's a scenario where early response can also be useful, please remind me of that. |
Hi, many thanks for the discussion! I think limiting by However, I agree with you that we can evaluate the model performance without using min(). I will update the results here first and add a table in the supplement to suggest that. Currently, I think the metrics are still fair for comparing ablations since all variants use the same evaluation. An early response can also be useful. For example, when a dangerous situation is about to occur, it would be better for the model to report in advance rather than with a delay. |
Thank you so much for the discussion! I really appreciate your help. |
Hi, thanks for the great work!
I have a question about the following line:
videollm-online/models/modeling_live.py
Line 123 in 755e265
This line caps the frame_difference to a certain value. (The comment says 'avoiding the bias' but I don't quite understand it. I would really appreciate more detailed explanation.)
Specifically, when the model fails to reply(=output eos) before current turn ends:
next_turn_num_frames
: it test whether the model can reply until the next turn, not the next-next turn or more future turns. The model may be able to reply in more future turns, so I don't get why you set the maximum to the next turn.number of frames in current turn - 1
: I also don't understand this. In extreme case, if the current turn has 1 frame, it just sets frame_diff to 0.[a] [a] [b] "b appeared" [c] "c appeared" [c] [c] [d] "d appeared"
where [k] is the frame with content k, and the response is in the "".
Turn 1: [a] [a] [b] "b appeared" (num frames = 3)
Turn 2: [c] "c appeared" (num frames = 1)
Turn 3: [c] [c] [d] "d appeared" (num frames = 3)
Let's say we're currently in Turn 2. Then, even if the model fails to reply in the first (and the only) frame [c], the frame_diff becomes 0 becuase
number of frames in current turn - 1 = 0
. I think we have to consider whether model can reply in the frames in Turn 3.The text was updated successfully, but these errors were encountered: