-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting info from the H5 files #32
Comments
Ok, perhaps I am getting to something:
This gives:
My interpretation is that video zv0Jl4TIQDc has three intervals annotated with the relative weights of Ekman's basic emotions. Is that correct? If that is the case, what would be the mapping of the emotions? What is the highest possible value for a given emotion? |
So column zero is the Likert score and then the other columns would be, in this order, {happiness, sadness, anger, fear, disgust, surprise} ? |
The issue with this interpretation is that segment 0 above would have been labelled with happiness and anger in similar amounts... |
Or is it (Anger Disgust Fear Happy Sad Surprise) as in Table 3? Then it would be Anger and Fear, which is more consistent, but the sentiment would be slightly positive... |
Checking the entries with the most negative and positive sentiment, it seems to be {happiness, sadness, anger, fear, disgust, surprise} |
I have forked MOSEI to build a unimodal SER dataset: |
Hello,
I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.
I would be recomputing the audio embeddings.
So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.
How would I go about extracting this information?
Thanks,
Ed
The text was updated successfully, but these errors were encountered: