Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSR performance lower on MuAViC version of LRS3 (En) #13

Open
roudimit opened this issue Nov 13, 2023 · 2 comments
Open

VSR performance lower on MuAViC version of LRS3 (En) #13

roudimit opened this issue Nov 13, 2023 · 2 comments

Comments

@roudimit
Copy link

roudimit commented Nov 13, 2023

Hi, thanks for your nice work! I preprocessed the MuAViC dataset according to the instructions. I already had LRS3 processed according to the AV-HuBERT instructions, so I wanted to test if a pre-trained model would get the same performance on both the AV-HuBERT dataset version and the MuAViC version of LRS3.

I first tried ckpt=large_noise_pt_noise_ft_433h.pt from AV-HuBERT, and ran this command:

python -B infer_s2s.py --config-dir ./conf/ --config-name s2s_decode.yaml \
  dataset.gen_subset=test common_eval.path=${ckpts_dir}/${ckpt} \
  common_eval.results_path=${exp_dir}/av-hubert/decode/s2s/test \
  override.modalities=['audio', 'video'] override.data=${lrs3_dir}/30h_data override.label_dir=${lrs3_dir}/30h_data common.user_dir=`pwd`

Using the AV-HuBERT version of LRS3:

  • 433 audio-visual: 1.486
  • 433h audio-only: 1.951
  • 433h video-only: 34.135

Using the MuAViC version of LRS3:

  • 433 audio-visual: 1.496 (slightly worse)
  • 433h audio-only: 1.951 (the same)
  • 433h video-only: 35.995 (noticeably worse)

It seems that the AV-HuBERT checkpoint got worse performance on the MuAViC data versions whenever video is involved.

I also tried running the MuAViC decoding script using the MuAViC English checkpoint on the MuAViC version of LRS3 and got the following performance:

  • 433 audio-visual: 2.1941
  • 433h audio-only: 3.22
  • 433h video-only: 35.995

Then I tried the MuAViC decoding script, MuAViC English checkpoint, and the AV-HuBERT LRS3 dataset version:

  • 433h audio-visual: 2.153 (slightly better)
  • 433h audio-only: 3.225 (the same)
  • 433h video-only: 34.459 (noticeably better).

The MuAViC checkpoint also gets better performance on the AV-HuBERT version of LRS3 which is kind of surprising. In both cases (AV-HuBERT checkpoint or MuAViC checkpoint), the audio-only performance stays identical.
I have also tried this with the other AV-HuBERT checkpoints and the conclusion is the same (also, the gap was more noticeable for the base models).
I wonder if MuAViC processed the LRS3 video differently than AV-HuBERT, which leads to a different performance?

@Anwarvic
Copy link
Contributor

Anwarvic commented Jan 5, 2024

Hi @roudimit ,

Thank you so much for raising this issue and so sorry for the late reply!

To be honest, I never tested our checkpoints on VSR since it was out-of-scope! However, looking at the video processing code for muavic and av-hubert, I can see there are a few differences:

  • how frames are extracted from the video, av-huberts does this on the fly. MuAViC does it beforehand.
  • how video is saved, both uses ffmpeg but a bit differently.

These are the only differences that I could find! Hope this helps.

@roudimit
Copy link
Author

roudimit commented Jan 5, 2024

Thanks @Anwarvic for the pointers! I tested the video loading and the video saving. The loading functions from MuAViC and AV-HuBERT load the video the same. However, the saving using ffmpeg is different since AV-HuBERT specifies '-crf', '20', while MuAViC saving uses the default (I belief crf=23), which means the video frames from MuAViC are more compressed. A link for more details: https://stackoverflow.com/questions/64011346/ffmpeg-quality-conversion-options-video-compression

I'm going to leave this issue open so that others are aware of the difference between the video processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants