-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which vq-wav2vec checkpoint was used for data preprocessing? #10
Comments
Thanks for your interest in our work! We use the kmeans version of vq-wav2vec trained on Librispeech provided by fairseq there. So it is the second row of that table. I should make that clearer in the README though. Please tell me if that solves your problem : ) |
Hey @cantabile-kwok Thank you for your response. Yep, it helped. Almost all labels now are matching. I have one more question regarding the data preprocessing. What were the steps you did for getting the If possible, can you share that code please? Thank you! |
@bestasoff We are not using a pretrained MFA model; but in fact, we firstly train a 10ms frame shift alignment model in Kaldi (I believe MFA is quite similar though), and that will give us the phoneme transcription of each utt (a.k.a the SIL1:dur <= 3 But note that you don't have to necessarily obtain the same phone transcriptions as the provided one. For this stage, every text preprocessing tool can be used, and you only have to ensure the duration matches the frame shift of vq-wav2vec features (which is 10ms). |
@cantabile-kwok Yep, I could do it with pretrained MFA ARPA model. It's not very accurate (as I read it's because of the ARPA phones format), but works. Now I faced other issue. I want to train the model on bigger datasets so I need to make all data preparation scripts to work. All the scripts seem to work ok, but
It's because of the Can you please help me resolve it. Thank you! |
I didn't encounter this before, so could you show me more information about this error, like the whole log file? |
@cantabile-kwok Yes, sure. Here is the whole log I get after running that command:
|
@bestasoff That is a bit weird, since the |
Hello @cantabile-kwok ! Thanks for this amazing project and congratulations on acceptance in AAAI.
I have a question. What vq-wav2vec checkpoint was used for tokenizing the speech data?
I'm reproducing the data preprocessing and find that some of resulting labels of the same libritts files do not match.
Thank you again for that project!
The text was updated successfully, but these errors were encountered: