Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor Real-Time Performance of Whisper Models Fine-Tuned on Synthetic Data #198

Open
monk1337 opened this issue Dec 25, 2023 · 1 comment

Comments

@monk1337
Copy link

monk1337 commented Dec 25, 2023

Hi,

I have custom text data for plant disease names and plant names like this:

uuid, context 
1er1hhaj13, The Rhododendron, a popular ornamental plant, often suffers from Phytophthora ramorum, a challenging disease to manage and pronounce. This pathogen causes Sudden Oak Death, which can lead to extensive damage and mortality in infected plants.

I used speech-to-text APIs to convert this context into audio WAV files, choosing 10 speakers with mostly American/UK/British accents. So I created around ~5k samples for training and ~2k samples for testing.

I followed the same steps from "Fast whisper finetuning" to finetune the peft version of Whisper Large-v2. The training and validation loss looks good:

Step | Training Loss | Validation Loss
250 | 0.413000 | 0.102663
500 | 0.109900 | 0.130888
750 | 0.116500 | 0.102719
1000 | 0.092800 | 0.099153
1250 | 0.068800 | 0.075613 
1500 | 0.042500 | 0.085680
1750 | 0.047500 | 0.076951
2000 | 0.027500 | 0.065127
2250 | 0.023700 | 0.061832
2500 | 0.012500 | 0.062658
2750 | 0.011500 | 0.061922
3000 | 0.008500 | 0.061463
3250 | 0.005300 | 0.060227
3500 | 0.003800 | 0.060712
3750 | 0.002700 | 0.060332
4000 | 0.002300 | 0.060496

When I calculated WER on the test data:

  • OpenAI Whisper APIs: 22.03 WER on test data
  • Finetuned model: 0.3 WER on test data

Which looks good. However, during real-time testing with an Indian English-speaking audience, the accuracy for plant names and disease names was not satisfactory. What strategies could we employ to improve accuracy in real-time settings?
Any guidance or suggestions on this matter would be greatly appreciated. Thank you!

@monk1337
Copy link
Author

Hi @Vaibhavs10 @sanchit-gandhi Can you check this issue, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant