-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot get correct translation from the model #7
Comments
Today, I have tried again with the following simple code to make sure everything follows the sample without other unknown factors.
And this is the transcribed result for your reference. |
@jingcodeguy thanks for the issue. I suspect it could be issues related to VAD before sending to the model. Here, model may see small chunk of audios which may cause hallucination. @z-zawhtet-a anything to add here? |
@titipata Thanks for your feedback. I have tried also the original version of Whisper and Whisper.cpp. Both generate sensible words most of them. Because I am not a Thai-expertise. I cannot estimate overall accuracy in those tools also. |
Maybe it is from the audio sampling rate? Just guessing here. |
I have the following findings to share for your reference to help improve the model in the future.
The hugging face suggested way of using the model is used. (the code in the previous comment) According to the observation of 3 and 4. a. The whisper.cpp version's ggml-large-v3.bin model can recognize the children's sound/voice without hallucinate or distracted. Attached are the sample sound and result I have made for your research. |
That's a cool finding! Let me ingest the information and probably think about model a bit more later. |
Hello!
Thanks for providing the hope about using the Thai language inference with better accuracy.
I have tried the following methods but none could give any meaningful words compared to the existing model.
I have tried
whisper-th-large-v3-combined
whisper-th-large-v3
whisper-th-medium-combined
respectively in the following tools.eg.
https://huggingface.co/biodatlab/whisper-th-large-v3-combined
System
The first thing I have done is cloning your project to local for a test.
Because the sample code is not outputing anything in the screen.
To monitor the process, I stream it to the tkinter so that I don't need to wait for the whole process finished to see the result.
Sample audio from this video
https://www.tiktok.com/@minnimum111/video/7245259683211398406
Is there any procedure I have missed to use your model?
The text was updated successfully, but these errors were encountered: