Pronunciation and Prosody Problems in xTTSv2 Fine-Tuned for Malayalam #242
rasheed-aidetic
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description
After fine-tuning xTTSv2 for Malayalam, I have observed the following issues:
Pronunciation Errors:
For certain words, the model mispronounces characters, such as pronouncing "ണ" as "ന" randomly.
Occasionally, the model skips letters in the middle of some words.
Random Sound Generation for Full Stops ("."):
The model sometimes produces sounds for full stops, but this behavior is inconsistent. It doesn't happen for all full stops but occurs randomly.
Unnatural Pausing:
The output audio often has unnatural pauses between words and sentences, leading to less fluent speech.
Lack of Emotions:
Observations
Increasing the number of fine-tuning epochs exacerbates the issue, particularly the generation of sounds for full stops.
Despite tuning the hyperparameters and increasing data quality, the issues persist.
Request for Help
I am seeking advice or suggestions on:
Possible approaches to mitigate the above issues.
Specific adjustments to the model architecture, training process, or hyperparameters.
Recommendations for improving the prosody and emotional expression in the generated speech.
Any guidance or insights from the community would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions