Pronunciation and Prosody Problems in xTTSv2 Fine-Tuned for Malayalam #242

rasheed-aidetic · 2025-01-03T10:36:00Z

rasheed-aidetic
Jan 3, 2025

Description
After fine-tuning xTTSv2 for Malayalam, I have observed the following issues:

Pronunciation Errors:

For certain words, the model mispronounces characters, such as pronouncing "ണ" as "ന" randomly.
Occasionally, the model skips letters in the middle of some words.
Random Sound Generation for Full Stops ("."):

The model sometimes produces sounds for full stops, but this behavior is inconsistent. It doesn't happen for all full stops but occurs randomly.
Unnatural Pausing:

The output audio often has unnatural pauses between words and sentences, leading to less fluent speech.
Lack of Emotions:

Observations
Increasing the number of fine-tuning epochs exacerbates the issue, particularly the generation of sounds for full stops.
Despite tuning the hyperparameters and increasing data quality, the issues persist.
Request for Help
I am seeking advice or suggestions on:

Possible approaches to mitigate the above issues.
Specific adjustments to the model architecture, training process, or hyperparameters.
Recommendations for improving the prosody and emotional expression in the generated speech.
Any guidance or insights from the community would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pronunciation and Prosody Problems in xTTSv2 Fine-Tuned for Malayalam #242

{{title}}

Replies: 0 comments

Select a reply

Pronunciation and Prosody Problems in xTTSv2 Fine-Tuned for Malayalam #242

rasheed-aidetic Jan 3, 2025

Replies: 0 comments

rasheed-aidetic
Jan 3, 2025