Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

Open
JeffTSaoO opened this issue Sep 25, 2024 · 1 comment
Open

Comments

@JeffTSaoO
Copy link

Hi,
I have a 85 hr of Chinese audio voice at 44100 hz to fintuning en-us/lessac/medium .ckpt, but effect not good.
And my loss_gen_all looks so high, loss_disc_all looks normal.

Questions:

Sample Rate Conversion: Is it advisable to convert the sample rate from 44,100 Hz to 22,050 Hz before fine-tuning? Could this conversion be contributing to the high loss_gen_all?
Language Adaptation: Since I am fine-tuning an English model with Chinese data, are there specific configurations or adjustments you recommend to improve performance?
Model Compatibility: Are there any known issues or limitations when fine-tuning the en-us/lessac/medium.ckpt model with a non-English dataset?

Any guidance or suggestions you could provide would be greatly appreciated.

Thank you for your time and assistance.

piper
image

@Kracozebr
Copy link

What config are you using? May be you are using English phoneme in stand of Chinese?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants