Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

JeffTSaoO · 2024-09-25T07:17:01Z

Hi,
I have a 85 hr of Chinese audio voice at 44100 hz to fintuning en-us/lessac/medium .ckpt, but effect not good.
And my loss_gen_all looks so high, loss_disc_all looks normal.

Questions:

Sample Rate Conversion: Is it advisable to convert the sample rate from 44,100 Hz to 22,050 Hz before fine-tuning? Could this conversion be contributing to the high loss_gen_all?
Language Adaptation: Since I am fine-tuning an English model with Chinese data, are there specific configurations or adjustments you recommend to improve performance?
Model Compatibility: Are there any known issues or limitations when fine-tuning the en-us/lessac/medium.ckpt model with a non-English dataset?

Any guidance or suggestions you could provide would be greatly appreciated.

Thank you for your time and assistance.

Kracozebr · 2024-11-14T13:26:24Z

What config are you using? May be you are using English phoneme in stand of Chinese?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

JeffTSaoO commented Sep 25, 2024

Kracozebr commented Nov 14, 2024

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

Fine-tuning an English checkpoint (ckpt) using a Chinese speech dataset #613

Comments

JeffTSaoO commented Sep 25, 2024

Kracozebr commented Nov 14, 2024