A prototype for Vits 2 / Yourtts 2 #137

Marioando · 2024-11-07T06:34:22Z

Hi , here is my prototype for vits2, text encoder is not conditionned on speaker.

pivolan · 2024-11-07T20:29:31Z

Hi do you have an example or comparison between vits and vits2?

eginhard · 2024-11-08T07:37:58Z

Cool, thank you! I'll give some comments next week. Could you add an example training recipe, e.g. based on https://github.com/idiap/coqui-ai-TTS/blob/dev/recipes/ljspeech/vits_tts/train_vits.py ? And do you have some samples to share?

Marioando · 2024-11-08T11:02:08Z

The model is still under training but here is some samples :
vits2_audio_samples.zip.tar.gz
I did train using dvector. Also, duration discriminator is conditionned on speaker which I think should be an improvement over original vits2.

pivolan · 2024-11-08T14:28:57Z

The model is still under training but here is some samples : vits2_audio_samples.zip.tar.gz I did train using dvector. Also, duration discriminator is conditionned on speaker which I think should be an improvement over original vits2.

this is my example in vits v1 german single language.
580_hier-ist-eine-typisc.mp3.zip

eginhard · 2024-11-11T09:21:13Z

Overall it looks good already, thanks. Where possible, could you reuse existing functions and classes? E.g. discriminator.py looks unchanged from the original Vits implementation, so you can just import that. I'll also check that at the end, but you might already know well which parts are the same and which are different.

Otherwise I'll add least need a training recipe for LJSpeech and some basic tests - there were some added here: https://github.com/coqui-ai/TTS/pull/3355/files

Marioando · 2024-11-11T09:53:36Z

Hi, I will add recipe once I got good result from the model. For now this prototype have the following issues that really slow me down for some days now. For vits1 training, accelerate divide training time by 4. Unfortunatly, I cant get it to work with this vits2 implementation. Here is the error message I got :

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1383, in train_step
outputs, loss_dict_new, step_time = self.optimize(
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1251, in optimize
grad_norm = self._compute_grad_norm(optimizer)
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1175, in _compute_grad_norm
return torch.norm(torch.cat([param.grad.view(-1) for param in self.master_params(optimizer)], dim=0), p=2)
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1175, in
return torch.norm(torch.cat([param.grad.view(-1) for param in self.master_params(optimizer)], dim=0), p=2)
AttributeError: 'NoneType' object has no attribute 'view'

What I suppose is that the gradient for some parameter are none when using accelerate. Training with trainer.distribute work fine but is 2 times slower than accelerate with half the batch size of accelerate. Any kind of help would be greatly appreciated.
Thank you!

Marioando added 7 commits November 7, 2024 09:14

Added dur disc vits2

09b3df2

Added vits2 modules

a15f2f7

Added vits2_config

df4e6fc

Added vits2

b24d653

updates losses for vits2

05be182

Updated vits2 layers

c0bf629

Updated vits2 model

aeba321

eginhard linked an issue Nov 8, 2024 that may be closed by this pull request

Add VITS 2 model #123

Open

Marioando changed the title ~~A prototype for vits 2~~ A prototype for vits 2 / Yourtts 2 Nov 8, 2024

Marioando changed the title ~~A prototype for vits 2 / Yourtts 2~~ A prototype for Vits 2 / Yourtts 2 Nov 8, 2024

eginhard mentioned this pull request Nov 9, 2024

Vits2 / Yourtts2 doesnt work with accelerate #148

Closed

eginhard marked this pull request as draft November 11, 2024 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A prototype for Vits 2 / Yourtts 2 #137

A prototype for Vits 2 / Yourtts 2 #137

Marioando commented Nov 7, 2024

pivolan commented Nov 7, 2024

eginhard commented Nov 8, 2024

Marioando commented Nov 8, 2024 •

edited

Loading

pivolan commented Nov 8, 2024

eginhard commented Nov 11, 2024

Marioando commented Nov 11, 2024

A prototype for Vits 2 / Yourtts 2 #137

Are you sure you want to change the base?

A prototype for Vits 2 / Yourtts 2 #137

Conversation

Marioando commented Nov 7, 2024

pivolan commented Nov 7, 2024

eginhard commented Nov 8, 2024

Marioando commented Nov 8, 2024 • edited Loading

pivolan commented Nov 8, 2024

eginhard commented Nov 11, 2024

Marioando commented Nov 11, 2024

Marioando commented Nov 8, 2024 •

edited

Loading