Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A prototype for Vits 2 / Yourtts 2 #137

Draft
wants to merge 7 commits into
base: dev
Choose a base branch
from
Draft

Conversation

Marioando
Copy link

Hi , here is my prototype for vits2, text encoder is not conditionned on speaker.

@pivolan
Copy link

pivolan commented Nov 7, 2024

Hi do you have an example or comparison between vits and vits2?

@eginhard
Copy link
Member

eginhard commented Nov 8, 2024

Cool, thank you! I'll give some comments next week. Could you add an example training recipe, e.g. based on https://github.com/idiap/coqui-ai-TTS/blob/dev/recipes/ljspeech/vits_tts/train_vits.py ? And do you have some samples to share?

@Marioando
Copy link
Author

Marioando commented Nov 8, 2024

The model is still under training but here is some samples :
vits2_audio_samples.zip.tar.gz
I did train using dvector. Also, duration discriminator is conditionned on speaker which I think should be an improvement over original vits2.

@pivolan
Copy link

pivolan commented Nov 8, 2024

The model is still under training but here is some samples : vits2_audio_samples.zip.tar.gz I did train using dvector. Also, duration discriminator is conditionned on speaker which I think should be an improvement over original vits2.

this is my example in vits v1 german single language.
580_hier-ist-eine-typisc.mp3.zip

@eginhard eginhard linked an issue Nov 8, 2024 that may be closed by this pull request
@Marioando Marioando changed the title A prototype for vits 2 A prototype for vits 2 / Yourtts 2 Nov 8, 2024
@Marioando Marioando changed the title A prototype for vits 2 / Yourtts 2 A prototype for Vits 2 / Yourtts 2 Nov 8, 2024
@eginhard
Copy link
Member

Overall it looks good already, thanks. Where possible, could you reuse existing functions and classes? E.g. discriminator.py looks unchanged from the original Vits implementation, so you can just import that. I'll also check that at the end, but you might already know well which parts are the same and which are different.

Otherwise I'll add least need a training recipe for LJSpeech and some basic tests - there were some added here: https://github.com/coqui-ai/TTS/pull/3355/files

@Marioando
Copy link
Author

Hi, I will add recipe once I got good result from the model. For now this prototype have the following issues that really slow me down for some days now. For vits1 training, accelerate divide training time by 4. Unfortunatly, I cant get it to work with this vits2 implementation. Here is the error message I got :

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1785, in _fit
self.train_epoch()
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1504, in train_epoch
outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1383, in train_step
outputs, loss_dict_new, step_time = self.optimize(
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1251, in optimize
grad_norm = self._compute_grad_norm(optimizer)
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1175, in _compute_grad_norm
return torch.norm(torch.cat([param.grad.view(-1) for param in self.master_params(optimizer)], dim=0), p=2)
File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1175, in
return torch.norm(torch.cat([param.grad.view(-1) for param in self.master_params(optimizer)], dim=0), p=2)
AttributeError: 'NoneType' object has no attribute 'view'

What I suppose is that the gradient for some parameter are none when using accelerate. Training with trainer.distribute work fine but is 2 times slower than accelerate with half the batch size of accelerate. Any kind of help would be greatly appreciated.
Thank you!

@eginhard eginhard marked this pull request as draft November 11, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add VITS 2 model
3 participants