v0.3.0
🐸 v0.3.0
New ForwardTTS
implementation.
This version implements a new ForwardTTS
interface that can be configured as any feed-forward TTS model that uses a duration predictor at inference time. Currently, we provide 3 pre-configured models and plan to implement one more.
- SpeedySpeech
- FastSpeech
- FastPitch
- FastSpeech 2 (TODO)
Through this API, any model can be trained in two ways. Either using pre-computed durations from a pre-trained Tacotron model or using an alignment network to learn durations from the dataset. The alignment network is only used at training and discarded at inference. You can set which mode you want to use by just setting the use_aligner
field in the configuration.
This new API will help us to design more efficient inference run-time for all these models using ONNX like run-time optimizers.
Old FastPitch
and SpeedySpeech
implementations are deprecated for the sake of this new implementation.
Fine-Tuning Documentation
This version introduces documentation for model fine-tunning. You can see it under https://tts.readthedocs.io/ when this is merged.
New Model Releases
- English Speedy Speech model on LJSpeech
Try out:
tts --text "This is a sample text for my model to speak." --model_name tts_models/en/ljspeech/speedy-speech
- Fine-tuned UnivNet Vocoder
Try out:
tts --text "This is how it is." --model_name tts_models/en/ljspeech/tacotron2-DDC_ph