PyTorch implementation of Direct speech-to-speech translation with a sequence-to-sequence model.
This implementation was on a private Telugu-Hindi dataset which cannot be published but has been tested and verified.
Distributed and Automatic Mixed Precision support relies on NVIDIA's Apex and AMP.
- NVIDIA GPU + CUDA cuDNN
- Clone this repo:
git clone https://github.com/NVIDIA/tacotron2.git
- CD into this repo:
cd tacotron2
- Initialize submodule:
git submodule init; git submodule update
- Modify the dataset module according to the dataset of your preference. The dataset used was noisy so corresponding filtering was applied to make it fit for training.
- Install [PyTorch]
- Install Apex
- Install python requirements or build docker image
- Install python requirements:
pip install -r requirements.txt
- Install python requirements:
python train.py --output_directory=outdir --log_directory=logdir
- (OPTIONAL)
tensorboard --logdir=outdir/logdir
Training using a pre-trained model can lead to faster convergence
- Download pretrained published Tacotron 2 model
python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start
python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True
This implementation uses code from the following repos: Tacotron 2
Big thanks to the Transalotron paper authors and Tacotron 2 paper authors.