Skip to content

Latest commit

 

History

History
28 lines (21 loc) · 1.68 KB

README.md

File metadata and controls

28 lines (21 loc) · 1.68 KB

This bangla text to speech model is a CNN based architecture with Attention mechanism.

Methodology based on : Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

Implementation based on: pytorch-dc-tts

Bengali Text to Speech Dataset: Bangla tts dataset by google contains approximately 3100 bangla sentences. This dataset was collected from native Indian Bengali and Bangladesh Bengali speakers.

Result

As there was hardware limitation, the training for the coarse mel spectrogram to the full STFT spectrogram was done only for 60 iterations. The audio samples and pretrained models can be found here link

About The Model Architecture

This TTS model consists of two networks: (1) Text2Mel, which synthesize a mel spectrogram from an input text, and (2) Spectrogram Super-resolution Network (SSRN), which convert a coarse mel spectrogram to the full STFT(Short-time Fourier transform) spectrogram. Figure below shows the overall architecture of the model. For more read this

Training Process

  1. Download the dataset into /datasets folder
  2. Preprocess the dataset.
  3. Train the Text2Mel model
  4. Train the SSRN model`
  5. Test the model

Colab Notebook : (https://colab.research.google.com/drive/1AjsxzBu6ekcv0GF3dyWubj04hhwwHkjE?usp=sharing) This colab playground might seems to be a total mess.