ON_DEVELOPMENT
Mandarin/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit
Read the document (write in Chinese) at MTTS Document
Using 15 hours of wav for a mandarin speech synthesis dataset which is not open-source, but you can use thchs30 dataset to run the demo (or record wav by yourself)
Using Training Sets Label to generate wav https://jackiexiao.github.io/MTTS/
I also use thchs30 dataset to train (only using 250 wavs for A11 speaker), see the website above
- First, you need data contain wav and txt (prosody mark is optional)
- Second, generate HTS label using this project
- Using merlin/egs/mandarin_voice to train
Python : python3.6
System: linux(tested on ubuntu16.04)
pip install jieba pypinyin
sudo apt-get install libatlas3-base
Download file by yourself or run bash tools/install_mtts.sh
Download montreal-forced-aligner and unzip to directory tools/
Download acoustic_model
thchs30.zip and copy to directory misc/
Run Demo
bash run_demo.sh
- Usage: Enter dir
MTTS/src
Runpython mtts.py txtfile wav_directory_path output_directory_path
(Absolute path or relative path) Then you will get HTS label - Attention: Currently only support Chinese Character, txt should not have any Arabia number or English alphabet
txtfile example
A_01 这是一段文本
A_02 这是第二段文本
wav_directory example(Sampleing Rate should larger than 16khz)
A_01.wav
A_02.wav
see source code for more information mandarin_frontend.py
This project use Montreal-Forced-Aligner to do forced alignment
- We trained the acoustic model using thchs30 dataset, see
misc/thchs30.zip
, the dictionary we use mandarin_mtts.lexicon - If you want to use mfa's (montreal-forced-aligner) pre-trained mandarin model, this is the dictionary you need mandarin-for-montreal-forced-aligner-pre-trained-model.lexicon
You can generate HTS Label without prosody mark. we assume that word segment is smaller than prosodic word(which is adjusted in code)
"#0","#1", "#2","#3" and "#4" are the prosody labeling symbols.
- #0 stands for word segment
- #1 stands for prosodic word
- #2 stands for stressful word (actually in this project we regrad it as #1)
- #3 stands for prosodic phrase
- #4 stands for intonational phrase
Improvement of prosody analyse will come soon
- Jackiexiao
- willian56