MTTS Mandarin/Chinese Text to Speech FrontEnd

ON_DEVELOPMENT

Mandarin/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit

Read the document (write in Chinese) at MTTS Document

Data

Using 15 hours of wav for a mandarin speech synthesis dataset which is not open-source, but you can use thchs30 dataset to run the demo (or record wav by yourself)

Generated Samples

Using Training Sets Label to generate wav https://jackiexiao.github.io/MTTS/

I also use thchs30 dataset to train (only using 250 wavs for A11 speaker), see the website above

How To Reproduce

First, you need data contain wav and txt (prosody mark is optional)
Second, generate HTS label using this project
Using merlin/egs/mandarin_voice to train

Context related annotation & Question Set

Install

Python : python3.6
System: linux(tested on ubuntu16.04)

pip install jieba pypinyin
sudo apt-get install libatlas3-base

Download file by yourself or run bash tools/install_mtts.sh
Download montreal-forced-aligner and unzip to directory tools/
Download acoustic_model thchs30.zip and copy to directory misc/

Run Demo

bash run_demo.sh

Usage

1. Generate HTS Label by wav and text

Usage: Enter dir MTTS/src Run python mtts.py txtfile wav_directory_path output_directory_path (Absolute path or relative path) Then you will get HTS label
Attention: Currently only support Chinese Character, txt should not have any Arabia number or English alphabet

txtfile example

A_01 这是一段文本
A_02 这是第二段文本

wav_directory example(Sampleing Rate should larger than 16khz)

A_01.wav  
A_02.wav

2. Generate Label by wav and alignment file

see source code for more information mandarin_frontend.py

3. Forced-alignment

This project use Montreal-Forced-Aligner to do forced alignment

We trained the acoustic model using thchs30 dataset, see misc/thchs30.zip, the dictionary we use mandarin_mtts.lexicon
If you want to use mfa's (montreal-forced-aligner) pre-trained mandarin model, this is the dictionary you need mandarin-for-montreal-forced-aligner-pre-trained-model.lexicon

Prosody Mark

You can generate HTS Label without prosody mark. we assume that word segment is smaller than prosodic word(which is adjusted in code)

"#0","#1", "#2","#3" and "#4" are the prosody labeling symbols.

#0 stands for word segment
#1 stands for prosodic word
#2 stands for stressful word (actually in this project we regrad it as #1)
#3 stands for prosodic phrase
#4 stands for intonational phrase

Improvement of prosody analyse will come soon

Contributor

Jackiexiao
willian56

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
example_file		example_file
misc		misc
src		src
tools		tools
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README-zh.md		README-zh.md
README.md		README.md
requirements.txt		requirements.txt
run_demo.sh		run_demo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTTS Mandarin/Chinese Text to Speech FrontEnd

Data

Generated Samples

How To Reproduce

Context related annotation & Question Set

Install

Usage

1. Generate HTS Label by wav and text

2. Generate Label by wav and alignment file

3. Forced-alignment

Prosody Mark

Contributor

About

Releases

Packages

Languages

License

yanyanxixi/MTTS

Folders and files

Latest commit

History

Repository files navigation

MTTS Mandarin/Chinese Text to Speech FrontEnd

Data

Generated Samples

How To Reproduce

Context related annotation & Question Set

Install

Usage

1. Generate HTS Label by wav and text

2. Generate Label by wav and alignment file

3. Forced-alignment

Prosody Mark

Contributor

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages