-
Notifications
You must be signed in to change notification settings - Fork 124
synthetic speech pacing is very fast with frontend script #8
Comments
I'm not sure about that, do you use DNN architecture in your duration model? Before training the duration model, have you checked your training data (forced-alignment result)? Do you use montreal-forced-alignment to do forced align? |
when use HTS label which are genarate by mandarin_frontend.py to synthesis speech , get some warning: |
In synthesis section, mandarin_frontend.py generate HTS Label without predicting phone duration. If your synthetic speech pacing is very fast, then you should check your duration model, both your traing data and training result. A duration model result (slt_arctic_demo)
|
@Jackiexiao i check the train data and notice some warning with using montreal-forced-alignment to genarate lab file: 2018-06-06 15:57:01,853 WARNING : --Miss: database/textgrid/mandarin_voice/A11_242.TextGrid do you think it may have some effect on that case? |
it means that montreal-forced-align didn't work, so no TextGrid file generated |
@Jackiexiao 我大概回想了一下,我是直接下载的thchs30_250_demo.tar.gz,,thchs30_250_demo.tar.gz包里的数据已经有了labels文件,应该是已经做过alignment的吧,所以应该和我自己有没有做alignment没啥关系,因为我没用新的数据。你觉得有毛病么 |
发一个合成音频样例来听听? |
@Jackiexiao A11是在训练过程中,自动生成的WAV。 |
确实很奇怪,按理来说不会出现这种情况,应该是duration model 的问题。你看一下时长模型训练的Log,对比一下之前我发的训练结果记录。
|
feed_forward_6_tanh_01_57PM_June_06_2018.log |
feed_forward_4_tanh_08_29PM_January_30_2018.log 训练duration model的log |
从头开始以后,duration model: |
看起来duration model训练的没有问题......我也不清楚原因 |
我又重新跑了一遍,发现之前生成不了mgc文件是因为他的路径:world/extract_features_for_merlin.py, 我发现他要找的是WORLD/extract_features_for_merlin.py,于是乎将world夹子改成了WORLD,生成mgc成功。 |
在s1/01_setup.sh文件中配置了声码器echo "Vocoder=WORLD" >> $global_config_file,你只需要修改这里01_setup.sh,然后重新跑一下相关脚本,你应该是其他地方的world没有改大写,所以才会报错 |
B11_0.lab.txt |
以最新前端生成的lab文件为准(因为修复了一些bug),A11的Lab是错误的,抱歉没有更新下载链接 |
错误的LAB会导致合成的语音语速快? |
导致语音语速快的只可能是duration预测太短了 |
看了下确实是时长模型的问题,预测的时间戳和训练的样本时间戳差了100倍。 补充:已经验证,不是时长模型的问题,A.lab和B.lab的时间戳都是由同一个时长模型生成的。A11.wav语速正常,B11.wav语速很快。 |
你发的附件中没有B.lab,我觉得还是时长模型的问题。 |
@Jackiexiao 想问下我在训练声学模型的时候,出现这样的错误 |
@Jackiexiao synthetic speech pacing is very fast, do you think what is the reason?
how to solve it?
37050000 37800000 x^iou4-m+ei4=sh@/A:4-4^1@/B:25+4@2^1^26+5#26-5-/C:a_a^n#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
37800000 38600000 iou4^m-ei4+sh=ih1@/A:4-4^1@/B:25+4@2^1^26+5#26-5-/C:a_a^n#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
38600000 39750000 m^ei4-sh+ih1=y@/A:4-1^4@/B:26+3@1^2^27+4#27-4-/C:a_n^z#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
39750000 40400000 ei4^sh-ih1+y=i4@/A:4-1^4@/B:26+3@1^2^27+4#27-4-/C:a_n^z#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
40400000 41200000 sh^ih1-y+i4=ang4@/A:1-4^4@/B:27+2@2^1^28+3#28-3-/C:a_n^z#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
The text was updated successfully, but these errors were encountered: