Skip to content
This repository has been archived by the owner on Apr 18, 2022. It is now read-only.

synthetic speech pacing is very fast with frontend script #8

Open
ben-8878 opened this issue Jun 5, 2018 · 22 comments
Open

synthetic speech pacing is very fast with frontend script #8

ben-8878 opened this issue Jun 5, 2018 · 22 comments

Comments

@ben-8878
Copy link

ben-8878 commented Jun 5, 2018

@Jackiexiao synthetic speech pacing is very fast, do you think what is the reason?
how to solve it?

37050000 37800000 x^iou4-m+ei4=sh@/A:4-4^1@/B:25+4@2^1^26+5#26-5-/C:a_a^n#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
37800000 38600000 iou4^m-ei4+sh=ih1@/A:4-4^1@/B:25+4@2^1^26+5#26-5-/C:a_a^n#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
38600000 39750000 m^ei4-sh+ih1=y@/A:4-1^4@/B:26+3@1^2^27+4#27-4-/C:a_n^z#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
39750000 40400000 ei4^sh-ih1+y=i4@/A:4-1^4@/B:26+3@1^2^27+4#27-4-/C:a_n^z#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!
40400000 41200000 sh^ih1-y+i4=ang4@/A:1-4^4@/B:27+2@2^1^28+3#28-3-/C:a_n^z#2+2+2&/D:xx=30!xx@1-1&/E:xx|30-xx@xx#1&xx!1-1#/F:xx^30=17_1-1!

@ben-8878 ben-8878 changed the title synthetic speech pacing is very fast synthetic speech pacing is very fast with frontend script Jun 5, 2018
@Jackiexiao
Copy link
Owner

I'm not sure about that, do you use DNN architecture in your duration model? Before training the duration model, have you checked your training data (forced-alignment result)? Do you use montreal-forced-alignment to do forced align?

@ben-8878
Copy link
Author

ben-8878 commented Jun 6, 2018

  1. I use montreal-forced-alignment to do forced align and feed the train data to merlin.
  2. i use default duration config parameter set:
    hidden_layer_size : [1024, 1024, 1024, 1024, 1024, 1024]
    hidden_layer_type : ['TANH', 'TANH', 'TANH', 'TANH', 'TANH', 'TANH']
  3. forced-alignment result seems right.
  4. othewise, the speed of speech which are synthesis by HTS label which are genarate by montreal-forced-alignment is normal, but the speed of speech which are synthesis by HTS label which are genarate by mandarin_frontend.py is very fast.

when use HTS label which are genarate by mandarin_frontend.py to synthesis speech , get some warning:
2018-06-06 13:57:16,701 INFO labels : loaded /data02/zhangyb/merlin/egs/mandarin_voice/s1/experiments/mandarin_voice/test_synthesis/gen-lab/B11_8.lab, 78 labels
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
WARNING: no silence found!
2018-06-06 13:57:16,845 INFO acoustic_norm: Loaded min max values from the trained data for feature dimension of 471
2018-06-06 13:57:17,193 INFO main : label dimension is 471
2018-06-06 13:57:17,194 INFO main : generating from DNN

@Jackiexiao
Copy link
Owner

In synthesis section, mandarin_frontend.py generate HTS Label without predicting phone duration. If your synthetic speech pacing is very fast, then you should check your duration model, both your traing data and training result.

A duration model result (slt_arctic_demo)

2018-01-30 20:30:02,166     INFO           main: Develop: DNN -- RMSE: 6.790 frames/phoneme; CORR: 0.629; 
2018-01-30 20:30:02,166     INFO           main: Test: DNN -- RMSE: 7.842 frames/phoneme; CORR: 0.562; 

@ben-8878
Copy link
Author

ben-8878 commented Jun 6, 2018

@Jackiexiao i check the train data and notice some warning with using montreal-forced-alignment to genarate lab file:

2018-06-06 15:57:01,853 WARNING : --Miss: database/textgrid/mandarin_voice/A11_242.TextGrid
2018-06-06 15:57:01,853 WARNING : --Miss: database/textgrid/mandarin_voice/A11_243.TextGrid
2018-06-06 15:57:01,853 WARNING : --Miss: database/textgrid/mandarin_voice/A11_244.TextGrid
2018-06-06 15:57:01,853 WARNING : --Miss: database/textgrid/mandarin_voice/A11_245.TextGrid
2018-06-06 15:57:01,853 WARNING : --Miss: database/textgrid/mandarin_voice/A11_246.TextGrid
2018-06-06 15:57:01,854 WARNING : --Miss: database/textgrid/mandarin_voice/A11_247.TextGrid
2018-06-06 15:57:01,854 WARNING : --Miss: database/textgrid/mandarin_voice/A11_248.TextGrid
2018-06-06 15:57:01,854 WARNING : --Miss: database/textgrid/mandarin_voice/A11_249.TextGrid

do you think it may have some effect on that case?

@Jackiexiao
Copy link
Owner

it means that montreal-forced-align didn't work, so no TextGrid file generated

@ben-8878
Copy link
Author

ben-8878 commented Jun 7, 2018

@Jackiexiao 我大概回想了一下,我是直接下载的thchs30_250_demo.tar.gz,,thchs30_250_demo.tar.gz包里的数据已经有了labels文件,应该是已经做过alignment的吧,所以应该和我自己有没有做alignment没啥关系,因为我没用新的数据。你觉得有毛病么

@Jackiexiao
Copy link
Owner

发一个合成音频样例来听听?

@ben-8878
Copy link
Author

ben-8878 commented Jun 7, 2018

@Jackiexiao A11是在训练过程中,自动生成的WAV。
B11是训练完成之后,通过前端获得没有时间戳的lab文件之后生成的。
wav.zip

@Jackiexiao
Copy link
Owner

确实很奇怪,按理来说不会出现这种情况,应该是duration model 的问题。你看一下时长模型训练的Log,对比一下之前我发的训练结果记录。

2018-01-30 20:30:02,166     INFO           main: Develop: DNN -- RMSE: 6.790 frames/phoneme; CORR: 0.629; 
2018-01-30 20:30:02,166     INFO           main: Test: DNN -- RMSE: 7.842 frames/phoneme; CORR: 0.562; 

@ben-8878
Copy link
Author

ben-8878 commented Jun 8, 2018

feed_forward_6_tanh_01_57PM_June_06_2018.log
@Jackiexiao 这是其中一个log,没看到你说的那个信息、

@Jackiexiao
Copy link
Owner

feed_forward_4_tanh_08_29PM_January_30_2018.log 训练duration model的log

@ben-8878
Copy link
Author

ben-8878 commented Jun 8, 2018

从头开始以后,duration model:
2018-06-08 12:07:49,213 INFO main: calculating MCD
2018-06-08 12:07:49,467 INFO main: Develop: DNN -- RMSE: 9.061 frames/phoneme; CORR: 0.659;
2018-06-08 12:07:49,468 INFO main: Test: DNN -- RMSE: 8.584 frames/phoneme; CORR: 0.668;

@Jackiexiao
Copy link
Owner

看起来duration model训练的没有问题......我也不清楚原因

@ben-8878
Copy link
Author

ben-8878 commented Jun 8, 2018

我又重新跑了一遍,发现之前生成不了mgc文件是因为他的路径:world/extract_features_for_merlin.py, 我发现他要找的是WORLD/extract_features_for_merlin.py,于是乎将world夹子改成了WORLD,生成mgc成功。
除了将world文件夹,名字改为大写的,什么都没改,第七步生成wav的时又神奇的报错了:
2018-06-08 19:04:07,893 INFO param_generation: processing 2 of 3: /data02/zhangyb/merlin/egs/mandarin_voice/s1/experiments/mandarin_voice/test_synthesis/wav/A11_1.cmp
2018-06-08 19:04:07,949 INFO param_generation: processing 3 of 3: /data02/zhangyb/merlin/egs/mandarin_voice/s1/experiments/mandarin_voice/test_synthesis/wav/A11_2.cmp
2018-06-08 19:04:07,996 INFO main : reconstructing waveform(s)
2018-06-08 19:04:07,997 CRITICAL wav_generation: The vocoder world is not supported yet!
Traceback (most recent call last):
File "/data02/zhangyb/merlin/src/run_merlin.py", line 1320, in
main_function(cfg)
File "/data02/zhangyb/merlin/src/run_merlin.py", line 989, in main_function
generate_wav(gen_dir, gen_file_id_list, cfg) # generated speech
File "/data02/zhangyb/merlin/src/utils/generate.py", line 335, in generate_wav
raise
RuntimeError: No active exception to reraise

@Jackiexiao
Copy link
Owner

在s1/01_setup.sh文件中配置了声码器echo "Vocoder=WORLD" >> $global_config_file,你只需要修改这里01_setup.sh,然后重新跑一下相关脚本,你应该是其他地方的world没有改大写,所以才会报错 CRITICAL wav_generation: The vocoder world is not supported yet!

@ben-8878
Copy link
Author

ben-8878 commented Jun 27, 2018

A11_0.lab.txt

B11_0.lab.txt
@Jackiexiao一直觉得哪里不对,找不出问题,A11和B11文本内容是相同的。
A11是来自thchs30_250_demo.tar.gz,在gmm model指导下生成的lab文件。
B11来通过前端生成的lab文件。
声学模型和时长模型都是没有问题的,是不是没有标韵律的导致的呢?

@Jackiexiao
Copy link
Owner

以最新前端生成的lab文件为准(因为修复了一些bug),A11的Lab是错误的,抱歉没有更新下载链接

@ben-8878
Copy link
Author

错误的LAB会导致合成的语音语速快?

@Jackiexiao
Copy link
Owner

导致语音语速快的只可能是duration预测太短了

@ben-8878
Copy link
Author

ben-8878 commented Jun 28, 2018

看了下确实是时长模型的问题,预测的时间戳和训练的样本时间戳差了100倍。

补充:已经验证,不是时长模型的问题,A.lab和B.lab的时间戳都是由同一个时长模型生成的。A11.wav语速正常,B11.wav语速很快。

wav_lab.zip

@Jackiexiao
Copy link
Owner

你发的附件中没有B.lab,我觉得还是时长模型的问题。

@1271086950
Copy link

@Jackiexiao 想问下我在训练声学模型的时候,出现这样的错误
the number of frames in label and acoustic features are different: 13879 vs 6721 (nitech_jp_song070_f001_020)
2019-09-16 18:05:16,385 CRITICAL main : train_DNN threw an exception
Traceback (most recent call last):
File "/home/zy/merlin/src/run_merlin.py", line 1320, in
main_function(cfg)
File "/home/zy/merlin/src/run_merlin.py", line 870, in main_function
cmp_mean_vector = cmp_mean_vector, cmp_std_vector = cmp_std_vector,init_dnn_model_file=cfg.start_from_trained_model)
File "/home/zy/merlin/src/run_merlin.py", line 223, in train_DNN
shared_train_set_xy, temp_train_set_x, temp_train_set_y = train_data_reader.load_one_partition()
File "/home/zy/merlin/src/utils/providers.py", line 296, in load_one_partition
shared_set_xy, temp_set_x, temp_set_y = self.load_next_partition()
File "/home/zy/merlin/src/utils/providers.py", line 761, in load_next_partition
raise
RuntimeError: No active exception to reraise
是什么原因呢?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants