Academic-Hammer · FFFXX0319 · Jan 15, 2024 · Jan 15, 2024 · Jan 21, 2024 · Jan 21, 2024
diff --git a/README.assets/image-20240118205212486.png b/README.assets/image-20240118205212486.png
diff --git a/README.assets/image-20240119123956865.png b/README.assets/image-20240119123956865.png
diff --git a/README.assets/image-20240119181755678.png b/README.assets/image-20240119181755678.png
diff --git a/README.assets/image-20240119193402509.png b/README.assets/image-20240119193402509.png
diff --git a/README.assets/image-20240121231348020.png b/README.assets/image-20240121231348020.png
diff --git a/README.assets/image-20240121231802025.png b/README.assets/image-20240121231802025.png
diff --git a/README.assets/image-20240121232546501.png b/README.assets/image-20240121232546501.png
diff --git a/README.md b/README.md
diff --git a/checkpoints/adaptive-vc/README.md b/checkpoints/adaptive-vc/README.md
@@ -0,0 +1 @@
+You can download the pretrain model from [here](http://speech.ee.ntu.edu.tw/~jjery2243542/resource/model/is19/vctk_model.ckpt) and the coresponding normalization parameters for inference from [here](http://speech.ee.ntu.edu.tw/~jjery2243542/resource/model/is19/attr.pkl).
diff --git a/dataset/README.md b/dataset/README.md
@@ -1,41 +1,28 @@
-这个文件夹中保存的是数据集如：
-- lrw
-- lrs2
-- mead
-- .......
-
-数据集处理的一般格式为：
-
-```
-dataset/
-
-├── lrs2
-
-| ├── data (存放数据集的原始数据)
-
-| ├── filelist (保存的是数据集划分)
-
-| │ ├── train.txt
-
-| │ ├── val.txt
-
-| │ ├── test.txt
-
-| ├── preprocessed_data (具体路径内容可以参考talkingface.utils.data_preprocess文件中处理lrs2时候的路径，主要存储的是视频抽帧后的图像文件和音频文件)
-```
-
-preprocessed_data的数据路径一般表示为：
-```
-preprocessed_root (lrs2_preprocessed)/
-
-├── list of folders
-
-| ├── Folders with five-digit numbered video IDs
-
-| │ ├── *.jpg
-
-| │ ├── audio.wav
-
-```
-
-数据集存储尽量按照这个格式来，数据集的划分也尽量按照train.txt val.txt和test.txt文件来
+实验使用的数据集是CSTR VCTK Corpus数据集，可以在下面的地址中下载获得。
+- [CSTR VCTK Corpus](https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html)
+
+在dataset文件夹中，VCTK-Corpus存放的是在网站下载的原始数据，preprocessed_data存放的是经过预处理之后的数据。
+
+经过预处理后的dataset的架构如下：
+```xml
+dataset
+├── preprocessed_data *
+│   ├── attr.pkl
+│   ├── in_test_files.txt
+│   ├── in_test.pkl
+│   ├── in_test_samples_128.json
+│   ├── out_test_files.txt
+│   ├── out_test.pkl
+│   ├── out_test_samples_128.json
+│   ├── train_128.pkl
+│   ├── train.pkl
+│   └── train_samples_128.json
+├── VCTK-Corpus *
+│   ├── COPYING
+│   ├── NOTE
+│   ├── README
+│   ├── speaker-info.txt
+│   ├── txt *
+│   └── wav48 *
+└── (*符号代表该路径名为文件夹)
+```
diff --git a/dataset/VCTK-Corpus/README.txt b/dataset/VCTK-Corpus/README.txt
@@ -0,0 +1,68 @@
+---------------------------------------------------------------------
+                          CSTR VCTK Corpus 
+      English Multi-speaker Corpus for CSTR Voice Cloning Toolkit 
+
+                           (Version 0.80) 
+                         RELEASE August 2012
+             The Centre for Speech Technology Research
+                      University of Edinburgh 
+                        Copyright (c) 2012 
+
+                         Junichi Yamagishi
+                       [email protected]
+---------------------------------------------------------------------
+
+Overview 
+
+This CSTR VCTK Corpus includes speech data uttered by 109 English
+speakers with various accents. Each speaker reads out about 400
+sentences, which were selected from a newspaper, the rainbow passage
+and an elicitation paragraph used for the speech accent archive.
+
+The newspaper texts were taken from Herald Glasgow, with permission
+from Herald & Times Group. Each speaker has a different set of the
+newspaper texts selected based a greedy algorithm that increases the
+contextual and phonetic coverage.
+
+The rainbow passage and elicitation paragraph are the same for all
+speakers. The rainbow passage can be found at International Dialects
+of English Archive:
+(http://web.ku.edu/~idea/readings/rainbow.htm). The elicitation
+paragraph is identical to the one used for the speech accent archive
+(http://accent.gmu.edu). The details of the the speech accent archive
+can be found at
+http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf
+
+All speech data was recorded using an identical recording setup: an
+omni-directional microphone (DPA 4035), 96kHz sampling frequency at 24
+bits and in a hemi-anechoic chamber of the University of
+Edinburgh. All recordings were converted into 16 bits, were downsampled
+to 48 kHz based on STPK, and were manually end-pointed.
+
+This corpus is aimed for HMM-based text-to-speech synthesis systems,
+especially for speaker-adaptive HMM-based speech synthesis that uses
+average voice models trained on multiple speakers and speaker
+adaptation technologies.
+
+COPYING 
+
+This corpus is licensed under Open Data Commons Attribution License
+(ODC-By) v1.0. 
+
+http://opendatacommons.org/licenses/by/1.0/ 
+http://opendatacommons.org/licenses/by/summary/
+
+
+ACKNOWLEDGEMENTS
+
+The CSTR VCTK Corpus was constructed by:
+
+        Christophe Veaux   (University of Edinburgh)
+        Junichi Yamagishi  (University of Edinburgh)
+        Kirsten MacDonald 
+
+The research leading to these results was partly funded from EPSRC
+grants EP/I031022/1 (NST) and EP/J002526/1 (CAF), from the RSE-NSFC
+grant (61111130120), and from the JST CREST (uDialogue).
+
+
diff --git a/dataset/preprocessed_data/attr.pkl b/dataset/preprocessed_data/attr.pkl