Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adaptive-VC #9

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added README.assets/image-20240118205212486.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added README.assets/image-20240119123956865.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added README.assets/image-20240119181755678.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added README.assets/image-20240119193402509.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added README.assets/image-20240121231348020.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added README.assets/image-20240121231802025.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added README.assets/image-20240121232546501.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
361 changes: 210 additions & 151 deletions README.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions checkpoints/adaptive-vc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
You can download the pretrain model from [here](http://speech.ee.ntu.edu.tw/~jjery2243542/resource/model/is19/vctk_model.ckpt) and the coresponding normalization parameters for inference from [here](http://speech.ee.ntu.edu.tw/~jjery2243542/resource/model/is19/attr.pkl).
69 changes: 28 additions & 41 deletions dataset/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,28 @@
这个文件夹中保存的是数据集如:
- lrw
- lrs2
- mead
- .......

数据集处理的一般格式为:

```
dataset/

├── lrs2

| ├── data (存放数据集的原始数据)

| ├── filelist (保存的是数据集划分)

| │ ├── train.txt

| │ ├── val.txt

| │ ├── test.txt

| ├── preprocessed_data (具体路径内容可以参考talkingface.utils.data_preprocess文件中处理lrs2时候的路径,主要存储的是视频抽帧后的图像文件和音频文件)
```

preprocessed_data的数据路径一般表示为:
```
preprocessed_root (lrs2_preprocessed)/

├── list of folders

| ├── Folders with five-digit numbered video IDs

| │ ├── *.jpg

| │ ├── audio.wav

```

数据集存储尽量按照这个格式来,数据集的划分也尽量按照train.txt val.txt和test.txt文件来
实验使用的数据集是CSTR VCTK Corpus数据集,可以在下面的地址中下载获得。
- [CSTR VCTK Corpus](https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html)

在dataset文件夹中,VCTK-Corpus存放的是在网站下载的原始数据,preprocessed_data存放的是经过预处理之后的数据。

经过预处理后的dataset的架构如下:
```xml
dataset
├── preprocessed_data *
│ ├── attr.pkl
│ ├── in_test_files.txt
│ ├── in_test.pkl
│ ├── in_test_samples_128.json
│ ├── out_test_files.txt
│ ├── out_test.pkl
│ ├── out_test_samples_128.json
│ ├── train_128.pkl
│ ├── train.pkl
│ └── train_samples_128.json
├── VCTK-Corpus *
│ ├── COPYING
│ ├── NOTE
│ ├── README
│ ├── speaker-info.txt
│ ├── txt *
│ └── wav48 *
└── (*符号代表该路径名为文件夹)
```
68 changes: 68 additions & 0 deletions dataset/VCTK-Corpus/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---------------------------------------------------------------------
CSTR VCTK Corpus
English Multi-speaker Corpus for CSTR Voice Cloning Toolkit

(Version 0.80)
RELEASE August 2012
The Centre for Speech Technology Research
University of Edinburgh
Copyright (c) 2012

Junichi Yamagishi
[email protected]
---------------------------------------------------------------------

Overview

This CSTR VCTK Corpus includes speech data uttered by 109 English
speakers with various accents. Each speaker reads out about 400
sentences, which were selected from a newspaper, the rainbow passage
and an elicitation paragraph used for the speech accent archive.

The newspaper texts were taken from Herald Glasgow, with permission
from Herald & Times Group. Each speaker has a different set of the
newspaper texts selected based a greedy algorithm that increases the
contextual and phonetic coverage.

The rainbow passage and elicitation paragraph are the same for all
speakers. The rainbow passage can be found at International Dialects
of English Archive:
(http://web.ku.edu/~idea/readings/rainbow.htm). The elicitation
paragraph is identical to the one used for the speech accent archive
(http://accent.gmu.edu). The details of the the speech accent archive
can be found at
http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf

All speech data was recorded using an identical recording setup: an
omni-directional microphone (DPA 4035), 96kHz sampling frequency at 24
bits and in a hemi-anechoic chamber of the University of
Edinburgh. All recordings were converted into 16 bits, were downsampled
to 48 kHz based on STPK, and were manually end-pointed.

This corpus is aimed for HMM-based text-to-speech synthesis systems,
especially for speaker-adaptive HMM-based speech synthesis that uses
average voice models trained on multiple speakers and speaker
adaptation technologies.

COPYING

This corpus is licensed under Open Data Commons Attribution License
(ODC-By) v1.0.

http://opendatacommons.org/licenses/by/1.0/
http://opendatacommons.org/licenses/by/summary/


ACKNOWLEDGEMENTS

The CSTR VCTK Corpus was constructed by:

Christophe Veaux (University of Edinburgh)
Junichi Yamagishi (University of Edinburgh)
Kirsten MacDonald

The research leading to these results was partly funded from EPSRC
grants EP/I031022/1 (NST) and EP/J002526/1 (CAF), from the RSE-NSFC
grant (61111130120), and from the JST CREST (uDialogue).


Binary file added dataset/preprocessed_data/attr.pkl
Binary file not shown.
Loading