See: https://espnet.github.io/espnet/tutorial.html
Directory name | Corpus name | Task | Language | URL | Note |
---|---|---|---|---|---|
aesrc2020 | Accented English Speech Recognition Challenge 2020 | ASR | EN | https://arxiv.org/abs/2102.10233 | |
aidatatang_200zh | Aidatatang_200zh A free Chinese Mandarin speech corpus | ASR | ZH | http://www.openslr.org/62/ | |
aishell | AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus | ASR | ZH | http://www.aishelltech.com/kysjcp | |
aishell2 | AISHELL-2 Open Source Mandarin Speech Corpus | ASR | ZH | http://www.aishelltech.com/aishell_2 | |
ami | The AMI Meeting Corpus | ASR | EN | http://groups.inf.ed.ac.uk/ami/corpus/ | |
an4 | CMU AN4 database | ASR/TTS | EN | http://www.speech.cs.cmu.edu/databases/an4/ | |
arctic | CMU ARCTIC databases | TTS, VC | EN, EN -> EN | http://www.festvox.org/cmu_arctic/ | |
aurora4 | Aurora-4 database | ASR | EN | http://aurora.hsnr.de/aurora-4.html | |
babel | IARPA Babel corups | ASR | ~20 Languages | https://www.iarpa.gov/index.php/research-programs/babel | |
blizzard_2017 | Blizzard Challenge 2017 | TTS | EN | https://www.synsig.org/index.php/Blizzard_Challenge_2017 | |
chime4 | The 4th CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | EN | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/ | |
chime5 | The 5th CHiME Speech Separation and Recognition Challenge | ASR | EN | http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME5/index.html | |
chime6 | The 6th CHiME Speech Separation and Recognition Challenge | ASR | EN | https://chimechallenge.github.io/chime6/ | |
cmu_wilderness | CMU Wilderness Multilingual Speech Dataset | Multilingual ASR | ~100 Languages | https://github.com/festvox/datasets-CMU_Wilderness | |
commonvoice | The Mozilla Common Voice | ASR | 13 Languages | https://voice.mozilla.org/datasets | |
covost2 | CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus | ASR/Machine Translation/Speech Translation | 15+21 Language pairs | https://github.com/facebookresearch/covost | |
csj | Corpus of Spontaneous Japanese | ASR | JP | https://pj.ninjal.ac.jp/corpus_center/csj/en/ | |
csmsc | Chinese Standard Mandarin Speech Copus | TTS | ZH | https://www.data-baker.com/open_source.html | |
dipco | Dinner Party Corpus | ASR | EN | https://arxiv.org/abs/1909.13447 | |
dirha_wsj | Distant-speech Interaction for Robust Home Applications | Multi-Array ASR | EN | https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj | |
fisher_callhome_spanish | Fisher and CALLHOME Spanish--English Speech Translation | ASR/Machine Translation/Speech Translation | ES->EN | https://catalog.ldc.upenn.edu/LDC2014T23 | |
fisher_swbd | Fisher English Training Speech, Switchboard-1 Release 2 | ASR | EN | https://catalog.ldc.upenn.edu/LDC2004S13, https://catalog.ldc.upenn.edu/LDC2005S13, https://catalog.ldc.upenn.edu/LDC97S62 | |
hkust | HKUST Mandarin Telephone Speech | ASR | ZH | https://catalog.ldc.upenn.edu/LDC2005S15, https://catalog.ldc.upenn.edu/LDC2005T32 | |
how2 | How2: A Large-scale Dataset for Multimodal Language Understanding | ASR/Machine Translation/Speech Translation | EN->PT | https://github.com/srvk/how2-dataset | |
hub4_spanish | 1997 Spanish Broadcast News Speech (HUB4-NE) | ASR | ES | https://catalog.ldc.upenn.edu/LDC98S74, https://catalog.ldc.upenn.edu/LDC98T29 | |
iwslt16 | International Workshop on Spoken Language Translation 2016 | Machine Translation | EN->DE | https://wit3.fbk.eu/mt.php?release=2016-01 | |
iwslt18 | International Workshop on Spoken Language Translation 2018 | ASR/Machine Translation/Speech Translation | EN->DE | https://sites.google.com/site/iwsltevaluation2018/Lectures-task | |
iwslt19 | International Workshop on Spoken Language Translation 2019 | ASR/Speech Translation | EN->DE | https://sites.google.com/view/iwslt-evaluation-2019/speech-translation | |
iwslt21 | International Workshop on Spoken Language Translation 2021 | ASR/Machine Translation/Speech Translation | EN->DE | https://iwslt.org/2021/offline | |
iwslt21_low_resource | International Workshop on Spoken Language Translation 2021 | ASR/Speech Translation | SWA->EN & SWC->FR | https://iwslt.org/2021/low-resource | |
jesc | Japanese-English Subtitle Corpus | Machine Translation | EN->JP | https://nlp.stanford.edu/projects/jesc/ | |
jnas | ASJ Japanese Newspaper Article Sentences Read Speech Corpus (JNAS) | ASR/TTS | JP | http://research.nii.ac.jp/src/JNAS.html | |
jsalt18e2e | Multilingual End-to-end ASR for Incomplete Data Benchmark | Multilingual ASR | ~20 Languages | https://www.clsp.jhu.edu/workshops/18-workshop/multilingual-end-end-asr-incomplete-data/ | babel+ |
jsut | Japanese speech corpus of Saruwatari-lab., University of Tokyo | ASR/TTS | JP | https://sites.google.com/site/shinnosuketakamichi/publication/jsut | |
jvs | JVS (Japanese versatile speech) corpus | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus | |
ksponspeech | KsponSpeech (Korean spontaneous speech) corpus | ASR | KR | https://aihub.or.kr/aidata/105 | |
li10 | Lanugage-Independent ASR task (10 languages) | Multilingual ASR | ~10 Languages | https://www.merl.com/publications/docs/TR2017-182.pdf | csj+hkust+voxforge(7lang)+wsj |
li42 | Corpora Combination with 42 languages | Multilingual ASR | ~42 Languages | aishell+aurora4+babel+chime4+commonvoice+csj+fisher_callhome_spanish+fisher_swbd+hkust+voxforge+wsj | |
libri_trans | Translation Augmented LibriSpeech Corpus | ASR/Machine Translation/Speech Translation | https://persyval-platform.univ-grenoble-alpes.fr/DS91/detaildataset | ||
librispeech | LibriSpeech ASR corpus | ASR | EN | http://www.openslr.org/12 | |
libritts | LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech | TTS | EN | http://www.openslr.org/60/ | |
ljspeech | The LJ Speech Dataset | TTS | EN | https://keithito.com/LJ-Speech-Dataset/ | |
lrs2 | The Lip Reading Sentences 2 Dataset | ASR | ENG | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html | |
lrs | The Lip Reading Sentences 2 and 3 Dataset | AVSR | ENG | https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs3.html | |
m_ailabs | The M-AILABS Speech Dataset | TTS | ~5 languages | https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ | |
mucs_2021 | MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages | ASR/Code Switching | HI, MR, OR, TA, TE, GU, HI-EN, BN-EN | https://navana-tech.github.io/MUCS2021/data.html | |
mtedx | Multilingual TEDx | ASR/Machine Translation/Speech Translation | 13 Language pairs | http://www.openslr.org/100/ | |
must_c | Must-C Multilingual Speech Translation Corpus | ASR/Machine Translation/Speech Translation | EN->{DE, ES, FR, IT, NL, PT, RO, RU} | https://ict.fbk.eu/must-c/ | |
must_c_v2 | Must-C Multilingual Speech Translation Corpus | ASR/Machine Translation/Speech Translation | EN->DE | https://ict.fbk.eu/must-c/ https://iwslt.org/2021/offline | More talks that result in 20k more audio/text segments. Improved cleaning strategies able to better discard low-quality triplets. TED talks of MuST-C v2 were downloaded from the YouTube TED channel. |
puebla_nahuatl | The Puebla-Nahuatl Corpus | ASR | Nahuatl | http://www.openslr.org/89 | |
reverb | REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge | ASR | EN | https://reverb2014.dereverberation.com/ | |
ru_open_stt | Russian Open Speech To Text (STT/ASR) Dataset | ASR | RU | https://github.com/snakers4/open_stt | |
swbd | The Switchboard corpus | ASR | EN | https://catalog.ldc.upenn.edu/LDC97S62 | |
tedlium2 | TED-LIUM corpus release 2 | ASR | EN | https://www.openslr.org/19/, http://www.lrec-conf.org/proceedings/lrec2014/pdf/1104_Paper.pdf | |
tedlium3 | TED-LIUM corpus release 3 | ASR | EN | http://www.openslr.org/51/, https://arxiv.org/pdf/1805.04699 | |
timit | TIMIT Acoustic-Phonetic Continuous Speech Corpus | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S1 | |
timit_ssc | Silent Speech Challenge | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S1/, https://ftp.espci.fr/pub/sigma/ | Features extracted from ultra sound image and lip motion video. Train set and test set transcripts are from TIMIT corpus and WSJ corpus respectively |
tweb | The World English Bible | TTS | EN | https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset | |
vais1000 | VAIS-1000 | TTS | VI | https://ieee-dataport.org/documents/vais-1000-vietnamese-speech-synthesis-corpus | |
vcc20 | Voice Conversion Challenge 2020 | VC | EN->{EN, DE, FI, ZH} | http://www.vc-challenge.org/ | |
vivos | VIVOS (Vietnamese corpus for ASR) | ASR | VI | https://doi.org/10.5281/zenodo.7068130 | |
voxforge | VoxForge | ASR | 7 languages | http://www.voxforge.org/ | |
wsj | CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A | |
wsj_mix | MERL WSJ0-mix multi-speaker dataset | Multispeaker ASR | EN | http://www.merl.com/demos/deep-clustering | |
yesno | The "yesno" corpus | ASR | HE | http://www.openslr.org/1 | |
Yoloxóchitl-Mixtec | The Yoloxóchitl-Mixtec corpus | ASR | Mixtec | http://www.openslr.org/89 | |
zeroth_korean | Zeroth-Korean | ASR | KR | http://www.openslr.org/40 |