Name	Name	Last commit message	Last commit date
parent directory ..
aesrc2020/asr1	aesrc2020/asr1
aidatatang_200zh/asr1	aidatatang_200zh/asr1
aishell/asr1	aishell/asr1
aishell2/asr1	aishell2/asr1
ami/asr1	ami/asr1
an4	an4
arctic	arctic
aurora4/asr1	aurora4/asr1
babel/asr1	babel/asr1
blizzard17/tts1	blizzard17/tts1
chime4	chime4
chime5/asr1	chime5/asr1
chime6/asr1	chime6/asr1
cmu_indic/tts1	cmu_indic/tts1
cmu_wilderness	cmu_wilderness
commonvoice/asr1	commonvoice/asr1
covost2	covost2
csj	csj
csmsc/tts1	csmsc/tts1
dipco/asr1	dipco/asr1
dirha_wsj/asr1	dirha_wsj/asr1
fisher_callhome_spanish	fisher_callhome_spanish
fisher_swbd/asr1	fisher_swbd/asr1
hkust/asr1	hkust/asr1
how2	how2
hub4_spanish/asr1	hub4_spanish/asr1
iwslt16/mt1	iwslt16/mt1
iwslt18	iwslt18
iwslt19	iwslt19
iwslt21	iwslt21
iwslt21_low_resource	iwslt21_low_resource
jesc/mt1	jesc/mt1
jnas	jnas
jsalt18e2e/asr1	jsalt18e2e/asr1
jsut	jsut
jvs	jvs
ksponspeech/asr1	ksponspeech/asr1
li10/asr1	li10/asr1
li42/asr1	li42/asr1
libri_css/asr1	libri_css/asr1
libri_trans	libri_trans
librispeech	librispeech
librispeech_100/asr1	librispeech_100/asr1
libritts/tts1	libritts/tts1
ljspeech	ljspeech
lrs	lrs
lrs2/asr1	lrs2/asr1
m_ailabs/tts1	m_ailabs/tts1
mboshi_french/st1	mboshi_french/st1
mgb2/asr1	mgb2/asr1
mini_an4	mini_an4
mtedx	mtedx
mucs21_subtask1/asr1	mucs21_subtask1/asr1
mucs21_subtask2/asr1	mucs21_subtask2/asr1
must_c	must_c
must_c_v2	must_c_v2
polyphone_swiss_french/asr1	polyphone_swiss_french/asr1
puebla_nahuatl	puebla_nahuatl
reverb	reverb
ru_open_stt/asr1	ru_open_stt/asr1
swbd/asr1	swbd/asr1
tedlium2	tedlium2
tedlium3/asr1	tedlium3/asr1
timit/asr1	timit/asr1
timit_ssc/ssr1	timit_ssc/ssr1
tweb	tweb
vais1000/tts1	vais1000/tts1
vcc20	vcc20
vivos	vivos
voxforge/asr1	voxforge/asr1
wsj/asr1	wsj/asr1
wsj_mix/asr1	wsj_mix/asr1
yesno	yesno
yoloxochitl_mixtec/asr1	yoloxochitl_mixtec/asr1
README.md	README.md

egs (Examples)

How to use?

See: https://espnet.github.io/espnet/tutorial.html

Overview of example information

Directory name	Corpus name	Task	Language	URL	Note

aesrc2020	Accented English Speech Recognition Challenge 2020	ASR	EN	https://arxiv.org/abs/2102.10233
aidatatang_200zh	Aidatatang_200zh A free Chinese Mandarin speech corpus	ASR	ZH	http://www.openslr.org/62/
aishell	AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus	ASR	ZH	http://www.aishelltech.com/kysjcp
aishell2	AISHELL-2 Open Source Mandarin Speech Corpus	ASR	ZH	http://www.aishelltech.com/aishell_2
ami	The AMI Meeting Corpus	ASR	EN	http://groups.inf.ed.ac.uk/ami/corpus/
an4	CMU AN4 database	ASR/TTS	EN	http://www.speech.cs.cmu.edu/databases/an4/
arctic	CMU ARCTIC databases	TTS, VC	EN, EN -> EN	http://www.festvox.org/cmu_arctic/
aurora4	Aurora-4 database	ASR	EN	http://aurora.hsnr.de/aurora-4.html
babel	IARPA Babel corups	ASR	~20 Languages	https://www.iarpa.gov/index.php/research-programs/babel
blizzard_2017	Blizzard Challenge 2017	TTS	EN	https://www.synsig.org/index.php/Blizzard_Challenge_2017
chime4	The 4th CHiME Speech Separation and Recognition Challenge	ASR/Multichannel ASR	EN	http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/
chime5	The 5th CHiME Speech Separation and Recognition Challenge	ASR	EN	http://spandh.dcs.shef.ac.uk/chime_challenge/CHiME5/index.html
chime6	The 6th CHiME Speech Separation and Recognition Challenge	ASR	EN	https://chimechallenge.github.io/chime6/
cmu_wilderness	CMU Wilderness Multilingual Speech Dataset	Multilingual ASR	~100 Languages	https://github.com/festvox/datasets-CMU_Wilderness
commonvoice	The Mozilla Common Voice	ASR	13 Languages	https://voice.mozilla.org/datasets
covost2	CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus	ASR/Machine Translation/Speech Translation	15+21 Language pairs	https://github.com/facebookresearch/covost
csj	Corpus of Spontaneous Japanese	ASR	JP	https://pj.ninjal.ac.jp/corpus_center/csj/en/
csmsc	Chinese Standard Mandarin Speech Copus	TTS	ZH	https://www.data-baker.com/open_source.html
dipco	Dinner Party Corpus	ASR	EN	https://arxiv.org/abs/1909.13447
dirha_wsj	Distant-speech Interaction for Robust Home Applications	Multi-Array ASR	EN	https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj
fisher_callhome_spanish	Fisher and CALLHOME Spanish--English Speech Translation	ASR/Machine Translation/Speech Translation	ES->EN	https://catalog.ldc.upenn.edu/LDC2014T23
fisher_swbd	Fisher English Training Speech, Switchboard-1 Release 2	ASR	EN	https://catalog.ldc.upenn.edu/LDC2004S13, https://catalog.ldc.upenn.edu/LDC2005S13, https://catalog.ldc.upenn.edu/LDC97S62
hkust	HKUST Mandarin Telephone Speech	ASR	ZH	https://catalog.ldc.upenn.edu/LDC2005S15, https://catalog.ldc.upenn.edu/LDC2005T32
how2	How2: A Large-scale Dataset for Multimodal Language Understanding	ASR/Machine Translation/Speech Translation	EN->PT	https://github.com/srvk/how2-dataset
hub4_spanish	1997 Spanish Broadcast News Speech (HUB4-NE)	ASR	ES	https://catalog.ldc.upenn.edu/LDC98S74, https://catalog.ldc.upenn.edu/LDC98T29
iwslt16	International Workshop on Spoken Language Translation 2016	Machine Translation	EN->DE	https://wit3.fbk.eu/mt.php?release=2016-01
iwslt18	International Workshop on Spoken Language Translation 2018	ASR/Machine Translation/Speech Translation	EN->DE	https://sites.google.com/site/iwsltevaluation2018/Lectures-task
iwslt19	International Workshop on Spoken Language Translation 2019	ASR/Speech Translation	EN->DE	https://sites.google.com/view/iwslt-evaluation-2019/speech-translation
iwslt21	International Workshop on Spoken Language Translation 2021	ASR/Machine Translation/Speech Translation	EN->DE	https://iwslt.org/2021/offline
iwslt21_low_resource	International Workshop on Spoken Language Translation 2021	ASR/Speech Translation	SWA->EN & SWC->FR	https://iwslt.org/2021/low-resource
jesc	Japanese-English Subtitle Corpus	Machine Translation	EN->JP	https://nlp.stanford.edu/projects/jesc/
jnas	ASJ Japanese Newspaper Article Sentences Read Speech Corpus (JNAS)	ASR/TTS	JP	http://research.nii.ac.jp/src/JNAS.html
jsalt18e2e	Multilingual End-to-end ASR for Incomplete Data Benchmark	Multilingual ASR	~20 Languages	https://www.clsp.jhu.edu/workshops/18-workshop/multilingual-end-end-asr-incomplete-data/	babel+
jsut	Japanese speech corpus of Saruwatari-lab., University of Tokyo	ASR/TTS	JP	https://sites.google.com/site/shinnosuketakamichi/publication/jsut
jvs	JVS (Japanese versatile speech) corpus	TTS	JP	https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus
ksponspeech	KsponSpeech (Korean spontaneous speech) corpus	ASR	KR	https://aihub.or.kr/aidata/105
li10	Lanugage-Independent ASR task (10 languages)	Multilingual ASR	~10 Languages	https://www.merl.com/publications/docs/TR2017-182.pdf	csj+hkust+voxforge(7lang)+wsj
li42	Corpora Combination with 42 languages	Multilingual ASR	~42 Languages		aishell+aurora4+babel+chime4+commonvoice+csj+fisher_callhome_spanish+fisher_swbd+hkust+voxforge+wsj
libri_trans	Translation Augmented LibriSpeech Corpus	ASR/Machine Translation/Speech Translation		https://persyval-platform.univ-grenoble-alpes.fr/DS91/detaildataset
librispeech	LibriSpeech ASR corpus	ASR	EN	http://www.openslr.org/12
libritts	LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech	TTS	EN	http://www.openslr.org/60/
ljspeech	The LJ Speech Dataset	TTS	EN	https://keithito.com/LJ-Speech-Dataset/
lrs2	The Lip Reading Sentences 2 Dataset	ASR	ENG	https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html
lrs	The Lip Reading Sentences 2 and 3 Dataset	AVSR	ENG	https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs3.html
m_ailabs	The M-AILABS Speech Dataset	TTS	~5 languages	https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/
mucs_2021	MUCS 2021: MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages	ASR/Code Switching	HI, MR, OR, TA, TE, GU, HI-EN, BN-EN	https://navana-tech.github.io/MUCS2021/data.html
mtedx	Multilingual TEDx	ASR/Machine Translation/Speech Translation	13 Language pairs	http://www.openslr.org/100/
must_c	Must-C Multilingual Speech Translation Corpus	ASR/Machine Translation/Speech Translation	EN->{DE, ES, FR, IT, NL, PT, RO, RU}	https://ict.fbk.eu/must-c/
must_c_v2	Must-C Multilingual Speech Translation Corpus	ASR/Machine Translation/Speech Translation	EN->DE	https://ict.fbk.eu/must-c/ https://iwslt.org/2021/offline	More talks that result in 20k more audio/text segments. Improved cleaning strategies able to better discard low-quality triplets. TED talks of MuST-C v2 were downloaded from the YouTube TED channel.
puebla_nahuatl	The Puebla-Nahuatl Corpus	ASR	Nahuatl	http://www.openslr.org/89
reverb	REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge	ASR	EN	https://reverb2014.dereverberation.com/
ru_open_stt	Russian Open Speech To Text (STT/ASR) Dataset	ASR	RU	https://github.com/snakers4/open_stt
swbd	The Switchboard corpus	ASR	EN	https://catalog.ldc.upenn.edu/LDC97S62
tedlium2	TED-LIUM corpus release 2	ASR	EN	https://www.openslr.org/19/, http://www.lrec-conf.org/proceedings/lrec2014/pdf/1104_Paper.pdf
tedlium3	TED-LIUM corpus release 3	ASR	EN	http://www.openslr.org/51/, https://arxiv.org/pdf/1805.04699
timit	TIMIT Acoustic-Phonetic Continuous Speech Corpus	ASR	EN	https://catalog.ldc.upenn.edu/LDC93S1
timit_ssc	Silent Speech Challenge	ASR	EN	https://catalog.ldc.upenn.edu/LDC93S1/, https://ftp.espci.fr/pub/sigma/	Features extracted from ultra sound image and lip motion video. Train set and test set transcripts are from TIMIT corpus and WSJ corpus respectively
tweb	The World English Bible	TTS	EN	https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset
vais1000	VAIS-1000	TTS	VI	https://ieee-dataport.org/documents/vais-1000-vietnamese-speech-synthesis-corpus
vcc20	Voice Conversion Challenge 2020	VC	EN->{EN, DE, FI, ZH}	http://www.vc-challenge.org/
vivos	VIVOS (Vietnamese corpus for ASR)	ASR	VI	https://doi.org/10.5281/zenodo.7068130
voxforge	VoxForge	ASR	7 languages	http://www.voxforge.org/
wsj	CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete	ASR	EN	https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A
wsj_mix	MERL WSJ0-mix multi-speaker dataset	Multispeaker ASR	EN	http://www.merl.com/demos/deep-clustering
yesno	The "yesno" corpus	ASR	HE	http://www.openslr.org/1
Yoloxóchitl-Mixtec	The Yoloxóchitl-Mixtec corpus	ASR	Mixtec	http://www.openslr.org/89
zeroth_korean	Zeroth-Korean	ASR	KR	http://www.openslr.org/40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

egs

egs

README.md

egs (Examples)

How to use?

Overview of example information

Files

egs

Directory actions

More options

Directory actions

More options

Latest commit

History

egs

Folders and files

parent directory

README.md

egs (Examples)

How to use?

Overview of example information