ctc-decode-asr

Tensorflow Automatic Speech Recognition (ASR) starter model to learn about end to end ASR and CTC decoding.

Most of this work began from the following github page: https://github.com/apoorvnandan/speech-recognition-primer

Additionally, as I was learning, I wanted to see how Tensorflow APIs were used for decoding, so I mostly fork lifted the following OCR code here: https://keras.io/examples/vision/captcha_ocr/

The code right now requires tensorflow and keras.

To run it:

python asr.py

The input transcript (for training) is the following:
MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL

The output will be a over-fit model to a specific audio file sample.wav from LibreSpeech corpus (5s). The output will look something like this:
Epoch 100/100
1/1 [==============================] - 1s 516ms/sample - loss: 3.4581

['mister quilter is the apostle of the middle classes and we are glad to welcome his gospel>']

You can use beam search in various ways using the decode_batch_predictions function.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
DALILineLengthHistogram.png		DALILineLengthHistogram.png
LICENSE		LICENSE
README.md		README.md
asr-dali-hello-world.py		asr-dali-hello-world.py
asr_hello_world.py		asr_hello_world.py
asr_hello_world_model.h5		asr_hello_world_model.h5
sample.wav		sample.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ctc-decode-asr

About

Releases

Packages

Contributors 2

Languages

License

scaperot/ctc-decode-asr

Folders and files

Latest commit

History

Repository files navigation

ctc-decode-asr

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages