Changelog

Command line script improvements. Notable changes:

Inference can be done without downloading training data.
Logs can be written to stderr instead of a file if desired.
Able to specify which validation statistics to record (WER, loss).

Corrects initialisation of DeepSpeech forget gate bias.

DeepSpeech and DeepSpeech2 models now default to GreedyCTCDecoder.

Initial random weights are saved to disk.

JupyterLab terminal defaults to bash.

v0.1 Pretrained Weights

Training Setup

Training was done on Google Cloud using one instance per replicated run. Each instance had 8 vCPUs, the data stored on disk, and an nvidia-tesla-v100 accelerator attached.

Training Command

Both models were trained using the same command. Set MODEL=ds to train Deep Speech and MODEL=ds2 to train Deep Speech 2:

deepspeech $MODEL --decoder greedy --n_epochs 15

WER Command

Both models were evaluated using the same command. Note that the script is not currently designed to do inference only - we bypass training by setting n_epochs=0 and avoid downloading a lot of data by setting train_subsets=train-clean-100. A more robust method that avoids downloading any training data will be provided in a later release. Set MODEL=ds to evaluate Deep Speech and MODEL=ds2 to evaluate Deep Speech 2. Set MODEL_PATH to the path of one of the .pt files output during training or to one of those provided below.

deepspeech $MODEL \
           --state_dict_path $MODEL_PATH \
           --no_resume_from_exp_dir \
           --decoder greedy \
           --dev_subsets dev-clean \
           --dev_batch_size 16 \
           --train_subsets train-clean-100 \
           --n_epochs 0

Results

The model's state_dict with lowest dev-clean and dev-other loss during training is provided below for each replicated run (where each replica has a different set of starting weights randomly drawn). Epochs Finished refers to the number of completed training epochs for the corresponding state_dict.

Deep Speech

Replica	Epochs Finished	`dev-clean` WER	Mean time/epoch	`state_dict`
1	9	15.98	4h 47m 55s	ds1_replica-1_8.pt
2	12	15.95	4h 55m 25s	ds1_replica-2_11.pt
3	13	15.85	4h 49m 29s	ds1_replica-3_12.pt
4	15	15.80	4h 53m 56s	ds1_replica-4_14.pt
5	14	15.81	4h 51m 12s	ds1_replica-5_13.pt

Deep Speech 2

Replica	Epochs Finished	`dev-clean` WER	Mean time/epoch	`state_dict`
1	6	15.19	4h 21m 28s	ds2_replica-1_5.pt
2	6	15.73	4h 25m 28s	ds2_replica-2_5.pt
3	9	14.83	4h 45m 31s	ds2_replica-3_8.pt
4	7	14.68	4h 39m 11s	ds2_replica-4_6.pt
5	7	14.96	4h 34m 55s	ds2_replica-5_6.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2

Changelog

Training Setup

Training Command

WER Command

Results

v0.1 Pretrained Weights

Training Setup

Training Command

WER Command

Results

Deep Speech

Deep Speech 2

Releases: MyrtleSoftware/deepspeech

v0.2

v0.2

Changelog

Training Setup

Training Command

WER Command

Results

v0.1

v0.1 Pretrained Weights

Training Setup

Training Command

WER Command

Results

Deep Speech

Deep Speech 2