Skip to content
This repository has been archived by the owner on Apr 29, 2021. It is now read-only.

Releases: MyrtleSoftware/deepspeech

v0.2

20 Nov 16:19
Compare
Choose a tag to compare

v0.2

Changelog

  • Command line script improvements. Notable changes:
    • Inference can be done without downloading training data.
    • Logs can be written to stderr instead of a file if desired.
    • Able to specify which validation statistics to record (WER, loss).
  • Corrects initialisation of DeepSpeech forget gate bias.
  • DeepSpeech and DeepSpeech2 models now default to GreedyCTCDecoder.
  • Initial random weights are saved to disk.
  • JupyterLab terminal defaults to bash.

Training Setup

As per v0.1.

Training Command

As per v0.1 but Deep Speech is now selected by setting MODEL=ds1.

WER Command

As per v0.1 but training is now skipped by setting --train_subsets (i.e. no subsets specified) or --n_epochs 0 (this will avoid downloading the training subsets).

Results

As per v0.1. The pretrained weights are compatible with this release.

v0.1

07 Nov 14:50
7ace38a
Compare
Choose a tag to compare

v0.1 Pretrained Weights

Training Setup

Training was done on Google Cloud using one instance per replicated run. Each instance had 8 vCPUs, the data stored on disk, and an nvidia-tesla-v100 accelerator attached.

Training Command

Both models were trained using the same command. Set MODEL=ds to train Deep Speech and MODEL=ds2 to train Deep Speech 2:

deepspeech $MODEL --decoder greedy --n_epochs 15

WER Command

Both models were evaluated using the same command. Note that the script is not currently designed to do inference only - we bypass training by setting n_epochs=0 and avoid downloading a lot of data by setting train_subsets=train-clean-100. A more robust method that avoids downloading any training data will be provided in a later release. Set MODEL=ds to evaluate Deep Speech and MODEL=ds2 to evaluate Deep Speech 2. Set MODEL_PATH to the path of one of the .pt files output during training or to one of those provided below.

deepspeech $MODEL \
           --state_dict_path $MODEL_PATH \
           --no_resume_from_exp_dir \
           --decoder greedy \
           --dev_subsets dev-clean \
           --dev_batch_size 16 \
           --train_subsets train-clean-100 \
           --n_epochs 0 

Results

The model's state_dict with lowest dev-clean and dev-other loss during training is provided below for each replicated run (where each replica has a different set of starting weights randomly drawn). Epochs Finished refers to the number of completed training epochs for the corresponding state_dict.

Deep Speech

Replica Epochs Finished dev-clean WER Mean time/epoch state_dict
1 9 15.98 4h 47m 55s ds1_replica-1_8.pt
2 12 15.95 4h 55m 25s ds1_replica-2_11.pt
3 13 15.85 4h 49m 29s ds1_replica-3_12.pt
4 15 15.80 4h 53m 56s ds1_replica-4_14.pt
5 14 15.81 4h 51m 12s ds1_replica-5_13.pt

Deep Speech 2

Replica Epochs Finished dev-clean WER Mean time/epoch state_dict
1 6 15.19 4h 21m 28s ds2_replica-1_5.pt
2 6 15.73 4h 25m 28s ds2_replica-2_5.pt
3 9 14.83 4h 45m 31s ds2_replica-3_8.pt
4 7 14.68 4h 39m 11s ds2_replica-4_6.pt
5 7 14.96 4h 34m 55s ds2_replica-5_6.pt