Releases: MyrtleSoftware/deepspeech
v0.2
v0.2
Changelog
- Command line script improvements. Notable changes:
- Inference can be done without downloading training data.
- Logs can be written to stderr instead of a file if desired.
- Able to specify which validation statistics to record (WER, loss).
- Corrects initialisation of
DeepSpeech
forget gate bias. DeepSpeech
andDeepSpeech2
models now default toGreedyCTCDecoder
.- Initial random weights are saved to disk.
- JupyterLab terminal defaults to bash.
Training Setup
As per v0.1.
Training Command
As per v0.1 but Deep Speech is now selected by setting MODEL=ds1
.
WER Command
As per v0.1 but training is now skipped by setting --train_subsets
(i.e. no subsets specified) or --n_epochs 0
(this will avoid downloading the training subsets).
Results
As per v0.1. The pretrained weights are compatible with this release.
v0.1
v0.1 Pretrained Weights
Training Setup
Training was done on Google Cloud using one instance per replicated run. Each instance had 8 vCPUs, the data stored on disk, and an nvidia-tesla-v100
accelerator attached.
Training Command
Both models were trained using the same command. Set MODEL=ds
to train Deep Speech and MODEL=ds2
to train Deep Speech 2:
deepspeech $MODEL --decoder greedy --n_epochs 15
WER Command
Both models were evaluated using the same command. Note that the script is not currently designed to do inference only - we bypass training by setting n_epochs=0
and avoid downloading a lot of data by setting train_subsets=train-clean-100
. A more robust method that avoids downloading any training data will be provided in a later release. Set MODEL=ds
to evaluate Deep Speech and MODEL=ds2
to evaluate Deep Speech 2. Set MODEL_PATH
to the path of one of the .pt
files output during training or to one of those provided below.
deepspeech $MODEL \
--state_dict_path $MODEL_PATH \
--no_resume_from_exp_dir \
--decoder greedy \
--dev_subsets dev-clean \
--dev_batch_size 16 \
--train_subsets train-clean-100 \
--n_epochs 0
Results
The model's state_dict
with lowest dev-clean
and dev-other
loss during training is provided below for each replicated run (where each replica has a different set of starting weights randomly drawn). Epochs Finished refers to the number of completed training epochs for the corresponding state_dict
.
Deep Speech
Replica | Epochs Finished | dev-clean WER |
Mean time/epoch | state_dict |
---|---|---|---|---|
1 | 9 | 15.98 | 4h 47m 55s | ds1_replica-1_8.pt |
2 | 12 | 15.95 | 4h 55m 25s | ds1_replica-2_11.pt |
3 | 13 | 15.85 | 4h 49m 29s | ds1_replica-3_12.pt |
4 | 15 | 15.80 | 4h 53m 56s | ds1_replica-4_14.pt |
5 | 14 | 15.81 | 4h 51m 12s | ds1_replica-5_13.pt |
Deep Speech 2
Replica | Epochs Finished | dev-clean WER |
Mean time/epoch | state_dict |
---|---|---|---|---|
1 | 6 | 15.19 | 4h 21m 28s | ds2_replica-1_5.pt |
2 | 6 | 15.73 | 4h 25m 28s | ds2_replica-2_5.pt |
3 | 9 | 14.83 | 4h 45m 31s | ds2_replica-3_8.pt |
4 | 7 | 14.68 | 4h 39m 11s | ds2_replica-4_6.pt |
5 | 7 | 14.96 | 4h 34m 55s | ds2_replica-5_6.pt |