Streaming ww dev (#162)

* moved training_torch to experimental and added a README * starting to move code into here * some updates to streaming wakeword * updated streaming wakeword model to be the actual candidate DS-TCN model w/ no residual layers * set default features to 40-D LFBEs * changed num_classes to 3 in train.py * demo notebook (in progress) added * demo notebook runs through small training run * added count_labels and is_batched() * demo now adds some silence waveforms (which then have noise added) to training dataset * updated get_dataset (mostly copied from demo.ipynb) and removed use_sam part from train.py * fixed default model architecture flag * fixed some issues with building model * added from_logits argument to model.compile * cleanup changes to get_dataset and demo notebook * catching up on edits * keras_model does not need tf datasets module * cleaning up demo notebook * cleaning up demo notebook * made path to speech commands dataset easier to config per location/user without upsetting git * beginning of code to test long waveform in python * some updates * added option to read in model config file * set validation set to incorporate background noise. also fixed issue where background_volume option was being ignored * moved code to add silent (or white noise) frames to dataset into its function. Applies to val set now, but still not test set * fixed argument error * added post-training quantization * several changes in order to use QAT and evaluate on long waveforms: Replaced flatten with reshape in order to preserve time duration. Adding empty noise frames and duplicates of the wakeword to validationset (instead of just training) number of duplicates and noise level are now command line flags and are separate for training and validation sets. Input shape is now (batch, time, 1, features) (None,49,1,40) instead of just (None,49,40) to avoid the extra expand_dims layer, which caused problems for QAT. * notebook updated to work with last commits on get_data, keras_model * changed default LR schedule to reduce_on_plateau so it scales better with more epochs * some more edits to get QAT working * changed labels to one-hot to work with precision/recall metrics. Also changed feature extractor code in dataset preparation to optionally work in a standalone model. * added notebook to develop tflite model for feature extraction * removed some prints from get_dataset. added an evaluation to train * adjusted reduce lr on plateau settings * fixed plotting error * working on different options to run the feature extractor on MCU * small changes to notebook * removed old commented-out code that loaded pre-built dataset * tflite_feature_extractor.ipynb very much a work in progress * added setup instructions and a to the streaming wakeword benchmark (#155) * cache datasets after spectrogram computation to avoid recomputing them at every epoch * fixed data_dir default to point to speech_commands_v0.02 * fixed data_dir default to point to speech_commands_v0.02 * added BooleanOptionalAction to correctly parse boolean Flags * fixed parsing of bool args (use_qat, run_test_set) to work with python 3.8 * changed so parse_command raises exception on unrecognized flags * changed so parse_command raises exception on unrecognized flags * added foreground scaling args foreground_volume_min, _max to train on quiter wakewords. Also changed defaults for data/background paths (to HOME/speech_commands_v0.02) to align with default filename * set is_training true for ds_val so it gets noise added * edits to str ww model * edits to data set building * saved training history along with plot * removed average pooling, increased initial feature stride * Fixed bug where np.random is only evaluated at graph creation, so all foregrounds are scaled by the same amount. Also added condition so empty frames are not added to calibration set. * fixed several places where np.random was used in a tf graph, resulting in the same value being used for the whole dataset * widened filters in 2nd,3rd layers to 128 * changed back from 32 LFBEs to 40 * minor cleanup -- whitespace, removing old commented out lines, etc. * fixed error - val set was using target words from training set * minor cleanup -- whitespace, removing old commented out lines, etc. * changed ordering in data prep, now shuffle before batching * adding current version of trained and quantized streaming ww model * minor edits/cleanup * changed Flags.num_train_samples to num_samples_training. same for test, validation. refactoring get_dataset code * added 1st pass at get_data_config(), refactoring dataset build * refactored dataset building. train.py runs now, have not tested performance * setup_example is work in progress, just capturing progress * train.py runs but gives random-level validation accuracy. demo notebook fails * flag parsing used 'train' instead of 'training' and therefore was not shuffling the training set * updated demo to match changes in data * minor updates * dumps options as json into plot_dir * fixed demo to work with new get_data code. moved take after shuffle so subsets are correctly mixed. but makes even runs on a small subset of training data slow, because shuffle runs on everything * moved softmax to inside the model; adjusted loss function accordingly * moved softmax calculation into the model * working on true pos/false pos computation * fixed error, post-wakeword extension was being added twice * fixing notebook counting of true/false positives * removed commented-out code; added zero2nan() * added multiple background noise paths, can split long bg files into smaller chunks, added clip_len_samples arg to prepare_bg_data * change QAT initial LR to Flags.lr, LR is too small after float pre-training * fixed cmd line arg processing to accomodate multile bg noise paths * removed commented out code from demo notebook * convert only-target dataset to numpy array and back so cardinality() works * refactored num_silent, num_repeats in to fraction_silent and fraction_target to make varied-length experiments work better * fixed cmd line arg processing to accomodate multile bg noise paths * fixing code for smaller datasets * catching up on demo edits * added code to run quantized model on long waveform * working on long wav file creation; added poisson process to place wake words * updated long wave creation, need to move it to a separate file soon. moved get_true_and_false_detections() to util * increased number of background files from 50 to 100 * added code to illustrate false detects/rejects * updating background noise creation to avoid train/val duplicates * added exclude_background_files.txt * put code to build the long test wav into its own (two) files * added eval_long_wav.py to test fpr, fnr on a long wav * made build_long_wav work with the musan_path from streaming_config.json * made build_long_wav work with the musan_path from streaming_config.json * fixed a typo * fixed issue with path construction in long wav spec * added l2 reg to conv layers * added L2 reg to conv layers * removed some old commented-out code * eval_long_wav can now test either h5 models or tflite models * added script to create indices into the val set for calibration * code to create calibration set is working * fixed quantize.py to work with extracted npz calibration set * adjusted volume of foreground and background for testing * added code to save spectrogram in build_long_wav.py * demo notebook should now work with current code * separated augmentation (built by get_augment_wavs_func()) and feature extractor (from get_lfbe_func()) * made l2 reg parameter a commmand line flag * fixed eval_long_wav to work with feature extractor changes * added validation set measurments to eval_long_wav.py * moved eval_long_wav to evaluate.py * added threshold=0.95 to precision/recall metrics to match evaluate.py * added a list of 'bad' marvin wav files. modified build_long_wav_spec and get_dataset to exclude them * edited comment on saved_model_path to reflect evaluate.py * added bad_marvin_files.txt * fixed error in number of unknown samples for reduced runs * renamed build_long_wav_spec.py -> build_long_wav_def.py to avoid ambiguity of spec (specification vs spectrogram) * renamed features back to audio to allow easy skipping of feature extraction * minor edits * removed debug print statement * fixed code for tflite models * adjusted some default training params * catching notebook up to other code * clearing out some debug prints * added trained model * moved label_count out of model_settings into a flag * minor edits * added random timeshift * added a couple more bad marvins to exclude * added flag to enforce a minimum SNR * centralized data paths in streaming_config.json (no command line arguments needed or observed) * removed some obsolete cmd line args and modified get_dataset to respect time_shift_ms * fixed evaulate.py to work with changes on speech_commands_path * changed evaluate and quantize to use model_init_path, so by default the reference model will still be used * adjusted trainign params * adjusted trainign params * updated long wav info * updated README * updated reference model * removed some info messages * add line to create plot_dir if it does not exist * reduced noise level in long wav * refactored command line argument parsing * refactored command line argument parsing * fixed some errors in README * fixed quantize to use saved_model_path instead of model_init_path * added calibration_samples.npz * fixing argument processing for evaluate.py to work with either tflite or h5 model * fixed typo in evaluate.py * fixed typo in evaluate * updated tflite model * fixed issue with plot_dir * ignoring trained models other than reference model * updated readme * updated demo notebook * added note about the demo notebook to readme --------- Co-authored-by: Alexander Montgomerie-Corcoran <[email protected]> Co-authored-by: Peter Chang <[email protected]>
mlcommons · Sep 9, 2024 · 0c6a990 · 0c6a990
1 parent bbe26e8
commit 0c6a990
Show file tree

Hide file tree

Showing 32 changed files with 10,260 additions and 2 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,11 @@
 *.idea*
 *__pycache__*
+*~
+tmp*
+tmp/
+.tmp*
+**/.DS_Store
+**/.ipynb_checkpoints/
 venv
 .python-version
+
diff --git a/benchmark/api/internally_implemented.h b/benchmark/api/internally_implemented.h
@@ -31,6 +31,7 @@ The file name has been changed and some functions removed.
 #define EE_MODEL_VERSION_VWW01 "vww01"
 #define EE_MODEL_VERSION_AD01 "ad01"
 #define EE_MODEL_VERSION_IC01 "ic01"
+#define EE_MODEL_VERSION_STRWW01 "strww01"
 
 typedef enum { EE_ARG_CLAIMED, EE_ARG_UNCLAIMED } arg_claimed_t;
 typedef enum { EE_STATUS_OK = 0, EE_STATUS_ERROR } ee_status_t;

diff --git a/benchmark/reference_submissions/keyword_spotting/setup_example.sh b/benchmark/reference_submissions/keyword_spotting/setup_example.sh
@@ -29,8 +29,8 @@ else
     wget https://github.com/tensorflow/tensorflow/archive/$TF_VERSION.zip
     unzip -o $TF_VERSION.zip
     pushd tensorflow-*	# we can't use TF_VERSION here, as github seems not to be consistent with naming (v2.3.1 vs 2.3.1) 
-    gmake -f tensorflow/lite/micro/tools/make/Makefile TAGS=$TF_MAKE_TAGS third_party_downloads
-    gmake -f tensorflow/lite/micro/tools/make/Makefile TAGS=$TF_MAKE_TAGS generate_hello_world_mbed_project -j18
+    LC_ALL=C gmake -f tensorflow/lite/micro/tools/make/Makefile TAGS=$TF_MAKE_TAGS third_party_downloads
+    LC_ALL=C gmake -f tensorflow/lite/micro/tools/make/Makefile TAGS=$TF_MAKE_TAGS generate_hello_world_mbed_project -j18
     mv -n tensorflow/lite/micro/tools/make/gen/*/prj/hello_world/mbed/* ../
     popd
     rm -rf tensorflow-*

diff --git a/benchmark/reference_submissions/streaming_wakeword/setup_example.sh b/benchmark/reference_submissions/streaming_wakeword/setup_example.sh
@@ -0,0 +1,47 @@
+# define TensorFlow version as git branch/tag/hash
+TF_VERSION=v2.3.1
+
+# enable CMSIS-NN
+TF_MAKE_TAGS="cmsis-nn"
+
+if [ "$1" == "clean" ]; then
+  rm  api/internally*
+  rm -rf main.cpp
+  rm -rf util
+  rm -rf tensorflow-master.zip
+  rm -rf tensorflow
+  rm -rf mbed-os
+  rm -rf mbed_settings*
+  rm -rf master*
+  rm -rf tensorflow-master
+  rm -f CMakeLists.txt
+  rm -rf BUILD
+  rm -rf third_party
+  rm -f LICENSE
+  rm -f README_MBED.md
+  rm -rf __pycache__
+
+else
+
+#cd $(dirname $0)
+  TFMICRO_DIR=tensorflow
+  if [ ! -f "$TFMICRO_DIR" ]; then
+    wget https://github.com/tensorflow/tensorflow/archive/$TF_VERSION.zip
+    unzip -o $TF_VERSION.zip
+    pushd tensorflow-*	# we can't use TF_VERSION here, as github seems not to be consistent with naming (v2.3.1 vs 2.3.1) 
+    LC_ALL=C gmake -f tensorflow/lite/micro/tools/make/Makefile TAGS=$TF_MAKE_TAGS third_party_downloads
+    LC_ALL=C gmake -f tensorflow/lite/micro/tools/make/Makefile TAGS=$TF_MAKE_TAGS generate_hello_world_mbed_project -j18
+    mv -n tensorflow/lite/micro/tools/make/gen/*/prj/hello_world/mbed/* ../
+    popd
+    rm -rf tensorflow-*
+    rm -rf tensorflow/lite/micro/examples/hello_world
+  fi
+
+  mbed config root .
+  mbed deploy
+  cp ../../api/internally* api/
+  cp ../../main.cpp .
+  cp -r ../../util .
+
+fi
+
diff --git a/benchmark/reference_submissions/streaming_wakeword/str_ww/strww_model_data.cc b/benchmark/reference_submissions/streaming_wakeword/str_ww/strww_model_data.cc
diff --git a/benchmark/reference_submissions/streaming_wakeword/submitter_implemented.cpp b/benchmark/reference_submissions/streaming_wakeword/submitter_implemented.cpp
@@ -0,0 +1,184 @@
+/*
+Copyright 2020 EEMBC and The MLPerf Authors. All Rights Reserved.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+This file reflects a modified version of th_lib from EEMBC. The reporting logic
+in th_results is copied from the original in EEMBC.
+==============================================================================*/
+/// \file
+/// \brief C++ implementations of submitter_implemented.h
+
+#include "api/submitter_implemented.h"
+
+#include <cstdarg>
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+
+#include "api/internally_implemented.h"
+#include "mbed.h"
+#include "tensorflow/lite/micro/kernels/micro_ops.h"
+#include "tensorflow/lite/micro/micro_error_reporter.h"
+#include "tensorflow/lite/micro/micro_interpreter.h"
+#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
+#include "tensorflow/lite/schema/schema_generated.h"
+#include "util/quantization_helpers.h"
+#include "util/tf_micro_model_runner.h"
+#include "str_ww/strww_input_data.h"
+#include "str_ww/strww_model_data.h"
+#include "str_ww/strww_model_settings.h"
+
+UnbufferedSerial pc(USBTX, USBRX);
+DigitalOut timestampPin(D7);
+
+constexpr int kTensorArenaSize = 200 * 1024;
+alignas(16) uint8_t tensor_arena[kTensorArenaSize];
+
+tflite::MicroModelRunner<int8_t, int8_t, 6> *runner;
+
+// Implement this method to prepare for inference and preprocess inputs.
+void th_load_tensor() {
+  int8_t input[kStrwwInputSize];
+
+  size_t bytes = ee_get_buffer(reinterpret_cast<uint8_t *>(input),
+                               kStrwwInputSize * sizeof(int8_t));
+  if (bytes / sizeof(int8_t) != kStrwwInputSize) {
+    th_printf("Input db has %d elemented, expected %d\n", bytes / sizeof(int8_t),
+              kStrwwInputSize);
+    return;
+  }
+  runner->SetInput(input);
+}
+
+
+// Add to this method to return real inference results.
+void th_results() {
+  /**
+   * The results need to be printed back in exactly this format; if easier
+   * to just modify this loop than copy to results[] above, do that.
+   */
+  th_printf("m-results-[");
+  int kCategoryCount = 12;
+
+  for (size_t i = 0; i < kCategoryCount; i++) {
+    float converted =
+        DequantizeInt8ToFloat(runner->GetOutput()[i], runner->output_scale(),
+                              runner->output_zero_point());
+
+	// Some platforms don't implement floating point formatting.
+    th_printf("0.%d", static_cast<int>(converted * 10));
+    th_printf("%d", static_cast<int>(converted * 100) % 10);
+    th_printf("%d", static_cast<int>(converted * 1000) % 10);
+
+    if (i < (kCategoryCount - 1)) {
+      th_printf(",");
+    }
+  }
+  th_printf("]\r\n");
+}
+
+// Implement this method with the logic to perform one inference cycle.
+void th_infer() { runner->Invoke(); }
+
+/// \brief optional API.
+void th_final_initialize(void) {
+  static tflite::MicroMutableOpResolver<6> resolver;
+  resolver.AddFullyConnected();
+  resolver.AddConv2D();
+  resolver.AddDepthwiseConv2D();
+  resolver.AddReshape();
+  resolver.AddSoftmax();
+  resolver.AddAveragePool2D();
+
+  static tflite::MicroModelRunner<int8_t, int8_t, 6> model_runner(
+         g_strww_model_data, resolver, tensor_arena, kTensorArenaSize);
+  runner = &model_runner;
+}
+
+void th_pre() {}
+void th_post() {}
+
+void th_command_ready(char volatile *p_command) {
+  p_command = p_command;
+  ee_serial_command_parser_callback((char *)p_command);
+}
+
+// th_libc implementations.
+int th_strncmp(const char *str1, const char *str2, size_t n) {
+  return strncmp(str1, str2, n);
+}
+
+char *th_strncpy(char *dest, const char *src, size_t n) {
+  return strncpy(dest, src, n);
+}
+
+size_t th_strnlen(const char *str, size_t maxlen) {
+  return strnlen(str, maxlen);
+}
+
+char *th_strcat(char *dest, const char *src) { return strcat(dest, src); }
+
+char *th_strtok(char *str1, const char *sep) { return strtok(str1, sep); }
+
+int th_atoi(const char *str) { return atoi(str); }
+
+void *th_memset(void *b, int c, size_t len) { return memset(b, c, len); }
+
+void *th_memcpy(void *dst, const void *src, size_t n) {
+  return memcpy(dst, src, n);
+}
+
+/* N.B.: Many embedded *printf SDKs do not support all format specifiers. */
+int th_vprintf(const char *format, va_list ap) { return vprintf(format, ap); }
+void th_printf(const char *p_fmt, ...) {
+  va_list args;
+  va_start(args, p_fmt);
+  (void)th_vprintf(p_fmt, args); /* ignore return */
+  va_end(args);
+}
+
+char th_getchar() { return getchar(); }
+
+void th_serialport_initialize(void) {
+# if EE_CFG_ENERGY_MODE==1
+  pc.baud(9600);
+# else
+  pc.baud(115200);
+# endif
+}
+
+void th_timestamp(void) {
+# if EE_CFG_ENERGY_MODE==1
+  timestampPin = 0;
+  for (int i=0; i<100'000; ++i) {
+    asm("nop");
+  }
+  timestampPin = 1;
+# else
+  unsigned long microSeconds = 0ul;
+  /* USER CODE 2 BEGIN */
+  microSeconds = us_ticker_read();
+  /* USER CODE 2 END */
+  /* This message must NOT be changed. */
+  th_printf(EE_MSG_TIMESTAMP, microSeconds);
+# endif  
+}
+
+void th_timestamp_initialize(void) {
+  /* USER CODE 1 BEGIN */
+  // Setting up BOTH perf and energy here
+  /* USER CODE 1 END */
+  /* This message must NOT be changed. */
+  th_printf(EE_MSG_TIMESTAMP_MODE);
+  /* Always call the timestamp on initialize so that the open-drain output
+     is set to "1" (so that we catch a falling edge) */
+  th_timestamp();
+}
diff --git a/benchmark/training/keyword_spotting/.gitignore b/benchmark/training/keyword_spotting/.gitignore
@@ -6,3 +6,4 @@ __pycache__/
 Untitled*
 aww_model.tflite
 .ipynb_checkpoints/
+plots/
diff --git a/benchmark/training/streaming_wakeword/.gitignore b/benchmark/training/streaming_wakeword/.gitignore
@@ -0,0 +1,5 @@
+.cache/
+.ipynb_checkpoints/
+streaming_config.json
+plots/
+trained_models/str_ww_model.h5
diff --git a/benchmark/training/streaming_wakeword/README.md b/benchmark/training/streaming_wakeword/README.md
@@ -0,0 +1,87 @@
+## In progress -- development of streaming wakeword benchmark
+
+## Setup
+
+1. Make sure you have enough disk space.  Speech commands and MUSAN together take about 16GB of space, but you'll need an additional 12GB while you untar MUSAN.
+
+2. Download and unpack the `speech_commands` dataset.  You may already have it for the keyword-spotting benchmark. I typically place this under a `~/data/` folder, but you can put it wherever you like, as long as you edit `streaming_config.json` accordingly (below), and replace `~/data/` with the correct path in the commands below.
+```
+cd ~/data/
+wget http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz
+mkdir speech_commands_v0.02
+cd speech_commands_v0.02
+tar -xzvf speech_commands_v0.02.tar.gz
+```
+
+2. Download and unpack the (MUSAN)[https://www.openslr.org/17/] noise dataset. 
+```
+cd ~/data/
+wget https://openslr.elda.org/resources/17/musan.tar.gz
+tar -xzvf musan.tar.gz
+```
+
+3. Setup a conda environment and install the required packages.
+```
+conda create -n tiny python=3.11 -y
+conda activate tiny
+python -m pip install -r requirements.txt
+```
+
+4. Copy `streaming_config_template.json` to `streaming_config.json` and edit it to match the paths where you saved the speech commands and musan datasets.
+```
+cp streaming_config_template.json streaming_config.json
+```
+Edit `streaming_config.json` to point to the paths where you have the speech commands and MUSAN datasets.
+```
+{
+  "speech_commands_path":"/path/to/data/speech_commands_v0.02/",
+  "musan_path":"/path/to/data/musan"
+}
+```
+
+## Evaluation
+To evaluate the pretrained model on the reference model (or at least whatever is in `trained_models/str_ww_ref_model.h5`), run
+```
+python evaluate.py --saved_model_path=trained_models/str_ww_ref_model.h5
+```
+The argument `saved_model_path` is required; there is no default. If you want to evaluate another model, replace the argument to the `saved_model_path` flag.
+
+On the reference model, you should see something like this:
+
+```
+Input shape = [None, 1, 40]
+Long waveform shape = (19200000,), spectrogram shape = (37499, 1, 40)
+Building dataset with 2796 targets, 1398 silent, and 9786 other.
+time shift should range from -1600 to 1600
+280/280 [==============================] - 20s 67ms/step - loss: 0.1984 - categorical_accuracy: 0.9607 - precision: 0.9948 - recall: 0.7500    
+Results: false_detections=4, true_detections=40, false_rejections=10,val_loss=0.1984, val_acc=0.9607, val_precision=0.9948, val_recall=0.7500
+```
+
+## Quantization
+To quantize and convert a trained model into a TFlite model, run this line.
+```
+python quantize.py --saved_model_path=trained_models/str_ww_ref_model.h5
+```
+As with `evaluate.py`, `saved_model_path` is required and has no default.
+
+After quantization, you can evaluate the quantized model with:
+```
+python evaluate.py --use_tflite_model --tfl_file_name=trained_models/strm_ww_int8.tflite
+```
+You should see something like this:
+```
+INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
+Long waveform shape = (19200000,), spectrogram shape = (37499, 1, 40)
+Building dataset with 2796 targets, 1398 silent, and 9786 other.
+Results: false_detections=3, true_detections=40, false_rejections=10,
+```
+
+## Training
+To train a model, you can use the `train.py` script.  Note that training (including fine-tuning, retraining) is not permitted for closed-division submissions.  The following command line will run a greatly reduced training run, mostly useful for checking that you have the correct file structure and a working installation of the required libraries.  It will use about 1% of the standard dataset and train for 3 epochs with standard floating-point computation, followed by 2 epochs using quantization-aware training.
+
+```
+python train.py --num_samples_training=1000 --num_samples_validation=1000 --epochs=5 --pretrain_epochs=3
+```
+
+## Demonstration notebook
+You can also run the jupyter notebook `demo.ipynb` to get a feel for the data, see the main processes, and visualize some of the data, including a closer look at data where the model makes mistakes.