Skip to content

DolbyLaboratories/audio_dl_wavelet_optimal_frontends

Repository files navigation

On Optimal Frontends for Audio Deep Learning: LW-net

Inspired by Le-LWTNet proposal [4], successfully implementing a Lifting wavelet transform approach for feature extraction, and the successful performance of classic MFCCs for the same task [5], we propose to combine these ideas with the filtering approaches presented in [3] and [2] in order to build a new general frontend for audio processing chains that can be used on resource restricted environments.

License

License is contained in LICENSE file, available in the root of this repository.

Installation

  1. Checkout submodules:
git submodule update --init --recursive
  1. Setup and activate the python environment Hint: use pyenv to manage Python versions. Run pyenv local 3.10 before running the next lines. Follow instructions here https://github.com/pyenv/pyenv to set up your shell (zsh, bash).

Dependencies (depend on CUDA version)

FWICU torch version depends on the CUDA version, which you can figure out by calling nvidia-smi. In our case, current machines run CUDA 11.8, this means that you should install torch compiled for that version, this way:

python -m pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 --no-cache-dir

(unclear to me how to generate requirements.txt from requirements.inso that it calls the right torch version...)

Re-generate a requirements.txt

The proper way:

pip-compile -o requirements.txt requirements.in

or

python -m piptools compile requirements.in

From [1]:

Now that you have a requirements.txt, you can use pip-sync to update your virtual environment to reflect exactly what's in there. pip-sync is meant to be used only with a requirements.txt generated by pip-compile

Do:

pip-sync requirements.txt

Training

To run a training:

python wavefront/wavefront_main.py -c wavefront/config_model_dataset.yaml

for example:

python wavefront/wavefront_main.py -c wavefront/config_MFCCNet_ESC50.yaml

To log and look at the progress, type in the bash:

tensorboard --bind_all --logdir lightning_logs

NOTE: To train on TIMIT we have preprocessed the dataset as proposed in [2].

[1] https://pypi.org/project/pip-tools/
[2] Mirco Ravanelli and Yoshua Bengio. Speaker recognition from raw waveform with SincNet. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 1021– 1028, 2018. https://github.com/mravanelli/SincNet
[3] Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, and Marco Tagliasacchi. Leaf: A learnable frontend for audio classification, 2021.
[4] Junchao Fan, Shizhan Tang, Han Duan, Xiuli Bi, Bin Xiao, Weisheng Li, and Xinbo Gao. Le-LWTnet: A learnable lifting wavelet convolutional neural network for heart sound abnormality detection. IEEE Transactions on Instrumentation and Measurement, 72:1–14, 2023.
[5] Vibha Tiwari. Mfcc and its applications in speaker recognition. International journal on emerging technologies, 1(1):19–22, 2010.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages