On Optimal Frontends for Audio Deep Learning: LW-net

Inspired by Le-LWTNet proposal [4], successfully implementing a Lifting wavelet transform approach for feature extraction, and the successful performance of classic MFCCs for the same task [5], we propose to combine these ideas with the filtering approaches presented in [3] and [2] in order to build a new general frontend for audio processing chains that can be used on resource restricted environments.

License

License is contained in LICENSE file, available in the root of this repository.

Installation

Checkout submodules:

git submodule update --init --recursive

Setup and activate the python environment Hint: use pyenv to manage Python versions. Run pyenv local 3.10 before running the next lines. Follow instructions here https://github.com/pyenv/pyenv to set up your shell (zsh, bash).

Dependencies (depend on CUDA version)

FWICU torch version depends on the CUDA version, which you can figure out by calling nvidia-smi. In our case, current machines run CUDA 11.8, this means that you should install torch compiled for that version, this way:

python -m pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 --no-cache-dir

(unclear to me how to generate requirements.txt from requirements.inso that it calls the right torch version...)

Re-generate a requirements.txt

The proper way:

pip-compile -o requirements.txt requirements.in

or

python -m piptools compile requirements.in

From [1]:

Now that you have a requirements.txt, you can use pip-sync to update your virtual environment to reflect exactly what's in there. pip-sync is meant to be used only with a requirements.txt generated by pip-compile

Do:

pip-sync requirements.txt

Training

To run a training:

python wavefront/wavefront_main.py -c wavefront/config_model_dataset.yaml

for example:

python wavefront/wavefront_main.py -c wavefront/config_MFCCNet_ESC50.yaml

To log and look at the progress, type in the bash:

tensorboard --bind_all --logdir lightning_logs

NOTE: To train on TIMIT we have preprocessed the dataset as proposed in [2].

[1] https://pypi.org/project/pip-tools/
[2] Mirco Ravanelli and Yoshua Bengio. Speaker recognition from raw waveform with SincNet. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 1021– 1028, 2018. https://github.com/mravanelli/SincNet
[3] Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, and Marco Tagliasacchi. Leaf: A learnable frontend for audio classification, 2021.
[4] Junchao Fan, Shizhan Tang, Han Duan, Xiuli Bi, Bin Xiao, Weisheng Li, and Xinbo Gao. Le-LWTnet: A learnable lifting wavelet convolutional neural network for heart sound abnormality detection. IEEE Transactions on Instrumentation and Measurement, 72:1–14, 2023.
[5] Vibha Tiwari. Mfcc and its applications in speaker recognition. International journal on emerging technologies, 1(1):19–22, 2010.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
wavefront		wavefront
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt
requirements_linux.txt		requirements_linux.txt
requirements_mac.txt		requirements_mac.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On Optimal Frontends for Audio Deep Learning: LW-net

License

Installation

Dependencies (depend on CUDA version)

Re-generate a requirements.txt

Training

About

Releases

Packages

Languages

License

DolbyLaboratories/audio_dl_wavelet_optimal_frontends

Folders and files

Latest commit

History

Repository files navigation

On Optimal Frontends for Audio Deep Learning: LW-net

License

Installation

Dependencies (depend on CUDA version)

Re-generate a requirements.txt

Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages