Deep Multiclass Audio Classification

Project structure

├── Coursera/
│   ├── soham/
│   │   ├── Coursera Assignments/
│   │   └── Coursera Notes/
│   └── Aanchal/
│       ├── Course1/
│       ├── Course2/
│       └── Course4/
├── EDA/
│   ├── esc-50-explore.ipynb
│   └── esc-preprocess-and-eda.ipynb
├── UI/
│   ├── test/
│   ├── audio_ui.py
│   ├── audio_ui2.py
│   ├── labels.py
│   ├── model.py
│   ├── yamnet.onnx
│   └── yamnet_inference.py
├── mini-projects/
│   ├── Aanchal/
│   │   ├── Audio Classification UrbanSound8k.ipynb
│   │   ├── NN_from_scratch.ipynb
│   │   └── Transfer learning with ResNet-50 cifar10.ipynb
│   └── Soham/
│       ├── Audio Classification UrbanSound8k/
│       ├── Neural-Network-from-scratch/
│       └── Transfer-learning-cifar10/
├── resnets_and_efficientnets/
│   ├── esc-dataset.ipynb
│   ├── esc-model1_2024-08-20_18-11-09.pth
│   ├── esc-transfer-learn.ipynb
│   ├── esc-transfer-learning2.ipynb
│   └── esc-utils.ipynb
├── yamnet/
│   ├── esc-dataset.ipynb
│   ├── esc-dataset2.xpynb
│   ├── esc-model1_20/
│   ├── esc-utils.ipynb
│   ├── esc-utils3.xpynb
│   ├── esc-yamnet.ipynb
│   ├── escyamnetdataset.xpynb
│   ├── getyamnet.xpynb
│   ├── yamnet-load.xpynb
│   └── yamnet.ipynb
├── LICENSE
└── README.md

Introduction

This project focuses on developing a robust audio classifier that processes user-provided audio files and accurately identifies the category or class to which the audio belongs.

Description

This project seeks to create a cutting-edge audio classification system capable of sorting diverse audio inputs, including speech, music, and environmental sounds.
We used 2 approaches for this project, which are as follows,

Convolutional Neural Networks (CNNs)
Transfer learning (YAMNet, ResNet50, EfficientNET )

WhatsApp.Video.2024-10-18.at.23.40.22.1.mp4

Tech Stack

Python
Pytorch
Kaggle

Contributors

Aanchal Borse
Soham Rane

Future Prospects

Hate Speech Detection in low-Resource Languages
Audio based Security Systems
Environmental Monitoring

Resources

Audio processing by Valerio Valerdo

Coursera course on Deep learning by Andrew Ng and Younes Bensouda Mourri

Pytorch playlist by Patrick Leober

Datasets used are as follows,

ESC-50 dataset
CIFAR 10 dataset
Urban Sound 8k

Acknowledgement

Special thanks to COC VJTI for ProjectX 2024

Special Thanks to our mentors Kshitij Shah and Param Thakkar who guided us throughout our project journey.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deep Multiclass Audio Classification

Project structure

Table of Contents

Introduction

Description

Tech Stack

Contributors

Future Prospects

Resources

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deep Multiclass Audio Classification

Project structure

Table of Contents

Introduction

Description

Tech Stack

Contributors

Future Prospects

Resources

Acknowledgement