Skip to content

Latest commit

 

History

History
125 lines (92 loc) · 3.79 KB

File metadata and controls

125 lines (92 loc) · 3.79 KB

Deep Multiclass Audio Classification

Project structure

├── Coursera/
│   ├── soham/
│   │   ├── Coursera Assignments/
│   │   └── Coursera Notes/
│   └── Aanchal/
│       ├── Course1/
│       ├── Course2/
│       └── Course4/
├── EDA/
│   ├── esc-50-explore.ipynb
│   └── esc-preprocess-and-eda.ipynb
├── UI/
│   ├── test/
│   ├── audio_ui.py
│   ├── audio_ui2.py
│   ├── labels.py
│   ├── model.py
│   ├── yamnet.onnx
│   └── yamnet_inference.py
├── mini-projects/
│   ├── Aanchal/
│   │   ├── Audio Classification UrbanSound8k.ipynb
│   │   ├── NN_from_scratch.ipynb
│   │   └── Transfer learning with ResNet-50 cifar10.ipynb
│   └── Soham/
│       ├── Audio Classification UrbanSound8k/
│       ├── Neural-Network-from-scratch/
│       └── Transfer-learning-cifar10/
├── resnets_and_efficientnets/
│   ├── esc-dataset.ipynb
│   ├── esc-model1_2024-08-20_18-11-09.pth
│   ├── esc-transfer-learn.ipynb
│   ├── esc-transfer-learning2.ipynb
│   └── esc-utils.ipynb
├── yamnet/
│   ├── esc-dataset.ipynb
│   ├── esc-dataset2.xpynb
│   ├── esc-model1_20/
│   ├── esc-utils.ipynb
│   ├── esc-utils3.xpynb
│   ├── esc-yamnet.ipynb
│   ├── escyamnetdataset.xpynb
│   ├── getyamnet.xpynb
│   ├── yamnet-load.xpynb
│   └── yamnet.ipynb
├── LICENSE
└── README.md

Table of Contents

Introduction

This project focuses on developing a robust audio classifier that processes user-provided audio files and accurately identifies the category or class to which the audio belongs.

Description

This project seeks to create a cutting-edge audio classification system capable of sorting diverse audio inputs, including speech, music, and environmental sounds.
We used 2 approaches for this project, which are as follows,

  • Convolutional Neural Networks (CNNs)
  • Transfer learning (YAMNet, ResNet50, EfficientNET )
WhatsApp.Video.2024-10-18.at.23.40.22.1.mp4

Tech Stack

Contributors

Future Prospects

  • Hate Speech Detection in low-Resource Languages
  • Audio based Security Systems
  • Environmental Monitoring

Resources

Audio processing by Valerio Valerdo

Coursera course on Deep learning by Andrew Ng and Younes Bensouda Mourri

Pytorch playlist by Patrick Leober

Datasets used are as follows,

  1. ESC-50 dataset
  2. CIFAR 10 dataset
  3. Urban Sound 8k

Acknowledgement

Special thanks to COC VJTI for ProjectX 2024

Special Thanks to our mentors Kshitij Shah and Param Thakkar who guided us throughout our project journey.