Skip to content

frednam93/MDFD-SED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Dilated Frequency Dynamic Convolution for Sound Event Detection

Includes following methods: Dilated Frequency Dynamic Convolution, Partial Frequency Dynamic Convolution, Partial Dilated Frequency Dynamic Convolution and Multi-Dilated Frequency Dynamic Convolution

Official implementation of

  • Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection (Accepted to INTERSPEECH2024)
    by Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park
    arXiv
  • Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution
    by Hyeonuk Nam, Yong-Hwa Park
    arXiv
  • Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes (DCASE2024 Challenge Task4 technical report, 2nd rank)
    by Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park
    arXiv DCASE

Requirements

Python version of 3.7.10 is used with following libraries

  • pytorch==1.8.0
  • pytorch-lightning==1.2.4
  • pytorchaudio==0.8.0
  • scipy==1.4.1
  • pandas==1.1.3
  • numpy==1.19.2
  • sed_scores_eval==0.0.4
  • sebbs==0.0.0

other requrements in requirements.txt

Datasets

You can download datasets by reffering to DCASE 2021 Task 4 description page or DCASE 2021 Task 4 baseline. You need DESED real datasets (weak/unlabeled in domain/validation/public eval) and DESED synthetic datasets (train/validation).

Test with saved models

You can test saved models by running:

python main.py

this example tests the best MDFD-CRNN model with class-wise median filter on truePSDS1.

To test MDFD-CRNNs with cSEBBs, run

python main.py -c ./configs/config_MDFDbest_sebb.yaml

then run

python sebbeval.py

To test DFD-CRNNs, run

python main.py -c ./configs/config_DFDbest_psds1.yaml

or

python main.py -c ./configs/config_DFDbest_psds2.yaml

To test PFD-CRNNs, run

python main.py -c ./configs/config_PFDbest.yaml

Training

To train the model, you have to chage configs/config_*.yaml/training/test_only as False, and run:

python main.py

Trained model will be saved in exps folder.

Reference

Citation & Contact

If this repository helped your works, please cite papers below! 3rd paper is about data augmentation method called FilterAugment which is applied to this work.

@article{nam2024dcase,
      title={Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes}, 
      author={Hyeonuk Nam and Deokki Min and Seungdeok Choi and Inhan Choi and Yong-Hwa Park},
      year={2024},
      journal={arXiv preprint arXiv:2406.15725},
}

@article{nam2024pushing,
      title={Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution}, 
      author={Hyeonuk Nam and Yong-Hwa Park},
      year={2024},
      journal={arXiv preprint arXiv:2406.13312},
}

@article{nam2024diversifying,
      title={Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection}, 
      author={Hyeonuk Nam and Seong-Hu Kim and Deokki Min and Junhyeok Lee and Yong-Hwa Park},
      year={2024},
      journal={arXiv preprint arXiv:2406.05341},
}

@inproceedings{Nam2023,
    author = "Nam, Hyeonuk and Kim, Seong-Hu and Min, Deokki and Park, Yong-Hwa",
    title = "Frequency \& Channel Attention for Computationally Efficient Sound Event Detection",
    booktitle = "Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)",
    address = "Tampere, Finland",
    month = "September",
    year = "2023",
    pages = "136--140",
}

@inproceedings{nam22_interspeech,
      author={Hyeonuk Nam and Seong-Hu Kim and Byeong-Yun Ko and Yong-Hwa Park},
      title={{Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection}},
      year=2022,
      booktitle={Proc. Interspeech 2022},
      pages={2763--2767},
      doi={10.21437/Interspeech.2022-10127}
}

@INPROCEEDINGS{nam2021filteraugment,
    author={Nam, Hyeonuk and Kim, Seong-Hu and Park, Yong-Hwa},
    booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
    title={Filteraugment: An Acoustic Environmental Data Augmentation Method}, 
    year={2022},
    pages={4308-4312},
    doi={10.1109/ICASSP43922.2022.9747680}
}

Please contact Hyeonuk Nam at [email protected] for any query.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages