Multi-Dilated Frequency Dynamic Convolution for Sound Event Detection

Includes following methods: Dilated Frequency Dynamic Convolution, Partial Frequency Dynamic Convolution, Partial Dilated Frequency Dynamic Convolution and Multi-Dilated Frequency Dynamic Convolution

Official implementation of

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection (Accepted to INTERSPEECH2024)
by Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park
Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution
by Hyeonuk Nam, Yong-Hwa Park
Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes (DCASE2024 Challenge Task4 technical report, 2nd rank)
by Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

Requirements

Python version of 3.7.10 is used with following libraries

pytorch==1.8.0
pytorch-lightning==1.2.4
pytorchaudio==0.8.0
scipy==1.4.1
pandas==1.1.3
numpy==1.19.2
sed_scores_eval==0.0.4
sebbs==0.0.0

other requrements in requirements.txt

Datasets

You can download datasets by reffering to DCASE 2021 Task 4 description page or DCASE 2021 Task 4 baseline. You need DESED real datasets (weak/unlabeled in domain/validation/public eval) and DESED synthetic datasets (train/validation).

Test with saved models

You can test saved models by running:

python main.py

this example tests the best MDFD-CRNN model with class-wise median filter on truePSDS1.

To test MDFD-CRNNs with cSEBBs, run

python main.py -c ./configs/config_MDFDbest_sebb.yaml

then run

python sebbeval.py

To test DFD-CRNNs, run

python main.py -c ./configs/config_DFDbest_psds1.yaml

or

python main.py -c ./configs/config_DFDbest_psds2.yaml

To test PFD-CRNNs, run

python main.py -c ./configs/config_PFDbest.yaml

Training

To train the model, you have to chage configs/config_*.yaml/training/test_only as False, and run:

python main.py

Trained model will be saved in exps folder.

Reference

Citation & Contact

If this repository helped your works, please cite papers below! 3rd paper is about data augmentation method called FilterAugment which is applied to this work.

@article{nam2024dcase,
      title={Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes}, 
      author={Hyeonuk Nam and Deokki Min and Seungdeok Choi and Inhan Choi and Yong-Hwa Park},
      year={2024},
      journal={arXiv preprint arXiv:2406.15725},
}

@article{nam2024pushing,
      title={Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution}, 
      author={Hyeonuk Nam and Yong-Hwa Park},
      year={2024},
      journal={arXiv preprint arXiv:2406.13312},
}

@article{nam2024diversifying,
      title={Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection}, 
      author={Hyeonuk Nam and Seong-Hu Kim and Deokki Min and Junhyeok Lee and Yong-Hwa Park},
      year={2024},
      journal={arXiv preprint arXiv:2406.05341},
}

@inproceedings{Nam2023,
    author = "Nam, Hyeonuk and Kim, Seong-Hu and Min, Deokki and Park, Yong-Hwa",
    title = "Frequency \& Channel Attention for Computationally Efficient Sound Event Detection",
    booktitle = "Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)",
    address = "Tampere, Finland",
    month = "September",
    year = "2023",
    pages = "136--140",
}

@inproceedings{nam22_interspeech,
      author={Hyeonuk Nam and Seong-Hu Kim and Byeong-Yun Ko and Yong-Hwa Park},
      title={{Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection}},
      year=2022,
      booktitle={Proc. Interspeech 2022},
      pages={2763--2767},
      doi={10.21437/Interspeech.2022-10127}
}

@INPROCEEDINGS{nam2021filteraugment,
    author={Nam, Hyeonuk and Kim, Seong-Hu and Park, Yong-Hwa},
    booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
    title={Filteraugment: An Acoustic Environmental Data Augmentation Method}, 
    year={2022},
    pages={4308-4312},
    doi={10.1109/ICASSP43922.2022.9747680}
}

Please contact Hyeonuk Nam at [email protected] for any query.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Dilated Frequency Dynamic Convolution for Sound Event Detection

Requirements

Datasets

Test with saved models

Training

Reference

Citation & Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
configs		configs
exps		exps
utils		utils
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
sebbeval.py		sebbeval.py

frednam93/MDFD-SED

Folders and files

Latest commit

History

Repository files navigation

Multi-Dilated Frequency Dynamic Convolution for Sound Event Detection

Requirements

Datasets

Test with saved models

Training

Reference

Citation & Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages