Includes following methods: Dilated Frequency Dynamic Convolution, Partial Frequency Dynamic Convolution, Partial Dilated Frequency Dynamic Convolution and Multi-Dilated Frequency Dynamic Convolution
Official implementation of
- Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection (Accepted to INTERSPEECH2024)
by Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park
- Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution
by Hyeonuk Nam, Yong-Hwa Park
- Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes (DCASE2024 Challenge Task4 technical report, 2nd rank)
by Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park
Python version of 3.7.10 is used with following libraries
- pytorch==1.8.0
- pytorch-lightning==1.2.4
- pytorchaudio==0.8.0
- scipy==1.4.1
- pandas==1.1.3
- numpy==1.19.2
- sed_scores_eval==0.0.4
- sebbs==0.0.0
other requrements in requirements.txt
You can download datasets by reffering to DCASE 2021 Task 4 description page or DCASE 2021 Task 4 baseline. You need DESED real datasets (weak/unlabeled in domain/validation/public eval) and DESED synthetic datasets (train/validation).
You can test saved models by running:
python main.py
this example tests the best MDFD-CRNN model with class-wise median filter on truePSDS1.
To test MDFD-CRNNs with cSEBBs, run
python main.py -c ./configs/config_MDFDbest_sebb.yaml
then run
python sebbeval.py
To test DFD-CRNNs, run
python main.py -c ./configs/config_DFDbest_psds1.yaml
or
python main.py -c ./configs/config_DFDbest_psds2.yaml
To test PFD-CRNNs, run
python main.py -c ./configs/config_PFDbest.yaml
To train the model, you have to chage configs/config_*.yaml/training/test_only as False, and run:
python main.py
Trained model will be saved in exps
folder.
- DCASE 2021 Task 4 baseline
- Sound event detection with FilterAugment
- Temporal Dynamic CNN for text-independent speaker verification
- Frequency Dynamic Convolution-Recurrent Neural Network (FDY-CRNN) for Sound Event Detection
- Frequency & Channel Attention for Computationally Efficient Sound Event Detection
- Sound Event Bounding Boxes
If this repository helped your works, please cite papers below! 3rd paper is about data augmentation method called FilterAugment which is applied to this work.
@article{nam2024dcase,
title={Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes},
author={Hyeonuk Nam and Deokki Min and Seungdeok Choi and Inhan Choi and Yong-Hwa Park},
year={2024},
journal={arXiv preprint arXiv:2406.15725},
}
@article{nam2024pushing,
title={Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution},
author={Hyeonuk Nam and Yong-Hwa Park},
year={2024},
journal={arXiv preprint arXiv:2406.13312},
}
@article{nam2024diversifying,
title={Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection},
author={Hyeonuk Nam and Seong-Hu Kim and Deokki Min and Junhyeok Lee and Yong-Hwa Park},
year={2024},
journal={arXiv preprint arXiv:2406.05341},
}
@inproceedings{Nam2023,
author = "Nam, Hyeonuk and Kim, Seong-Hu and Min, Deokki and Park, Yong-Hwa",
title = "Frequency \& Channel Attention for Computationally Efficient Sound Event Detection",
booktitle = "Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)",
address = "Tampere, Finland",
month = "September",
year = "2023",
pages = "136--140",
}
@inproceedings{nam22_interspeech,
author={Hyeonuk Nam and Seong-Hu Kim and Byeong-Yun Ko and Yong-Hwa Park},
title={{Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection}},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={2763--2767},
doi={10.21437/Interspeech.2022-10127}
}
@INPROCEEDINGS{nam2021filteraugment,
author={Nam, Hyeonuk and Kim, Seong-Hu and Park, Yong-Hwa},
booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Filteraugment: An Acoustic Environmental Data Augmentation Method},
year={2022},
pages={4308-4312},
doi={10.1109/ICASSP43922.2022.9747680}
}
Please contact Hyeonuk Nam at [email protected] for any query.