BIRD is an open dataset that consists of 1,000,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.
If you use BIRD in your research, please consider citing the official paper:
@inproceedings{grondin2021bird,
title={BIRD: Big Impulse Response Dataset},
author={Grondin, Fran{\c{c}}ois and Lauzon, Jean-Samuel and Michaud, Simon and Ravanelli, Mirco and Michaud, Fran{\c}ois},
eprint={arXiv:2010.09930},
year={2020}
}
There are two ways to download the BIRD dataset: using the python script (recommended) or direct download.
Here's how to quickly install and use the dataset:
- Install required libraries:
pip3 install -r requirements.txt
- Download the complete dataset and store it on your hardware (11.3 GB):
python3 tools/download.py
or you can downloads parts of the dataset with the --fold argument such as:
python3 tools/download.py --folds 1
You can download directly the dataset here.
We can visualize a sample from the dataset. Suppose we want to look at the fourth sample in the dataset, we would use the following commands:
Launch the following script:
python3 tools/visualize.py --view room --item 4
Which returns the following plot:
The microphones are displayed by the black dots whereas the sources are the colored dots
Launch the following script:
python3 tools/visualize.py --view rir --item 4
Which returns the following plot:
To see the metadatas as presented in the article, launch the following script:
python3 tools/visualize.py --view meta --item 4
Which prints the following string in the terminal:
{
'L': [14.83, 11.49, 3.01],
'alpha': 0.36,
'c': 350.5,
'mics': [[14.141, 2.934, 1.895],
[14.224, 3.010, 2.161]],
'srcs': [[0.811, 5.702, 1.547],
[6.658, 4.000, 2.582],
[5.340, 9.433, 1.775],
[12.164, 8.109, 2.161]]
}
Here are a few examples that show how BIRD can be used to perform data augmentation.
Launch the following script:
python3 examples/sound_source_localization.py
Which outputs the following spectrograms:
And generates the following TDOAs:
tensor([-2.3261, 2.0721], dtype=torch.float64)
Launch the following script:
python3 examples/reverberation_time_estimation.py
Which outputs the following spectrograms:
And generates the following RT60:
0.416
Launch the following script:
python3 examples/counting_speech_sources.py
Which outputs the following spectrograms:
And generates the following source count:
3
Launch the following script:
python3 examples/ideal_ratio_mask_estimation.py
Which outputs the following spectrograms:
And the following ideal ratio mask:
And the target TDOA:
-2.3261