Skip to content

Latest commit

 

History

History
110 lines (81 loc) · 5.21 KB

README.md

File metadata and controls

110 lines (81 loc) · 5.21 KB

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

License: MIT arXiv DOI Huggingface

Ilaria Manco*1,2, Benno Weck*3, Seungheon Doh4, Minz Won5, Yixiao Zhang1, Dmitry Bogdanov3, Yusong Wu6, Ke Chen7, Philip Tovstogan3, Emmanouil Benetos1, Elio Quinton2, George Fazekas1, Juhan Nam4

1 QMUL, 2 UMG, 3 UPF, 4 KAIST, 5 ByteDance, 6 MILA, 7 UCSD

* equal contribution

This repository contains starter code for the Song Describer Dataset (SDD).

Dataset overview

sdd.mp4

"A retro-futurist drum machine groove drenched in bubbly synthetic sound effects and a hint of an acid bassline."

"Elegant and sophisticated Latin jazz piece with a Cuban base and a whispered melodic female voice."

"Calm sitar and Indian tabla with dramatic synthetic strings background."

SDD contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in evaluation of models that address music-and-language (M&L) tasks such as music captioning, text-to-music generation and music-language retrieval. More information about the data, collection method and validation is provided in the data card, together with more in-depth documentation in the datasheet.

Subset Tracks Captions Annotators Cap len (avg) Vocab size Audio len
full 706 1106 142 21.7 ± 12.4 2859 ~ 2 min
valid 546 746 114 18.2 ± 7.6 1942 ~ 2 min

Downloading the dataset

The dataset is available to download from Zenodo:

wget -P data https://zenodo.org/record/10072001/files/song_describer.csv https://zenodo.org/record/10072001/files/audio.zip
unzip data/audio.zip -d data/audio

A download script is also available here.

Code setup

To use this code, we recommend creating a new python3 virtual environment:

python -m venv venv 
source venv/bin/activate

Then, clone the repository and install the dependencies:

git clone https://github.com/mulab-mir/song-describer-dataset.git
cd song-describer-dataset
pip install -r requirements.txt

Reproducing the analysis in the paper

The overview statistics presented in the paper can be reproduced via the code in the dataset_stats.ipynb notebook. Further exploratory analysis of the data can be found in the data_exploration.ipynb notebook

Using the dataset

PyTorch

[Coming soon]

Hugging Face

[coming soon]

Benchmarking M&L models with SDD

[coming soon]

Cite

If you use the dataset or the code in this repo, please consider citing our work:

@inproceedings{manco2023thesong,
  title={The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation}, 
  author={Manco, Ilaria and Weck, Benno and Doh, Seungheon and Won, Minz and Zhang, Yixiao and Bogdanov, Dmitry and Wu, Yusong and Chen, Ke and Tovstogan, Philip and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Nam, Juhan},
  booktitle={Machine Learning for Audio Workshop at NeurIPS 2023}, 
  year={2023},
}

License

This repository is released under the MIT License. Please see the LICENSE file for more details. The dataset is released under the CC BY-SA 4.0 license.

Contact

If you have any questions, please get in touch: [email protected].