Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

This repository contains the code and dataset accompanying the paper "Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model" by Dr. Jaeyong Kang, Prof. Soujanya Poria, and Prof. Dorien Herremans.

Demo: https://amaai-lab.github.io/Video2Music/
Paper: https://arxiv.org/abs/2311.00968
Dataset (MuVi-Sync)
- MuVi-Sync (features) (Link)
- MuVi-Sync (original video) (Link)
- MuVi-Sync (original audio) (Link)

Introduction

We propose a novel AI-powered multimodal music generation framework called Video2Music. This framework uniquely uses video features as conditioning input to generate matching music using a Transformer architecture. By employing cutting-edge technology, our system aims to provide video creators with a seamless and efficient solution for generating tailor-made background music.

Directory Structure

saved_models/: saved model files
utilities/
- run_model_vevo.py: code for running model (AMT)
- run_model_regression.py: code for running model (bi-GRU)
model/
- video_music_transformer.py: Affective Multimodal Transformer (AMT) model
- video_regression.py: Bi-GRU regression model used for predicting note density/loudness
- positional_encoding.py: code for Positional encoding
- rpr.py: code for RPR (Relative Positional Representation)
dataset/
- vevo_dataset.py: Dataset loader
train.py: training script (AMT)
train_regression.py: training script (bi-GRU)
evaluate.py: evaluation script
generate.py: inference script

Preparation

Clone this repo
Obtain the dataset:
- MuVi-Sync (features) (Link)
- MuVi-Sync (original video) (Link)
Put all directories started with vevo in the dataset under this folder (dataset/)
Download the processed training data AMT.zip from HERE and extract the zip file and put the extracted two files directly under this folder (saved_models/AMT/)
Install dependencies pip install -r requirements.txt
- Choose the correct version of torch based on your CUDA version

Training

python train.py

Inference

python generate.py

Citation

If you find this resource useful, please cite the original work:

  @article{kang2023video2music,
    title={Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model},
    author={Kang, Jaeyong and Poria, Soujanya and Herremans, Dorien},
    journal={arXiv preprint arXiv:2311.00968},
    year={2023}
  }

Kang, J., Poria, S. & Herremans, D. (2023). Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model. arXiv preprint arXiv:2311.00968.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

Introduction

Directory Structure

Preparation

Training

Inference

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
dataset		dataset
docs		docs
model		model
saved_models/AMT		saved_models/AMT
script		script
utilities		utilities
README.md		README.md
evaluate.py		evaluate.py
evaluate_regression.py		evaluate_regression.py
framework.png		framework.png
generate.py		generate.py
graph_results.py		graph_results.py
requirements.txt		requirements.txt
train.py		train.py
train_regression.py		train_regression.py

declare-lab/Video2Music

Folders and files

Latest commit

History

Repository files navigation

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

Introduction

Directory Structure

Preparation

Training

Inference

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages