Skip to content

Latest commit

 

History

History
62 lines (40 loc) · 6.26 KB

File metadata and controls

62 lines (40 loc) · 6.26 KB

Modeling Harmonic Complexity in Automatic Music Generation using Conditional Variational Autoencoders

Author: @DavideGioiosa

Main Technologies

  • Python 3.7
  • Tensorflow/Keras
  • Pandas, Numpy, Matplotlib
  • Flask, AWS
  • MIDI, .wav

Goal

Is it possible to use complexity as a parameter to automatically generate music? This is the question that motivates our research. In the area of automatic music composition, several neural network models have been implemented to generate music of a certain musical genre e.g. rock, pop, jazz, or to capture and imitate the style of a composer. Recent studies in this area of research, focus on providing the ability not only to generate music, but also to be able to condition the creative process.

From previous researches we know that complexity is a parameter closely related to the amount of brain activity of the listener (the so-called "arousal potential"). It also affects a person's musical preferences. Given this close correlation with a listener's perceptions, we decide to explore the use of this parameter in music.

Complexity is present in each of the aspects in which the music can be divided, e.g. chords, rhythm, melody, etc. Among these we choose to focus on the harmony. In particular, in this work we explore Harmonic Complexity and its use as a parameter to condition the generation of chord sequences.

For the automatic generation process we exploit two Conditional Neural Network models both based on the Variational Autoencoder (VAE). We evaluated, through a perceptual test, the ability to generate chord sequences give a desired complexity values.

Dataset

The starting dataset used for this experimental thesis comes from this research: A Data-Driven Model of Tonal Chord Sequence Complexity, containing 5-chord sequences associated with a complexity bin. We represent each chord by a multi-hot vector:

Conditional Variational Autoencoder

The Conditional Variational Autoencoder (CVAE) is an extension of the VAE model and it's a type of Conditional Architectures. They are a type of Neural Networks characterized by the addition of the conditioning feature as an extra input layer to the network model. This model provides the possibility of having control over the data during the generation process through the conditioning with the target feature. In this research the harmonic complexity is the parameter used to condition the VAE.

In the next section the two implemented models are simply described, more information on their mathematical formulation is available in the thesis.

Model A

This first model of CVAE incorporates the conditioning information by concatenating the complexity vector as input the encoder and the decoder.

Model B

This second model of CVAE is composed by the combination of the standard VAE with a Regressor (RVAE), which explicitly condition the distribution of the data in the latent space with respect to the harmonic complexity.

This is achieved by linking the latent representation of the input chord sequence with the associated harmonic complexity in the model structure: on one hand, the latent representations generated by the Regressor prediction must resemble the latent representation of the input, and on the other hand, the variation relative to the harmonic complexity in the latent space is encouraged to follow a direction defined by the dissociated dimension. We obtained a disentangled dimension in the latent representation encoding harmonic complexity: variations along one dimension are explained by this feature, while they are relatively invariant to variations in other factors.

Generation of new chord progressions

The two trained CVAE can generate new chord sequences with the desired harmonic complexity.

Examples of generating chord sequences using harmonic complexity with the CVAE and RVAE models are provided in the two .ipynb files in symbolic, midi and wav formats.

Listening Test

A web-app has been designed using Flask and AWS to collect ratings on the generated chord sequences. In the fist part of the experiment, the participants are profied based on their music background using the self-report questionnaire of the Goldsmiths Musical Sophistication Index. The second part is the perceptual test in which the participants were asked to express their level of agreement to the indicated complexity value provided for each chord progressions. The evaluation is expressed using the Likert scale scores from 0 to 4, where completely agree is the highest score and completely disagree the lowest one.

Research Article

The article of this research is published in EURASIP Journal on Audio, Speech, and Music Processing; more information and the results of the different experiments are available in the Article.

Further examples of the outputs of the models are available on this Website.