forked from academicpages/academicpages.github.io
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add new publications, revise conference titles
- Loading branch information
1 parent
f12e841
commit cada239
Showing
6 changed files
with
35 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
title: "Zero-Shot Duet Singing Voices Separation with | ||
Diffusion Models" | ||
collection: publications | ||
permalink: /publications/2023-11-4-duet-svs | ||
excerpt: | ||
date: 2023-11-4 | ||
venue: 'Sound Demixing Workshop, ISMIR' | ||
paperurl: 'https://sdx-workshop.github.io/papers/Yu.pdf' | ||
citation: 'Chin-Yun Yu, Emilian Postolache, Emanuele Rodolà, and György Fazekas, "Zero-Shot Duet Singing Voices Separation with | ||
Diffusion Models", <i>The ISMIR Sound Demixing Workshop</i>, November 2023.' | ||
--- | ||
In recent studies, diffusion models have shown promise as priors for solving audio inverse problems, including source separation. | ||
These models allow us to sample from the posterior distribution of a target signal given an observed signal by manipulating the diffusion process. | ||
However, when separating audio sources of the same type, such as duet singing voices, the prior learned by the diffusion process may not be sufficient to maintain the consistency of the source identity in the separated audio. | ||
For example, the singer may change from one to another from time to time. | ||
Tackling this problem will be useful for separating sources in a choir, or a mixture of multiple instruments with similar timbre, without acquiring large amounts of paired data. | ||
In this paper, we examine this problem in the context of duet singing voices separation, and propose a method to enforce the coherency of singer identity by splitting the mixture into overlapping segments and performing posterior sampling in an auto-regressive manner, conditioning on the previous segment. | ||
We evaluate the proposed method on the MedleyVox dataset with different overlap ratios, and show that the proposed method outperforms naive posterior sampling baseline. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: "Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables" | ||
collection: publications | ||
permalink: /publications/2023-11-4-golf | ||
excerpt: | ||
date: 2023-11-4 | ||
venue: 'International Society for Music Information Retrieval Conference (ISMIR)' | ||
paperurl: 'https://zenodo.org/records/10265377' | ||
citation: 'Chin-Yun Yu and György Fazekas, "Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables", <i>International Society for Music Information Retrieval Conference</i>, November 2023.' | ||
--- | ||
This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for singing voice synthesis (SVS) that exploits the physical characteristics of the human voice using differentiable digital signal processing. GOLF employs a glottal model as the harmonic source and IIR filters to simulate the vocal tract, resulting in an interpretable and efficient approach. We show it is competitive with state-of-the-art singing voice vocoders, requiring fewer synthesis parameters and less memory to train, and runs an order of magnitude faster for inference. Additionally, we demonstrate that GOLF can model the phase components of the human voice, which has immense potential for rendering and analysing singing voices in a differentiable manner. Our results highlight the effectiveness of incorporating the physical properties of the human voice mechanism into SVS and underscore the advantages of signal-processing-based approaches, which offer greater interpretability and efficiency in synthesis. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters