Skip to content

Commit

Permalink
add new publications, revise conference titles
Browse files Browse the repository at this point in the history
  • Loading branch information
yoyolicoris committed Jul 27, 2024
1 parent f12e841 commit cada239
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 5 deletions.
2 changes: 1 addition & 1 deletion _publications/2018-11-26-multi-layered-cepstrum.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ collection: publications
permalink: /publications/2018-11-26-multi-layered-cepstrum
excerpt:
date: 2018-11-26
venue: '2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)'
venue: 'IEEE Global Conference on Signal and Information Processing (GlobalSIP)'
paperurl: 'https://ieeexplore.ieee.org/document/8646684'
citation: 'Chin-Yun Yu and Li Su, &quot;Multi-layered Cepstrum for Instantaneous Frequency Estimation&quot;, <i>IEEE Global Conference on Signal and Information Processing</i>, Novermber 2018.'
---
Expand Down
2 changes: 1 addition & 1 deletion _publications/2020-12-10-harmonic-preserve-network.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ collection: publications
permalink: /publications/2020-12-10-harmonic-preserve-network
excerpt:
date: 2020-12-10
venue: '2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)'
venue: 'Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)'
paperurl: 'https://ieeexplore.ieee.org/abstract/document/9306345'
citation: 'Chin-Yun Yu, Jing-Hua Lin, and Li Su, &quot;Harmonic Preserving Neural Networks for Efficient and Robust Multipitch Estimation&quot;, <i>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference</i>, December 2020.'
---
Expand Down
4 changes: 2 additions & 2 deletions _publications/2021-11-12-danna-sep.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ collection: publications
permalink: /publications/2021-11-12-danna-sep
excerpt:
date: 2021-11-12
venue: '2021 Music Demixing Workshop, ISMIR'
venue: 'Music Demixing Workshop, ISMIR'
paperurl: 'https://arxiv.org/abs/2112.03752'
citation: 'Chin-Yun Yu and Kin-Wai Cheuk, &quot;Danna-Sep: Unite to separate them all&quot;, <i>The ISMIR 2021 Workshop on Music Source Separation</i>, November 2021.'
citation: 'Chin-Yun Yu and Kin-Wai Cheuk, &quot;Danna-Sep: Unite to separate them all&quot;, <i>The ISMIR Workshop on Music Source Separation</i>, November 2021.'
---
Deep learning-based music source separation has gained a lot of interest in the last decades. Most of the existing methods operate with either spectrograms or waveforms. Spectrogram based models learn suitable masks for separating magnitude spectrogram into different sources, and waveform-based models directly generate waveforms of individual sources. The two types of models have complementary strengths; the former is superior given harmonic sources such as vocals, while the latter demonstrates better results for percussion and bass instruments. In this work, we improved upon the state-of-the-art (SoTA) models and successfully combined the best of both worlds. The backbones of the proposed framework, dubbed Danna-Sep, are two spectrogram-based models including a modified X-UMX and U-Net, and an enhanced Demucs as the waveform-based model. Given an input of mixture, we linearly combined respective outputs from the three models to obtain the final result. We showed in the experiments that, despite its simplicity, Danna-Sep surpassed the SoTA models by a large margin in terms of Source-to-Distortion Ratio.
19 changes: 19 additions & 0 deletions _publications/2023-11-4-duet-svs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: "Zero-Shot Duet Singing Voices Separation with
Diffusion Models"
collection: publications
permalink: /publications/2023-11-4-duet-svs
excerpt:
date: 2023-11-4
venue: 'Sound Demixing Workshop, ISMIR'
paperurl: 'https://sdx-workshop.github.io/papers/Yu.pdf'
citation: 'Chin-Yun Yu, Emilian Postolache, Emanuele Rodolà, and György Fazekas, &quot;Zero-Shot Duet Singing Voices Separation with
Diffusion Models&quot;, <i>The ISMIR Sound Demixing Workshop</i>, November 2023.'
---
In recent studies, diffusion models have shown promise as priors for solving audio inverse problems, including source separation.
These models allow us to sample from the posterior distribution of a target signal given an observed signal by manipulating the diffusion process.
However, when separating audio sources of the same type, such as duet singing voices, the prior learned by the diffusion process may not be sufficient to maintain the consistency of the source identity in the separated audio.
For example, the singer may change from one to another from time to time.
Tackling this problem will be useful for separating sources in a choir, or a mixture of multiple instruments with similar timbre, without acquiring large amounts of paired data.
In this paper, we examine this problem in the context of duet singing voices separation, and propose a method to enforce the coherency of singer identity by splitting the mixture into overlapping segments and performing posterior sampling in an auto-regressive manner, conditioning on the previous segment.
We evaluate the proposed method on the MedleyVox dataset with different overlap ratios, and show that the proposed method outperforms naive posterior sampling baseline.
11 changes: 11 additions & 0 deletions _publications/2023-11-4-golf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: "Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables"
collection: publications
permalink: /publications/2023-11-4-golf
excerpt:
date: 2023-11-4
venue: 'International Society for Music Information Retrieval Conference (ISMIR)'
paperurl: 'https://zenodo.org/records/10265377'
citation: 'Chin-Yun Yu and György Fazekas, &quot;Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables&quot;, <i>International Society for Music Information Retrieval Conference</i>, November 2023.'
---
This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for singing voice synthesis (SVS) that exploits the physical characteristics of the human voice using differentiable digital signal processing. GOLF employs a glottal model as the harmonic source and IIR filters to simulate the vocal tract, resulting in an interpretable and efficient approach. We show it is competitive with state-of-the-art singing voice vocoders, requiring fewer synthesis parameters and less memory to train, and runs an order of magnitude faster for inference. Additionally, we demonstrate that GOLF can model the phase components of the human voice, which has immense potential for rendering and analysing singing voices in a differentiable manner. Our results highlight the effectiveness of incorporating the physical properties of the human voice mechanism into SVS and underscore the advantages of signal-processing-based approaches, which offer greater interpretability and efficiency in synthesis.
2 changes: 1 addition & 1 deletion _publications/2023-6-3-diffwave-sr.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ collection: publications
permalink: /publications/2023-6-3-diffwave-sr
excerpt:
date: 2023-6-3
venue: '2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)'
venue: 'IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)'
paperurl: 'https://ieeexplore.ieee.org/abstract/document/10095103'
citation: 'Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, and Hao Tang, &quot;Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution&quot;, <i>IEEE International Conference on Acoustics, Speech and Signal Processing</i>, June 2023.'
---
Expand Down

0 comments on commit cada239

Please sign in to comment.