After just using VAE reconstruct a audio, I only get noise #19

SuperiorDtj · 2023-05-31T11:42:01Z

Here is my code. Is there something wrong on my method about using vae?

`def recon_vae(self, filename):
        """ recon audio only by vae """
        with torch.no_grad():

        waveform, sample_rate = torchaudio.load(filename)
        waveform = torchaudio.functional.resample(waveform, orig_freq=sample_rate, new_freq=16000)[0]
        waveform = waveform - torch.mean(waveform)
        waveform = waveform / (torch.max(torch.abs(waveform)) + 1e-8)
        waveform = 0.5 * waveform
        waveform = waveform / torch.max(torch.abs(waveform))
        waveform = 0.5 * waveform
      
        #waveform = 0.5 * waveform[0:int(len(waveform)*1)]
        
        audio = torch.unsqueeze(waveform, 0)
        audio = torch.nan_to_num(torch.clip(audio, -1, 1))
        audio = torch.autograd.Variable(audio, requires_grad=False)
        melspec, log_magnitudes_stft, energy = self.stft.mel_spectrogram(audio)
        melspec = melspec.transpose(1, 2)
        melspec = melspec.unsqueeze(1)
        truth_lattent = self.vae.get_first_stage_encoding(self.vae.encode_first_stage(melspec))
       
        mel_recon = self.vae.decode_first_stage(truth_lattent)
        wave = self.vae.decode_to_waveform(mel_recon)
    return wave[0], waveform`

The text was updated successfully, but these errors were encountered:

deepanwayx · 2023-06-02T14:49:15Z

Can you try the folllowing:

import torch
import torchaudio
from tango import Tango
from tools.torch_tools import wav_to_fbank

filename = ... 

device = "cuda:0"
tango = Tango("declare-lab/tango", device)
tango.vae.eval()
tango.stft.eval()

duration = 10
target_length = int(duration * 102.4)

with torch.no_grad():
    mel, _, waveform = wav_to_fbank([filename], target_length, tango.stft)
    mel = mel.unsqueeze(1).to(device)
    latent = tango.vae.get_first_stage_encoding(tango.vae.encode_first_stage(mel))
    reconstructed_mel = tango.vae.decode_first_stage(latent)
    reconstructed_waveform = tango.vae.decode_to_waveform(reconstructed_mel)[0]

SuperiorDtj · 2023-06-05T01:50:48Z

Can you try the folllowing:

import torch
import torchaudio
from tango import Tango
from tools.torch_tools import wav_to_fbank

filename = ... 

device = "cuda:0"
tango = Tango("declare-lab/tango", device)
tango.vae.eval()
tango.stft.eval()

duration = 10
target_length = int(duration * 102.4)

with torch.no_grad():
    mel, _, waveform = wav_to_fbank([filename], target_length, tango.stft)
    mel = mel.unsqueeze(1).to(device)
    latent = tango.vae.get_first_stage_encoding(tango.vae.encode_first_stage(mel))
    reconstructed_mel = tango.vae.decode_first_stage(latent)
    reconstructed_waveform = tango.vae.decode_to_waveform(reconstructed_mel)[0]

Thanks for your code！Now I can reconstruct the audio, but only in the situation that the number of the audio's frames is the multiple of four(3.6s dur instead of 3.7s dur)it can reconstruct the audio.
Is this commom issue of the VAE model?

deepanwayx · 2023-06-06T05:45:05Z

What is the exact issue when reconstructing a 3.7s audio? Does it generate noise for the entire 3.7s or the last 0.1s?

SuperiorDtj · 2023-06-06T06:02:42Z

What is the exact issue when reconstructing a 3.7s audio? Does it generate noise for the entire 3.7s or the last 0.1s?

When the VAE reconsturct a 3.7s audio, it generate noise for the entire 3.7s

ikm565 · 2023-07-29T14:46:06Z

I meet the same problem as u. Have the problem been solved? I tried making reconstruction on the same one audio smaple for several times, the reconstructed results are always very different noise. And the results of each reconstruction vary greatly from one another.

The only one solution is setting the duration like this?
target_length = int(duration * 102.4)

SuperiorDtj changed the title ~~After just using VAE reconstruct a audio, I only get noise?~~ After just using VAE reconstruct a audio, I only get noise May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After just using VAE reconstruct a audio, I only get noise #19

After just using VAE reconstruct a audio, I only get noise #19

SuperiorDtj commented May 31, 2023 •

edited

Loading

deepanwayx commented Jun 2, 2023

SuperiorDtj commented Jun 5, 2023 •

edited

Loading

deepanwayx commented Jun 6, 2023

SuperiorDtj commented Jun 6, 2023

ikm565 commented Jul 29, 2023 •

edited

Loading

After just using VAE reconstruct a audio, I only get noise #19

After just using VAE reconstruct a audio, I only get noise #19

Comments

SuperiorDtj commented May 31, 2023 • edited Loading

deepanwayx commented Jun 2, 2023

SuperiorDtj commented Jun 5, 2023 • edited Loading

deepanwayx commented Jun 6, 2023

SuperiorDtj commented Jun 6, 2023

ikm565 commented Jul 29, 2023 • edited Loading

SuperiorDtj commented May 31, 2023 •

edited

Loading

SuperiorDtj commented Jun 5, 2023 •

edited

Loading

ikm565 commented Jul 29, 2023 •

edited

Loading