-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After just using VAE reconstruct a audio, I only get noise #19
Comments
Can you try the folllowing: import torch
import torchaudio
from tango import Tango
from tools.torch_tools import wav_to_fbank
filename = ...
device = "cuda:0"
tango = Tango("declare-lab/tango", device)
tango.vae.eval()
tango.stft.eval()
duration = 10
target_length = int(duration * 102.4)
with torch.no_grad():
mel, _, waveform = wav_to_fbank([filename], target_length, tango.stft)
mel = mel.unsqueeze(1).to(device)
latent = tango.vae.get_first_stage_encoding(tango.vae.encode_first_stage(mel))
reconstructed_mel = tango.vae.decode_first_stage(latent)
reconstructed_waveform = tango.vae.decode_to_waveform(reconstructed_mel)[0] |
Thanks for your code!Now I can reconstruct the audio, but only in the situation that the number of the audio's frames is the multiple of four(3.6s dur instead of 3.7s dur)it can reconstruct the audio. |
What is the exact issue when reconstructing a 3.7s audio? Does it generate noise for the entire 3.7s or the last 0.1s? |
When the VAE reconsturct a 3.7s audio, it generate noise for the entire 3.7s |
I meet the same problem as u. Have the problem been solved? I tried making reconstruction on the same one audio smaple for several times, the reconstructed results are always very different noise. And the results of each reconstruction vary greatly from one another. The only one solution is setting the duration like this? |
Here is my code. Is there something wrong on my method about using vae?
The text was updated successfully, but these errors were encountered: