You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thanks for sharing this repo, it's an amazingly interesting work. Since the datasets you used to my knowledge are paid and only at 16k and 8k, respectively, I wanted to train your network using the VCTK dataset. For training it, I only chose small files with high activity of the trimmed version of it to avoid long sections of silence. However, when I do so, the training goes well for a few epochs and the message becomes intelligible very fast, but then the loss (I am using L1) explodes to a value order of magnitudes higher as shown here. Have you observed this pattern during your trainings?
As a second question, I wanted to know what you used your own STFT implementation as opposed to the torchaudio ones that allow backprop.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi Felix,
First of all thanks for sharing this repo, it's an amazingly interesting work. Since the datasets you used to my knowledge are paid and only at 16k and 8k, respectively, I wanted to train your network using the VCTK dataset. For training it, I only chose small files with high activity of the trimmed version of it to avoid long sections of silence. However, when I do so, the training goes well for a few epochs and the message becomes intelligible very fast, but then the loss (I am using L1) explodes to a value order of magnitudes higher as shown here. Have you observed this pattern during your trainings?
As a second question, I wanted to know what you used your own STFT implementation as opposed to the torchaudio ones that allow backprop.
Thanks!
The text was updated successfully, but these errors were encountered: