-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated audio not clear #6
Comments
Hi, |
Hi, I have updated the permission access. Thank you so much |
I use the prosody_cloning source code |
Thanks, I can access them now. This definitely sounds bad, worse than in my experiments. Could you share the recognized transcript from this utterance? |
Yes, the transcription is in phonetic format and seem to be correct. So the problem is not at the ASR's end. The warning should not matter, you can ignore it. |
The 58 samples are from 2 different speakers. Thank you for your suggestion, I will try it. |
Hi, I want to update about this issue. Currently, I experienced the same thing. The output speech sounds the same eventhough I used all data. Actually, I also got problem about "pretrained_models" as in this issue #2 I use this model because later in the line 234, only this model that has 'style_emb_func' Is it the cause of the problem? |
It is weird that the script even attempted to find the model in pretrained_models. In GANAnonymizer, the variable self.embed_model_path (which is then given for the variable model_path in the speaker embedding extraction) is overwritten with the path from the settings file, the one that you now set manually. The only idea that I have is that something went wrong during this load_parameter function. Could you check if this settings.json is loaded correctly? |
I have to admit though that this code is rather old and I might have fixed some bugs in other versions of the code that I might have forgotten to fix here too. I would appreciate your help in trying to figure out your issue but I understand that this might be too time-consuming for you. You can find a working version of this model in the latest Voice Privacy Challenge. We included this model as baseline B3, in the code under the tag sttts . Compared to the default setting we have here, the model in the challenge includes prosody modifications per default, but you can disable it by commenting out the part with the prosody anonymization in the config. Alternatively, you can use the code in our VoicePAT toolkit which was the basis on which the code of the challenge was restructured. The main branch underwent some changes during the challenge development, but you can find a working version in the develop branch (which will be moved to the main branch soon). In any way, I recommend you to use either the Voice Privacy Challenge 2024 or VoicePAT for evaluation. They contain several improvements compared to the the evaluation scripts of the Voice Privacy Challenge 2022 or 2020, which are still included in this repository. |
Hello, may I ask for your guidance in generating the anonymized audio?
I can run your code with the default setting but the output audio is not clear.
Here is the example when I generate the audio with a sampling rate of 16khz
https://drive.google.com/file/d/17bv8ZMYrOmohT8T61G3jg16udOoWiO05/view?usp=drive_link
here is the audio with a sampling rate of 48khz
https://drive.google.com/file/d/1yQ56s5QGJuFDItTFO_mJ3hPKnyvFegHS/view?usp=drive_link
The text was updated successfully, but these errors were encountered: