-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model training question #26
Comments
I have the same issue about the input data format. Please put more instructions |
Hi! Sorry for such a brief description of training process in readme and such a late response (I hope it will still be useful to put it here). The whole folders structure in your data directory data_dir (it is the directory that you set in train_enc.py and train_dec.py before training starts) should look like this: data_dir/wavs/spk1/spk1_000001.wav As for mels and embeds subfolders, they should have the same structure: Calculating mel-spectrograms and speaker embeddings from wav files to fill the subfolders mels and embeds can be performed with functions get_mel and get_embed defined in the jupyter notebook inference.ipynb correspondingly. These functions return numpy arrays that should be saved using np.save. After you do that, you can write some wav filenames (without ".wav") to filelists/valid.txt to use them for validation purposes. Also, if for some reasons you don't want specific wavs to be used at training, you can add them in the same format to filelists/exceptions.txt. Otherwise you can leave this file empty. Paths to valid.txt and exceptions.txt should be set in train_dec.py (variables val_file and exc_file respectively) along with the path to the data directory data_dir. After these paths there is also a list of training parameters in train_dec.py (like epochs, batch_size and learning_rate). Some other important model hyperparameters can be set in params.py. Then you can finally launch train_dec.py with the pre-trained encoder in logs_enc directory. If you also want to train the encoder yourself (e.g. your language is different from English, or you want to use a dataset richer than LibriTTS), you have to do some additional data preparation. For training encoder you'll need additional subfolders mels_mode and textgrids with the following structure: As for alignment TextGrid files in the subfolder textgrids, please refer to Montreal Forced Aligner for the instructions on how to get such alignment files from wavs. To get average voice mel-spectrograms in the subfolder mels_mode, please run get_avg_mels.ipynb jupyter noteboook. After this has been done, you can launch train_enc.py to start training your encoder. |
Thank you very much for the answer. Can you tell me if there are any encoders for the Russian language? Or datasets on which you can train the encoder? |
Hi, |
Basically, for each audio file .wav you know which frame corresponds to which phoneme (you can extract this information from textgrid file by calculating start_frame and end_frame as in get_avg_mels.ipynb), and then for each frame replace mel feature in _mel.npy file with the average feature of the corresponding phoneme -- mels_mode dictionary contains mapping {phoneme: its average mel feature}. |
Hi, thanks for sharing the code.
I have a folder with wav files of different speakers. I don't understand what to do next to get the trained model. What type of files should be in the "mels" and "embeds" folders.
How exactly to fill them. Maybe there is some more detailed instructions?
The text was updated successfully, but these errors were encountered: