-
Describe the bugI'm trying to train an Austrian TTS model with Vits, but despite trying various configurations, I haven't been able to start the training properly. It worked once with 5% of my data (around 7.5 hours) using a batch size of 2, but I suspect this isn't sufficient for good output quality. # define model config
config = VitsConfig(
batch_size=16,
eval_batch_size=8,
batch_group_size=1,
num_loader_workers=0,
num_eval_loader_workers=32,
run_eval=True,
test_delay_epochs=-1,
epochs=1000,
text_cleaner="basic_german_cleaners",
use_phonemes=True,
phoneme_language="de",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache_tts"),
compute_input_seq_cache=True,
precompute_num_workers=12,
print_step=20,
print_eval=True,
mixed_precision=True,
output_path=output_path,
datasets=[dataset_config],
use_speaker_embedding=True,
test_sentences=[
"Hallo, wie geht es dir? Ich hoffe, du hast einen schönen Tag.",
"Das ist ein Test. Wir überprüfen, ob alles wie erwartet funktioniert.",
"Ich lerne gerade Programmierung. Es ist eine sehr nützliche Fähigkeit, die viele Türen Öffnen kann.",
"Die Sonne scheint heute. Es ist ein perfekter Tag, um draußen spazieren zu gehen und die Natur zu genießen.",
"Ich mag Schokolade. Besonders dunkle Schokolade mit einem hohen Kakaoanteil ist mein Favorit."
],
cudnn_enable=True,
cudnn_benchmark=True,
cudnn_deterministic=True
)
Any suggestions for improvement? To ReproduceCUDA_VISIBLE_DEVICES="0, 1, 2" python -m trainer.distribute --script train.py Expected behaviorNo response LogsNo response Environment{
"CUDA": {
"GPU": [
"NVIDIA GeForce RTX 4090",
"NVIDIA GeForce RTX 4090",
"NVIDIA GeForce RTX 4090"
],
"available": true,
"version": "12.4"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "2.5.0+cu124",
"TTS": "0.24.3",
"numpy": "1.26.4"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "",
"python": "3.12.2",
"version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30)"
}
} Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Moved to discussions because it doesn't look like there's any issue with Coqui here. You'll need to adjust the batch size to work for your GPUs. If you have some long audios, the |
Beta Was this translation helpful? Give feedback.
Moved to discussions because it doesn't look like there's any issue with Coqui here. You'll need to adjust the batch size to work for your GPUs. If you have some long audios, the
max_audio_len
setting might also be relevant:coqui-ai-TTS/TTS/tts/configs/shared_configs.py
Line 212 in b5bd995