[Bug] 3x RTX 4090 GPUs Insufficient for TTS Model Training? #155

ParadoxTR · 2024-11-12T12:37:02Z

ParadoxTR
Nov 12, 2024

Describe the bug

I'm trying to train an Austrian TTS model with Vits, but despite trying various configurations, I haven't been able to start the training properly. It worked once with 5% of my data (around 7.5 hours) using a batch size of 2, but I suspect this isn't sufficient for good output quality.

# define model config
config = VitsConfig(
    batch_size=16,
    eval_batch_size=8,
    batch_group_size=1,
    num_loader_workers=0,
    num_eval_loader_workers=32,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1000,
    text_cleaner="basic_german_cleaners",
    use_phonemes=True,
    phoneme_language="de",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache_tts"),
    compute_input_seq_cache=True,
    precompute_num_workers=12,
    print_step=20,
    print_eval=True,
    mixed_precision=True,
    output_path=output_path,
    datasets=[dataset_config],
    use_speaker_embedding=True,
    test_sentences=[
        "Hallo, wie geht es dir? Ich hoffe, du hast einen schönen Tag.",
        "Das ist ein Test. Wir überprüfen, ob alles wie erwartet funktioniert.",
        "Ich lerne gerade Programmierung. Es ist eine sehr nützliche Fähigkeit, die viele Türen Öffnen kann.",
        "Die Sonne scheint heute. Es ist ein perfekter Tag, um draußen spazieren zu gehen und die Natur zu genießen.",
        "Ich mag Schokolade. Besonders dunkle Schokolade mit einem hohen Kakaoanteil ist mein Favorit."
    ],
    cudnn_enable=True,
    cudnn_benchmark=True,
    cudnn_deterministic=True
)

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 16.50 MiB is free. Including non-PyTorch memory, this process has 23.61 GiB memory in use. Of the allocated memory 22.53 GiB is allocated by PyTorch, and 482.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Any suggestions for improvement?

To Reproduce

CUDA_VISIBLE_DEVICES="0, 1, 2" python -m trainer.distribute --script train.py

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090",
            "NVIDIA GeForce RTX 4090",
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "12.4"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.5.0+cu124",
        "TTS": "0.24.3",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.12.2",
        "version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30)"
    }
}

Additional context

No response

Answered by eginhard

Nov 12, 2024

Moved to discussions because it doesn't look like there's any issue with Coqui here. You'll need to adjust the batch size to work for your GPUs. If you have some long audios, the max_audio_len setting might also be relevant:

coqui-ai-TTS/TTS/tts/configs/shared_configs.py

Line 212 in b5bd995

max_audio_len (int):

View full answer

eginhard · 2024-11-12T13:06:52Z

eginhard
Nov 12, 2024
Maintainer

Moved to discussions because it doesn't look like there's any issue with Coqui here. You'll need to adjust the batch size to work for your GPUs. If you have some long audios, the max_audio_len setting might also be relevant:

coqui-ai-TTS/TTS/tts/configs/shared_configs.py

Line 212 in b5bd995

max_audio_len (int):

1 reply

ParadoxTR Nov 12, 2024
Author

Thanks a lot, it worked! I have one more question, though. The VRAM usage is around 20% for each GPU, but sometimes it spikes higher, causing an out-of-memory error. Is there a way to avoid this?

After this it stops working on GPU 1.

I adjusted batch size and few things and now able to load at least 20% of my data. Here is my new config;

config = VitsConfig(
    batch_size=16,
    eval_batch_size=8,
    batch_group_size=5,
    num_loader_workers=6,
    num_eval_loader_workers=2,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=10000,
    text_cleaner="basic_german_cleaners",
    use_phonemes=True,
    phoneme_language="de",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache_tts"),
    compute_input_seq_cache=True,
    precompute_num_workers=12,
    print_step=20,
    print_eval=True,
    mixed_precision=True,
    output_path=output_path,
    datasets=[dataset_config],
    use_speaker_embedding=True,
    test_sentences=[
        "Hallo, wie geht es dir? Ich hoffe, du hast einen schönen Tag.",
        "Das ist ein Test. Wir überprüfen, ob alles wie erwartet funktioniert.",
    ],
    cudnn_enable=True,
    cudnn_benchmark=True,
    cudnn_deterministic=False,
    min_text_len=0,
    max_text_len=3407,
    min_audio_len=0,
    max_audio_len=120000,
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 3x RTX 4090 GPUs Insufficient for TTS Model Training? #155

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[Bug] 3x RTX 4090 GPUs Insufficient for TTS Model Training? #155

ParadoxTR Nov 12, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Replies: 1 comment · 1 reply

eginhard Nov 12, 2024 Maintainer

ParadoxTR Nov 12, 2024 Author

ParadoxTR
Nov 12, 2024

Replies: 1 comment 1 reply

eginhard
Nov 12, 2024
Maintainer

ParadoxTR Nov 12, 2024
Author