Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the GPU be used to create WAV files instead of the CPU? #598

Open
haydonryan opened this issue Sep 10, 2024 · 6 comments
Open

Can the GPU be used to create WAV files instead of the CPU? #598

haydonryan opened this issue Sep 10, 2024 · 6 comments

Comments

@haydonryan
Copy link

Love everyone's work here!

Reading the readme - can the GPU be used to create wave files by using the --cuda parameter on the python version?

or is the python version / cuda ONLY for training?

Thanks in advance. I'm trying to get the fastest TTS I can for converting large documents.

@FrontierDK
Copy link

FrontierDK commented Sep 10, 2024

It can, but it's slower, at least for single inference.

I got it working on Ubuntu (in a virtual PC with access to a RTX 3060 via PCI-E passthrough). If I had to guess, it's due to all the overhead of setting up / preparing GPU, and then finally do an inference.

@haydonryan
Copy link
Author

Ahh yeah that makes sense but unfortunate.. My 5950x is getting a solid workout here.

@rajuaryan21
Copy link

I tried, but it didn't work, I guess. I appended this --cuda parameter to audio generation and still see CPU doing all the work instead of RTX 4060. is it something I am missing? I am using win 11 and piper.exe to test this. Would love to see your insights.

@thetznecker
Copy link

It does work, yeah. But generation time, even after model is loaded into VRAM is x times longer than on CPU unfortunately

@haydonryan
Copy link
Author

I wonder why that is! Sadly i'm not up on CUDA optimization enough (yet) to understand why this might be the case. Would love to see some focus on that (but understand it's not really the main direction of the project)

@BryceBarbara
Copy link

Just to confirm I understand, for use cases like the read-aloud extension where it needs to create multiple generations one after the other, GPU would likely be faster since it would be able to re-use the same state for subsequent generations, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants