-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XTTS: add inference_stream_text (slightly friendlier for text-streaming) #21
Conversation
PR recreated - linting test was not passing, I had to refork from this repo (my old fork was from |
Thanks for the PR! Just to let you that I'm traveling and won't have a chance to look at this for a couple of days. |
Thanks for the update, it can wait ofc, it's a small one. Safe travels 👌 |
Could you explain the benefits of this PR more concretely, i.e. what do you mean by "slightly friendlier for text-streaming"? Running streaming TTS for sentences one-by-one is already possible with the current code, no? |
Hi @eginhard, It's a bit friendlier in the sense that it's more clear that you can do text-streaming, my argument is that this explicitness puts the user "at ease" in regards to keeping the necessary "initialization state" in effect while that's ongoing (instead of redoing it every time new text comes in). Currently as I've mentioned, from the current code, that "initialization state" represents only those 2 statements that move All in all, I know this is debatable regarding its usefulness, so I'll leave it to your preference to decide whether it's worth doing this, I'm fine with either it's just that I tend more towards this version when doing text streaming, it's clearer for me that the library offers incremental TTS when I see those functions. |
Thank you for clarifying. I'm not sure I agree that it needs to be made more explicit that it is possible to run streaming TTS for multiple sentences one-by-one. I'll close this PR because it adds complexity to the code with no clear practical benefit. Also text-streaming or incremental TTS usually refers to streaming synthesis based on partial text input, which is not currently supported, so this terminology would only lead to more confusion. However, please let me know if there is anything that blocks your use case. |
Sure, no problem, thanks for taking the time to look over this 👍 |
Hello,
(moved the PR here, noticed the comments on the old one)
Doing TTS streaming but also with text-streaming (text coming progressively over a stream), locally.
I know
inference_stream
theoretically is enough for this case, except for the beginning part (which indeed is not so bad to be repeated but nicer would be to be able to skip it too since it's not necessary):So I've added
inference_stream_text
(maybe not the best name, let me know if you prefer another) particularly for text-streaming, e.g.:IMO this also makes for a nicer interface when doing text-streaming, I'll leave it to you to decide :)
Cheers! 🍻