-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements... #50
Comments
in support of item 2, I've begun https://github.com/scruffynerf/book2jsonofnlp |
oh? 👀 if you can figure out how to make the output not have to be like de-tokenized then that would fix a lot of issues that are hard to mess with in booknlp 👍 also sorry about the late response, been busy lately with finals and all |
progress made, still debugging...
|
Oh dang! And your not having any issues with words like "Don't" Being writtten as "Don 't"? |
fixing those spacing issues is on my todo list. Still working on it... the booknlp code is a bit crufty, so I'm both cleaning it up, learning it understand it, adding json output, and figuring out issues beyond booknlp's reach As mentioned, pre-booknlp processing (so it doesn't struggle) and post-booknlp processing (to make it better for TTS) are likely both needed. I'll try and make the json output as clean as possible though. I suspect this will iterate well though. The above text is Little Brother by Cory Doctorow, good book, but also gutenberg text, with modern and has lots of weird formatting like The Emma text used by bookNLP as an example doesn't even get parsed entirely correctly. (look early and you'll see a 3 way convo with Emma, her dad, and Mr Knightly and it's incorrect. So Rather than figure out why, I figured I'd try a different text with more modern structures. |
If we end up with a better sounding audiobook than these 4 AI voices, with actual multiple speakers, victory is ours. https://hackernoon.com/dedicated-to-borderlands-books (that contains the text section above) |
Oh yeah and about the gradio I was actually looking into turning it into a gradio it's just time consuming and I kinda forgot about it 😅😅 But for ref here's what I was getting at it a couple months ago auto styleTTS2 versionhttps://huggingface.co/spaces/drewThomasson/Auto-VoxNovel-Demo-StyleTTS testing how to make the character selections in gradiohttps://huggingface.co/spaces/drewThomasson/Dynamic-Gradio-Dropdowns headless voxnovel gradio test spacehttps://huggingface.co/spaces/drewThomasson/Headless-VoxNovel-Demo-testing_grounds xtts auto VoxNovel testing spacehttps://huggingface.co/spaces/drewThomasson/Headless-VoxNovel-Demo |
I was looking at slapping them onto ebook2audiobook as an extra beta feature Or at least getting these out to replace the crappy docker images of VoxNovel But it got complex and to be honest VoxNovel was not nearly as popular as I thought it was So mostly I was throwing my time into ebook2audiobook V2.0 |
I think a good chunk of those links are fully functional tho But like The fine controls and such Yeah lol Anyway hope that helps you out in some way with their codes |
Yeah, I found the many different programs a bit confusing.... unsure which is which (ie your efforts, adding features, etc). For example... take a Doctor Who novel that Big Finish hasn't (yet) adapted, and give it a few distinct well known voice sample wavs and suddenly it's a full audio experience. Then you take the above style json with some extra tweaks (location, etc which booknlp can do), and suddenly it's an audio track for a video script, with lip sync-ed voices, moving images, and so on... and that's just one example. With the rapid AI video development, music and so on... having a decent book->json breakdown just makes one more potential resource to connect in. Retheming? Rewriting? Recasting? etc.. |
Well yeah I wanted to eventually have a local LLM also go through and change how things are said depending on the context surrounding them, So like have a LLM prompt other audio generation models to generate background sounds when a scene is described in the book Or have it change the emotion in how things are said through stuff like facebooks spirit lm And such till we basically get a radio show out of a book generated locally |
That was my ultimate goal 😅😓 |
"Hey AI, take my favorite book, parse it, retheme it as a space western, add some musical soundtrack in the background in the style of Morricone meets Space Opera (NOT https://www.youtube.com/watch?v=YXJiIqJ9_tQ which is awful...), use my voice cast favorites, and give me some visual samples of outfits and crew to decide on..." real video (not AI) https://www.youtube.com/watch?v=4SpX8bVEmJo |
Ok yeah making it into a video locally tho that'll probs take the next 5-10 years but yes 😭 |
At least we have the same kind of goals in mind for this |
Nah, we're almost at realtime video... Suno/Udio is doing 3+ minute songs, static images are getting higher quality and faster every few months, and video models are already lightyears better than a year ago. But text->audio is totally doable right now, and it'll be easy enough to adapt to do video stuff next. (I do a lot with ComfyUI already, and that's also on my todo list, to make booknlp work with ComfyUI and generate images. |
progress:
so there is a bit of cleanup left to do... the also: "speaker_name": "One With Purple", yeah, it is: original text: also, the "she said."s could be removed, IF the voices are now distinct... there are arguments both ways (text accurate, versus Audio cleanup)... obviously only the bare "she said" by narrator, and not "she said, warily, looking him over" sort of stuff. That could be an option to 'hide' those and not generate them. |
lol yeah your running into the same issues I ran into I ended up doing a bunch of manual reformatting Should be around the top area with in the BOOKNLP part of my code You can probs pass it through chatgpt to pull out the parts you want It's a mess 😅😭😓 |
https://github.com/scruffynerf/book2jsonofnlp has the code for above still using booknlp for actual python name, so no code changes needed externally.. Happy for fresh eyeballs. Still in progress, but this should help if you want to start using this. |
Which issues?
as I said, small substitutions are to be expected... What sort of manual reformatting?
not sure what you mean? Beyond the number stuff? Reassigning speakers would now be ultra easy, thanks to the json... the character list is there, the ids are there (the names are likely to be removed/ignored, especially if we alter...) In the above case, your current gui lets us reassign "Purple" back to Narrator. The 'improvement' would be search/select all/etc. (all of which should be easier with json-ed info) |
Honestly probs just gona rebuilt the whole thing at this point Like I'll say it my code for VoxNovel is garbage Idk how it's even functioning XD |
to be clear, regardless of voxnovel or whatever the next gen is, or if you roll it into ebook2audiobook... I'm doing the book->json cause that's the key piece missing for all of this (for whomever wants to do better multi-voice TTS) |
Yes yes yes this will be very helpful in any direction I go or anyone else goes with this |
did you and Robert start a discord for this stuff? |
No but we probs should lol Cause the next ebook2audiobook will have 1107 languages.... so that'll send a lota people running at our work ._. |
Here I'll rush one out but be warned I've never hosted a server lol |
Join Our Discord Server!Click the badge below to join the Ebook2audiobook Discord Server! |
So in reviewing your code and playing with this, I think massive improvements can be made with the following changes:
use the new Auralis tts. It's so much faster. Night and Day...
What took VoxNovel quite a while for a very short epub (6 or so pages), I had in mere seconds with https://github.com/JohnZolton/Fast-Audiobook which is a super simple implementation
booknlp, for some ungodly reason, never outputs the text in a reasonable json format. So you're forced to parse the ugly html... let's fix that, and solve it with a strike at the root: fork booknlp to add a json output of the book text, not the html mess it generates which you have to reverse engineer, with just a speaker attribute, so it's all just a clean set of ordered lines to process, each with a speaker.
(The speakers would be in listed in json also, the above is just to be more human readable as example.
{speakerid: 0, name: "Narrator"}
would be better as an index and speaker:0 above.Simple list(s) of substitutions (likely some regex as well, for number handling, for example, but other cases, like weird -- issues which can be seen in https://github.com/booknlp/booknlp/tree/main/examples/158_emma ) would allow people to customize and solve once, and even share.
so this would be easy to fix:
Again, it all being in json helps here for all of these examples.
https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS (which again, isn't as complex and is single speaker focused) is SO nice... I'd love to see something like this with VoxNovel.
Let me see a list of speakers and drop and drag voices, preview, etc. Let me preview each line and make the tweaks above, etc...
Happy to help implement some of these, just let me know.
The text was updated successfully, but these errors were encountered: