Improvements... #50

scruffynerf · 2024-12-09T19:13:31Z

So in reviewing your code and playing with this, I think massive improvements can be made with the following changes:

use the new Auralis tts. It's so much faster. Night and Day...
What took VoxNovel quite a while for a very short epub (6 or so pages), I had in mere seconds with https://github.com/JohnZolton/Fast-Audiobook which is a super simple implementation
booknlp, for some ungodly reason, never outputs the text in a reasonable json format. So you're forced to parse the ugly html... let's fix that, and solve it with a strike at the root: fork booknlp to add a json output of the book text, not the html mess it generates which you have to reverse engineer, with just a speaker attribute, so it's all just a clean set of ordered lines to process, each with a speaker.

[
...
{line: 783, text:"I really like you", speaker: "Jane"},
{line: 784, text:"said Jane, smiling shyly,", speaker: "Narrator"},
{line: 785, text:"but... I have to say No.", speaker: "Jane"},
...
]

(The speakers would be in listed in json also, the above is just to be more human readable as example.
{speakerid: 0, name: "Narrator"} would be better as an index and speaker:0 above.

There needs to be pre-booknlp text processing, and then post-booknlp text processing. That will solve the number issues, the punctuation, the mispronounced words, weird timing/spacing issues, and more.

Simple list(s) of substitutions (likely some regex as well, for number handling, for example, but other cases, like weird -- issues which can be seen in https://github.com/booknlp/booknlp/tree/main/examples/158_emma ) would allow people to customize and solve once, and even share.

Once a line is generated, if the line isn't right, and it can fixed by tweaking the text or voice, that change can be rolled into the list(s) above, or one-timed on the fly, once generated so we regen it and solve it everywhere.
so this would be easy to fix:

Oh, it mispronounced this, let's tweak that spelling so it sounds right....
[This change will affect other 17 lines, Y/N? Regenning 17 lines] [Add this to the list of TTS rewrites? Y/N]
Oh that voice just isn't right, let's change that to a different voice sample....
[This change will regen 38 lines Y/N?]
The default voice sample doesn't quite work here, but I like it in 99% of her speaking... let's just tweak THIS like with a new voice sample that sounds a bit more scared/anxious to get the tone right...
[selected ScaredJane.wav, added as new Speaker "JaneAnxious". Line regened]

Again, it all being in json helps here for all of these examples.

UI improvements... Is there a reason you're not using Gradio/etc?
https://github.com/quantumlump/eBook_to_Audiobook_with_F5-TTS (which again, isn't as complex and is single speaker focused) is SO nice... I'd love to see something like this with VoxNovel.
Let me see a list of speakers and drop and drag voices, preview, etc. Let me preview each line and make the tweaks above, etc...

Happy to help implement some of these, just let me know.

The text was updated successfully, but these errors were encountered:

scruffynerf · 2024-12-11T02:25:25Z

in support of item 2, I've begun https://github.com/scruffynerf/book2jsonofnlp

DrewThomasson · 2024-12-11T03:45:44Z

oh? 👀

if you can figure out how to make the output not have to be like de-tokenized then that would fix a lot of issues that are hard to mess with in booknlp

👍

also sorry about the late response, been busy lately with finals and all

scruffynerf · 2024-12-13T05:04:04Z

progress made, still debugging...
example output:

        {
            "text": "Vanessa took off her jean jacket and then pulled off the cotton hoodie she was wearing underneath it. She wadded it up and pressed it to Darryl 's side.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 409
        },
        {
            "text": "Take his head,",
            "speaker_id": 376,
            "speaker_name": "Van",
            "index": 410
        },
        {
            "text": "she said to me.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 411
        },
        {
            "text": "Keep it elevated.",
            "speaker_id": 376,
            "speaker_name": "Van",
            "index": 412
        },
        {
            "text": "To Jolu she said,",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 413
        },
        {
            "text": "Get his feet up -- roll up your coat or something.",
            "speaker_id": 376,
            "speaker_name": "Van",
            "index": 414
        },
        {
            "text": "Jolu moved quickly. Vanessa 's mother is a nurse and she 'd had first aid training every summer at camp. She loved to watch people in movies get their first aid wrong and make fun of them.
 I was so glad to have her with us.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 415
        },

DrewThomasson · 2024-12-13T05:07:21Z

Oh dang!

And your not having any issues with words like

"Don't"

Being writtten as

"Don 't"?

scruffynerf · 2024-12-13T05:14:18Z

fixing those spacing issues is on my todo list. Still working on it... the booknlp code is a bit crufty, so I'm both cleaning it up, learning it understand it, adding json output, and figuring out issues beyond booknlp's reach

As mentioned, pre-booknlp processing (so it doesn't struggle) and post-booknlp processing (to make it better for TTS) are likely both needed. I'll try and make the json output as clean as possible though. I suspect this will iterate well though. The above text is Little Brother by Cory Doctorow, good book, but also gutenberg text, with modern and has lots of weird formatting like > texting and * shouting * which probably are good test cases to solve.

The Emma text used by bookNLP as an example doesn't even get parsed entirely correctly. (look early and you'll see a 3 way convo with Emma, her dad, and Mr Knightly and it's incorrect. So Rather than figure out why, I figured I'd try a different text with more modern structures.

scruffynerf · 2024-12-13T05:23:04Z

If we end up with a better sounding audiobook than these 4 AI voices, with actual multiple speakers, victory is ours.

https://hackernoon.com/dedicated-to-borderlands-books

(that contains the text section above)

DrewThomasson · 2024-12-13T05:31:25Z

Oh yeah and about the gradio

I was actually looking into turning it into a gradio it's just time consuming and I kinda forgot about it

😅😅

But for ref here's what I was getting at it a couple months ago

auto styleTTS2 version

https://huggingface.co/spaces/drewThomasson/Auto-VoxNovel-Demo-StyleTTS

testing how to make the character selections in gradio

https://huggingface.co/spaces/drewThomasson/Dynamic-Gradio-Dropdowns

headless voxnovel gradio test space

https://huggingface.co/spaces/drewThomasson/Headless-VoxNovel-Demo-testing_grounds

xtts auto VoxNovel testing space

https://huggingface.co/spaces/drewThomasson/Headless-VoxNovel-Demo

DrewThomasson · 2024-12-13T05:34:08Z

I was looking at slapping them onto ebook2audiobook as an extra beta feature

Or at least getting these out to replace the crappy docker images of VoxNovel

But it got complex and to be honest VoxNovel was not nearly as popular as I thought it was

So mostly I was throwing my time into ebook2audiobook V2.0

DrewThomasson · 2024-12-13T05:34:55Z

I think a good chunk of those links are fully functional tho

But like

The fine controls and such

Yeah lol

Anyway hope that helps you out in some way with their codes

scruffynerf · 2024-12-13T05:41:33Z

Yeah, I found the many different programs a bit confusing.... unsure which is which (ie your efforts, adding features, etc).
There are lots of ebook->audio programs that do a single narrator... that space is crowded. The 'cast recording' far more open, and actually more useful.

For example... take a Doctor Who novel that Big Finish hasn't (yet) adapted, and give it a few distinct well known voice sample wavs and suddenly it's a full audio experience. Then you take the above style json with some extra tweaks (location, etc which booknlp can do), and suddenly it's an audio track for a video script, with lip sync-ed voices, moving images, and so on... and that's just one example. With the rapid AI video development, music and so on... having a decent book->json breakdown just makes one more potential resource to connect in. Retheming? Rewriting? Recasting? etc..

DrewThomasson · 2024-12-13T05:46:25Z

Well yeah I wanted to eventually have a local LLM also go through and change how things are said depending on the context surrounding them,

So like have a LLM prompt other audio generation models to generate background sounds when a scene is described in the book

Or have it change the emotion in how things are said through stuff like facebooks spirit lm

And such till we basically get a radio show out of a book generated locally

DrewThomasson · 2024-12-13T05:46:40Z

That was my ultimate goal 😅😓

scruffynerf · 2024-12-13T05:54:23Z

"Hey AI, take my favorite book, parse it, retheme it as a space western, add some musical soundtrack in the background in the style of Morricone meets Space Opera (NOT https://www.youtube.com/watch?v=YXJiIqJ9_tQ which is awful...), use my voice cast favorites, and give me some visual samples of outfits and crew to decide on..."
"Ok, these 5 picks look good, now make it into a 3 hour video I can watch this evening."

real video (not AI) https://www.youtube.com/watch?v=4SpX8bVEmJo
but seriously, we can do THIS today now.

DrewThomasson · 2024-12-13T05:56:11Z

Ok yeah making it into a video locally tho that'll probs take the next 5-10 years but yes 😭

DrewThomasson · 2024-12-13T05:57:25Z

At least we have the same kind of goals in mind for this

scruffynerf · 2024-12-13T06:00:19Z

Ok yeah making it into a video locally tho that'll probs take the next 5-10 years but yes 😭

Nah, we're almost at realtime video... Suno/Udio is doing 3+ minute songs, static images are getting higher quality and faster every few months, and video models are already lightyears better than a year ago.

But text->audio is totally doable right now, and it'll be easy enough to adapt to do video stuff next. (I do a lot with ComfyUI already, and that's also on my todo list, to make booknlp work with ComfyUI and generate images.

scruffynerf · 2024-12-15T17:13:17Z

progress:

         {
            "text": "She held up a camera and snapped a picture of me and my crew.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 232
        },
        {
            "text": "Cheese,",
            "speaker_id": 938,
            "speaker_name": "Another Kid My Age",
            "index": 233
        },
        {
            "text": "she said.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 234
        },
        {
            "text": "You're on candid snitch - cam.",
            "speaker_id": 938,
            "speaker_name": "Another Kid My Age",
            "index": 235
        },
        {
            "text": "No way,\"I said.\"You would n't --",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 236
        },
        {
            "text": "I will,",
            "speaker_id": 938,
            "speaker_name": "Another Kid My Age",
            "index": 237
        },
        {
            "text": "she said.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 238
        },
        {
            "text": "I will send this photo to truant watch in thirty seconds unless you four back off from this clue and let me and my friends here run it down. You can come back in one hour and it'll be all yours. I think that's more than fair.",
            "speaker_id": 938,
            "speaker_name": "Another Kid My Age",
            "index": 239
        },
        {
            "text": "I looked behind her and noticed three other girls in similar garb -- one with blue hair, one with green, and one with purple.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 240
        },
        {
            "text": "Who are you supposed to be, the Popsicle Squad?",
            "speaker_id": 944,
            "speaker_name": "One With Purple",
            "index": 241
        },
        {
            "text": "We're the team that's going to kick your team's ass at Harajuku Fun Madness,",
            "speaker_id": 938,
            "speaker_name": "Another Kid My Age",
            "index": 242
        },
        {
            "text": "she said.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 243
        },
        {
            "text": "And I'm the one who's * right this second * about to upload your photo and get you in * so much trouble * --",
            "speaker_id": 938,
            "speaker_name": "Another Kid My Age",
            "index": 244
        },
        {
            "text": "Behind me I felt Van start forward. Her all - girls school was notorious for its brawls, and I was pretty sure she was ready to knock this chick's block off.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 245
        },
        {
            "text": "Then the world changed forever.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 246
        },
        {
            "text": "We felt it first, that sickening lurch of the cement under your feet that every Californian knows instinctively -- * earthquake *. My first inclination, as always, was to get away :\"when in trouble or in doubt, run in circles, scream and shout.\"But the fact was, we were already in the safest place we could be, not in a building that could fall in on us, not out toward the middle of the road where bits of falling cornice could brain us.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 247
        },
        {
            "text": "Earthquakes are eerily quiet -- at first, anyway -- but this was n't quiet. This was loud, an incredible roaring sound that was louder than anything I'd ever heard before. The sound was so punishing it drove me to my knees, and I was n't the only one. Darryl shook my arm and pointed over the buildings and we saw it then : a huge black cloud rising from the northeast, from the direction of the Bay.",
            "speaker_id": 0,
            "speaker_name": "Narrator",
            "index": 248
        },

so there is a bit of cleanup left to do... the was n't the stray "s (Unsure how to best handle this? Split always on "s?
The " - " stuff can likely be stripped to remove the spaces and combine words with the dash.
How to handle text (or text) to make the TTS do some sort of emphasize? Maybe split on those, and we can make it alter the voice params for those slightly? Or just figure out how to tell the TTS to do that?

also: "speaker_name": "One With Purple",
I think that's Narrator misattributed... so we'd still want a way to find stray lines and reattribute, and do so in bulk.

yeah, it is: original text:
I looked behind her and noticed three other girls in similar garb -- one with blue hair, one with green, and one with purple. "Who are you supposed to be, the Popsicle Squad?"

also, the "she said."s could be removed, IF the voices are now distinct... there are arguments both ways (text accurate, versus Audio cleanup)... obviously only the bare "she said" by narrator, and not "she said, warily, looking him over" sort of stuff. That could be an option to 'hide' those and not generate them.

DrewThomasson · 2024-12-15T17:33:57Z

lol yeah your running into the same issues I ran into

I ended up doing a bunch of manual reformatting

Should be around the top area with in the BOOKNLP part of my code

You can probs pass it through chatgpt to pull out the parts you want

It's a mess 😅😭😓

scruffynerf · 2024-12-15T18:30:30Z

https://github.com/scruffynerf/book2jsonofnlp has the code for above

still using booknlp for actual python name, so no code changes needed externally..
just uninstall the current booknlp with pip then
pip install git+https://github.com/scruffynerf/book2jsonofnlp
and it should work. New file created is .book.json

Happy for fresh eyeballs. Still in progress, but this should help if you want to start using this.

scruffynerf · 2024-12-15T18:36:34Z

lol yeah your running into the same issues I ran into

Which issues?

I ended up doing a bunch of manual reformatting

as I said, small substitutions are to be expected... What sort of manual reformatting?

Should be around the top area with in the BOOKNLP part of my code

not sure what you mean? Beyond the number stuff?

Reassigning speakers would now be ultra easy, thanks to the json... the character list is there, the ids are there (the names are likely to be removed/ignored, especially if we alter...)

In the above case, your current gui lets us reassign "Purple" back to Narrator. The 'improvement' would be search/select all/etc. (all of which should be easier with json-ed info)

DrewThomasson · 2024-12-15T18:38:35Z

Honestly probs just gona rebuilt the whole thing at this point

Like

I'll say it my code for VoxNovel is garbage

Idk how it's even functioning XD

scruffynerf · 2024-12-15T18:40:19Z

to be clear, regardless of voxnovel or whatever the next gen is, or if you roll it into ebook2audiobook...

I'm doing the book->json cause that's the key piece missing for all of this (for whomever wants to do better multi-voice TTS)

DrewThomasson · 2024-12-15T18:41:20Z

Yes yes yes this will be very helpful in any direction I go or anyone else goes with this

scruffynerf · 2024-12-15T18:43:21Z

did you and Robert start a discord for this stuff?

DrewThomasson · 2024-12-15T18:44:39Z

No but we probs should lol

Cause the next ebook2audiobook will have 1107 languages.... so that'll send a lota people running at our work ._.

DrewThomasson · 2024-12-15T18:45:15Z

Here I'll rush one out but be warned I've never hosted a server lol

DrewThomasson · 2024-12-15T20:13:12Z

Join Our Discord Server!

Click the badge below to join the Ebook2audiobook Discord Server!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements... #50

Improvements... #50

scruffynerf commented Dec 9, 2024

scruffynerf commented Dec 11, 2024

DrewThomasson commented Dec 11, 2024

scruffynerf commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024

scruffynerf commented Dec 13, 2024 •

edited

Loading

scruffynerf commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024 •

edited

Loading

DrewThomasson commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024 •

edited

Loading

scruffynerf commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024 •

edited

Loading

scruffynerf commented Dec 13, 2024 •

edited

Loading

DrewThomasson commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024

scruffynerf commented Dec 13, 2024

scruffynerf commented Dec 15, 2024 •

edited

Loading

DrewThomasson commented Dec 15, 2024

scruffynerf commented Dec 15, 2024

scruffynerf commented Dec 15, 2024 •

edited

Loading

DrewThomasson commented Dec 15, 2024

scruffynerf commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024 •

edited

Loading

scruffynerf commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024

Improvements... #50

Improvements... #50

Comments

scruffynerf commented Dec 9, 2024

scruffynerf commented Dec 11, 2024

DrewThomasson commented Dec 11, 2024

scruffynerf commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024

scruffynerf commented Dec 13, 2024 • edited Loading

scruffynerf commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024 • edited Loading

auto styleTTS2 version

testing how to make the character selections in gradio

headless voxnovel gradio test space

xtts auto VoxNovel testing space

DrewThomasson commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024 • edited Loading

scruffynerf commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024 • edited Loading

scruffynerf commented Dec 13, 2024 • edited Loading

DrewThomasson commented Dec 13, 2024

DrewThomasson commented Dec 13, 2024

scruffynerf commented Dec 13, 2024

scruffynerf commented Dec 15, 2024 • edited Loading

DrewThomasson commented Dec 15, 2024

scruffynerf commented Dec 15, 2024

scruffynerf commented Dec 15, 2024 • edited Loading

DrewThomasson commented Dec 15, 2024

scruffynerf commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024 • edited Loading

scruffynerf commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024

DrewThomasson commented Dec 15, 2024

Join Our Discord Server!

scruffynerf commented Dec 13, 2024 •

edited

Loading

DrewThomasson commented Dec 13, 2024 •

edited

Loading

DrewThomasson commented Dec 13, 2024 •

edited

Loading

DrewThomasson commented Dec 13, 2024 •

edited

Loading

scruffynerf commented Dec 13, 2024 •

edited

Loading

scruffynerf commented Dec 15, 2024 •

edited

Loading

scruffynerf commented Dec 15, 2024 •

edited

Loading

DrewThomasson commented Dec 15, 2024 •

edited

Loading