Hour plus long transcription #946
Replies: 4 comments 11 replies
-
@dilacerated one option useful to you may be to use live transcription https://chidiwilliams.github.io/buzz/docs/usage/live_recording To position subtitles on a video screen you can use OBS studio. Buzz has option to export live transcripts as they get transcribed to a text file that you can use as a text input source in OBS. I have used such setup for live transcription of conference presentations. Second option is to try Whisper large model. Looks like your GPU has 8GB of VRAM. If the large models fail you have an option to use Whisper.cpp, that will use CPU. It will be able to run large model but without a GPU it may take several hours to transcribe. Can test if the large model gets better quality. Third option is to wait a bit, I'll explore longer video transcriptions. There may be something we can add or alter in the Buzz to improve how the long transcripts are handled. |
Beta Was this translation helpful? Give feedback.
-
GPU option with 1.2.0 does the job nice and fast. Still some quirks. https://www.youtube.com/watch?v=cDgVxNSO3fQ Take the above video for example. Whisper Medium at the beginning:
Large says:
Looks like Whisper Medium did a far better job with this video but suffers from some subtitles appearing, during points with no dialog, on screen long before the words are spoken. On to the CPU option with the same video and models using VB-Cable. |
Beta Was this translation helpful? Give feedback.
-
@dilacerated Please see #955 for update on a feature that can improve subtitle quality of long audio files. Did a test with audio from the video link above with:
Result was very accurate subtitles with no text when no one is speaking and timings seemed quite correct. Built in voice separation may come in some future Buzz version. |
Beta Was this translation helpful? Give feedback.
-
@dilacerated In the very latest development version here https://github.com/chidiwilliams/buzz/actions/workflows/ci.yml?query=branch%3Amain a new feature to extract speech was added. This will separate speech from any background noises and should make transcription accuracy better. For highest quality try to combine speech extraction with subtitle generation from word level timestamp transcripts. |
Beta Was this translation helpful? Give feedback.
-
Hey all,
My wife suffered a brain injury in a car accident several years ago and she struggles to understand dialog without subtitles being there. To make things more complicated she loves British programming which is harder for her due to accents (one actor or actress she may be fully unable to understand clearly).
We have old hard copies of many programs from before her accident that I'd like to make subtitles for but I have noticed that with short videos Buzz creates spot on subtitles using Whisper -> Medium where long videos, using any Whisper model, have problems (subtitles where no speech is going on, sync issues, etc.)...
Can anyone recommend settings for me to try out?
Ryzen 9 3900X
NVIDIA 3070
32GB DDR4-3600
Beta Was this translation helpful? Give feedback.
All reactions