An audio summarizer that glues together faster-whisper and BART.
Only English summarization is supported.
- Python 3 (tested: 3.12)
Create a virtual environment for python, activate it and install the required python packages:
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt
- In your terminal, make shure you have your python venv activated
- Run audio-summarize.py
./audio-summarize.py -i filepath -o filepath [-m name]
[--summin n] [--summax n] [--segmax n]
options:
-h, --help show this help message and exit
--summin n The minimum lenght of a segment summary [10] (min: 5)
--summax n The maximum lenght of a segment summary [90] (min: 5)
--segmax n The maximum number of tokens per segment [375] (5 - 500)
-m name The name of the whisper model to be used [small.en]
-i filepath The path to the media file
-o filepath Where to save the output text to
Example:
./audio-summarize.py -i ./tmp/test.webm -o ./tmp/output.txt
To summarize a media file, the program executes the following steps:
- Convert and transcribe the media file using faster-whisper, using ffmpeg and ctranslate2 under the hood
- Semantically split up the transcript into segments using semantic-text-splitter and the tokenizer for BART
- Summarize each segment using BART (
facebook/bart-large-cnn
) - Write the results to a text file