This notebook transcribes YouTube videos using WhisperX
- pytubefix - For downloading YouTube videos
- WhisperX - For high-accuracy transcription (https://github.com/m-bain/whisperX)
- If using CUDA 11.x: Install ctranslate2==3.24.0
- Install numpy<2 if required
- Requires GPU with CUDA support
-
Install WhisperX: Please refer to https://github.com/m-bain/whisperX
-
Install other dependencies:
pip install -r requirements.txt
-
Create a .env file with your API keys:
OPENAI_API_KEY=your_openai_key HF_AUTH_TOKEN=your_huggingface_token
# For YouTube videos
python transcribe_video.py --youtube https://www.youtube.com/watch?v=ZEyPHhBKgJ4
# For local audio files
python transcribe_video.py --audio path/to/audio.mp3
# transcribe-only
python transcribe_video.py --audio path/to/audio.mp3 --transcribe-only
# For summarize text, will use general summary prompt combine summary + QA
python summarize.py "script.txt/.md"
# For summarize earning call
python summarize.py "script.txt/.md" --transcript_type earning_transcript
magic-pdf -p "path_to_pdf.pdf" -o "out_put_path" -m auto