- Step 0: Download audio files from RTHK
- Step 1: Split audio files into smaller chunks
- Step 2: Source separation
- Step 3: Voice enhancement
- Step 4: Transcribe audio files
- Step 4.1: Transcribe audio files using SenseVoiceSmall with LID
- Step 4.2: Transcribe audio files using Whisper V3
- Step 4.23: Transcribe audio files using Cantonese Whisper V2
- Step 5: Transcription Post-processing
pip install -r requirements.txt
# Download audio file and convert to 16kHz, at this stage, it would create a folder `audios` for original audio files and `audios_16k` for 16kHz audio files
python step-0.py
# Source separation, remove background music
python step-1.py --audio_root_path audios_16k
# Split audio files into smaller chunks
python step-2.py --audio_root_path chunks
TODO...