[QUESTION] How to use live ASR? #349

FerLuisxd · 2025-01-13T00:56:34Z

Question or Issue

Is there a way to transcribe in real time using whisper v3 large turbo?

OS

No response

Python Version

No response

Nexa SDK Version

No response

GPU (if using one)

No response

DanieleMorotti · 2025-01-13T12:00:16Z

I see that they implemented some new features for audio models such as streaming and live transcribe. But at the moment I don't see an easy way to use it.

If you want, you can try this code, that worked for me:

import sounddevice as sd
import numpy as np
import signal


class MicrophoneTranscriber:
    def __init__(self, model_instance):
        self.model = model_instance
        self.is_recording = False
        
        # Audio parameters
        self.channels = 1
        self.rate = 16000  # Must match the model's expected sample rate
        self.chunk = 1024
        
        # Setup signal handler for graceful shutdown
        signal.signal(signal.SIGINT, self.signal_handler)
        
    def signal_handler(self, signum, frame):
        print("\nStopping transcription...")
        self.stop_transcription()
        
    def audio_callback(self, indata, frames, time, status):
        if status:
            print(f"Status: {status}")
        if self.is_recording:
            # Convert from int16 to float32 and process directly
            audio_data = indata[:, 0].astype(np.float32)
            self.model.insert_audio_chunk(audio_data)
            
            # Get transcription for the current buffer
            start, end, text = self.model.process_iter()
            if text:
                print(f"\rTranscription: {text}", end='', flush=True)
            
    def start_transcription(self):
        self.is_recording = True
        
        # Start audio stream
        self.stream = sd.InputStream(
            channels=self.channels,
            samplerate=self.rate,
            blocksize=self.chunk,
            callback=self.audio_callback,
            dtype=np.float32
        )
        self.stream.start()
        
        print("Recording... Press Ctrl+C to stop")
        
        try:
            # Keep the main thread alive
            while self.is_recording:
                sd.sleep(100)  # Use sounddevice's sleep function
        except KeyboardInterrupt:
            self.stop_transcription()
    
    def stop_transcription(self):
        self.is_recording = False
        
        # Stop and close the audio stream
        if hasattr(self, 'stream'):
            self.stream.stop()
            self.stream.close()
        
        # Get final transcription
        start, end, text = self.model.finish()
        if text:
            print(f"\nFinal transcription: {text}")

def run_microphone_transcription(model_path=None, local_path=None, **kwargs):
    """Run real-time microphone transcription using the specified model."""
    from nexa.gguf import NexaVoiceInference
    
    # Initialize the model
    inference = NexaVoiceInference(
        model_path=model_path,
        local_path=local_path,
        **kwargs
    )
    
    # Create and start the microphone transcriber
    transcriber = MicrophoneTranscriber(inference)
    transcriber.start_transcription()


if __name__ == "__main__":
    # Example usage
    run_microphone_transcription(
        model_path="faster-whisper-tiny",  # Replace with your preferred model
        language="en",
        beam_size=5
    )

You need to install sounddevice and the latest version of the package (cloning this repo and following the instructions for local build), given that the package installed with pip has not been updated yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] How to use live ASR? #349

[QUESTION] How to use live ASR? #349

FerLuisxd commented Jan 13, 2025

DanieleMorotti commented Jan 13, 2025

[QUESTION] How to use live ASR? #349

[QUESTION] How to use live ASR? #349

Comments

FerLuisxd commented Jan 13, 2025

Question or Issue

OS

Python Version

Nexa SDK Version

GPU (if using one)

DanieleMorotti commented Jan 13, 2025