Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] How to use live ASR? #349

Open
FerLuisxd opened this issue Jan 13, 2025 · 1 comment
Open

[QUESTION] How to use live ASR? #349

FerLuisxd opened this issue Jan 13, 2025 · 1 comment

Comments

@FerLuisxd
Copy link

Question or Issue

Is there a way to transcribe in real time using whisper v3 large turbo?

OS

No response

Python Version

No response

Nexa SDK Version

No response

GPU (if using one)

No response

@DanieleMorotti
Copy link

I see that they implemented some new features for audio models such as streaming and live transcribe. But at the moment I don't see an easy way to use it.

If you want, you can try this code, that worked for me:

import sounddevice as sd
import numpy as np
import signal


class MicrophoneTranscriber:
    def __init__(self, model_instance):
        self.model = model_instance
        self.is_recording = False
        
        # Audio parameters
        self.channels = 1
        self.rate = 16000  # Must match the model's expected sample rate
        self.chunk = 1024
        
        # Setup signal handler for graceful shutdown
        signal.signal(signal.SIGINT, self.signal_handler)
        
    def signal_handler(self, signum, frame):
        print("\nStopping transcription...")
        self.stop_transcription()
        
    def audio_callback(self, indata, frames, time, status):
        if status:
            print(f"Status: {status}")
        if self.is_recording:
            # Convert from int16 to float32 and process directly
            audio_data = indata[:, 0].astype(np.float32)
            self.model.insert_audio_chunk(audio_data)
            
            # Get transcription for the current buffer
            start, end, text = self.model.process_iter()
            if text:
                print(f"\rTranscription: {text}", end='', flush=True)
            
    def start_transcription(self):
        self.is_recording = True
        
        # Start audio stream
        self.stream = sd.InputStream(
            channels=self.channels,
            samplerate=self.rate,
            blocksize=self.chunk,
            callback=self.audio_callback,
            dtype=np.float32
        )
        self.stream.start()
        
        print("Recording... Press Ctrl+C to stop")
        
        try:
            # Keep the main thread alive
            while self.is_recording:
                sd.sleep(100)  # Use sounddevice's sleep function
        except KeyboardInterrupt:
            self.stop_transcription()
    
    def stop_transcription(self):
        self.is_recording = False
        
        # Stop and close the audio stream
        if hasattr(self, 'stream'):
            self.stream.stop()
            self.stream.close()
        
        # Get final transcription
        start, end, text = self.model.finish()
        if text:
            print(f"\nFinal transcription: {text}")

def run_microphone_transcription(model_path=None, local_path=None, **kwargs):
    """Run real-time microphone transcription using the specified model."""
    from nexa.gguf import NexaVoiceInference
    
    # Initialize the model
    inference = NexaVoiceInference(
        model_path=model_path,
        local_path=local_path,
        **kwargs
    )
    
    # Create and start the microphone transcriber
    transcriber = MicrophoneTranscriber(inference)
    transcriber.start_transcription()


if __name__ == "__main__":
    # Example usage
    run_microphone_transcription(
        model_path="faster-whisper-tiny",  # Replace with your preferred model
        language="en",
        beam_size=5
    )

You need to install sounddevice and the latest version of the package (cloning this repo and following the instructions for local build), given that the package installed with pip has not been updated yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants