-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] How to use live ASR? #349
Comments
I see that they implemented some new features for audio models such as streaming and live transcribe. But at the moment I don't see an easy way to use it. If you want, you can try this code, that worked for me: import sounddevice as sd
import numpy as np
import signal
class MicrophoneTranscriber:
def __init__(self, model_instance):
self.model = model_instance
self.is_recording = False
# Audio parameters
self.channels = 1
self.rate = 16000 # Must match the model's expected sample rate
self.chunk = 1024
# Setup signal handler for graceful shutdown
signal.signal(signal.SIGINT, self.signal_handler)
def signal_handler(self, signum, frame):
print("\nStopping transcription...")
self.stop_transcription()
def audio_callback(self, indata, frames, time, status):
if status:
print(f"Status: {status}")
if self.is_recording:
# Convert from int16 to float32 and process directly
audio_data = indata[:, 0].astype(np.float32)
self.model.insert_audio_chunk(audio_data)
# Get transcription for the current buffer
start, end, text = self.model.process_iter()
if text:
print(f"\rTranscription: {text}", end='', flush=True)
def start_transcription(self):
self.is_recording = True
# Start audio stream
self.stream = sd.InputStream(
channels=self.channels,
samplerate=self.rate,
blocksize=self.chunk,
callback=self.audio_callback,
dtype=np.float32
)
self.stream.start()
print("Recording... Press Ctrl+C to stop")
try:
# Keep the main thread alive
while self.is_recording:
sd.sleep(100) # Use sounddevice's sleep function
except KeyboardInterrupt:
self.stop_transcription()
def stop_transcription(self):
self.is_recording = False
# Stop and close the audio stream
if hasattr(self, 'stream'):
self.stream.stop()
self.stream.close()
# Get final transcription
start, end, text = self.model.finish()
if text:
print(f"\nFinal transcription: {text}")
def run_microphone_transcription(model_path=None, local_path=None, **kwargs):
"""Run real-time microphone transcription using the specified model."""
from nexa.gguf import NexaVoiceInference
# Initialize the model
inference = NexaVoiceInference(
model_path=model_path,
local_path=local_path,
**kwargs
)
# Create and start the microphone transcriber
transcriber = MicrophoneTranscriber(inference)
transcriber.start_transcription()
if __name__ == "__main__":
# Example usage
run_microphone_transcription(
model_path="faster-whisper-tiny", # Replace with your preferred model
language="en",
beam_size=5
) You need to install |
Question or Issue
Is there a way to transcribe in real time using whisper v3 large turbo?
OS
No response
Python Version
No response
Nexa SDK Version
No response
GPU (if using one)
No response
The text was updated successfully, but these errors were encountered: