Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement streaming audio Websocket #30

Merged
merged 14 commits into from
Oct 9, 2024
Merged

Conversation

NeonDaniel
Copy link
Member

@NeonDaniel NeonDaniel commented Oct 4, 2024

Description

Implements a WS API for streaming raw audio (input and output) with a per-client listener running in the backend

Issues

Other Notes

  • Consider implementing streaming response audio

    Implemented. Stream socket handles chunked audio input and sends wav audio segments, all as bytes.

  • Consider WW config and included plugins

    Global configuration will be used (at least initially). Plugins will match neon-speech defaults.

  • Consider configuring max allowed clients to prevent overloading the backend server with listener instances

    Implemented in configuration with unit test coverage

Outline handling of client audio stream
Lazy init streaming when clients connect to the endpoint
TODO note client cleanup upon disconnection
…dio support

Update mocked methods for compat with dinkum 0.1.0+
Add websocket dependencies for streaming client
Add apt dependencies to Dockerfile for Python module builds
Separate streaming dependencies from basic WS
Refactor streaming client code into a separate module
Handle streaming socket retry if too early
Implement streaming audio responses
Remove duplicate docstring not included in OpanAPI pages
@NeonDaniel NeonDaniel marked this pull request as ready for review October 7, 2024 23:09
neon_hana/app/routers/node_server.py Outdated Show resolved Hide resolved
neon_hana/app/routers/node_server.py Outdated Show resolved Hide resolved
Comment on lines +58 to +59
stt=Mock(transcribe=Mock(return_value=[])),
fallback_stt=Mock(transcribe=Mock(return_value=[])),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider MagicMock, which should stub out all of the necessary methods without having to be explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the inferred return type is not a list if I use MagicMock (similar issue with transformers return value)

│ Exception in thread Thread-6:                                                                                                                                  │
│ Traceback (most recent call last):                                                                                                                             │
│   File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner                                                                                  │
│     self.run()                                                                                                                                                 │
│   File "/usr/local/lib/python3.9/site-packages/neon_hana/streaming_client.py", line 67, in run                                                                 │
│     self.voice_loop.run()                                                                                                                                      │
│   File "/usr/local/lib/python3.9/site-packages/ovos_dinkum_listener/voice_loop/voice_loop.py", line 269, in run                                                │
│     self._after_cmd(chunk)                                                                                                                                     │
│   File "/usr/local/lib/python3.9/site-packages/ovos_dinkum_listener/voice_loop/voice_loop.py", line 783, in _after_cmd                                         │
│     utts, stt_context = self._get_tx(stt_context)                                                                                                              │
│   File "/usr/local/lib/python3.9/site-packages/ovos_dinkum_listener/voice_loop/voice_loop.py", line 731, in _get_tx                                            │
│     filtered = [max(utts, key=lambda k: k[1])]                                                                                                                 │
│ ValueError: max() arg is an empty sequence 

Copy link

@mikejgray mikejgray Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set a return type to anything you'd like with a MagicMock, but this is definitely not blocking:

Python 3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from unittest.mock import MagicMock
>>>
>>> # Create a mock transcription result
>>> mock_transcription = [("Hello, this is a test", 0.9)]
>>>
>>> # Set up the STT objects using MagicMock
>>> stt = MagicMock()
>>> stt.transcribe.return_value = mock_transcription
>>>
>>> fallback_stt = MagicMock()
>>> fallback_stt.transcribe.return_value = mock_transcription
>>> stt.transcribe
<MagicMock name='mock.transcribe' id='4314005728'>
>>> stt.transcribe()
[('Hello, this is a test', 0.9)]
>>> fallback_stt.transcribe()
[('Hello, this is a test', 0.9)]
>>> type(fallback_stt.transcribe())
<class 'list'>
>>> [max(stt.transcribe(), key=lambda k: k[1])]
[('Hello, this is a test', 0.9)]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. 6 of one half dozen of the other IMO since we're explicitly specifying those methods and their return values

neon_hana/streaming_client.py Outdated Show resolved Hide resolved
requirements/websocket.txt Outdated Show resolved Hide resolved
@NeonDaniel NeonDaniel merged commit d851940 into dev Oct 9, 2024
6 checks passed
@NeonDaniel NeonDaniel deleted the FEAT_StreamInputAudio branch October 9, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants