Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket: Fix websocket data being modelled as a stream internally #2638

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

p12tic
Copy link
Contributor

@p12tic p12tic commented Feb 6, 2025

Currently WebSocket implementation has the following input data path:

input_stream -> websocket parser -> queue -> data_sink -> input_stream wrapper

On the output side, the data path is as follows:

output_stream wrapper -> data_source -> queue -> websocket serializer -> output_stream

The input_stream and output_stream wrappers are what is exposed to the user. This is problematic, because WebSocket is a message-based protocol and streams do not have the concept of messages, they model data as a stream of bytes. As a result, the payloads that the user sends or receives will not correspond to a single websocket message, breaking the underlying protocol in most cases. E.g. json or protobuf payloads no longer contain valid data and so on.

Currently this behavior is seen on high load, when the stream wrappers start to coalesce and split messages in the write path due to more than one message being available at a time.

The solution is to expose data_sink and data_source that are backed by message queues directly to the user.

@xemul
Copy link
Contributor

xemul commented Feb 7, 2025

More specifically, the current code results in problems on high load, when the stream wrappers start to coalesce and split messages in the write path due to more than one message being available at a time.

Why is that a problem? The way it's described here it looks like -- a code sends two messages into output stream, and those two messages are squashed together and are sent out on a wire. It sounds as a nice optimization to me, not a problem

@p12tic
Copy link
Contributor Author

p12tic commented Feb 7, 2025

Why is that a problem? The way it's described here it looks like -- a code sends two messages into output stream, and those two messages are squashed together and are sent out on a wire. It sounds as a nice optimization to me, not a problem

The problem is that the parts of more than one message can be squashed into one websocket message. This may completely break the protocol of the payload - json or protobuf encoded payloads no longer decode successfully and so on.

I will edit have edited the commit and PR descriptions to make this clear

Currently WebSocket implementation has the following input data path:

input_stream -> websocket parser -> queue -> data_sink -> input_stream
wrapper

On the output side, the data path is as follows:

output_stream wrapper -> data_source -> queue -> websocket serializer ->
output_stream

The input_stream and output_stream wrappers are what is exposed to the
user. This is problematic, because WebSocket is a message-based protocol
and streams do not have the concept of messages, they model data as a
stream of bytes. As a result, the payloads that the user sends or
receives will not correspond to a single websocket message, breaking the
underlying protocol in most cases. E.g. json or protobuf payloads no
longer contain valid data and so on.

Currently this behavior is seen on high load, when the stream wrappers
start to coalesce and split messages in the write path due to more than
one message being available at a time.

The solution is to expose data_sink and data_source that are backed by
message queues directly to the user.
@p12tic p12tic force-pushed the websocket-data-source-sink branch from 7bd64b2 to 0543087 Compare February 7, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants