Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum inlined data size is 1024 bytes by default. Good? Bad? #4

Open
petersilva opened this issue Feb 14, 2019 · 7 comments
Open
Labels
Done has been implemented, but additional discussion needed want_feedback something is implemented, but further review sought

Comments

@petersilva
Copy link
Contributor

This is a reference to #3 ... separate from implementation concerns, inlining large data will have a severe effect on broker performance. so this Issue will try to document a consensus value.

@petersilva
Copy link
Contributor Author

really don't want this to be big... I'll say 1000 bytes.

@josusky
Copy link
Contributor

josusky commented Feb 14, 2019

I think that 1000 bytes must be enough for everybody. On the other hand, if an institution will publish too big messages then none will subscribe to them.
At the end such institution will harm itself because clients will poll the directory tree (and generate unnecessary load on the server). So the size will organically self-regulate :-)

@petersilva
Copy link
Contributor Author

This is implemented in the wmo_mesh example now. --inline option, with --inline_max to do experiments with maximim inline message size.

@petersilva
Copy link
Contributor Author

petersilva commented Feb 18, 2019

self-regulation idea is a good one. On one hand, including the data in the payload saves time for small bulletins. On the other hand, if one is

  • subscribing to two sources for all products, then one will only be using the products that come
    from the first one, and all inlined data that does not arrive first is wasted
    (would not have been
    downloaded if it were not inlined.)

  • server side filtering possible with MQTT (or AMQP) is fairly coarse, and one must, in general
    request more messages than one genuinely intends to download. These other messages are
    filtered out by client side reject clauses. so how many messages are downloaded, only to be
    rejected on the client side.

  • inlining worsens performance in a LAN where the roundtrip time is negligeable, the optimization is negligeable, likely drowned out by the reduced message processing rate. In the LAN case using AMQP one wants to spread the requests out to many instances, which is done more quickly without inlining. in Sarracenia, SFTP sessions are maintained, so while there is a round trip for the get request, one does not pay connection establishment on each transfer.

@petersilva petersilva added the want_feedback something is implemented, but further review sought label Feb 18, 2019
@petersilva petersilva changed the title Maximum data size needs to be indicated for inclusion in a message. Maximum inlined data size is 1024 bytes by default. Good? Bad? Feb 22, 2019
@petersilva
Copy link
Contributor Author

on the current feed from hpfx.collab, I upped the maximum to 2048, to get more files inlined, provides more frequent demonstration.

@petersilva petersilva added the Done has been implemented, but additional discussion needed label Mar 31, 2019
@petersilva
Copy link
Contributor Author

petersilva commented Mar 7, 2020

@davidpodeur brought up an interesting case:

  • relatively high speed transfer, but very long latency ( satellite link )
  • regardless of how many instances run in parallel, the performance is much worse than creation of periodic buckets as tar files, and sending those.

one would need fify or more parallel transfers to catch up with tar files. In this instance, a much higher limit for the size of embedded data makes sense, or an extended message type that refers to a tar bucket.

@josusky
Copy link
Contributor

josusky commented Mar 12, 2020

Well, we definitely cannot set one hard limit to fit all use-cases. It needs to remain configurable. We can just recommend - something like: "Keep it in kilobytes unless you are sure that your use case will benefit from a higher limit. Avoid going to megabytes unless your data is distributed only to a restricted group of systems that can cope with it."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Done has been implemented, but additional discussion needed want_feedback something is implemented, but further review sought
Projects
None yet
Development

No branches or pull requests

2 participants