Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Add topic and expand info for in-memory queue #13246

Merged
merged 6 commits into from
Oct 5, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,9 @@ include::static/filebeat-modules.asciidoc[]
:edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/resiliency.asciidoc
include::static/resiliency.asciidoc[]

:edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/mem-queue.asciidoc
include::static/mem-queue.asciidoc[]

:edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/persistent-queues.asciidoc
include::static/persistent-queues.asciidoc[]

Expand Down
43 changes: 43 additions & 0 deletions docs/static/mem-queue.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
[[memory-queue]]
=== Memory queue

By default, Logstash uses in-memory bounded queues between pipeline stages (inputs → pipeline workers) to buffer events.
If Logstash experiences a temporary machine failure, the contents of the memory queue will be lost.
Temporary machine failures are scenarios where Logstash or its host machine are terminated abnormally, but are capable of being restarted.

[[mem-queue-benefits]]
==== Benefits of memory queues

The memory queue might be a good choice if you value throughput over data resiliency.

* Easier configuration
* Easier management and administration
* Faster throughput

[[mem-queue-limitations]]
==== Limitations of memory queues

* Can lose data in abnormal termination
* Don't do well handling sudden bursts of data, where extra capacity in needed for {ls} to catch up
* Not a good choice for data you can't afford to lose
karenzone marked this conversation as resolved.
Show resolved Hide resolved

TIP: Consider using <<persistent-queues,persistent queues>> to avoid these limitations.

[[sizing-mem-queue]]
==== Memory queue size

Memory queue size is not configured directly.
karenzone marked this conversation as resolved.
Show resolved Hide resolved
Multiply the `pipeline.batch.size` and `pipeline.workers` values to get the size of the memory queue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size is kind of interesting. How we define size in memory queues is different to PQ - PQ we talk about the number of bytes on disk, whereas memory queue is defined by the number of events, which can vary greatly depending on the event payload.

Maybe something like "The maximum number of events that can be held in each memory queue is equal to the value of pipeline.batch.size multiplied by pipeline.batch.size" (maybe add defaults here?).".

I don't know if we need any content referring to the fact that each pipeline has its own queue here as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline->pipeline has its own complexity, but I think that might be something we could talk about in a separate PR. Maybe in an advanced section, or pipeline->pipeline considerations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline->pipeline has its own complexity, but I think that might be something we could talk about in a separate PR. Maybe in an advanced section, or pipeline->pipeline considerations?

New issue to track: #13275

This value is called the "inflight count."

See <<tuning-logstash>> for more info on the effects of adjusting `pipeline.batch.size` and `pipeline.workers`.

[[backpressure-mem-queue]]
==== Back pressure

When the queue is full, Logstash puts back pressure on the inputs to stall data
flowing into Logstash.
This mechanism helps Logstash control the rate of data flow at the input stage
without overwhelming outputs like Elasticsearch.

Each input handles back pressure independently.
4 changes: 3 additions & 1 deletion docs/static/resiliency.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
[[resiliency]]
== Data resiliency
== Queues and data resiliency

By default, Logstash uses <<memory-queue,in-memory bounded queues>> between pipeline stages (inputs → pipeline workers) to buffer events.

As data flows through the event processing pipeline, Logstash may encounter
situations that prevent it from delivering events to the configured
Expand Down