From 2af381a3f1a88c51deae5f5ba0f88307c4f82128 Mon Sep 17 00:00:00 2001 From: Karen Metts Date: Wed, 22 Sep 2021 15:55:40 -0400 Subject: [PATCH 1/6] Doc: Add topic and expand info for mem queue --- docs/index.asciidoc | 3 +++ docs/static/mem-queue.asciidoc | 10 ++++++++++ 2 files changed, 13 insertions(+) create mode 100644 docs/static/mem-queue.asciidoc diff --git a/docs/index.asciidoc b/docs/index.asciidoc index e58ea85314a..403efa1ff6e 100644 --- a/docs/index.asciidoc +++ b/docs/index.asciidoc @@ -178,6 +178,9 @@ include::static/filebeat-modules.asciidoc[] :edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/resiliency.asciidoc include::static/resiliency.asciidoc[] +:edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/mem-queue.asciidoc +include::static/mem-queue.asciidoc[] + :edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/persistent-queues.asciidoc include::static/persistent-queues.asciidoc[] diff --git a/docs/static/mem-queue.asciidoc b/docs/static/mem-queue.asciidoc new file mode 100644 index 00000000000..b29be1e260c --- /dev/null +++ b/docs/static/mem-queue.asciidoc @@ -0,0 +1,10 @@ +[[memory-queue]] +=== Memory queue + +By default, Logstash uses in-memory bounded queues between pipeline stages +(inputs → pipeline workers) to buffer events. The size of these in-memory +queues is fixed and not configurable. If Logstash experiences a temporary +machine failure, the contents of the in-memory queue will be lost. Temporary machine +failures are scenarios where Logstash or its host machine are terminated +abnormally but are capable of being restarted. + From a716de3e311061bc42f895c36c659ee98b3232ec Mon Sep 17 00:00:00 2001 From: Karen Metts Date: Thu, 23 Sep 2021 13:28:58 -0400 Subject: [PATCH 2/6] Start fleshing out content and areas to explore --- docs/static/mem-queue.asciidoc | 57 ++++++++++++++++++++++++++++----- docs/static/resiliency.asciidoc | 22 +++++++++++++ 2 files changed, 71 insertions(+), 8 deletions(-) diff --git a/docs/static/mem-queue.asciidoc b/docs/static/mem-queue.asciidoc index b29be1e260c..ec5028ec1a5 100644 --- a/docs/static/mem-queue.asciidoc +++ b/docs/static/mem-queue.asciidoc @@ -1,10 +1,51 @@ [[memory-queue]] -=== Memory queue - -By default, Logstash uses in-memory bounded queues between pipeline stages -(inputs → pipeline workers) to buffer events. The size of these in-memory -queues is fixed and not configurable. If Logstash experiences a temporary -machine failure, the contents of the in-memory queue will be lost. Temporary machine -failures are scenarios where Logstash or its host machine are terminated -abnormally but are capable of being restarted. +=== Memory queue + +By default, Logstash uses in-memory bounded queues between pipeline stages (inputs → pipeline workers) to buffer events. +If Logstash experiences a temporary machine failure, the contents of the memory queue will be lost. +Temporary machine failures are scenarios where Logstash or its host machine are terminated abnormally, but are capable of being restarted. + +[[mem-queue-benefits]] +==== Benefits of memory queues + +The memory queue might be a good choice if you value throughput over data resiliency. + +* Easier configuration +* Easier management and administration +* Faster throughput + +[[mem-queue-limitations]] +==== Limitations of memory queues + +* Can lose data in abnormal termination +* Don't do well handling sudden bursts of data, where extra capacity in needed for {ls} to catch up +* Not a good choice for data you can't afford to lose + +TIP: Consider using <> to avoid these limitations. + +[[sizing-mem-queue]] +==== Memory queue size + +Memory queue size is not configured directly. +Multiply the `pipeline.batch.size` and `pipeline.workers` values to get the size of the memory queue. +This value is called the "inflight count." + +[[backpressure-mem-queue]] +==== Back pressure + +When the queue is full, Logstash puts back pressure on the inputs to stall data +flowing into Logstash. +This mechanism helps Logstash control the rate of data flow at the input stage +without overwhelming outputs like Elasticsearch. + +ToDo: Is the next paragraph accurate for MQ? + +Each input handles back pressure independently. +For example, when the +<> encounters back pressure, it no longer +accepts new connections. +It waits until the queue has space to accept more events. +After the filter and output stages finish processing existing +events in the queue and ACKs them, Logstash automatically starts accepting new +events. diff --git a/docs/static/resiliency.asciidoc b/docs/static/resiliency.asciidoc index 9f21ba427a5..13b5fdd7b9e 100644 --- a/docs/static/resiliency.asciidoc +++ b/docs/static/resiliency.asciidoc @@ -1,6 +1,28 @@ [[resiliency]] == Data resiliency + +///// +What happens when the queue is full? +Input plugins push data into the queue, and filters pull out. If the queue (persistent or memory) is full then the input plugin thread blocks. + +See handling backpressure topic. Relocate this info for better visibility? +///// + + +///// +Settings in logstash.yml and pipelines.yml can interract in unintuitive ways + +A setting on a pipeline in pipelines.yml takes precedence, falling back to the value in logstash.yml if there is no setting present for the specific pipeline, falling back to the default if there is no value present in logstash.yml + +^^ This is true for any setting in both logstash.yml and pipelines.yml, but seems to trip people up in PQs. Other queues, too? +///// + + +//ToDo: Add MQ to discussion (for compare/constrast), even thought it's not really considered a "resiliency feature". Messaging will need to be updated. + + + As data flows through the event processing pipeline, Logstash may encounter situations that prevent it from delivering events to the configured output. For example, the data might contain unexpected data types, or From c80a909d36f645b29bb1bb147e810e68ea580e4a Mon Sep 17 00:00:00 2001 From: Karen Metts Date: Wed, 29 Sep 2021 12:50:55 -0400 Subject: [PATCH 3/6] Add mem queue to resiliency topic --- docs/static/resiliency.asciidoc | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/docs/static/resiliency.asciidoc b/docs/static/resiliency.asciidoc index 13b5fdd7b9e..14124d98672 100644 --- a/docs/static/resiliency.asciidoc +++ b/docs/static/resiliency.asciidoc @@ -1,27 +1,7 @@ [[resiliency]] == Data resiliency - -///// -What happens when the queue is full? -Input plugins push data into the queue, and filters pull out. If the queue (persistent or memory) is full then the input plugin thread blocks. - -See handling backpressure topic. Relocate this info for better visibility? -///// - - -///// -Settings in logstash.yml and pipelines.yml can interract in unintuitive ways - -A setting on a pipeline in pipelines.yml takes precedence, falling back to the value in logstash.yml if there is no setting present for the specific pipeline, falling back to the default if there is no value present in logstash.yml - -^^ This is true for any setting in both logstash.yml and pipelines.yml, but seems to trip people up in PQs. Other queues, too? -///// - - -//ToDo: Add MQ to discussion (for compare/constrast), even thought it's not really considered a "resiliency feature". Messaging will need to be updated. - - +By default, Logstash uses <> between pipeline stages (inputs → pipeline workers) to buffer events. As data flows through the event processing pipeline, Logstash may encounter situations that prevent it from delivering events to the configured From 7193ecfc3022c55ba31c88669ade684e55fd84a0 Mon Sep 17 00:00:00 2001 From: Karen Metts Date: Wed, 29 Sep 2021 17:05:39 -0400 Subject: [PATCH 4/6] Changes from code review --- docs/static/mem-queue.asciidoc | 12 ++---------- docs/static/resiliency.asciidoc | 2 +- 2 files changed, 3 insertions(+), 11 deletions(-) diff --git a/docs/static/mem-queue.asciidoc b/docs/static/mem-queue.asciidoc index ec5028ec1a5..f4340d944d3 100644 --- a/docs/static/mem-queue.asciidoc +++ b/docs/static/mem-queue.asciidoc @@ -30,6 +30,8 @@ Memory queue size is not configured directly. Multiply the `pipeline.batch.size` and `pipeline.workers` values to get the size of the memory queue. This value is called the "inflight count." +See <> for more info on the effects of adjusting `pipeline.batch.size` and `pipeline.workers`. + [[backpressure-mem-queue]] ==== Back pressure @@ -38,14 +40,4 @@ flowing into Logstash. This mechanism helps Logstash control the rate of data flow at the input stage without overwhelming outputs like Elasticsearch. -ToDo: Is the next paragraph accurate for MQ? - Each input handles back pressure independently. -For example, when the -<> encounters back pressure, it no longer -accepts new connections. -It waits until the queue has space to accept more events. -After the filter and output stages finish processing existing -events in the queue and ACKs them, Logstash automatically starts accepting new -events. - diff --git a/docs/static/resiliency.asciidoc b/docs/static/resiliency.asciidoc index 14124d98672..ff347097bda 100644 --- a/docs/static/resiliency.asciidoc +++ b/docs/static/resiliency.asciidoc @@ -1,5 +1,5 @@ [[resiliency]] -== Data resiliency +== Queues and data resiliency By default, Logstash uses <> between pipeline stages (inputs → pipeline workers) to buffer events. From e575a5ccc167238f0ebb6bab5e860d56177733fb Mon Sep 17 00:00:00 2001 From: Karen Metts Date: Mon, 4 Oct 2021 19:24:02 -0400 Subject: [PATCH 5/6] More changes from code review --- docs/static/mem-queue.asciidoc | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/static/mem-queue.asciidoc b/docs/static/mem-queue.asciidoc index f4340d944d3..eb239e76cde 100644 --- a/docs/static/mem-queue.asciidoc +++ b/docs/static/mem-queue.asciidoc @@ -19,7 +19,6 @@ The memory queue might be a good choice if you value throughput over data resili * Can lose data in abnormal termination * Don't do well handling sudden bursts of data, where extra capacity in needed for {ls} to catch up -* Not a good choice for data you can't afford to lose TIP: Consider using <> to avoid these limitations. @@ -27,11 +26,30 @@ TIP: Consider using <> to avoid these limit ==== Memory queue size Memory queue size is not configured directly. -Multiply the `pipeline.batch.size` and `pipeline.workers` values to get the size of the memory queue. +It is defined by the number of events, which can vary greatly depending on the event payload. + +The maximum number of events that can be held in each memory queue is equal to +the value of `pipeline.batch.size` multiplied by the value of +`pipeline.workers`. This value is called the "inflight count." +NOTE: Each pipeline has its own queue. + See <> for more info on the effects of adjusting `pipeline.batch.size` and `pipeline.workers`. +[[mq-settings]] +===== Settings that affect queue size + +These values can be configured in `logstash.yml` and `pipelines.yml`. + +pipeline.batch.size:: +Number events to retrieve from inputs before sending to filters+workers +The default is 125. + +pipelines.workers:: +Number of workers that will, in parallel, execute the filters+outputs stage of the pipeline. +This value defaults to the number of the host's CPU cores. + [[backpressure-mem-queue]] ==== Back pressure From 56c79cd8c37c030db5602e99121cf8d2f6a9d03b Mon Sep 17 00:00:00 2001 From: Karen Metts <35154725+karenzone@users.noreply.github.com> Date: Tue, 5 Oct 2021 14:09:51 -0400 Subject: [PATCH 6/6] Update docs/static/mem-queue.asciidoc Co-authored-by: Rob Bavey --- docs/static/mem-queue.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/static/mem-queue.asciidoc b/docs/static/mem-queue.asciidoc index eb239e76cde..110716bacd9 100644 --- a/docs/static/mem-queue.asciidoc +++ b/docs/static/mem-queue.asciidoc @@ -26,7 +26,7 @@ TIP: Consider using <> to avoid these limit ==== Memory queue size Memory queue size is not configured directly. -It is defined by the number of events, which can vary greatly depending on the event payload. +It is defined by the number of events, the size of which can vary greatly depending on the event payload. The maximum number of events that can be held in each memory queue is equal to the value of `pipeline.batch.size` multiplied by the value of