Skip to content

Commit

Permalink
Start fleshing out content and areas to explore
Browse files Browse the repository at this point in the history
  • Loading branch information
karenzone committed Sep 23, 2021
1 parent 2af381a commit 6111c2e
Show file tree
Hide file tree
Showing 2 changed files with 123 additions and 1 deletion.
102 changes: 101 additions & 1 deletion docs/static/mem-queue.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[memory-queue]]
=== Memory queue
=== Memory queue (in-memory queue?)

By default, Logstash uses in-memory bounded queues between pipeline stages
(inputs → pipeline workers) to buffer events. The size of these in-memory
Expand All @@ -8,3 +8,103 @@ machine failure, the contents of the in-memory queue will be lost. Temporary mac
failures are scenarios where Logstash or its host machine are terminated
abnormally but are capable of being restarted.


[[mem-queue-benefits]]
==== Benefits of memory queue

The memory queue might be a good choice if you value throughput over data resiliency.

* Easier configuration
* Easier management and administration
* Faster throughput


[[mem-queue-limitations]]
==== Limitations of memory queue

* Can lose data in abnormal termination
* Not a good choice for data you can't afford to lose


[[configuring-mem-queue]]
==== Configuring in-memory queue

// Notes: mem queue is default.
//ToDo: Check into single sourcing settings for use with PQ and MQ


/////
Adjust text and placement to avoid redundancy between PQ and MQ.
Maybe document under "Resiliency" and link back to it from here?
Use same approach for PQ
[[backpressure-mem-queue]]
==== Handling Back Pressure
When the queue is full, Logstash puts back pressure on the inputs to stall data
flowing into Logstash. This mechanism helps Logstash control the rate of data
flow at the input stage without overwhelming outputs like Elasticsearch.
Use `queue.max_bytes` setting to configure the total capacity of the queue on
disk. The following example sets the total capacity of the queue to 8gb:
[source, yaml]
queue.type: persisted
queue.max_bytes: 8gb
With these settings specified, Logstash will buffer events on disk until the
size of the queue reaches 8gb. When the queue is full of unACKed events, and
the size limit has been reached, Logstash will no longer accept new events.
Each input handles back pressure independently. For example, when the
<<plugins-inputs-beats,beats>> input encounters back pressure, it no longer
accepts new connections and waits until the persistent queue has space to accept
more events. After the filter and output stages finish processing existing
events in the queue and ACKs them, Logstash automatically starts accepting new
events.
/////


/////
Is this concept applicable for MQ?
[[durability-mq]]
==== Controlling Durability for memory queue
Durability is a property of storage writes that ensures data will be available after it's written.
When the persistent queue feature is enabled, Logstash will store events on
disk. Logstash commits to disk in a mechanism called checkpointing.
To discuss durability, we need to introduce a few details about how the persistent queue is implemented.
First, the queue itself is a set of pages. There are two kinds of pages: head pages and tail pages. The head page is where new events are written. There is only one head page. When the head page is of a certain size (see `queue.page_capacity`), it becomes a tail page, and a new head page is created. Tail pages are immutable, and the head page is append-only.
Second, the queue records details about itself (pages, acknowledgements, etc) in a separate file called a checkpoint file.
When recording a checkpoint, Logstash will:
* Call fsync on the head page.
* Atomically write to disk the current state of the queue.
The process of checkpointing is atomic, which means any update to the file is saved if successful.
If Logstash is terminated, or if there is a hardware-level failure, any data
that is buffered in the persistent queue, but not yet checkpointed, is lost.
You can force Logstash to checkpoint more frequently by setting
`queue.checkpoint.writes`. This setting specifies the maximum number of events
that may be written to disk before forcing a checkpoint. The default is 1024. To
ensure maximum durability and avoid losing data in the persistent queue, you can
set `queue.checkpoint.writes: 1` to force a checkpoint after each event is
written. Keep in mind that disk writes have a resource cost. Setting this value
to `1` can severely impact performance.
/////

/////
Applicable for MQ?
[[garbage-collection-mq]]
==== Disk Garbage Collection
On disk, the queue is stored as a set of pages where each page is one file. Each page can be at most `queue.page_capacity` in size. Pages are deleted (garbage collected) after all events in that page have been ACKed. If an older page has at least one event that is not yet ACKed, that entire page will remain on disk until all events in that page are successfully processed. Each page containing unprocessed events will count against the `queue.max_bytes` byte size.
/////
22 changes: 22 additions & 0 deletions docs/static/resiliency.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,28 @@
[[resiliency]]
== Data resiliency


/////
What happens when the queue is full?
Input plugins push data into the queue, and filters pull out. If the queue (persistent or memory) is full then the input plugin thread blocks.
See handling backpressure topic. Relocate this info for better visibility?
/////


/////
Settings in logstash.yml and pipelines.yml can interract in unintuitive ways
A setting on a pipeline in pipelines.yml takes precedence, falling back to the value in logstash.yml if there is no setting present for the specific pipeline, falling back to the default if there is no value present in logstash.yml
^^ This is true for any setting in both logstash.yml and pipelines.yml, but seems to trip people up in PQs. Other queues, too?
/////


//ToDo: Add MQ to discussion (for compare/constrast), even thought it's not really considered a "resiliency feature". Messaging will need to be updated.



As data flows through the event processing pipeline, Logstash may encounter
situations that prevent it from delivering events to the configured
output. For example, the data might contain unexpected data types, or
Expand Down

0 comments on commit 6111c2e

Please sign in to comment.