From f12838af2a756a51bd9113aa9fda506929c411e2 Mon Sep 17 00:00:00 2001 From: Neil Twigg Date: Tue, 2 May 2023 13:48:55 +0100 Subject: [PATCH] ADR-35: JetStream Filestore Compression Signed-off-by: Neil Twigg --- README.md | 3 +++ adr/ADR-35.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) create mode 100644 adr/ADR-35.md diff --git a/README.md b/README.md index b011e4c3..fa1e9bde 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,7 @@ This repo is used to capture architectural and design decisions as a reference o |[ADR-32](adr/ADR-32.md)|client|Service API| |[ADR-33](adr/ADR-33.md)|jetstream, client, server|Metadata for Stream and Consumer| |[ADR-34](adr/ADR-34.md)|jetstream, client, server|JetStream Consumers Multiple Filters| +|[ADR-35](adr/ADR-35.md)|jetstream, client, server|JetStream Filestore Compression| ## Jetstream @@ -54,6 +55,7 @@ This repo is used to capture architectural and design decisions as a reference o |[ADR-31](adr/ADR-31.md)|jetstream, client, server|JetStream Direct Get| |[ADR-33](adr/ADR-33.md)|jetstream, client, server|Metadata for Stream and Consumer| |[ADR-34](adr/ADR-34.md)|jetstream, client, server|JetStream Consumers Multiple Filters| +|[ADR-35](adr/ADR-35.md)|jetstream, client, server|JetStream Filestore Compression| ## Kv @@ -100,6 +102,7 @@ This repo is used to capture architectural and design decisions as a reference o |[ADR-31](adr/ADR-31.md)|jetstream, client, server|JetStream Direct Get| |[ADR-33](adr/ADR-33.md)|jetstream, client, server|Metadata for Stream and Consumer| |[ADR-34](adr/ADR-34.md)|jetstream, client, server|JetStream Consumers Multiple Filters| +|[ADR-35](adr/ADR-35.md)|jetstream, client, server|JetStream Filestore Compression| ## When to write an ADR diff --git a/adr/ADR-35.md b/adr/ADR-35.md new file mode 100644 index 00000000..7e3c7e40 --- /dev/null +++ b/adr/ADR-35.md @@ -0,0 +1,53 @@ +# JetStream Filestore Compression + +| Metadata | Value | +|----------|---------------------------| +| Date | 2023-05-01 | +| Author | @neilalexander | +| Status | Implemented | +| Tags | jetstream, client, server | + +## Context and Problem Statement + +Use of filestore encryption can almost completely prevent host filesystem compression or deduplication from working effectively. This may present a particular problem in environments where encryption is mandated for compliance reasons but local storage is either limited or expensive. Having the ability for the NATS Server to compress the message block content before encryption takes place can help in this area. + +## References + +Compression and decompression of messages is performed transparently by the NATS Server if configured to do so, therefore clients do not need to be modified in order to publish to or consume messages from a stream. However, clients will need to be modified in order to be able to configure or inspect the compression on a stream. + +- Server PRs: + - + - +- JetStream schema: + - +- NATS CLI: + - + +## Design + +The stream configuration will gain a new optional `"compression"` field. If supplied, the following values are valid: + +- `"none"` — No compression is enabled on the stream +- `"s2"` — S2 compression is enabled on the stream + +This field can be provided when creating a stream with `$JS.API.STREAM.CREATE`, updating a stream with `$JS.API.STREAM.UPDATE` and it will be returned when requesting the stream info with `$JS.API.STREAM.INFO`. + +When enabled, message blocks will be compressed asynchronously when they cease to be the tail block — that is, at the point that the message block reaches the maximum configured block size and a new block is created. This is to prevent unnecessary decompression and recompression of the tail block while it is still being written to, which would reduce publish throughput. + +Compaction and truncation operations will also compress/decompress any relevant blocks synchronously as required. + +Compressed blocks gain a new prepended header describing not only the compression algorithm in use but also the original block content size. This header is encrypted along with the rest of the block when filestore encryption is enabled. Absence of this header implies that the block is not compressed and the NATS Server will not ordinarily prepend a header to an uncompressed block. The presence of the original block content size within the header makes it possible to determine the effective compression ratio later without having to decompress the block, although the NATS Server does not currently do this. + +The checksum at the end of the block is specifically excluded from compression and remains on disk as-is, so that checking the block integrity does not require decompressing the entire block. + +## Decision + +The design is such that different compression algorithms can easily be implemented within the NATS Server if necessary. Initially, only S2 compression is in scope. + +Both block and individual message compression were initially explored. In order to benefit from repetition across individual messages (particularly where the data is structured, i.e. in JSON format), compression at the block level provides significantly better compression ratios over compressing individual messages separately. + +The compression algorithm can be updated after the stream has been created. Newly minted blocks will use the newly selected compression algorithm, but this will not result in existing blocks being proactively compressed or decompressed. An existing block will only be compressed or decompressed according to the newly configured algorithm when it is modified for another reason, i.e. during truncation or compaction. + +## Consequences + +Compression requires extra system resources, therefore it is anticipated that a compressed stream may suffer some performance penalties compared to an uncompressed stream.