-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using LZ4 to send stats #336
Comments
I use LZ4_compress_fast_continue to write to the compressed stream, e.g. it's (re-)using dictionaries etc. from previously compressed blocks to gain better compression ratio. The function belongs to the "Streaming Compression Functions" group from lz4.h. I don't think I really do anything special here. |
Compression on the relay isn't the problem. The java client is compressing a metrics batch. The relay decompresses and forwards to carbon cache. The problem is that the whole compressed packet vanishes on the relay which makes me think that something is wrong with the compressed data format since lz4 does appear to have interoperability issues. Do you have a sample command line that will generate correct lz4 format that I can pipe into the relay with 'nc' for testing and reference? |
Looking briefly at the source code for both the relay and the lz4 library it seems the problem is the use of the block streaming API by the relay.
I think that the answer would be to use the lzframe API that frames the incoming blocks with a header that describes what's coming. I've only spent 30 minutes looking at the sources so I could be wrong here so please let me know if I'm off the mark. |
safe here is because the input data is random, a crash may not happen due to misformed data. Yesterday I tried to produce a simple example using lz4 util, but failed. So there's something wrong here. I tested this at the time using two relays, but I agree it would be useful to work well with other (standard) tools. A small problem here is that we don't want to buffer so much data before we send it, e.g. send a block after each metric to push the metric out. Adding frames and stuff is probably going to defeat the purpose, but I may be wrong here. Tonight I will look into why the read using lz4 util fails. You may be right about doing the wrong thing there. |
I just tested, using two relays, it works as expected. The problem appears to be that nothing seems to be able to just generate block streams, except the c-library itself. |
I'm thinking I better introduce a "heavy-weight" lz4 mode if you want to send lz4 data from a non-C producer. |
It will only work even between relays if this socket read returns the entire block, such as when both relays are on a fast network, or the same machine. If there is network congestion and the socket read returns, say 1 byte, as it can, then this call will fail because it must be provided with an entire block. "streaming" in LZ4 terminology refers to a stream of blocks, not bytes. The API is much more unfriendly and hard to use than zlib.
I agree. That would be a perfect solution. |
since the read you refer to is appending to the buffer, a short read will not harm, decompression will fail, the next read will complete the block though |
Any reason why you need LZ4 here? Would zstd for instance be ok for you too? |
Sorry for the late reply, I must have missed the notification for this. zstd looks almost too good to be true. We'd certainly switch over to that if it were available. We don't specifically need LZ4. Anything that achieves similar compression to zlib with lower CPU usage and has java client libs available is great. |
Alright. I've just looked into creating the "heavy-weight" lz4 support, but (at least on the writer side) it isn't that simple. It most likely won't compress much because it will run per-metric. So you won't benefit that much if I would do this. Have you looked at Snappy? It is close in compression rate to lz4, although its compression and decompression rates aren't that stunning. Still compared to zlib it's much faster. I could add support for that easily, I think. There should be very good Java support for it, since it's used by a lot of BigData formats such as Parquet and Avro. |
I'll have a look at the compression performance of Snappy against our payloads. I have a feeling it will be good because our payloads contain a lot of repeating text so compression algorithms don't have to try hard to do well. |
I tested some of our payloads and found that with zlib set to level 5 compression we got 95.5% and with snappy it's about 90% using the 'snzip' command line client utility. Quite close really. Nobody's complaining yet about CPU load caused by our use of zlib so there's no rush to implement snappy but if you do it then we would probably migrate over to it. |
I started on it, got it mostly done, except that it doesn't work yet :) You're right that zlib will be best, but at the cost of CPU. Compression rates may be different as the relay does it metric by metric though. |
This is related to issue #336, where snappy could be an alternative take to lz4.
Would you be interested in a pull request that implements the LZ4 framed format? I've got it working here and have tested communication between java -> relay and relay -> relay. |
Yeah, I think there's no way around this, I need to drop the compatibility as it will not be able to work reliably this way without enveloping/framing to ensure the decompressor has everything it needs. |
Cool, I'll leave it running for a bit longer capturing metrics and forwarding between relays to make sure it's all good then work on the PR. I'll branch off your last release tag and make my edits there. |
I have configured a socket for receiving line data compressed with LZ4. The sender is java. I consulted the LZ4 compatibility page and selected Apache commons compress. I tried both block and framed mode (which one is correct?) and neither works. Packets are dropped silently at carbon-c-relay with no logging at all. Basically nothing happens. None of the internal counters are updated (monitored with -d option).
For reference when I switch to a gzip wrapped socket and try that it works perfectly, as does plain uncompressed data. Any hints as to where I might be going wrong? carbon-c-relay is built against LZ4 lib version 1.8.2 (with static linking).
My listeners are configured like this:
The text was updated successfully, but these errors were encountered: