Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take into account the coordinated omission problem and JVM warmup phase #36

Open
maseev opened this issue Jan 28, 2018 · 1 comment
Open

Comments

@maseev
Copy link

maseev commented Jan 28, 2018

If I'm not mistaken, wrk is subjected to the coordinated omission problem. Because of that, wrk most likely skews the results about real application latency.

Fortunately, there's a patched version of this tool, called wrk2. It would be great if you could replace wrk with wrk2 in your tests and see how that affects the results.

There's also another load generator that might come in handy - tcpkali. Although, I'm not entirely sure it doesn't have the same problem that wrk does.

Also, If I understand correctly, you skip the usual warmup phase for the JVM. In other words, you start the server and then start sending requests and collecting statistics about throughput and latency right away.

By default, it takes JVM 10,000 interpreted method invocations before JIT kicks in and the interpreted code gets compiled into machine code along with a long list of different optimizations like method inlining, dead code elimination, etc.

I also noticed that you have an 'Avg Latency' column in the main table. I don't really think you can use the average for latency because it doesn't really mean anything. Here's a great presentation by Gil Tene where he talks about how to not measure latency.

In short - it would be much better if you could share not the average latency numbers but, let's say the median 50% percentile and 99.9999% percentile (with four nines after the dot).

Looks like this ticket is related.

@stevehu
Copy link
Contributor

stevehu commented Jan 29, 2018

@maseev I agree with you that we should switch to wrk2 and rerun the tests. For the current result, I ran the test for 30 seconds to warm up and then take the second round of 30 seconds for the result. I recently rerun one of the tests and found it is much slower than before and I think it is due to Intel CPU patch applied. I have been thinking to expend the test with service to service communication with TLS as it is more realistic for microservices but haven't devoted my time on it due to limitations. Let's keep this issue open and we will address these and rerun the test for all. Thanks a lot for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants