-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abnormal Mget latency increase issue #3031
Comments
The team will attempt to dig some more in this issue, but from the quick read that I did it would be extremely hard, close to impossible, to answer the question without a lot of more information being provided. Latency spikes of 5ms is an extremely low threshold and could be caused by virtually any of the actors in the chain. Unless you detect some difference in the way the driver behaves (by profiling it while this issue occurs and monitoring the traffic) we could only play a guessing game, which is not helpful for anyone. |
Hi tishun@, thanks for your attention on the issue. Let me try to add more details on the issue and our suspicious. Our suspicion is that the abnormal mget latency increase issue may be related to connection problems. In our use case, we utilize Lettuce as the Redis client and initialize both read and write connections. see the code snippet below: When service A starts, it creates a read connection and a write connection. The redisURIs array contains all the URIs for the Redis nodes in the cluster. So in ideal case, one instance in Service A will create two connections to the Redis cluster.
During the deployment of service A, we observe the T1: The start time of deployment Here there are two things i want to elaborate.
I believe there are some thing wrong with the connection in the step #1 and step #2 above, An interesting observation is that some clients established an excessive number of connections to a single Redis server. For instance, the client with IP address 10.117.154.244 has five active connections to one Redis server.
The output shows multiple connections from this client, which is concerning, as we expect a single client machine to have no more than two connections to a Redis server. The high number of connections from certain clients is likely degrading the performance of the Redis server, which in turn is increasing the mget latency for service A. We want to get some insights that:
Additionally, we have verified key metrics on the Redis server and found no anomalies:
|
Bug Report
We own service A fleet with 500+ fleet capacity, the 500+ hosts leverage Lettuce client to access the Redis cluster(around 20shards, total 100 hosts). Recently we observe the anomalies caused by service fleet deployment(gradually deployment, ~20 host per round, each host deployment cost ~10mins). During the deployment, we find that the mget(emit from service A view)latency increased a lot(from 15ms to 20+ms).
Figure 1: Service A uses Lettuce to access Redis cluster and mget latency increase during the fleet deployment
Figure 2: Mget latency increase from 15ms to 20ms during the fleet deploymentAfter checking service A log, especially for the lettuce log, we do not observe any anomalies. Currently we can not explain that why the service A fleet deployment will trigger the mget latency increase. The only variable is the fleet deployment.
Is there any clues that can help for the next step trouble shooting on the abnormal latency increase issue? Thanks
Current Behavior
Stack trace
// your stack trace here;
Input Code
Input Code
// your code here;
Expected behavior/code
Environment
Possible Solution
Additional context
The text was updated successfully, but these errors were encountered: