High connection count on Redis Cluster shard 0001 with redis engine 6.2+ #3478

mikecoder5 · 2025-01-11T02:18:14Z

Version: redis-py 5.2.1 - redis engine 6.2.6

Platform: Python 3.9 on Debian 12 / AWS

Description: Upgrading to latest redis-py from redis-py-cluster causes a large imbalanced 10x increase in connection count on only shard 1 node 1. The RedisCluster client is created as follows RedisCluster(host=aws_configuration_endpoint) using a configuration endpoint which redirects to a "random" redis node. This connection count problem happens with redis engine 6.2.6 but not 5.0.6

Suspected Root Cause of elevated connection count: We know the initial redis cluster command is issued to an effectively random redis node because of how the configuration endpoint works. This means there are additional cluster commands issued during RedisCluster client initialization (or somehow multiple connections being opened) to the first node returned in the cluster slots list (which for engine 6.2+ is node 0001). Assuming additional redis calls are needed for client initialization, then ideally we would reuse the existing node we made the initial cluster slots call against or select a random one.

Observations:

Elevated connections only on node 0001 (other primary nodes on other shards are normal)
redis-py get_default_node() behavior
- redis engine 6.2.6 - always returns the same node 0001
- redis engine 5.0.6 - returns a seemingly random node
redis-cli cluster slots ordering of list of slots and nodes
- redis engine 5.0.6 uses "random" ordering
  - More specifically ordering is stable for a given node (calling the same node multiple times results in the getting back the same list ordered the same way), but each node has it's own seemingly random ordering. So as long as the cluster slots command is issued to a random node each time (which is the case when using the configuration endpoint), then effectively the response appears to be a random list
- redis engine 6.2.6 uses sorted ordering
  - Regardless of which node is called, the first slot in cluster slots is always slot 0 and as a result it's always node 0001
redis-py always sets the default node to the first node returned by cluster slots (code link)
- This is easy to update however calling replace_default_node() post-init of client does not fix the connection count issue. This probably means the root cause is during initialization where the client issues additional commands to the default node (unsure if it's before or after self.default_node is set)

Sample Code: This does not reproduce the high connection count, but it does show the cluster slots sorting behavior and the client bias for the first node in the list.

from redis import RedisCluster
r5_nodes = []
r6_nodes = []
redis_5_0_host = "" # fill in with configuration endpoint for cluster running redis 5.0.6
redis_6_2_host = "" # fill in with configuration endpoint for cluster running redis 6.2.6
for i in range(20):
    r5_client = RedisCluster(host=redis_5_0_host, port=6379)
    r6_client = RedisCluster(host=redis_6_2_host, port=6379)
    r5_nodes.append(r5_client.get_default_node().host)
    r6_nodes.append(r6_client.get_default_node().host)

set(r5_nodes) # Prints most/all of the primary nodes in the cluster
set(r6_nodes) # Prints only one node

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High connection count on Redis Cluster shard 0001 with redis engine 6.2+ #3478

High connection count on Redis Cluster shard 0001 with redis engine 6.2+ #3478

mikecoder5 commented Jan 11, 2025

High connection count on Redis Cluster shard 0001 with redis engine 6.2+ #3478

High connection count on Redis Cluster shard 0001 with redis engine 6.2+ #3478

Comments

mikecoder5 commented Jan 11, 2025