Sporadic issue with RedLock.GetHost #112

olivierboucher · 2024-03-07T15:08:35Z

Hi,

We're running into this issue every now and then on our production environment:

System.AggregateException: One or more errors occurred. (The specified endpoint is not defined (Parameter 'endpoint'))
 ---> System.ArgumentException: The specified endpoint is not defined (Parameter 'endpoint')
   at RedLockNet.SERedis.RedLock.GetHost(IConnectionMultiplexer cache)
   at RedLockNet.SERedis.RedLock.LockInstance(RedisConnection cache)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass32_0`2.<ForEachWorker>b__0(Int32 i)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker& currentWorker, Int64 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker& currentWorker, Int64 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.TaskReplicator.Run[TState](ReplicatableUserAction`1 action, ParallelOptions options, Boolean stopOnFirstFailure)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal,TInt](TInt fromInclusive, TInt toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal,TInt](TInt fromInclusive, TInt toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at RedLockNet.SERedis.RedLock.Acquire()
   at RedLockNet.SERedis.RedLock.Start()
   at EverflowIdentity.Services.DistributedNewKeyLock`1.LockAsync(Int32 millisecondsTimeout) in /app/EverflowIdentity/Services/DistributedNewKeyLock.cs:line 26

It seems to happen randomly and sometimes months go by without encountering the issue.

Here is how we initialize the factory:

            _multiplexer = ConnectionMultiplexer.Connect(Configuration["RedisHost"]);
            var endPoints = new List<RedLockMultiplexer>
            {
                _multiplexer,
            };
            _redLockFactory = RedLockFactory.Create(endPoints);

            services.AddSingleton(_redLockFactory);

The code creating the lock:

                _lock = _redLockFactory.CreateLock("identity_key_lock", TimeSpan.FromMinutes(1),
                    TimeSpan.FromMinutes(1),
                    TimeSpan.FromMilliseconds(200));

                if (!_lock.IsAcquired)
                {
                    return Task.FromResult(false);
                }

                return Task.FromResult(true);

Any idea what could be the root cause of this sporadic issue or how to mitigate against it?

The text was updated successfully, but these errors were encountered:

Tasteful · 2024-06-24T18:20:46Z

We are getting the same issue in one of our production environment and I have been able to track down the reason.

First, we are using redis sentinel with an master-replica setup in HA, this is running as a stateful set in kubernetes with the bitnami/redis helm-chart as installation. We have configure the useHostnames: false when the helm is deployed, this to get the IP instead of host-names. I have read after this was setup that it may be better to use the hostname when running services inside kubernetes but haven't have time to try that out yet.

Every now and then, the redis pod inside kubernetes is re-scheduled into a new worker or the redis-deployment is updated so the pod is recreated. Some times, the pod is getting the same IP back and then it works, but often they are getting a new IP back. Sentinel keeping track of what IP that are active and should be used, but the sentinel also keep a list of old IPs that not is active (bitnami/charts#5418) the IConnectionMultiplexer.GetEndPoints() returns a list with all known endpoints, even if we not can connect to them, when the IConnectionMultiplexer.GetServer(endPoint) is executed it will throw an ArgumentException if the endpoint isn't connected and this will blow up and make the lock throw exception instead of real result.

I copied the GetHost method into a controller that writing out the result of the information and after I added the try/catch as in the linked pr the result looks like

10.42.12.167:6379 (Cant connect to host), 10.42.12.208:6379 (Cant connect to host), 10.42.34.52:6379 (Cant connect to host), 10.42.34.67:6379 (Cant connect to host), 10.42.12.208:6379 (Cant connect to host), 10.42.12.67:6379 (master, disconnected), 10.42.34.52:6379 (Cant connect to host), 10.42.12.167:6379 (Cant connect to host), 10.42.34.128:6379 (master, disconnected), 10.42.34.52:6379 (Cant connect to host), 10.42.34.67:6379 (Cant connect to host), 10.42.12.67:6379 (master, disconnected), 10.42.34.165:6379 (master), 10.42.34.78:6379 (slave), 10.42.34.128:6379 (master, disconnected), 10.42.12.60:6379 (slave), 10.42.12.67:6379 (master, disconnected), 10.42.34.165:6379 (master), 10.42.34.78:6379 (slave), 10.42.34.128:6379 (master, disconnected), 10.42.12.60:6379 (slave), 10.42.12.67:6379 (master, disconnected)

In general I think this logging can be improved (or cached) with better result, now the connection information will not be clear and it will take resources for each execution that using the GetHost method.

Tasteful · 2024-06-25T19:04:59Z

@samcook Do you have time to look into this and the linked PR? Maybe a patch release with this?

Our workaround with using host names instead of IPs did not work because we run into another issue that was setting redis sentinel in tilt-mode, so we need to reverted that change.

samcook · 2024-06-26T10:26:07Z

Hi @Tasteful,

I've had a look and managed to reproduce the issue - it seems like there's an issue with StackExchange.Redis when it loses its current sentinel connection (and your redis instances are on ephemeral IPs, like in Kubernetes), after reconnecting it seems to still retain some endpoints that it doesn't have matching 'server' entries for.

Anyway, the proposed PR looks like it's probably a reasonable solution. I'll take a look at getting that merged in and pushing out a new release tomorrow.

As an aside though, looking at the StackExchange.Redis behaviour, if you can solve the tilt mode problem it's probably best if you can use the hostname mode, as over time StackExchange.Redis seems to ends up with more and more of these phantom connections in its list of endpoints.

Tasteful · 2024-06-26T12:38:25Z

Thanks!

Yes, the tilt mode actual exists in another bitnami/charts#9689 and the solution on that is to use ip-addresses instead of hostnames :)

Earlier today I created my own version of the Redlock dll and injected in the deployment pipeline and have from that point no logs about distributed lock exceptions.

sluebbert · 2024-07-12T14:20:55Z

FWIW, we were getting this same error, but it came down to the simple issue of us giving existing ConnectionMultiplexer connections to the RedLockFactory and then later disposing those connections while continuing to use the RedLockFactory instance. 🤦

I just figured I'd mention this just in case it saves someone else some time.

Tasteful linked a pull request Jun 24, 2024 that will close this issue

Catch exceptions for when endpoints isn't connected #113

Open

Tasteful mentioned this issue Jun 26, 2024

Connection established but 'The specified endpoint is not defined' StackExchange/StackExchange.Redis#2728

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sporadic issue with RedLock.GetHost #112

Sporadic issue with RedLock.GetHost #112

olivierboucher commented Mar 7, 2024

Tasteful commented Jun 24, 2024 •

edited

Loading

Tasteful commented Jun 25, 2024

samcook commented Jun 26, 2024

Tasteful commented Jun 26, 2024

sluebbert commented Jul 12, 2024

Sporadic issue with RedLock.GetHost #112

Sporadic issue with RedLock.GetHost #112

Comments

olivierboucher commented Mar 7, 2024

Tasteful commented Jun 24, 2024 • edited Loading

Tasteful commented Jun 25, 2024

samcook commented Jun 26, 2024

Tasteful commented Jun 26, 2024

sluebbert commented Jul 12, 2024

Tasteful commented Jun 24, 2024 •

edited

Loading