-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic issue with RedLock.GetHost #112
Comments
We are getting the same issue in one of our production environment and I have been able to track down the reason. First, we are using redis sentinel with an master-replica setup in HA, this is running as a stateful set in kubernetes with the bitnami/redis helm-chart as installation. We have configure the Every now and then, the redis pod inside kubernetes is re-scheduled into a new worker or the redis-deployment is updated so the pod is recreated. Some times, the pod is getting the same IP back and then it works, but often they are getting a new IP back. Sentinel keeping track of what IP that are active and should be used, but the sentinel also keep a list of old IPs that not is active (bitnami/charts#5418) the I copied the GetHost method into a controller that writing out the result of the information and after I added the try/catch as in the linked pr the result looks like
In general I think this logging can be improved (or cached) with better result, now the connection information will not be clear and it will take resources for each execution that using the GetHost method. |
@samcook Do you have time to look into this and the linked PR? Maybe a patch release with this? Our workaround with using host names instead of IPs did not work because we run into another issue that was setting redis sentinel in tilt-mode, so we need to reverted that change. |
Hi @Tasteful, I've had a look and managed to reproduce the issue - it seems like there's an issue with StackExchange.Redis when it loses its current sentinel connection (and your redis instances are on ephemeral IPs, like in Kubernetes), after reconnecting it seems to still retain some endpoints that it doesn't have matching 'server' entries for. Anyway, the proposed PR looks like it's probably a reasonable solution. I'll take a look at getting that merged in and pushing out a new release tomorrow. As an aside though, looking at the StackExchange.Redis behaviour, if you can solve the tilt mode problem it's probably best if you can use the hostname mode, as over time StackExchange.Redis seems to ends up with more and more of these phantom connections in its list of endpoints. |
Thanks! Yes, the tilt mode actual exists in another bitnami/charts#9689 and the solution on that is to use ip-addresses instead of hostnames :) Earlier today I created my own version of the Redlock dll and injected in the deployment pipeline and have from that point no logs about distributed lock exceptions. |
FWIW, we were getting this same error, but it came down to the simple issue of us giving existing ConnectionMultiplexer connections to the RedLockFactory and then later disposing those connections while continuing to use the RedLockFactory instance. 🤦 I just figured I'd mention this just in case it saves someone else some time. |
Hi,
We're running into this issue every now and then on our production environment:
It seems to happen randomly and sometimes months go by without encountering the issue.
Here is how we initialize the factory:
The code creating the lock:
Any idea what could be the root cause of this sporadic issue or how to mitigate against it?
The text was updated successfully, but these errors were encountered: