Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to AWS ElastiCache Redis after few days of successful working #3086

Open
srikarrampa opened this issue Dec 17, 2024 · 2 comments
Labels
status: waiting-for-feedback We need additional information before we can continue

Comments

@srikarrampa
Copy link

Bug Report

Observing that lettuce client fails to process commands successfully after 3-4 days of successful deployments and working.
The current client options are

TimeoutOptions timeoutOptions =
        TimeoutOptions.builder().fixedTimeout(Duration.ofMillis(1000)).build();

    ClientOptions clientOptions =
        ClientOptions.builder()
            .socketOptions(
                SocketOptions.builder()
                    .connectTimeout(Duration.ofMillis(1000))
                    .build())
            .timeoutOptions(timeoutOptions)
            .autoReconnect(true))
            .suspendReconnectOnProtocolFailure(true)
            .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)
            .build();

    ClientResources clientResources =
        DefaultClientResources.builder()
            .dnsResolver(new DirContextDnsResolver())
            .reconnectDelay(
                Delay.fullJitter(
                    Duration.ofMillis(300),
                    Duration.ofMillis(700),
                    redisConfig.getBase(),
                    TimeUnit.MILLISECONDS))
            .build();
    LettuceClientConfiguration.LettuceClientConfigurationBuilder builder =
        LettuceClientConfiguration.builder()
            .readFrom(ReadFrom.REPLICA_PREFERRED)
            .clientOptions(clientOptions)
            .clientResources(clientResources)
            .redisCredentialsProviderFactory(redisCredentialsProviderFactory());

Current Behavior

Stack trace
java.util.concurrent.CompletionException: io.lettuce.core.RedisCommandExecutionException: WRONGPASS invalid username-password pair or user is disabled.
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1141)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
	at io.lettuce.core.RedisHandshake.lambda$tryHandshakeResp3$2(RedisHandshake.java:134)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
	at io.lettuce.core.protocol.AsyncCommand.doCompleteExceptionally(AsyncCommand.java:143)
	at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:124)
	at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:115)
	at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:67)
	at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:762)
	at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:697)
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:614)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1503)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1366)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1415)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: io.lettuce.core.RedisCommandExecutionException: WRONGPASS invalid username-password pair or user is disabled.
	at io.lettuce.core.internal.ExceptionFactory.createExecutionException(ExceptionFactory.java:151)
	at io.lettuce.core.internal.ExceptionFactory.createExecutionException(ExceptionFactory.java:120)
	... 29 more

Error message: Disabling autoReconnect due to initialization failure

We see the above error and after that all the requests are rejected. Is there some issue in the configuration settings? The process we follow for creating a connection with AWS ElastiCache is as follows,

  • Assume a role that provides with SessionToken
  • Pass that to a Class that extends RedisCredentialsProvider which resolvesCredentials (public Mono resolveCredentials())

Input Code

Input Code
// your code here;

Expected behavior/code

The reconnection should happen

Environment

  • Lettuce version(s): 6.5.1.RELEASE
  • Redis version: 7.1.0

Questions

  • What is the correct ClientOptions to use in production to allow re-connect. Is it advised to have all three together?
    .autoReconnect(true))
    .suspendReconnectOnProtocolFailure(true)
    .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)

  • Based on the above settings, reconnect is suspended. how should we recover from this?

  • Restarting the deployment fixed the issue. However we cannot restart the deployment everytime. Please do provide if any documentation that explains the best practices to connect when using AWS ElastiCache

@tishun
Copy link
Collaborator

tishun commented Dec 20, 2024

Hello @srikarrampa ,

I do not see something obviously wrong with your set up, from the code samples you have provided.

When reconnecting the driver would attempt to fetch new credentials from the redisCredentialsProviderFactory.
It seems that these new credentials could not be verified by the server and it is returning

WRONGPASS invalid username-password pair or user is disabled.

Have you tried logging the credentials used and trying them out manually with some other client such as redis-cli?

@tishun
Copy link
Collaborator

tishun commented Dec 20, 2024

Questions

  • What is the correct ClientOptions to use in production to allow re-connect. Is it advised to have all three together?
    .autoReconnect(true))
    .suspendReconnectOnProtocolFailure(true)
    .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)

AWS maintains the Amazon ElasiCache so perhaps this question is best suited for them.
Here is their documentation on the subject

  • Based on the above settings, reconnect is suspended. how should we recover from this?

Reconnect is suspended because the credentials used are wrong.
The driver assumes this is not recoverable so it stops any additional attempts.
I think the proper solutions is to never pass credentials that are wrong in the first place.

Otherwise you can do some manual retry attempt if you listen for such exceptions and reset the connection, but I am not sure this is a valid use case.

  • Restarting the deployment fixed the issue. However we cannot restart the deployment everytime. Please do provide if any documentation that explains the best practices to connect when using AWS ElastiCache

Restarting is definitely not a solution, but it is a clue to what is wrong - seems like the credentials after restart are different than the ones provided during reconnect. You need to investigate why.

@tishun tishun added the status: waiting-for-feedback We need additional information before we can continue label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue
Projects
None yet
Development

No branches or pull requests

2 participants