Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce command replay filter to avoid command replaying after reconnect #1310

Open
doo871128 opened this issue Jun 14, 2020 · 6 comments · May be fixed by #3118
Open

Introduce command replay filter to avoid command replaying after reconnect #1310

doo871128 opened this issue Jun 14, 2020 · 6 comments · May be fixed by #3118
Labels
type: feature A new feature
Milestone

Comments

@doo871128
Copy link

Feature Request

Is your feature request related to a problem? Please describe

For now, we are using lettuce to handle large scales of traffic. The traffic that comes to REDIS cluster per minute is more than 10 million per minute. In this situation, there are some cases where REDIS went down abruptly with various reasons(hardware failure, instance restart etc). The problem that we are facing is when the restarted REDIS instance is attached to REDIS cluster, there are a big CPU spikes causing latency spikes. It turns out that when REDIS instance goes back to REDIS cluster, lettuce client try to send the failed command to the recovered REDIS INSTANCE using AT LEAST ONCE mode of reliability. After changing auto reconnect mode to false,
the latency spike problem is gone. But, another problem that we have is we cannot reconnect that recovered REDIS instance. This is because only way to change reliability is to change auto connect mode. So, my suggestion/question is
why can't we set the only reliability level regardless of auto connect mode?
Is there any reason why reliability level should follow the auto reconnect mode?

A clear and concise description of what the problem is. Ex. I have an issue when [...]

Describe the solution you'd like

I want to suggest that if we can adjust the reliability level creating another option like RELIABILITY.
That can solve my problem.

A clear and concise description of what you want to happen. Add any considered drawbacks.
For now, I didn't understand why this RELIABILITY has to follow the auto reconnect mode.

@doo871128 doo871128 added the type: feature A new feature label Jun 14, 2020
@mp911de
Copy link
Collaborator

mp911de commented Jun 14, 2020

So basically you’re asking for exceptionally completing commands if the server goes down so that non-executed commands aren’t retried upon reconnect?

@doo871128
Copy link
Author

Yes. That's what I want. What I want is
Reliability = AT_MOST_ONCE
autoConnect = true.

Currently, In order to use Reliability = AT_MOST_ONCE, we have to set autoConnect = false. so, reconnection will not be established.
@mp911de

@mp911de
Copy link
Collaborator

mp911de commented Jun 14, 2020

At least once wouldn’t work with auto-reconnect disabled while at most once would (that’s the default).

I’m rather considering a Predicate<RedisCommand> (retry commands, replay commands or something the like) that controls whether a command qualifies for retry. Right now, we filter cancelled commands and do not retry these. We collect all commands once the connection gets disconnected and commands that do not pass the predicate would be cancelled. In your case, you could pass in command -> false to achieve at most once semantics. With command rejection during disconnect, you also get fail fast behavior.

@mp911de mp911de changed the title Can we have another option to control RELIABILITY? Introduce command replay filter to avoid command replaying after reconnect Jun 24, 2020
@vbabenkoru
Copy link

vbabenkoru commented May 5, 2021

Looks like we're also experiencing this issue: during traffic spikes and resulting CPU load, Lettuce seems to amplify Redis calls, causing Redis to enter a failure state: Lettuce keeps reconnecting and resending commands, and Redis stays at 100% CPU and that results in more timeouts.
Does Lettuce take command timeouts into account? i.e. if command timeout is 10 seconds, does it only replay all commands made in the last 10 seconds, or everything including those that already timed out?
I second that being able to drop commands on reconnect would be an ideal solution. Disabling reconnect isn't really a solution, because it basically brings the whole service down on node failure.

@jdeppe-pivotal
Copy link

I’m rather considering a Predicate<RedisCommand> (retry commands, replay commands or something the like) that controls whether a command qualifies for retry.

@mp911de is this capability on your roadmap?

@mp911de
Copy link
Collaborator

mp911de commented Jul 28, 2021

It is, but I currently don't have the bandwidth to implement it.

@tishun tishun added the for: team-attention An issue we need to discuss as a team to make progress label Jul 15, 2024
@tishun tishun added this to the Backlog milestone Jul 15, 2024
@tishun tishun modified the milestones: Backlog, 6.6.0.RELEASE Jan 3, 2025
@tishun tishun removed the for: team-attention An issue we need to discuss as a team to make progress label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants