Introduce command replay filter to avoid command replaying after reconnect #1310

doo871128 · 2020-06-14T12:09:16Z

Feature Request

Is your feature request related to a problem? Please describe

For now, we are using lettuce to handle large scales of traffic. The traffic that comes to REDIS cluster per minute is more than 10 million per minute. In this situation, there are some cases where REDIS went down abruptly with various reasons(hardware failure, instance restart etc). The problem that we are facing is when the restarted REDIS instance is attached to REDIS cluster, there are a big CPU spikes causing latency spikes. It turns out that when REDIS instance goes back to REDIS cluster, lettuce client try to send the failed command to the recovered REDIS INSTANCE using AT LEAST ONCE mode of reliability. After changing auto reconnect mode to false,
the latency spike problem is gone. But, another problem that we have is we cannot reconnect that recovered REDIS instance. This is because only way to change reliability is to change auto connect mode. So, my suggestion/question is
why can't we set the only reliability level regardless of auto connect mode?
Is there any reason why reliability level should follow the auto reconnect mode?

A clear and concise description of what the problem is. Ex. I have an issue when [...]

Describe the solution you'd like

I want to suggest that if we can adjust the reliability level creating another option like RELIABILITY.
That can solve my problem.

A clear and concise description of what you want to happen. Add any considered drawbacks.
For now, I didn't understand why this RELIABILITY has to follow the auto reconnect mode.

mp911de · 2020-06-14T14:06:12Z

So basically you’re asking for exceptionally completing commands if the server goes down so that non-executed commands aren’t retried upon reconnect?

doo871128 · 2020-06-14T15:55:12Z

Yes. That's what I want. What I want is
Reliability = AT_MOST_ONCE
autoConnect = true.

Currently, In order to use Reliability = AT_MOST_ONCE, we have to set autoConnect = false. so, reconnection will not be established.
@mp911de

mp911de · 2020-06-14T19:19:49Z

At least once wouldn’t work with auto-reconnect disabled while at most once would (that’s the default).

I’m rather considering a Predicate<RedisCommand> (retry commands, replay commands or something the like) that controls whether a command qualifies for retry. Right now, we filter cancelled commands and do not retry these. We collect all commands once the connection gets disconnected and commands that do not pass the predicate would be cancelled. In your case, you could pass in command -> false to achieve at most once semantics. With command rejection during disconnect, you also get fail fast behavior.

vbabenkoru · 2021-05-05T00:27:00Z

Looks like we're also experiencing this issue: during traffic spikes and resulting CPU load, Lettuce seems to amplify Redis calls, causing Redis to enter a failure state: Lettuce keeps reconnecting and resending commands, and Redis stays at 100% CPU and that results in more timeouts.
Does Lettuce take command timeouts into account? i.e. if command timeout is 10 seconds, does it only replay all commands made in the last 10 seconds, or everything including those that already timed out?
I second that being able to drop commands on reconnect would be an ideal solution. Disabling reconnect isn't really a solution, because it basically brings the whole service down on node failure.

jdeppe-pivotal · 2021-07-27T20:47:06Z

I’m rather considering a Predicate<RedisCommand> (retry commands, replay commands or something the like) that controls whether a command qualifies for retry.

@mp911de is this capability on your roadmap?

mp911de · 2021-07-28T06:43:00Z

It is, but I currently don't have the bandwidth to implement it.

doo871128 added the type: feature A new feature label Jun 14, 2020

mp911de changed the title ~~Can we have another option to control RELIABILITY?~~ Introduce command replay filter to avoid command replaying after reconnect Jun 24, 2020

mp911de mentioned this issue Jun 24, 2020

Control Reliability not using autoreconnect. #1309

Closed

tishun added the for: team-attention An issue we need to discuss as a team to make progress label Jul 15, 2024

tishun added this to the Backlog milestone Jul 15, 2024

tishun mentioned this issue Dec 10, 2024

[improvement] separate command execution reliability from auto reconnect #3072

Closed

tishun modified the milestones: Backlog, 6.6.0.RELEASE Jan 3, 2025

tishun removed the for: team-attention An issue we need to discuss as a team to make progress label Jan 3, 2025

tishun linked a pull request Jan 3, 2025 that will close this issue

Introduce command replay filter to avoid command replaying after reconnect #1310 #3118

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce command replay filter to avoid command replaying after reconnect #1310

Introduce command replay filter to avoid command replaying after reconnect #1310

doo871128 commented Jun 14, 2020

mp911de commented Jun 14, 2020

doo871128 commented Jun 14, 2020

mp911de commented Jun 14, 2020

vbabenkoru commented May 5, 2021 •

edited

Loading

jdeppe-pivotal commented Jul 27, 2021

mp911de commented Jul 28, 2021

Introduce command replay filter to avoid command replaying after reconnect #1310

Introduce command replay filter to avoid command replaying after reconnect #1310

Comments

doo871128 commented Jun 14, 2020

Feature Request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

mp911de commented Jun 14, 2020

doo871128 commented Jun 14, 2020

mp911de commented Jun 14, 2020

vbabenkoru commented May 5, 2021 • edited Loading

jdeppe-pivotal commented Jul 27, 2021

mp911de commented Jul 28, 2021

vbabenkoru commented May 5, 2021 •

edited

Loading