Support pausing and resuming consumers #4966

ripienaar · 2024-01-17T15:57:08Z

Proposed change

Introduce an API on $JS.API.CONSUMER.PAUSE.*.* that takes as request:

type JSApiConsumerPauseRequest struct {
    PauseUntil *time.Time `json:"pause_until,omitempty"`
}

The consumer will set itself in a paused state but continue to handle acks for in-flight messages. No further message deliveries will be done after this point, other than deliveries being inhibited the consumer functions as usual.

If a delay is given a timer will auto-resume the consumer. If no time or a time in the past is given a paused consumer will resume.

Consumer info includes 2 new fields:

  Paused bool `json:"paused,omitempty"`
  PauseRemaining time.Duration `json:"pause_remaining,omitempty"`

The paused state and time time would need to be persisted to the raft layer such that server restarts would not unpause paused consumers. This is done using the consumer configuration that has a new value:

PauseUntil time.Time `json:"pause_until,omitempty"`

When given at create time this creates a paused consumer, it's not updatable at runtime using a configuration update, but the PAUSE api will update this setting. Essentially the only way to change this post-create is with the PAUSE API.

Advisories for pause and unpause to be added on io.nats.jetstream.advisory.v1.consumer_pause with pertinant info

Use case

It is difficult to schedule maintenance on central resources on a large distributed system where 100s or 1000s of clients are accessing data in a stream.

We would like to be able to pause a Consumer such that it appears healthy but just doesnt deliver any messages.

During the pause maintenance can happen and resources accessed by clients will not be under constant pressure, later the stream can be unpaused and work will continue.

This would happen without impacting running clients - other than they would see pending messages in stream info but not get any deliveries.

This would apply to push and pull consumers.

Contribution

No response

The text was updated successfully, but these errors were encountered:

derekcollison · 2024-01-17T22:55:12Z

Should delay just be a parseable string? "1s", "2h"? If we can't parse we return an error.

Do we want to have maximum and minimums or start simple and add in limits as needed?

ripienaar · 2024-01-17T22:57:57Z

We don’t have other cases of such strings in the API it’s also a bit go centric so Duration seems best and let UIs handle it as they wish be it strings like that in CLI or some kind oh picker on web

let’s start simple.

derekcollison · 2024-01-17T22:59:49Z

ok, but if we use time.Duration then its nanos, not millis.. But I hear you on consistency..

ripienaar · 2024-01-17T23:01:46Z

Indeed - nanos. Will fix.

derekcollison · 2024-01-18T14:39:22Z

@neilalexander and @Jarema could you work with @ripienaar and this writeup and schedule this work?

Jarema · 2024-01-23T09:18:22Z

@derekcollison this has been scheduled to start on the 5th of February, with a plan to finish before the 16th of February. @neilalexander will be working on it.

Jarema · 2024-02-06T10:56:47Z

@ripienaar @neilalexander Can I ask for an update of the final design after recent discussions?

ripienaar · 2024-02-06T11:03:05Z

from my perspective I think the pause/resume APIs are still the right direction. Details for how we actually implement that in a way thats not massive plumbing in the server is for @neilalexander to comment

derekcollison · 2024-02-06T17:14:58Z

I vote it should just be part of the consumer config, with no new API endpoints.

ripienaar · 2024-02-06T17:20:29Z

At this point I'd say lets just not add this feature. We can go back and find requirements.

As it stands the few requirements we do have will not be met without these extra APIs, so lets just close the issue and move on.

derekcollison · 2024-02-06T17:23:05Z

I thought it would be easier but not impossible, you are saying they would require securing just that functionality vs general update yes? And without general callouts we only have new APIs to secure individually, that correct?

ripienaar · 2024-02-06T17:24:59Z

Yes, I think there is a need to cater for 2 distinct users - operational needs and configuration needs. Often configuration may not be changed without approvals by change advisory boards etc.

Doing maintenance should not require a configuration change.

Those doing maintenance should not need to be authorized to do a configuration change.

ripienaar · 2024-02-06T18:24:11Z

Capturing a discussion that keeps coming up around this one:

Question: Should the paused until configuration be updatable as configuration?
Answer: We have the pattern where updates to consumer configs are idempotent and as a result applications set their confguration at startup often. We added the action to help distinguish a bit, its problematic though as that is not something one can do authz against today.

Given this pattern the question is who owns this property? If an administrator sets the pause state to x and the app starting up sets it to start-paused or unpaused, how is the system to distinguish between a normal app making the API call to create a paused/unpaused consumer and a admin asking the consumer to be paused?

I dont think the API has the context of who is calling it for what reason and it would be undesirable to allow a unexpected config update by a starting worker to unpause a consumer.

It's essential that the responsibilities of creation and administration be seperate here, it could be created paused - but a administrator must be able to unpause it and know if that creation is run again it will not again be paused. Or if an administrator overrides the pause from 1 hour to 10 minutes that a service startup does not again set it back to 1 hour.

I cant think of a way to capture this distinction (except maybe (ab)using the action property? But see authz comments and about roles and responsibilties). Happy to hear if there's a design solution that both allows this property to be updated as config and the ownership of who has responsibility for its management to be retained.

ripienaar · 2024-02-20T13:18:43Z

Related server PR #5066

Jarema · 2024-02-28T06:26:00Z

Server PR has been merged 🎉
Closing the issue.
This feature will be part of release 2.11

ripienaar added the proposal Enhancement idea or proposal label Jan 17, 2024

bruth added this to the 2.11.0 milestone Jan 17, 2024

derekcollison self-assigned this Jan 17, 2024

chenjpu mentioned this issue Jan 18, 2024

Support weight consumers #4970

Open

derekcollison assigned neilalexander and Jarema and unassigned derekcollison Jan 18, 2024

ripienaar mentioned this issue Feb 19, 2024

Support consumer pause nats-io/nats-architecture-and-design#266

Open

15 tasks

Jarema closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pausing and resuming consumers #4966

Support pausing and resuming consumers #4966

ripienaar commented Jan 17, 2024 •

edited

Loading

derekcollison commented Jan 17, 2024

ripienaar commented Jan 17, 2024

derekcollison commented Jan 17, 2024

ripienaar commented Jan 17, 2024

derekcollison commented Jan 18, 2024

Jarema commented Jan 23, 2024

Jarema commented Feb 6, 2024

ripienaar commented Feb 6, 2024

derekcollison commented Feb 6, 2024

ripienaar commented Feb 6, 2024

derekcollison commented Feb 6, 2024

ripienaar commented Feb 6, 2024

ripienaar commented Feb 6, 2024

ripienaar commented Feb 20, 2024

Jarema commented Feb 28, 2024

Support pausing and resuming consumers #4966

Support pausing and resuming consumers #4966

Comments

ripienaar commented Jan 17, 2024 • edited Loading

Proposed change

Use case

Contribution

derekcollison commented Jan 17, 2024

ripienaar commented Jan 17, 2024

derekcollison commented Jan 17, 2024

ripienaar commented Jan 17, 2024

derekcollison commented Jan 18, 2024

Jarema commented Jan 23, 2024

Jarema commented Feb 6, 2024

ripienaar commented Feb 6, 2024

derekcollison commented Feb 6, 2024

ripienaar commented Feb 6, 2024

derekcollison commented Feb 6, 2024

ripienaar commented Feb 6, 2024

ripienaar commented Feb 6, 2024

ripienaar commented Feb 20, 2024

Jarema commented Feb 28, 2024

ripienaar commented Jan 17, 2024 •

edited

Loading