Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a metric for geo replication for tracking replicated subscriptions snapshot timeouts #21793

Closed
1 of 2 tasks
lhotari opened this issue Dec 22, 2023 · 3 comments · Fixed by #22381 · May be fixed by nikam14/pulsar#3
Closed
1 of 2 tasks

Add a metric for geo replication for tracking replicated subscriptions snapshot timeouts #21793

lhotari opened this issue Dec 22, 2023 · 3 comments · Fixed by #22381 · May be fixed by nikam14/pulsar#3
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@lhotari
Copy link
Member

lhotari commented Dec 22, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Geo replication replicated subscriptions (PIP-33) snapshot creation might time out.
The code contains a debug log message when this happens:

log.debug("[{}] Snapshot creation timed out for {}", topic.getName(), entry.getKey());

When this happens, the subscription state won't be reflected on the remote side and a backlog would build up.
There's no metric to detect this situation.

Solution

Add a new metric pulsar_replicated_subscriptions_snapshot_timeouts which is a counter (that only resets when the broker restarts).

Alternatives

No response

Anything else?

Increasing the timeout threshold replicatedSubscriptionsSnapshotTimeoutSeconds=30 -> replicatedSubscriptionsSnapshotTimeoutSeconds=60 could help resolve the situation. This metric would help detect when it would be necessary.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari lhotari added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Dec 22, 2023
@poorbarcode
Copy link
Contributor

@lhotari

Add a new metric pulsar_replicated_subscriptions_snapshot_timeouts which is a counter (that only resets when the broker restarts).

Agree with you

@nikam14
Copy link
Contributor

nikam14 commented Mar 28, 2024

@lhotari I have made a PR in forked repo can you take a look.

@lhotari
Copy link
Member Author

lhotari commented Mar 28, 2024

@lhotari I have made a PR in forked repo can you take a look.

@nikam14 looks good. Please go ahead and create a apache/pulsar PR. Please fill in the details in the description and name the PR properly too. The contribution guide contains advice unless the PR template explains it. For metrics, there will also need to be documentation to be added to pulsar-site repository.
You can usually get help also on Apache Pulsar Slack's #dev channel for anything related to contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
3 participants