Add a metric for geo replication for tracking replicated subscriptions snapshot timeouts #21793
Closed
1 of 2 tasks
Labels
type/enhancement
The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Search before asking
Motivation
Geo replication replicated subscriptions (PIP-33) snapshot creation might time out.
The code contains a debug log message when this happens:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/ReplicatedSubscriptionsController.java
Line 256 in 465fac5
When this happens, the subscription state won't be reflected on the remote side and a backlog would build up.
There's no metric to detect this situation.
Solution
Add a new metric
pulsar_replicated_subscriptions_snapshot_timeouts
which is a counter (that only resets when the broker restarts).Alternatives
No response
Anything else?
Increasing the timeout threshold
replicatedSubscriptionsSnapshotTimeoutSeconds=30
->replicatedSubscriptionsSnapshotTimeoutSeconds=60
could help resolve the situation. This metric would help detect when it would be necessary.Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: