[admin-tool][server] Add two admin tool commands for dumping consumer ingestion states and heartbeat states; Add logging for stale heartbeat replicas #1260
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[admin-tool][server] Add two admin tool commands for dumping consumer ingestion states and heartbeat states; Add logging for stale heartbeat replicas
We already have a admin-tool command to dump ingestion context for a specific topic partition's ingestion context on a specific server host. However, we found it is not very easy to detect which partition replica is actually lagging, when there are multiple partitions assigned to the host. We can iterate over all of them via Helix UI, but it is going to be time consuming.
This PR added 3 things to improve the usability.
My long term hope is these commands can be further extended to run periodically when you choose to "attach" to a specific host, and it can collect data for certain time and generate local visualization to help us understand ingestion performance and fairness (because we don't emit too many low level metrics so as to avoid metric explosion), but we need these endpoints to get started.
How was this PR tested?
Added an integration test to tests two admin tool commands.
Does this PR introduce any user-facing changes?