Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elastic_agent] Improvements to the Elastic Agent Metrics Overview dashboard #12488

Open
andrewkroh opened this issue Jan 27, 2025 · 5 comments · May be fixed by #12524
Open

[elastic_agent] Improvements to the Elastic Agent Metrics Overview dashboard #12488

andrewkroh opened this issue Jan 27, 2025 · 5 comments · May be fixed by #12524
Labels
bug Something isn't working, use only for issues dashboard Relates to a Kibana dashboard bug, enhancement, or modification. enhancement New feature or request Integration:elastic_agent Elastic Agent Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team [elastic/elastic-agent-data-plane]

Comments

@andrewkroh
Copy link
Member

Issues

  1. The memory usage and open handles charts should split on the component.id instead of the elastic_agent.process. The elastic_agent.process is not unique. For example there can be multiple filebeat processes. We want the split to be same as it is for the CPU Usage chart.
  2. The queue depth/percentage charts are broken. They don't work at all. I think it is because they are looking at logs-* instead of metrics-*.
  3. The cgroup memory chart could be more clear. Remove the "split by" (or only show elastic-agent) because all processes are a member of the same cgroup. This will make it easier to read the chart to see how close the value is to the limit. The "limit" line should also be made thicker because it's hard to notice (like 2px instead of 1px).
  4. Nothing on this page indices when CPU throttling is occurring. Add a chart showing the system.process.cgroup.cpu.stats.throttled.ns metric. See https://github.com/user-attachments/assets/894e6e76-d4a1-4cc2-8171-6599170510d7.

Image


This is an example visualization show the CPU throttling data:

Image
@andrewkroh andrewkroh added bug Something isn't working, use only for issues dashboard Relates to a Kibana dashboard bug, enhancement, or modification. enhancement New feature or request Integration:elastic_agent Elastic Agent Team:Elastic-Agent Label for the Agent team labels Jan 27, 2025
@elasticmachine
Copy link

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@strawgate
Copy link
Contributor

strawgate commented Jan 28, 2025

We should set minimum values on some of those graphs (like events duplicated) so they don't show as flat lines through the middle of the graph

Fyi on the queue metrics elastic/beats#42093

It's a big bummer as they are extremely useful for detecting output back pressure

We should also consider improvements to the overview dashboard like showing agents with full queues, agents with the most output errors, etc

@jlind23
Copy link
Contributor

jlind23 commented Jan 28, 2025

cc @faec as you are currently working on it.

@jlind23 jlind23 added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team [elastic/elastic-agent-data-plane] label Jan 28, 2025
@elasticmachine
Copy link

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@strawgate
Copy link
Contributor

strawgate commented Jan 28, 2025

I'd also like to see us elevate problematic queue metrics (once fixed) and output metrics to the overview or info page with something like:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working, use only for issues dashboard Relates to a Kibana dashboard bug, enhancement, or modification. enhancement New feature or request Integration:elastic_agent Elastic Agent Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team [elastic/elastic-agent-data-plane]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants