Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool health state as value #2

Open
bmcgough opened this issue Sep 19, 2019 · 7 comments
Open

zpool health state as value #2

bmcgough opened this issue Sep 19, 2019 · 7 comments

Comments

@bmcgough
Copy link
Contributor

We have adopted zpool_prometheus for use on one of our clusters. We are prometheus+grafana users with some existing detailed dashboards.

I would very much like to be able to alert on health state and I cannot find a way to do this with the health state information in a label. I can make some useful graphs with the multistat plugin, grouped by label values, but that panel doesn't seem to support alerting.

Other zfs exporters index the health values like this:

0 ONLINE
1 DEGRADED
2 FAULTED
3 OFFLINE
4 UNAVAIL
5 REMOVED
6 AVAIL
7 INUSE
-1 no data/timeout

Is this something you would consider adding? No existing metrics would change, it would just be one additional metric per vdev.

Perhaps I am missing a way to do this in Grafana?

@richardelling
Copy link
Owner

Hi Ben,
The information to arrive at the state is in the nvlist we get. However, there is logic required to get to the actual state that isn't immediately obvious to folks. For example, AVAIL applies to a spare, but UNAVAIL can apply to other vdevs, too. In other words, there is not a 1:1 correlation between the state number and the state you will see in zpool status So, the question is: do you want a state that matches zpool status or do you want the state's raw number (that doesn't correspond directly to what you see in zpool status?

The way I handle this with influxdb is to put the state as a field in its decoded form. However, influxdb allows string (enums) for its fields and, sadly, prometheus only allows floats for its values. So we could add the state as a string (enum) to the label.

Thoughts?

@richardelling
Copy link
Owner

Or, we could expose the states directly as numbers and document the values while not trying to apply the UI logic. This would be trivial. For reference, the internal state numbers are documented starting here:
https://github.com/zfsonlinux/zfs/blob/master/include/sys/fs/zfs.h#L824

@bmcgough
Copy link
Contributor Author

I think that's the right call. Grafana should be able to apply that logic if we want graphs that match zpool status output. And of course the STATE label should remain as it is now. I would be happy to contribute my dashboard once I've built it (no promises on quality, I'm not a professional dashboard builder!).

It looks like the zpool status logic is here:

https://github.com/zfsonlinux/zfs/blob/2a0d41889e1c7c430e708cea76e70b11e0e2b0aa/lib/libzfs/libzfs_pool.c#L184

However, that includes the state SPLIT which I haven't seen documented anywhere... .

Thank you!

@richardelling
Copy link
Owner

that is exactly right, there are "user" states that aren't described in one specific place, not even in zpool_state_to_name().

I've got some other changes to push this weekend (adding size distribution histograms) and I'll add the trivial state values. Then we can work from there to see if there is a better way to consume the info.

FYI, split occurs when the zpool split command is used... not a commonly used command. But it shows how hardcoding the enum values becomes hard to maintain in dashboards.

@bmcgough
Copy link
Contributor Author

Oh, just realized that we rely on vdev mapping to go from ZFS vdev to physical location, so it would be most helpful if the path were included in the health metric. For example, we name our devices : and have paths like /dev/disk/by-vdev/1:7... I'm only seeing these as labels on some of the metrics.

@bmcgough
Copy link
Contributor Author

We began work on a python parser before finding your project (I found 4 other zfs exporters for prometheus through google, but it took combing through ZFS on Linux issues to find this!). We had decided that as long as we add new states to the end of the enum it should be safe enough. I hope that is what they will do if they need to add any internal states to ZFS.

I modified the Ubuntu CmakeLists.txt to enable CPACK creation of a DEB package. I'll contribute that shortly.

@richardelling
Copy link
Owner

Hi Ben, can you take a look at #5 and give feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants