Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxWriter should store one record containing all perfdata per data point #7060

Open
onitake opened this issue Mar 29, 2019 · 7 comments · May be fixed by #8177
Open

InfluxWriter should store one record containing all perfdata per data point #7060

onitake opened this issue Mar 29, 2019 · 7 comments · May be fixed by #8177
Assignees
Labels
area/influxdb Metrics to InfluxDB enhancement New feature or request queue/wishlist

Comments

@onitake
Copy link

onitake commented Mar 29, 2019

Is your feature request related to a problem? Please describe.

The InfluxWriter addon writes performance data as individual metric-value pairs into InfluxDB.
This is fine for creating simple queries on individual data points, but makes it practically impossible to query across multiple metrics, even if they come from the same check.

Describe the solution you'd like

Instead of single values tagged with a metric, only a single record for each check should be created, with the metric fields keyed as the name of the metric instead of a generic "value" and a tag that specifies the metric.

So, instead of this:

time     value metric hostname       check
12345678 100   rxbps server-xyz-0001 bandwidth
12345678 10000 txbps server-xyz-0001 bandwidth
12349000 200   rxbps server-xyz-0001 bandwidth
12349000 12000 txbps server-xyz-0001 bandwidth

We would have this:

time     rxbps txbps hostname        check
12345678 100   10000 server-xyz-0001 bandwidth
12349000 200   12000 server-xyz-0001 bandwidth

Describe alternatives you've considered

Due to limitations in the InfluxDB query language, the only possibility to combine metrics would be some sort of meta-query, but that must be implemented in the system that display the results. In Grafana, such a feature is being considered, but remains missing at present.

Additional context

Discussion on the Grafana bug tracker: grafana/grafana#12324

I understand this may make other use cases (like automatically generated, separate graphs) more difficult to create, but it would greatly simplify build special-purpose graphs like for network bandwidth (as seen above). Perhaps a configuration option could be added to the addon to enable this kind of behaviour?

@Al2Klimov Al2Klimov added enhancement New feature or request area/influxdb Metrics to InfluxDB labels Apr 1, 2019
@dnsmichi
Copy link
Contributor

dnsmichi commented Apr 8, 2019

I'm not a friend of config options making things work in one and another direction and schema. That will become hard to support.

Since you'll write about the limitations of the InfluxDB query language, isn't that something this should be solved there? I've read that 2.0 changes quite a few things and introduces a new query language. That may be the better direction. @Mikesch-mp might also know more, I know as much of InfluxDB as I know how to ship simple data.

@onitake
Copy link
Author

onitake commented Apr 8, 2019

@dnsmichi This is less about config options and more about the way data is stored in InfluxDB. Having a configurable option is just a suggestion on how to avoid incompatibility with existing consumers.

While you are right that the query language should allow combining data across records, it's currently a fact that it doesn't and that effectively kills a valid use case. I'd be happy to wait and see what Influx 2.0 offers, but it may also get worse...

I'd also be interested in why data is stored in separate records in the first place. Not putting everything gathered from a single check into a single record wastes space and may lead to ambiguities.

@Thomas-Gelf
Copy link
Contributor

Discussed this with @dnsmichi, the change itself wouldn't be complicated. This would however cause some extra work for:

  • keeping two implementations for compatibility reasons
  • adding a related config switch
  • documentation
  • new related sample Dashboards for Grafana

Transition to a new default would involve multiple release cycles. What we're sending right now is something like this:

interface,host=a,metric=octetsIn value=100 <timestamp>
interface,host=a,metric=octetsOut value=200 <timestamp>

Not exactly like this, but similar. What @onitake expects is:

interface,host=a octetsIn=100,octetsOut=200  <timestamp>

This would be a lot easier to handle in Grafana / in an InfluxDB query. When implementing this we should also take care of perfdata using the check_multi syntax and ship one line per instance:

interface,host=a,instance=eth0 octetsIn=100,octetsOut=200 <timestamp>
interface,host=a,instance=eth1:1 octetsIn=100,octetsOut=300 <timestamp>

Creating dynamic dashboards similar to we're building right now for Icinga vSphereDB would then become a lot easier.

Cheers,
Thomas

@dnsmichi
Copy link
Contributor

dnsmichi commented Apr 8, 2019

Hi,

thanks, now I understand things better. One thing to note for metadata and thresholds ... I am not sure how this could be stored in a changed format.

Current behaviour

hostname=a measurement=interface metrics=eth0 value=23,min=4,max=100 <ts>
hostname=a measurement=interface metrics=eth1 value=25,min=7,max=120 <ts>
hostname=b measurement=interface metrics=eth0 value=27,min=9,max=129 <ts>

Desired behaviour

interface,hostname=a eth0=23,eth1=25,eth0_max=100,eth1_max=120 <ts>

Something else, string prefix, or similar?

Cheers,
Michael

@Thomas-Gelf
Copy link
Contributor

For the records, the syntax is roughly:

|measurement|,tag_set| |field_set| |timestamp|

As far as I can recall, in Icinga 2 measurement is the check command, tag_set contains host, service and metric with metric being the perfdata label. field_set contains value, warn, crit, min and max - if given. To continue with your example, usually eth0 isn't a perfdata label, while octetsIn is. So, ignoring the whole check_multi topic, your example would probably read:

interface,hostname=a,service=Interface\ eth0,metric=octetsIn value=23,min=4,max=100 <ts>
interface,hostname=a,service=Interface\ eth0,metric=octetsOut value=17,min=4,max=100 <ts>
interface,hostname=a,service=Interface\ eth0,metric=errorsIn value=0,min=4,max=10 <ts>

The proposed change would result in this output:

interface,hostname=a,service=Interface\ eth0 octetsIn=23,octetsOut=17,errorsIn=0 <ts>

Now let's pick a more advanced example, a check plugin writing perfdata according to the check_multi syntax could (depending on the plugin) give us something like this:

interface::eth0::octetsIn=23 octetsOut=17 errorsIn=0 interface::eth1::octetsIn=133 octetsOut=45 errorsIn=0 

This should be transformed into:

interface,hostname=a,service=Interfaces,interface=eth0 octetsIn=23,octetsOut=17,errorsIn=0 <ts>
interface,hostname=a,service=Interfaces,interface=eth1 octetsIn=133,octetsOut=45,errorsIn=0 <ts>

That way we would have InfluxDB data structured in a way the plugin writer expected them to look like. As you can see, I left out the optional meta parameters (warn/crit/limits). When you opt in for them they must be prefixed with the related label, as they differ per field. This would read as follows:

load,hostname=a,service=System\ Load load1=3,load1_warn=4,load1_crit=8,load5=2,load5_warn=3,load5_crit=4,load15=1.4,load15_warn=2,load15_crit=4 <ts>

@onitake: would this match you expect to see?

@onitake
Copy link
Author

onitake commented Apr 8, 2019

This is pretty much what I'm looking for, yes. Thanks for the additional insights on the limits and duplication.

One thing to note is that tags are stored efficiently (i.e. they are indexed), while fields are written as-is. So, if you store the limits as fields, they will likely increase database size significantly.
As I understand, the limit's aren't stored by default, so this won't directly affect everybody. Correct?

@dnsmichi dnsmichi added the needs-sponsoring Not low on priority but also not scheduled soon without any incentive label May 21, 2019
@Al2Klimov Al2Klimov removed the needs-sponsoring Not low on priority but also not scheduled soon without any incentive label Aug 13, 2020
@Al2Klimov Al2Klimov self-assigned this Aug 13, 2020
Al2Klimov added a commit that referenced this issue Aug 13, 2020
@Al2Klimov Al2Klimov linked a pull request Aug 13, 2020 that will close this issue
Al2Klimov added a commit that referenced this issue Dec 14, 2020
... for enabling an alternative schema.

refs #7060
@Al2Klimov
Copy link
Member

For the record, I'd also benefit from this. Currently I have to visually subtract cached memory from used memory in Grafana. With e.g. #8177 Grafana could subtract this for me.

Al2Klimov added a commit that referenced this issue Aug 15, 2023
... for enabling an alternative schema.

refs #7060
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/influxdb Metrics to InfluxDB enhancement New feature or request queue/wishlist
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants