Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add type and unit metadata labels #39
base: main
Are you sure you want to change the base?
Proposal: Add type and unit metadata labels #39
Changes from 16 commits
25305b0
e56b2be
6cfdbd4
de68600
e0d62d3
619537d
a2e5ed9
75ccddd
d9e0ad3
f1760bd
bce6d9c
0a545c8
94796fb
dec6f78
bc88072
05edf5d
0b070f8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For exposition formats, I guess it means this would allow:
Ideally I do opposite to what's written, so TYPE and UNIT wins here, or this could be blocked even by spec. Essentially I wonder about efficiency of parsers here.
For ingestion protocols, at least it feels metadata should override it (not what is written now) so it's easier for everyone.
Any further concerns around perf of parsing here for this @bboreham?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about what would be on the
/federate
endpoint if the Prometheus server scraped conflicting metrics from applications. Our choices are:#UNIT
metadata, and no__unit__
labels.__unit__
labels, and no#UNIT
metadata.__unit__
labels, and a#UNIT
with one of the units the metric has./federate
endpoint. It isn't a regression from what we have today.#UNIT
.I'm OK with disallowing mixing
__unit__
and#UNIT
, since we could relax that in the future if needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline, and:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fond of adding
__unit__
and__type__
as labels in the exposition format. I think they should stay as a single line at the top of the metric family like it is today.The Unit/Type will be the same for all time series in a metric family. If they become labels in the exposition format, it will just be repeated information for all time series. It increases the payload, and we waste CPU time parsing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have "Users see no difference to exposition formats." stated above, but if you put the type and especially the unit in the
# UNIT
part then you need them in the metric name as well, defeating the purpose. You're other option is to put them into the series on each line.Looking at /metrics is for humans and follows what you see is what you get in terms of showing what you can query - implying that if you can query by
__type__
or the API/UI returns the__type__
then you should put them on each line.And the implication of that is that there needs to be at least one more efficient exposition format that is not so verbose for machines to read.
So I'm not sure this would work without either putting the new labels on every line or modifying OpenMetrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course the above argument doesn't stand if we follow @beorn7 's advice and not show
__type__
,__unit__
on the query API (/query
,/query_range
) , although I would love to have it on the metadata queries endpoint like (/series
) for Grafana to be able to know what's a float series and what's a native histogram.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should show
__type__
and__unit__
in the query APIs. I'm just saying, in the zeroth step, we should remove those labels in PromQL operations (like the name is removed).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with showing
__type__
and__unit__
in query APIs, just like we show__name__
.That's a really good catch! This is something we'll probably need to tackle with OM 2.0. Is this what you had in mind with this issue @bwplotka prometheus/OpenMetrics#280?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "Users see no difference to exposition formats", I meant that this proposal does not change how metrics are currently exposed in exposition formats. I'll leave that for follow-up discussions in prometheus/OpenMetrics#280 and elsewhere. I updated the language in 0a545c8.
I am open to ways of querying for type and unit that align better with the existing #TYPE and #UNIT comments, if we want to preserve the "what you see is what you get" experience of Prometheus. We do support querying for
__name__
, so there is at least some precedence.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought about differentiating native histograms and classic histograms:
As Prometheus doesn't really have a notion of "classic histogram time series", but instead breaks a classic histogram (and a summary) into a number of series, I think these series should have their own distinct types (rather than labeling them all as "histogram" or "summary), which would also remove the dependency on magic suffixes.
More concretely, I propose the following label values for the
__type__
label:counter
total
counter
gauge
unknown
__type__
labelinfo
sum
count
gsum
gcount
quantile
quantile
labelbucket
le
labelgbucket
le
labelhistogram
ghistogram
I'm currently unsure how Prometheus handles OM's stateset type and left it out.
About the
total
alias: The idea is to treattotal
andcounter
completely equivalent. If we went for the.
idea sketched out below, a legacy metric likehttp_requests_total
could now be written ashttp_requests.total
, and even OTel counters could be addressed by a syntax that almost looks like they got a "total" suffix.Similarly, we could switch to not attaching _count/_sum/_bucket anymore when ingesting classic histograms or summaries, avoiding the naming collision issue. What was addressed as
request_duration_count
orrequest_duration_sum
would now becomerequest_duration.count
andrequest_duration.sum
.We could even go wild and fall back to retrieving the count and sum from a native histogram if the users specifies
request_duration.count
orrequest_duration.sum, but these metrics don't exist directly, while a native histogram
request_duration.histogram` exists.About the combination of summary and classic histogram: This is something you could already trick into Prometheus by just providing a combination of metrics like
foo{quantile="..."}
,foo_bucket{le="..."}
,foo_count
,foo_sum
. I don't see a reason to forbid it. And I would even say we should support it to not break setups that used that trick.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing, just noticed this thanks
Funny how we ended up with similar idea https://docs.google.com/document/d/10Z1XKeQXxJAc_jKW0qEC8G4krlPTmE89k31FJfPVbro/edit?tab=t.0#heading=h.1zpzuhp77mpk
I will update my doc to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a fixed set of allowed strings for the
__type__
, we could probably declare them "reserved" and not usable for units. Then we could use the same special separator characters for both type and unit (http_request_duration.seconds.histogram
, equivalent tohttp_request_durations.histogram.seconds
). (Pitfall: If we extend the list of allowed types later, we might hit strings that have been used as units somewhere.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this deserves another quick proposal, I like reserved words for Prometheus, might work for majority for vendors, but ties PromQL to prometheus metric types.. which is perhaps already done with certain functions? Not to that degree 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would see
{"http.server.duration"}~seconds.histogram
as the most "obvious" syntax.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't see this as an argument specifically about this syntax. You could also say that with the original proposal,
foo_seconds
becomesfoo{__unit__="seconds", __type__="histogram"
}.Or in other words: If the original metric has unit and/or type in its name, we wouldn't need to change the name, but could still add the labels. (i.e. what's written below as mitigation), which is true for the original proposal, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional suffixes don't have to be used, unless you need to resolve an ambiguity or you miss the old-style suffixes. This solution is mostly meant for the latter. And in the case of ambiguity, it is still simpler than writing the verbose label syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how safe this is. If PromQL does not hide the return, new labels will suddenly appear in users' queries.
I'm also not sure about the potential impact other than dashboards/alerts having new labels, could that be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely open to this alternative. I like having the option of showing
__type__
and__unit__
in the UI, but if it has to wait for Prom 4.0 because it is breaking, that seems like a significant downside.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it could be also merge of both, where PromQL only returns those labels based on something e.g. on conflict or when type or unit is selectors.
The downside is that we now have to really deeply design groupings and joins.. should they use those labels or not. With the current proposal the answer is trivial and straightforward. This helps with creating alerts and recording rules etc
I think I would be keen to try the current approach e.g. behind feature flag and see who will use it and how to tell more. cc @juliusv @roidelapluie WDYT from UX point of view for non-UI things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like the idea to model OTel's notion of "identifying metadata" as labels in Prometheus (to be distinguished from "non-identifying metadata", which is closer to Prometheus's original metadata idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately OTel resource attributes are all considered identifying today (correct me if I'm wrong David).
There is work in progress to replace resource attributes with a new thing called
Entity
. Entity will be able to distinguish, using otel lingo,Identifying
fromDescriptive
attributes, but this will take some years to get popular in the industryThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTel resource attributes are all considered identifying today, but we don't treat them all as identifying when translating from OTel -> Prometheus. Instead, we just treat service.name + service.instance.id + service.namespace as identifying. That is correct if the OTLP originates from an OTel-instrumented application, but not if it wasn't (e.g. comes from an otel collector receiver).
Anyways, what I had in mind was more things that users don't think of as "labels", like schema url, or maybe scope name + version. I think resource attributes make sense as "real" labels, rather than metadata.