Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add type and unit metadata labels #39

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions proposals/2024-09-25_metadata-labels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
## Type and Unit Metadata Labels

* **Owners:**
* David Ashpole [@dashpole](https://github.com/dashpole)

* **Implementation Status:** `Not implemented`

* **Related Issues and PRs:**
* https://github.com/open-telemetry/opentelemetry-specification/issues/2497

* **Other docs or links:**
* Survey Results: https://opentelemetry.io/blog/2024/prometheus-compatibility-survey/
* Slack thread: https://cloud-native.slack.com/archives/C01AUBA4PFE/p1726399373207819
* Doc with Options: https://docs.google.com/document/d/1t4ARkyOoI4lLNdKb0ixbUz7k7Mv_eCiq7sRKHAGZ9vg
* Prometheus PoC: https://github.com/prometheus/prometheus/compare/main...dashpole:prometheus:type_and_unit_labels

This document proposes adding the metric type and unit as labels on metrics.

## Why

Per [dev-summit consensus](https://docs.google.com/document/d/1uurQCi5iVufhYHGlBZ8mJMK_freDFKPG0iYBQqJ9fvA/edit#bookmark=id.q6upqm7itl24), we would like to avoid adding type and unit suffixes to metric names when translating from OpenTelemetry to Prometheus. These suffixes are currently required by the [compatibility specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md). However, OpenTelemetry currently considers the metric type and unit identifying, whereas Prometheus does not. Simply removing suffixes might result in "collisions" between distinct OpenTelemetry metrics which have the same name, but different types (less commonly) or units. Additionally, it is possible in Prometheus to metrics with the same name, but different value types (float64 vs native histogram).
dashpole marked this conversation as resolved.
Show resolved Hide resolved

### Pitfalls of the current solution

Roughly half of OpenTelemetry users surveyed preferred keeping metric names unmodified when translating to Prometheus. With our goal of being the default choice to store OpenTelemetry metrics, we should find a way to preserve metric names without introducing potential issues for users.

## Goals

Goals and use cases for the solution as proposed in [How](#how):

* [Required] OpenTelemetry users can query for the original names of their metrics.
* [Required] Users can filter by the metric type or unit in PromQL queries.
* [Required] PromQL returns a warning when querying across metrics with different units or types, or when using "inappropriate" operations for a metric type.
* [Nice to have] OpenTelemetry users can grep for the original names in the text exposition.

### Audience

This document is for Prometheus server maintainers, PromQL maintainers, and anyone interested in furthering compatibility between Prometheus and OpenTelemetry.

## Non-Goals

* Prometheus will not attempt to auto-convert between units when there is a conflict.
* Prometheus will not attempt to auto-convert between types (e.g. native histogram vs float series).
* Prometheus client libraries will not allow mixing metrics with the same name, but different type or unit in the exposition format. See potential future extensions.

## User Experience

When querying for a metric, users can filter for a type or unit by specifying a filter on the `__unit__` or `__type__` labels, which use the reserved `__` prefix to ensure they do not collide with user-provided labels.

For example:

* `my_metric{}` returns all series with any type or unit, including `__type__` and `__unit__` labels.
dashpole marked this conversation as resolved.
Show resolved Hide resolved
* `my_metric{__unit__="seconds", __type__="counter"}` returns only series with the specified type and unit.
dashpole marked this conversation as resolved.
Show resolved Hide resolved

When a query for a metric returns multiple metrics with a different `__type__` or `__unit__` label, but the same `__name__`, users see a warning in the UI.

Users don't see the `__type__` or `__unit__` labels in the Prometheus UI next to other labels by default.

Users see no difference to exposition formats.
dashpole marked this conversation as resolved.
Show resolved Hide resolved

## How

### PromQL Changes

When a query for a metric returns multiple metrics with a different `__type__` or `__unit__` label, but the same `__name__`, an info annotation will be returned with the PromQL response, which is otherwise unmodified.
dashpole marked this conversation as resolved.
Show resolved Hide resolved

Aggregations and label matches ignore `__unit__` and `__type__` and any operation removes the `__unit__` and `__type__` label (with the exception of `label_replace`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the intention here: are you suggesting that __unit__ and __type__ would be treated the same as __name__ is? (assuming delayed name removal is enabled)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the treatment should be exactly as __name__ (to keep things simple and consistent), so early removal by default and delayed removal if delayed removal is enabled. (Which is just "for now" as I think, if the experiment is successful, we should consider a "smart" change of the type label (and maybe even for the unit label, where it makes sense).


### Prometheus UI Changes

When displaying a metric's labels in the table or in the graph views, the UI will hide labels starting with `__` (double underscore) by default, similar to the current handling of `__name__`. A "Show System Labels" check-box will be added, which shows hidden labels when checked.

### Prometheus Server Ingestion

When receiving OTLP or PRW 2.0, or when scraping the text, OM, or proto formats, the type and unit of the metric are added as the `__type__` and `__unit__` labels, if not already present.
ArthurSens marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For exposition formats, I guess it means this would allow:

TYPE metric counter
UNIT metric seconds
HELP ...
metric{__type__="gauge"}

Ideally I do opposite to what's written, so TYPE and UNIT wins here, or this could be blocked even by spec. Essentially I wonder about efficiency of parsers here.

For ingestion protocols, at least it feels metadata should override it (not what is written now) so it's easier for everyone.

Any further concerns around perf of parsing here for this @bboreham?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about what would be on the /federate endpoint if the Prometheus server scraped conflicting metrics from applications. Our choices are:

  1. [current] Only #UNIT metadata, and no __unit__ labels.
  2. Only __unit__ labels, and no #UNIT metadata.
  3. [proposal] Mixed __unit__ labels, and a #UNIT with one of the units the metric has.
  • 1 will result in collisions, but maybe that is acceptable for the /federate endpoint. It isn't a regression from what we have today.
  • 2 works, but would be breaking for existing users that are consuming #UNIT.
  • 3 is confusing because of the mixing, but I wanted to leave that open as an option.

I'm OK with disallowing mixing __unit__ and #UNIT, since we could relax that in the future if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline, and:

  1. would epic to follow 1 and allow multiple the same named metric for different types and units in OM, let's try proposing that, it would help massively with exporters and federate
  2. type and unit from metadata is primary (in case of mix)
  3. Should we allow type and unit at all in exposition, SHOULD NOT maybe?

Copy link
Member

@ArthurSens ArthurSens Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too fond of adding __unit__ and __type__ as labels in the exposition format. I think they should stay as a single line at the top of the metric family like it is today.

The Unit/Type will be the same for all time series in a metric family. If they become labels in the exposition format, it will just be repeated information for all time series. It increases the payload, and we waste CPU time parsing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have "Users see no difference to exposition formats." stated above, but if you put the type and especially the unit in the # UNIT part then you need them in the metric name as well, defeating the purpose. You're other option is to put them into the series on each line.

Looking at /metrics is for humans and follows what you see is what you get in terms of showing what you can query - implying that if you can query by __type__ or the API/UI returns the __type__ then you should put them on each line.

And the implication of that is that there needs to be at least one more efficient exposition format that is not so verbose for machines to read.

So I'm not sure this would work without either putting the new labels on every line or modifying OpenMetrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the above argument doesn't stand if we follow @beorn7 's advice and not show __type__, __unit__ on the query API (/query , /query_range) , although I would love to have it on the metadata queries endpoint like (/series) for Grafana to be able to know what's a float series and what's a native histogram.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should show __type__ and __unit__ in the query APIs. I'm just saying, in the zeroth step, we should remove those labels in PromQL operations (like the name is removed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with showing __type__ and __unit__ in query APIs, just like we show __name__.

We have "Users see no difference to exposition formats." stated above, but if you put the type and especially the unit in the # UNIT part then you need them in the metric name as well, defeating the purpose. You're other option is to put them into the series on each line.

That's a really good catch! This is something we'll probably need to tackle with OM 2.0. Is this what you had in mind with this issue @bwplotka prometheus/OpenMetrics#280?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "Users see no difference to exposition formats", I meant that this proposal does not change how metrics are currently exposed in exposition formats. I'll leave that for follow-up discussions in prometheus/OpenMetrics#280 and elsewhere. I updated the language in 0a545c8.

I am open to ways of querying for type and unit that align better with the existing #TYPE and #UNIT comments, if we want to preserve the "what you see is what you get" experience of Prometheus. We do support querying for __name__, so there is at least some precedence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought about differentiating native histograms and classic histograms:

As Prometheus doesn't really have a notion of "classic histogram time series", but instead breaks a classic histogram (and a summary) into a number of series, I think these series should have their own distinct types (rather than labeling them all as "histogram" or "summary), which would also remove the dependency on magic suffixes.

More concretely, I propose the following label values for the __type__ label:

label value metric type original OpenMetrics type notes
counter counter Counter
total counter Counter alias for counter
gauge gauge Gauge
unknown unknown Unknown equivalent to a missing/empty __type__ label
info info Info would be great if we could somehow mark the data labels and the identifying labels, story for another day…
sum sum component of a classic histogram or a summary Histogram or Summary this would support a combination of summary and (classic) histogram
count count component of a classic histogram or a summary Histogram or Summary this would support a combination of summary and (classic) histogram
gsum sum component of a classic gauge histogram GaugeHistogram
gcount count component of a classic gauge histogram GaugeHistogram
quantile quantile of a summary Summary must have a quantile label
bucket bucket of a classic histogram Histogram must have an le label
gbucket bucket of a classic gauge histogram GaugeHistogram must have an le label
histogram native counter histogram Histogram for any schema (i.e. includes NHCB)
ghistogram native gauge histogram GaugeHistogram for any schema (i.e. includes NHCB)

I'm currently unsure how Prometheus handles OM's stateset type and left it out.

About the total alias: The idea is to treat total and counter completely equivalent. If we went for the . idea sketched out below, a legacy metric like http_requests_total could now be written as http_requests.total, and even OTel counters could be addressed by a syntax that almost looks like they got a "total" suffix.

Similarly, we could switch to not attaching _count/_sum/_bucket anymore when ingesting classic histograms or summaries, avoiding the naming collision issue. What was addressed as request_duration_count or request_duration_sum would now become request_duration.count and request_duration.sum.

We could even go wild and fall back to retrieving the count and sum from a native histogram if the users specifies request_duration.count or request_duration.sum, but these metrics don't exist directly, while a native histogram request_duration.histogram` exists.

About the combination of summary and classic histogram: This is something you could already trick into Prometheus by just providing a combination of metrics like foo{quantile="..."}, foo_bucket{le="..."}, foo_count, foo_sum. I don't see a reason to forbid it. And I would even say we should support it to not break setups that used that trick.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I like these, the one confusing thing would be that we already have "type" as metric family type with less choices. Mentioned here , also in exposition formats (e.g. Prometheus, OM1.0).

I'm assuming @dashpole just meant to have the OpenMetrics types in the __type__ label?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, we could switch to not attaching _count/_sum/_bucket anymore when ingesting classic histograms or summaries, avoiding the naming collision issue. What was addressed as request_duration_count or request_duration_sum would now become request_duration.count and request_duration.sum.

For existing use cases, it would be a breaking change. So I think we would need to provide a new API in the instrumentation library to make it explicit that you don't want suffixes (or migrate to NHCB conversion which already solves the problem). But some will stay on classic histograms, which means you have to keep supporting classic histograms for a long time. So this doesn't seem viable and the new labels solve some collisions anyway.

We could even go wild and fall back to retrieving the count and sum from a native histogram if the users specifies request_duration.count or request_duration.sum, but these metrics don't exist directly, while a native histogram request_duration.histogram` exists.

I think the "fall back" is not worth the effort. Either the user wants to stay on classic histograms, in which case it already works, no changes necessary to queries/dashboards. Or they want to switch to or introduce NHCB/OTEL/etc where the new syntax works. Also I'd hate to see PromQL try to "guess" what I meant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had "native summaries" and then got completely rid of classic histograms and "classic summaries", we could do a perfect mapping, but the existence of the "flattened" classic histograms and "classic summaries" is a deviation from the data model of the exposition formats. Using magic suffixes and label names to mark them was flawed from the beginning. Solving this via appropriate type labels would finally give us a solid solution.

I don't think we need to stick 1:1 to OM v1 types (at least one of which isn't even supported by Prometheus AFAIK), and even less so to a non-formal description of metric types in the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing use cases, it would be a breaking change. So I think we would need to provide a new API in the instrumentation library to make it explicit that you don't want suffixes

This is about ingestion, not about exposition. Instrumentation libraries can stay as they are. Ingestors can decide if they want old world or new world.

I think the "fall back" is not worth the effort.

It's not so much a fallback. It's a way to avoid the function call (histogram_sum, histogram_count), which has implication because of PromQL syntax (e.g. cannot use a range selector on the outcome of a function). We could use different operators for it, but since the outcome is exactly the same, why would we?

Copy link
Contributor Author

@dashpole dashpole Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming @dashpole just meant to have the OpenMetrics types in the type label?

Yes, that was my intent. I could see an argument for extending this to preserve OTLP types (e.g. updowncounter), but that is a different discussion.

My primary concern with using those "distinct types" is that people being able to use them in queries is an important use-case, and I think requires users to learn the new concept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can hammer out the precise strings for the type label later, but I would like to point out that the exposition formats do not make a difference between classic and native histograms, while we really want to mark them differently in Prometheus. We already ran into problem with tooling like query editors that cannot tell if a histogram is native or classic before seeing the actual time series.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add addressing collisions between native and classic histograms as an explicit requirement/goal of the proposal.


PRW 1.0 is omitted because metadata is sent separately from timeseries, making it infeasible to add the labels at ingestion time.

Users can modify the type and unit of a metric at ingestion time by using `metric_relabel_configs`, and relabeling the `__type__` and `__unit__` labels.

### Implementation Plan

#### Milestone 1: Feature flag for adding labels

Add a feature flag: `--enable-feature=identifying-type-and-unit`. When enabled, `__type__` and `__unit__` labels are added when receiving OTLP or PRW 2.0, or when scraping the text, OM, or proto formats. Implement any changes required to allow relabeling `__type__` and `__unit__` in `metric_relabel_configs`.

#### Milestone 2: UI and PromQL changes

During this stage, implement the UI and PromQL changes above. Iterate based on feedback. Changes should have no effect unless the identifying-type-and-unit feature flag is enabled.

#### Milestone 3: Add NoNameChanges option for OTLP translation

Add an option, `NoNameChanges` for the OTLP translation strategy. When enabled, it disables UTF-8 sanitization and the addition of suffixes. In the documentation for this option, recommend that `--enable-feature=identifying-type-and-unit` is enabled.

## Alternatives

### “Real” Type and Unit suffixes in **name**
dashpole marked this conversation as resolved.
Show resolved Hide resolved
dashpole marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, just noticed this thanks

Funny how we ended up with similar idea https://docs.google.com/document/d/10Z1XKeQXxJAc_jKW0qEC8G4krlPTmE89k31FJfPVbro/edit?tab=t.0#heading=h.1zpzuhp77mpk

I will update my doc to this

bwplotka marked this conversation as resolved.
Show resolved Hide resolved

Similar to suffixes of _<unit> and _<type/total> but make it an explicit suffix using a delimiter not currently permitted in metric names. Specifying suffixes is optional when querying for a metric. When the type or unit suffix is omitted from a query, it would (design TBD) return results which include any type or unit suffix which exists for that name.

NOTE: Dot for units and slash for type is just one example, there might be better operators/characters to use.
dashpole marked this conversation as resolved.
Show resolved Hide resolved

Writing queries that include the type and unit would be recommended as a best-practice by the community.

For example:

* Querying for foo/histogram would return results that include both foo.seconds/histogram and foo.milliseconds/histogram.
* Querying for foo.seconds would return results that include both foo.seconds/histogram and foo.seconds/counter.
* Querying for http_server_duration would return results that include both foo.seconds/histogram and foo.milliseconds/counter.
* Querying for an OpenTelemetry metric, such as http.server.duration, with suffixes would require querying for ”http.server.duration”.seconds/histogram. Note that suffixes are outside of quotes.

This solution is not chosen because:

* Requires PromQL changes (intrusive), touches on “dot” operator ideas.
dashpole marked this conversation as resolved.
Show resolved Hide resolved
* Adding suffixes outside of quotes looks strange: {“http.server.duration”.seconds/histogram}
dashpole marked this conversation as resolved.
Show resolved Hide resolved
* Rolling this out would be breaking for existing Prometheus users: E.g. {foo_seconds} becomes {foo.seconds/histogram}. Could this be part of OM 2.0?
dashpole marked this conversation as resolved.
Show resolved Hide resolved
* Mitigation: users just stay with {foo_seconds.seconds/histogram}
* Users might be surprised by, or dislike the additional suffixes and delimiters in the metric name results
* Mitigation: Opt-in for query engines?
Comment on lines +121 to +122
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional suffixes don't have to be used, unless you need to resolve an ambiguity or you miss the old-style suffixes. This solution is mostly meant for the latter. And in the case of ambiguity, it is still simpler than writing the verbose label syntax.


### Type and Unit in complex value types

Unlike other metric types, native histograms are a complex type and contain many fields in a single value. The metric type and unit could be added as fields of a complex value type in a similar way.

This solution is not chosen because:

* Requires intrusive changes to all formats (text, proto, etc.).
* Requires new PromQL syntax for querying the type and unit.

### "Hide" __type__ and __unit__ labels in PromQL, instead of UI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how safe this is. If PromQL does not hide the return, new labels will suddenly appear in users' queries.

I'm also not sure about the potential impact other than dashboards/alerts having new labels, could that be a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely open to this alternative. I like having the option of showing __type__ and __unit__ in the UI, but if it has to wait for Prom 4.0 because it is breaking, that seems like a significant downside.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it could be also merge of both, where PromQL only returns those labels based on something e.g. on conflict or when type or unit is selectors.

The downside is that we now have to really deeply design groupings and joins.. should they use those labels or not. With the current proposal the answer is trivial and straightforward. This helps with creating alerts and recording rules etc

I think I would be keen to try the current approach e.g. behind feature flag and see who will use it and how to tell more. cc @juliusv @roidelapluie WDYT from UX point of view for non-UI things?


Existing UIs don't handle the `__type__` and `__unit__` labels. To mitigate this, PromQL could omit the `__type__` and `__unit__` labels from the query response. Doing this would avoid requiring UIs to update to handle the new labels.

This solution is not chosen because:

* It deviates from the current handling of the `__name__` label.
* We expect metadata, like type and unit to be useful to display in the UI, and want to enable these use-cases.
* It should be a small amount of effort to hide these labels.

### Omit __unit__ label from counting series for summaries and histograms

The unit of a `_count` or `_bucket` series of a histogram is always `1`, since these are counts. Applying a `__unit__` label of `seconds`, for example, is confusing because the series themselves measure a counts of observations. Alternatively, we could only apply `__unit__` labels to the `_sum` series of histograms and summaries, and the quantile series of summaries.

However, this would re-introduce part of the problem we are trying to solve. Histogram buckets' `le` label's value is in the units of the metric. If the unit is not included in the labels of the histogram bucket series, series with `le=1000` in seconds, for example, would collide with series with `le=1000` in milliseconds. We could include the unit in the `le` label (e.g. `le=1000s`), but that would be much more disruptive to users without much additional benefit.

The `_count` series of histograms and summaries could omit the `__unit__` label without this consequence, since the count does not have any relation to the unit. This proposal includes the `__unit__` label for consistency so that users can always query for metrics that have a specific unit.

## Potential Future Extensions

### Handle __type__ and __unit__ in PromQL operations

Initially, aggregations and label matches will ignore `__unit__` and `__type__` and all PromQL operations remove the `__unit__` and `__type__` label (with the exception of `label_replace`). Over time, we can update each function to keep these labels by implementing the appropriate logic. For example, adding two gauges together should yeild a gauge.

### __type__ and __unit__ from client libraries

One obvious extension of this proposal would be for Prometheus clients to start sending `__type__` and `__unit__` labels with the exposition format. This would:

* Allow mixing metrics with the same name, but different types and units in the same endpoint (an explicit non-goal of this proposal).

This is excluded from this proposal because:

* OpenTelemetry users can use [Views](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#view) to resolve collisions. That should be "good enough".
* Doing this would require some changes to client libraries, which is significantly more work, and is harder to experiment with.
* There can be conflicts between `# TYPE` and `# UNIT` metadata for a metric, and `__type__` and `__unit__` labels. This adds complexity to understanding the exposition format, and requires establishing rules for dealing with conflicts.
ArthurSens marked this conversation as resolved.
Show resolved Hide resolved

### More metadata labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like the idea to model OTel's notion of "identifying metadata" as labels in Prometheus (to be distinguished from "non-identifying metadata", which is closer to Prometheus's original metadata idea).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately OTel resource attributes are all considered identifying today (correct me if I'm wrong David).

There is work in progress to replace resource attributes with a new thing called Entity. Entity will be able to distinguish, using otel lingo, Identifying from Descriptive attributes, but this will take some years to get popular in the industry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTel resource attributes are all considered identifying today, but we don't treat them all as identifying when translating from OTel -> Prometheus. Instead, we just treat service.name + service.instance.id + service.namespace as identifying. That is correct if the OTLP originates from an OTel-instrumented application, but not if it wasn't (e.g. comes from an otel collector receiver).

Anyways, what I had in mind was more things that users don't think of as "labels", like schema url, or maybe scope name + version. I think resource attributes make sense as "real" labels, rather than metadata.


OpenTelemetry has lots of other metadata that may be a good fit for this "metadata label" pattern. For example, OpenTelemetry's scope name and version, or the schema URL are technically identifying for a time series, but are unlikely to be something that we want to display prominently in the UI.

If the pattern of adding `__type__` and `__unit__` works well for this metadata, we could consider making the pattern more generic:

* UIs should hide _all_ labels starting with `__` by default, not just `__name__`, `__type__`, and `__unit__`.
* Introduce a mechanism to allow OTel libraries to provide additional metadata labels. However, this has the potential to introduce collisions, since `__` has been reserved thus far. Maybe a specific allowlist (e.g. `__otel`-prefixed labels) could work.

## Action Plan

* [ ] Feature flag for adding labels
* [ ] UI and PromQL changes
* [ ] Add NoNameChanges option for OTLP translation
Loading