Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rollup guide #27008

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 7 additions & 14 deletions content/en/dashboards/functions/rollup.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
title: Rollup
aliases:
- /graphing/functions/rollup/
further_reading:
- link: "/dashboards/guide/rollup-cardinality-vizualizations"
tag: "Documentation"
text: "Understanding rollup function and cardinality in visualizations"
---

Every metric query is inherently aggregated. However, appending the `.rollup()` function at the end of a query allows you to perform custom [time aggregation][1] that overrides the defaults. This function enables you to define:
Expand Down Expand Up @@ -77,20 +81,9 @@ Rollups should usually be avoided in [monitor][5] queries, because of the possib

If your monitors are unexpectedly evaluating in a "No Data" status, consider reviewing your settings for rollups and evaluation windows. For instance, if a monitor has a 4-minute rollup and a 20-minute evaluation window, it produces one data point every 4 minutes, leading to a maximum of 5 data points within the window. If the "Require Full Window" option is enabled, the evaluation may result in "No Data" because the window is not fully populated.

## Other functions

{{< whatsnext desc="Consult the other available functions:" >}}
{{< nextlink href="/dashboards/functions/algorithms" >}}Algorithmic: Implement Anomaly or Outlier detection on your metric.{{< /nextlink >}}
{{< nextlink href="/dashboards/functions/arithmetic" >}}Arithmetic: Perform Arithmetic operation on your metric. {{< /nextlink >}}
{{< nextlink href="/dashboards/functions/count" >}}Count: Count non zero or non null value of your metric. {{< /nextlink >}}
{{< nextlink href="/dashboards/functions/exclusion" >}}Exclusion: Exclude certain values of your metric.{{< /nextlink >}}
{{< nextlink href="/dashboards/functions/interpolation" >}}Interpolation: Fill or set default values for your metric.{{< /nextlink >}}
{{< nextlink href="/dashboards/functions/rank" >}}Rank: Select only a subset of metrics. {{< /nextlink >}}
{{< nextlink href="/dashboards/functions/rate" >}}Rate: Calculate custom derivative over your metric.{{< /nextlink >}}
{{< nextlink href="/dashboards/functions/regression" >}}Regression: Apply some machine learning function to your metric.{{< /nextlink >}}
{{< nextlink href="/dashboards/functions/smoothing" >}}Smoothing: Smooth your metric variations.{{< /nextlink >}}
{{< nextlink href="/dashboards/functions/timeshift" >}}Timeshift: Shift your metric data point along the timeline. {{< /nextlink >}}
{{< /whatsnext >}}
## Further reading

{{< partial name="whats-next/whats-next.html" >}}

[1]: /dashboards/functions/#add-a-function
[2]: /metrics/faq/rollup-for-distributions-with-percentiles/
Expand Down
1 change: 1 addition & 0 deletions content/en/dashboards/guide/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ cascade:

{{< whatsnext desc="Functions:" >}}
{{< nextlink href="/dashboards/guide/how-weighted-works" >}}How does weighted() work?{{< /nextlink >}}
{{< nextlink href="/dashboards/guide/rollup-cardinality-vizualizations" >}}Understanding rollup function and cardinality in visualizations{{< /nextlink >}}
{{< /whatsnext >}}

{{< whatsnext desc="Deprecated APIs:" >}}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: Understanding Rollup Function and Cardinality in Visualizations
estherk15 marked this conversation as resolved.
Show resolved Hide resolved
further_reading:
- link: "/dashboards/functions/rollup/"
tag: "Documentation"
text: "Learn more about the Rollup function"
---

## Overview

Visualizations in data analysis often rely on aggregation functions to summarize data over time. One common challenge arises when using the rollup function alongside distinct or unique cardinality measures.

The interaction between rollup functions and cardinality measures can lead to unexpected results when visualizing data. You need to understand these nuances to interpret visualizations accurately. By aligning expectations with the nature of rollup results and employing clear queries, you can gain valuable insights from their data.

This document explains how the rollup function operates, particularly in the context of cardinality, and provides best practices on how to interpret visualization results accurately.
estherk15 marked this conversation as resolved.
Show resolved Hide resolved

## Understanding cardinality in timeseries

**Cardinality**
: The number of tag values associated with a tag key for a metric.

Cardinality refers to counting unique elements within a dataset. When applied to timeseries data, this often involves counting distinct users, sessions, or events within time frames, such as hours or days.

A common misconception with visualizations occurs when the sum of distinct counts over short intervals is expected to match the distinct count over a longer period. This is often not the case due to the nature of cardinality.

### Example: Distinct user counts

Consider a scenario where you track distinct users visiting a website. Each day, you observe 100 unique users, totaling 700 across a week. However, the actual number of distinct users over the entire week might be 400, as many users visit the site on multiple days. This discrepancy arises because each time frame (such as each day) independently counts unique users, which inflates the sum when compared to a single, longer rollup time frame.
estherk15 marked this conversation as resolved.
Show resolved Hide resolved

## Rollup functionality and unexpected results

When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.
When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket, but only once across the entire day.


### Implications for visualizations

Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few terms here like "scalar value" and "direct query" that I'm not familiar with, so I'm not 100% I understand this example. I wonder if we could make it sound a little more like the previous example? Maybe something like:

Suggested change
Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
By default, visualizations show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 users who visited your website in an hourly rollup, while a direct query shows 121 users over the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.

Reading it again, I think I'm confused because both of these numbers are over an hour-long period, so I'm not sure where the daily rollup part comes in 🤔


## Rollups with averages and cardinality

Averages involving cardinality can also present challenges.

For example, hourly averages for the proportion of distinct users without errors may consistently appear high, even at 99.5%. Yet, weekly averages can reveal a lower percentage, decreasing to 97.5% due to the broader time frame.

This discrepancy is due to the weekly calculation aggregating more unique user visits, which means more error occurrences over a longer period.
estherk15 marked this conversation as resolved.
Show resolved Hide resolved

### Example calculation

Suppose a site experiences 18,000 error events among 13,000 users in a week. Hourly, this might average to about 107 error-afflicted users out of 20,000 total users per hour, resulting in a lower error rate. However, weekly aggregation reveals a higher error rate due to more users encountering errors across the entire week.
estherk15 marked this conversation as resolved.
Show resolved Hide resolved

When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.

## Solutions and best practices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section feels pretty vague and not particularly helpful to me, but I'm not sure how I would improve it.

I'm also not the target audience for this, so I'll trust your judgment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2¢ is that this section feels more theoretical than you would expect after reading a heading like "solutions and best practices."

I'm wondering if we could remove it altogether, since I think all three of these points are already covered earlier in the topic. Otherwise, maybe we could introduce some more examples here with some specific calculations for readers to follow along with, so the advice feels more actionable. If you do the latter, I might incorporate it into the above sections because I think it could aid the conceptual understanding - I don't think I'd collect it at the end, in case readers don't scroll that far 🙂

This one is tough to get right! It's necessarily way into the weeds because it's a counterintuitive nuance that users very understandably struggle with, and that's super difficult to explain away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yea, that makes sense. It seems like a repeat of what was already covered so I'll remove.

estherk15 marked this conversation as resolved.
Show resolved Hide resolved

### Understanding data rollups

Interpreting rollup results requires a clear understanding of how data is aggregated. To grasp the full picture, manually calculate distinct counts over desired intervals and compare them to rollup outputs. This approach clarifies discrepancies and aids in accurate data interpretation.

### Aligning expectations with reality

Align expectations with how distinct counts function in rollups. Recognize that the sum of individual intervals does not necessarily reflect a longer aggregation period. Instead, focus on the distinct nature of each interval's count and apply this understanding to interpret visualizations correctly.

### Clarifying queries

Rephrase queries to highlight the data's behavior to make results more intuitive. For example, instead of asking how many users visited each day, ask how many unique users visited at least once during the entire week. This perspective helps manage expectations and aligns them with the data's inherent characteristics.
estherk15 marked this conversation as resolved.
Show resolved Hide resolved

## Further reading

{{< partial name="whats-next/whats-next.html" >}}
Loading