DataDog · estherk15 · Jan 7, 2025 · Jan 8, 2025 · Jan 22, 2025 · Jan 22, 2025
@@ -2,6 +2,10 @@
 title: Rollup
 aliases:
     - /graphing/functions/rollup/
+further_reading:
+- link: "/dashboards/guide/rollup-cardinality-vizualizations"
+  tag: "Documentation"
+  text: "Understanding rollup function and cardinality in visualizations"
 ---
 
 Every metric query is inherently aggregated. However, appending the `.rollup()` function at the end of a query allows you to perform custom [time aggregation][1] that overrides the defaults. This function enables you to define:
@@ -77,20 +81,9 @@ Rollups should usually be avoided in [monitor][5] queries, because of the possib
 
 If your monitors are unexpectedly evaluating in a "No Data" status, consider reviewing your settings for rollups and evaluation windows. For instance, if a monitor has a 4-minute rollup and a 20-minute evaluation window, it produces one data point every 4 minutes, leading to a maximum of 5 data points within the window. If the "Require Full Window" option is enabled, the evaluation may result in "No Data" because the window is not fully populated.
 
-## Other functions
-
-{{< whatsnext desc="Consult the other available functions:" >}}
-    {{< nextlink href="/dashboards/functions/algorithms" >}}Algorithmic: Implement Anomaly or Outlier detection on your metric.{{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/arithmetic" >}}Arithmetic: Perform Arithmetic operation on your metric.  {{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/count" >}}Count: Count non zero or non null value of your metric. {{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/exclusion" >}}Exclusion: Exclude certain values of your metric.{{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/interpolation" >}}Interpolation: Fill or set default values for your metric.{{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/rank" >}}Rank: Select only a subset of metrics. {{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/rate" >}}Rate: Calculate custom derivative over your metric.{{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/regression" >}}Regression: Apply some machine learning function to your metric.{{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/smoothing" >}}Smoothing: Smooth your metric variations.{{< /nextlink >}}
-    {{< nextlink href="/dashboards/functions/timeshift" >}}Timeshift: Shift your metric data point along the timeline. {{< /nextlink >}}
-{{< /whatsnext >}}
+ ## Further reading
+
+ {{< partial name="whats-next/whats-next.html" >}}
 
 [1]: /dashboards/functions/#add-a-function
 [2]: /metrics/faq/rollup-for-distributions-with-percentiles/

@@ -41,6 +41,7 @@ cascade:
 
 {{< whatsnext desc="Functions:" >}}
     {{< nextlink href="/dashboards/guide/how-weighted-works" >}}How does weighted() work?{{< /nextlink >}}
+    {{< nextlink href="/dashboards/guide/rollup-cardinality-vizualizations" >}}Understanding rollup function and cardinality in visualizations{{< /nextlink >}}
 {{< /whatsnext >}}
 
 {{< whatsnext desc="Deprecated APIs:" >}}

@@ -0,0 +1,68 @@
+---
+title: Understanding Rollup Function and Cardinality in Visualizations
+further_reading:
+- link: "/dashboards/functions/rollup/"
+  tag: "Documentation"
+  text: "Learn more about the Rollup function"
+---
+
+## Overview
+
+Visualizations in data analysis often rely on aggregation functions to summarize data over time. One common challenge arises when using the rollup function alongside distinct or unique cardinality measures. 
+
+The interaction between rollup functions and cardinality measures can lead to unexpected results when visualizing data. You need to understand these nuances to interpret visualizations accurately. By aligning expectations with the nature of rollup results and employing clear queries, you can gain valuable insights from their data.
+
+This document explains how the rollup function operates, particularly in the context of cardinality, and provides best practices on how to interpret visualization results accurately.
+
+## Understanding cardinality in timeseries
+
+**Cardinality**   
+: The number of tag values associated with a tag key for a metric.
+
+Cardinality refers to counting unique elements within a dataset. When applied to timeseries data, this often involves counting distinct users, sessions, or events within time frames, such as hours or days. 
+
+A common misconception with visualizations occurs when the sum of distinct counts over short intervals is expected to match the distinct count over a longer period. This is often not the case due to the nature of cardinality.
+
+### Example: Distinct user counts
+
+Consider a scenario where you track distinct users visiting a website. Each day, you observe 100 unique users, totaling 700 across a week. However, the actual number of distinct users over the entire week might be 400, as many users visit the site on multiple days. This discrepancy arises because each time frame (such as each day) independently counts unique users, which inflates the sum when compared to a single, longer rollup time frame.
+
+## Rollup functionality and unexpected results
+
+When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.
-When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.
+When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket, but only once across the entire day.
-When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.
+When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket, but only once across the entire day.
+
+### Implications for visualizations
+
+Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
-Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
+By default, visualizations show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 users who visited your website in an hourly rollup, while a direct query shows 121 users over the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
-Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
+By default, visualizations show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 users who visited your website in an hourly rollup, while a direct query shows 121 users over the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
+
+## Rollups with averages and cardinality
+
+Averages involving cardinality can also present challenges. 
+
+For example, hourly averages for the proportion of distinct users without errors may consistently appear high, even at 99.5%. Yet, weekly averages can reveal a lower percentage, decreasing to 97.5% due to the broader time frame.
+
+This discrepancy is due to the weekly calculation aggregating more unique user visits, which means more error occurrences over a longer period.
+
+### Example calculation
+
+Suppose a site experiences 18,000 error events among 13,000 users in a week. Hourly, this might average to about 107 error-afflicted users out of 20,000 total users per hour, resulting in a lower error rate. However, weekly aggregation reveals a higher error rate due to more users encountering errors across the entire week.
+
+When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.
+
+## Solutions and best practices
+
+### Understanding data rollups
+
+Interpreting rollup results requires a clear understanding of how data is aggregated. To grasp the full picture, manually calculate distinct counts over desired intervals and compare them to rollup outputs. This approach clarifies discrepancies and aids in accurate data interpretation.
+
+### Aligning expectations with reality
+
+Align expectations with how distinct counts function in rollups. Recognize that the sum of individual intervals does not necessarily reflect a longer aggregation period. Instead, focus on the distinct nature of each interval's count and apply this understanding to interpret visualizations correctly.
+
+### Clarifying queries
+
+Rephrase queries to highlight the data's behavior to make results more intuitive. For example, instead of asking how many users visited each day, ask how many unique users visited at least once during the entire week. This perspective helps manage expectations and aligns them with the data's inherent characteristics.
+
+## Further reading
+
+{{< partial name="whats-next/whats-next.html" >}}