Add rollup guide #27008

estherk15 · 2025-01-07T23:00:09Z

What does this PR do? What is the motivation?

Add a new guide to understand rollup functions and cardinality
DOCS-9413

Merge instructions

Merge readiness:

Ready for merge

Merge queue is enabled in this repo. To have it automatically merged after it receives the required reviews, create the PR (from a branch that follows the <yourname>/description naming convention) and then add the following PR comment:

/merge

Additional notes

github-actions · 2025-01-07T23:02:54Z

Preview links (active after the `build_preview` check completes)

New or renamed files

https://docs-staging.datadoghq.com/esther/docs-9413-rollup-cardinality-guide/dashboards/guide/rollup-cardinality-vizualizations

Modified Files

edanaher

Thank you for writing this up! It keeps the spirit of my original document, but feels much more polished, professional, and organized.

I do have a couple suggestions; if you disagree, I'd be happy to have a conversation either here in the PR or on a short call if you think that would be more productive.

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

edanaher · 2025-01-08T15:13:32Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.
+
+## Solutions and best practices


This section feels pretty vague and not particularly helpful to me, but I'm not sure how I would improve it.

I'm also not the target audience for this, so I'll trust your judgment.

My 2¢ is that this section feels more theoretical than you would expect after reading a heading like "solutions and best practices."

I'm wondering if we could remove it altogether, since I think all three of these points are already covered earlier in the topic. Otherwise, maybe we could introduce some more examples here with some specific calculations for readers to follow along with, so the advice feels more actionable. If you do the latter, I might incorporate it into the above sections because I think it could aid the conceptual understanding - I don't think I'd collect it at the end, in case readers don't scroll that far 🙂

This one is tough to get right! It's necessarily way into the weeds because it's a counterintuitive nuance that users very understandably struggle with, and that's super difficult to explain away.

Thanks! Yea, that makes sense. It seems like a repeat of what was already covered so I'll remove.

Co-authored-by: Evan Danaher <[email protected]>

cswatt · 2025-01-08T22:20:48Z

Added DOCS-9842 to track review.

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

janine-c · 2025-01-10T19:51:29Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+### Example calculation
+
+Suppose, over the course of a week, 2,000 users on a site experiences a total of 6,000 error events, while the remaining 22,000 users don't experience any errors.  Since a user's multiple errors may occur nearly simultaneously or in different hours, there could be an average of as many as 35 users experiencing errors per hour or as few as 11.


Suggested change

Suppose, over the course of a week, 2,000 users on a site experiences a total of 6,000 error events, while the remaining 22,000 users don't experience any errors. Since a user's multiple errors may occur nearly simultaneously or in different hours, there could be an average of as many as 35 users experiencing errors per hour or as few as 11.

Suppose, over the course of a week, 2,000 users on a site experience a total of 6,000 error events, while the remaining 22,000 users don't experience any errors. Since a user's multiple errors may occur nearly simultaneously or in different hours, there could be an average of as many as 35 users experiencing errors per hour or as few as 11.

I found this a bit hard to follow - I noticed that the heading says "calculation," but it's hard to follow where the numbers are coming from (35 or 11?) because I don't actually know how these values are being calculated. It's likely obvious for someone who has the intuitive math skills that I lack, but I worry that the cognitive load for this section is pretty high, and that makes it harder to see what the takeaway should be.

I wonder if it would be easier to follow if we introduced the takeaway first, then supported it with numbers, so the focus doesn't get bogged down in the details?

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

janine-c · 2025-01-11T00:31:37Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.
+
+## Solutions and best practices


My 2¢ is that this section feels more theoretical than you would expect after reading a heading like "solutions and best practices."

I'm wondering if we could remove it altogether, since I think all three of these points are already covered earlier in the topic. Otherwise, maybe we could introduce some more examples here with some specific calculations for readers to follow along with, so the advice feels more actionable. If you do the latter, I might incorporate it into the above sections because I think it could aid the conceptual understanding - I don't think I'd collect it at the end, in case readers don't scroll that far 🙂

This one is tough to get right! It's necessarily way into the weeds because it's a counterintuitive nuance that users very understandably struggle with, and that's super difficult to explain away.

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

estherk15 · 2025-01-17T15:29:28Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.
+
+## Solutions and best practices


Thanks! Yea, that makes sense. It seems like a repeat of what was already covered so I'll remove.

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

estherk15 · 2025-01-22T20:18:23Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+This disparity arises from multiple user visits being counted over a week, leading to a higher likelihood of users encountering errors over that period. See the following illustrative example for more context on this disparity.
+
+### Error rate variation and user interactions case study


@janine-c I tried to simplify this section, but essentially it's just a more in depth example that builds on the above example. Does it make the issue more confusing and would it be better left off the document?

I don't think you need two separate sections! I like that you have a more in-depth example that illustrates a bunch of pitfalls at once. I think you could probably replace the simpler one with the chonkier one :)

janine-c

Hey Esther, this looks great! It feels a lot simpler and less theoretical. I have a few remaining questions because this is so far outside of my domain, but feel free to ignore them if you think that someone who's already in this section of the docs will already understand what's happening there.

Your topic feels overall a lot more practical and easy to follow compared to the last draft I looked at - nicely done! I think users will come away from it feeling a lot more empowered to understand their data 👏🏻

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

janine-c · 2025-01-22T22:18:35Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+## Rollup functionality and unexpected results
+
+When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.


Suggested change

When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.

When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket, but only once across the entire day.

janine-c · 2025-01-22T22:47:38Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+### Implications for visualizations
+
+Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.


There are a few terms here like "scalar value" and "direct query" that I'm not familiar with, so I'm not 100% I understand this example. I wonder if we could make it sound a little more like the previous example? Maybe something like:

Suggested change

Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.

By default, visualizations show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 users who visited your website in an hourly rollup, while a direct query shows 121 users over the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.

Reading it again, I think I'm confused because both of these numbers are over an hour-long period, so I'm not sure where the daily rollup part comes in 🤔

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

janine-c · 2025-01-22T23:00:04Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+
+This disparity arises from multiple user visits being counted over a week, leading to a higher likelihood of users encountering errors over that period. See the following illustrative example for more context on this disparity.
+
+### Error rate variation and user interactions case study


I don't think you need two separate sections! I like that you have a more in-depth example that illustrates a bunch of pitfalls at once. I think you could probably replace the simpler one with the chonkier one :)

janine-c · 2025-01-22T23:13:59Z

content/en/dashboards/guide/rollup-cardinality-vizualizations.md

+Consider a scenario where 2,000 users experience 6,000 errors in a week, while 22,000 users face no errors. Daily error rates fluctuate, with hourly figures ranging from 11 to 35 users facing errors. Additionally, on an hourly basis, there are around 1,000 distinct users encountering errors weekly, reflecting an error rate of 0.11% to 0.35%.
+
+In contrast, over the week, 2,000 out of 24,000 users encounter errors, accounting for an 8.3% error rate—much higher than the hourly observation.


Could you explain this part:

Daily error rates fluctuate, with hourly figures ranging from 11 to 35 users facing errors. Additionally, on an hourly basis, there are around 1,000 distinct users encountering errors weekly

I can't intuit the difference between an hourly figure and the number on an hourly basis. Or how on an hourly basis, users encounter errors weekly? 😵 It might need some more dumbing down - I wonder if the difference here is that there's a difference between how the time frames are counted, and how users can choose to have that data display? Or maybe it's something else?

Co-authored-by: Janine Chan <[email protected]>

Add rollup guide

f956be0

estherk15 added the WORK IN PROGRESS No review needed, it's a wip ;) label Jan 7, 2025

estherk15 requested a review from a team as a code owner January 7, 2025 23:00

github-actions bot added the Guide Content impacting a guide label Jan 7, 2025

estherk15 removed the WORK IN PROGRESS No review needed, it's a wip ;) label Jan 8, 2025

estherk15 requested a review from edanaher January 8, 2025 14:18

edanaher reviewed Jan 8, 2025

View reviewed changes

Apply suggestions from code review

fa76c26

Co-authored-by: Evan Danaher <[email protected]>

cswatt added the editorial review Waiting on a more in-depth review label Jan 8, 2025

janine-c reviewed Jan 11, 2025

View reviewed changes

estherk15 commented Jan 17, 2025

View reviewed changes

estherk15 added 3 commits January 22, 2025 09:31

Apply suggestions from review

5e74d7e

Merge branch 'master' into esther/docs-9413-rollup-cardinality-guide

a41dbce

Revise the example for clarity

b0ca067

estherk15 commented Jan 22, 2025

View reviewed changes

estherk15 requested a review from janine-c January 22, 2025 20:19

janine-c reviewed Jan 22, 2025

View reviewed changes

Apply suggestions from code review

9da42b7

Co-authored-by: Janine Chan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rollup guide #27008

Add rollup guide #27008

estherk15 commented Jan 7, 2025 •

edited by jira bot

Loading

github-actions bot commented Jan 7, 2025

edanaher left a comment

edanaher Jan 8, 2025

janine-c Jan 11, 2025

estherk15 Jan 17, 2025

cswatt commented Jan 8, 2025

janine-c Jan 10, 2025

janine-c Jan 11, 2025

janine-c Jan 11, 2025

estherk15 Jan 17, 2025

estherk15 Jan 22, 2025

janine-c Jan 22, 2025

janine-c left a comment

janine-c Jan 22, 2025

janine-c Jan 22, 2025

janine-c Jan 22, 2025

janine-c Jan 22, 2025


		When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.

		## Solutions and best practices


		### Example calculation

		Suppose, over the course of a week, 2,000 users on a site experiences a total of 6,000 error events, while the remaining 22,000 users don't experience any errors. Since a user's multiple errors may occur nearly simultaneously or in different hours, there could be an average of as many as 35 users experiencing errors per hour or as few as 11.


		This disparity arises from multiple user visits being counted over a week, leading to a higher likelihood of users encountering errors over that period. See the following illustrative example for more context on this disparity.

		### Error rate variation and user interactions case study


		## Rollup functionality and unexpected results

		When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.


		### Implications for visualizations

		Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.

		Consider a scenario where 2,000 users experience 6,000 errors in a week, while 22,000 users face no errors. Daily error rates fluctuate, with hourly figures ranging from 11 to 35 users facing errors. Additionally, on an hourly basis, there are around 1,000 distinct users encountering errors weekly, reflecting an error rate of 0.11% to 0.35%.

		In contrast, over the week, 2,000 out of 24,000 users encounter errors, accounting for an 8.3% error rate—much higher than the hourly observation.

Add rollup guide #27008

Are you sure you want to change the base?

Add rollup guide #27008

Conversation

estherk15 commented Jan 7, 2025 • edited by jira bot Loading

What does this PR do? What is the motivation?

Merge instructions

Additional notes

github-actions bot commented Jan 7, 2025

Preview links (active after the build_preview check completes)

New or renamed files

Modified Files

edanaher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cswatt commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janine-c left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

estherk15 commented Jan 7, 2025 •

edited by jira bot

Loading

Preview links (active after the `build_preview` check completes)