Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rollup guide #27008

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

estherk15
Copy link
Contributor

@estherk15 estherk15 commented Jan 7, 2025

What does this PR do? What is the motivation?

  • Add a new guide to understand rollup functions and cardinality
  • DOCS-9413

Merge instructions

Merge readiness:

  • Ready for merge

Merge queue is enabled in this repo. To have it automatically merged after it receives the required reviews, create the PR (from a branch that follows the <yourname>/description naming convention) and then add the following PR comment:

/merge

Additional notes

@estherk15 estherk15 added the WORK IN PROGRESS No review needed, it's a wip ;) label Jan 7, 2025
@estherk15 estherk15 requested a review from a team as a code owner January 7, 2025 23:00
@github-actions github-actions bot added the Guide Content impacting a guide label Jan 7, 2025
@estherk15 estherk15 removed the WORK IN PROGRESS No review needed, it's a wip ;) label Jan 8, 2025
@estherk15 estherk15 requested a review from edanaher January 8, 2025 14:18
Copy link
Member

@edanaher edanaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this up! It keeps the spirit of my original document, but feels much more polished, professional, and organized.

I do have a couple suggestions; if you disagree, I'd be happy to have a conversation either here in the PR or on a short call if you think that would be more productive.


When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.

## Solutions and best practices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section feels pretty vague and not particularly helpful to me, but I'm not sure how I would improve it.

I'm also not the target audience for this, so I'll trust your judgment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2¢ is that this section feels more theoretical than you would expect after reading a heading like "solutions and best practices."

I'm wondering if we could remove it altogether, since I think all three of these points are already covered earlier in the topic. Otherwise, maybe we could introduce some more examples here with some specific calculations for readers to follow along with, so the advice feels more actionable. If you do the latter, I might incorporate it into the above sections because I think it could aid the conceptual understanding - I don't think I'd collect it at the end, in case readers don't scroll that far 🙂

This one is tough to get right! It's necessarily way into the weeds because it's a counterintuitive nuance that users very understandably struggle with, and that's super difficult to explain away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yea, that makes sense. It seems like a repeat of what was already covered so I'll remove.

@cswatt
Copy link
Contributor

cswatt commented Jan 8, 2025

Added DOCS-9842 to track review.

@cswatt cswatt added the editorial review Waiting on a more in-depth review label Jan 8, 2025

### Example calculation

Suppose, over the course of a week, 2,000 users on a site experiences a total of 6,000 error events, while the remaining 22,000 users don't experience any errors. Since a user's multiple errors may occur nearly simultaneously or in different hours, there could be an average of as many as 35 users experiencing errors per hour or as few as 11.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Suppose, over the course of a week, 2,000 users on a site experiences a total of 6,000 error events, while the remaining 22,000 users don't experience any errors. Since a user's multiple errors may occur nearly simultaneously or in different hours, there could be an average of as many as 35 users experiencing errors per hour or as few as 11.
Suppose, over the course of a week, 2,000 users on a site experience a total of 6,000 error events, while the remaining 22,000 users don't experience any errors. Since a user's multiple errors may occur nearly simultaneously or in different hours, there could be an average of as many as 35 users experiencing errors per hour or as few as 11.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a bit hard to follow - I noticed that the heading says "calculation," but it's hard to follow where the numbers are coming from (35 or 11?) because I don't actually know how these values are being calculated. It's likely obvious for someone who has the intuitive math skills that I lack, but I worry that the cognitive load for this section is pretty high, and that makes it harder to see what the takeaway should be.

I wonder if it would be easier to follow if we introduced the takeaway first, then supported it with numbers, so the focus doesn't get bogged down in the details?


When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.

## Solutions and best practices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2¢ is that this section feels more theoretical than you would expect after reading a heading like "solutions and best practices."

I'm wondering if we could remove it altogether, since I think all three of these points are already covered earlier in the topic. Otherwise, maybe we could introduce some more examples here with some specific calculations for readers to follow along with, so the advice feels more actionable. If you do the latter, I might incorporate it into the above sections because I think it could aid the conceptual understanding - I don't think I'd collect it at the end, in case readers don't scroll that far 🙂

This one is tough to get right! It's necessarily way into the weeds because it's a counterintuitive nuance that users very understandably struggle with, and that's super difficult to explain away.


When aggregating errors at a weekly scale, the total count of errors appears higher as more users experience errors over the extended duration, contrasting with the lower average seen hourly.

## Solutions and best practices
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yea, that makes sense. It seems like a repeat of what was already covered so I'll remove.


This disparity arises from multiple user visits being counted over a week, leading to a higher likelihood of users encountering errors over that period. See the following illustrative example for more context on this disparity.

### Error rate variation and user interactions case study
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janine-c I tried to simplify this section, but essentially it's just a more in depth example that builds on the above example. Does it make the issue more confusing and would it be better left off the document?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need two separate sections! I like that you have a more in-depth example that illustrates a bunch of pitfalls at once. I think you could probably replace the simpler one with the chonkier one :)

@estherk15 estherk15 requested a review from janine-c January 22, 2025 20:19
Copy link
Contributor

@janine-c janine-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Esther, this looks great! It feels a lot simpler and less theoretical. I have a few remaining questions because this is so far outside of my domain, but feel free to ignore them if you think that someone who's already in this section of the docs will already understand what's happening there.

Your topic feels overall a lot more practical and easy to follow compared to the last draft I looked at - nicely done! I think users will come away from it feeling a lot more empowered to understand their data 👏🏻


## Rollup functionality and unexpected results

When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket but only once across the entire day.
When aggregating data using the rollup function, the results can be counterintuitive. For example, the sum of hourly distinct user counts can exceed the count of distinct users over a full day. This is because users appearing in multiple hourly buckets are counted once per bucket, but only once across the entire day.


### Implications for visualizations

Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few terms here like "scalar value" and "direct query" that I'm not familiar with, so I'm not 100% I understand this example. I wonder if we could make it sound a little more like the previous example? Maybe something like:

Suggested change
Visualizations by default show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 for hourly rollups, while a direct query shows 121 for the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.
By default, visualizations show the sum of rollup values across intervals, which can lead to discrepancies between the sum and a scalar value representing the entire time frame. For instance, a graph might display a sum of 125 users who visited your website in an hourly rollup, while a direct query shows 121 users over the same period. This is due to sessions or users being counted multiple times across hourly buckets but only once in the daily rollup.

Reading it again, I think I'm confused because both of these numbers are over an hour-long period, so I'm not sure where the daily rollup part comes in 🤔


This disparity arises from multiple user visits being counted over a week, leading to a higher likelihood of users encountering errors over that period. See the following illustrative example for more context on this disparity.

### Error rate variation and user interactions case study
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need two separate sections! I like that you have a more in-depth example that illustrates a bunch of pitfalls at once. I think you could probably replace the simpler one with the chonkier one :)

Comment on lines +39 to +41
Consider a scenario where 2,000 users experience 6,000 errors in a week, while 22,000 users face no errors. Daily error rates fluctuate, with hourly figures ranging from 11 to 35 users facing errors. Additionally, on an hourly basis, there are around 1,000 distinct users encountering errors weekly, reflecting an error rate of 0.11% to 0.35%.

In contrast, over the week, 2,000 out of 24,000 users encounter errors, accounting for an 8.3% error rate—much higher than the hourly observation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this part:

Daily error rates fluctuate, with hourly figures ranging from 11 to 35 users facing errors. Additionally, on an hourly basis, there are around 1,000 distinct users encountering errors weekly

I can't intuit the difference between an hourly figure and the number on an hourly basis. Or how on an hourly basis, users encounter errors weekly? 😵 It might need some more dumbing down - I wonder if the difference here is that there's a difference between how the time frames are counted, and how users can choose to have that data display? Or maybe it's something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial review Waiting on a more in-depth review Guide Content impacting a guide
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants