Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification user groups migrations and models #23

Merged
merged 12 commits into from
Aug 23, 2023

Conversation

yuenmichelle1
Copy link
Collaborator

As mentioned in #21, splitting up changes to

  1. migrations and models
  2. a follow up PR with the logic.

Designs found here:

https://www.figma.com/file/qbqbmR3t5XV6eKcpuRj7mG/Group-Stats?type=design&node-id=0-1&mode=design

Complexity of Similar Looking Continuous Aggregates and Argument for Why They Are Needed

You may have noticed the following continuous aggregates that look similar to a user based continuous aggregates:
For eg.

  • UserGroupClassificationCounts::DailyGroupUserClassificationCount looks similar to UserClassificationCounts::DailyUserClassificationCount OR
  • UserGroupClassificationCounts::DailyGroupUserProjectClassificationCount looks similar to UserClassificationCounts::DailyUserProjectClassificationCount OR
  • UserGroupClassificationCounts::DailyGroupUserWorkflowClassificationCount looks similar to UserClassificationCounts::DailyUserWorkflowClassificationCount

and you may be wondering: Why query one as opposed to the other?
Reason/s

  1. UserClassificationCounts::DailyUserClassificationCount (any continuous aggregate with model beginning with UserClassificationCounts) does not focus on WHEN a user has joined a group/left a group.
  2. On the flip side, if we wanted to query for just user stats via /classifications/users/:id , we cannot query from any continuous aggregate with model beginning with UserGroupClassificationCounts, because not every user belongs to a user_group.

Possible Issues

With current implementation and complexity of the problem we want to solve, we do at least 3 queries per callout.

  1. Query To grab time bucketed counts
  2. Query to grab Active Users
  3. Query to grab project contributions

In order for accuracy, this heavily relies on Continuous Aggregates to be in sync. (If one is out of sync, data will look a little off and we would need to manually refresh continuous aggregate) .

  • There is another way of doing this as opposed to doing 3 queries, we could query from a more detailed continuous aggregate to grab all 3 information and then we won't have to worry about synchronization of Continuous aggregates. (In this case, UserGroupClassificationCounts::DailyGroupUserProjectClassificationCount, HOWEVER, when serializing data to get response that we want, we would have to loop through this array of active records and do a lot of group_bys within our loops. This gets even complicated and inefficient when we think of our big corporate sponsors (a corporate sponsor that has done 8 million+ classifications since 2018).
    • I chose the 3 queries route because of the heavy looping that would have needed to get done if we chose ^ path, but also it felt more readable in the long run.

db/schema.rb Show resolved Hide resolved
@yuenmichelle1 yuenmichelle1 marked this pull request as ready for review August 1, 2023 21:27
Copy link
Member

@zwolf zwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, still reviewing #24 & #25.

@yuenmichelle1 yuenmichelle1 merged commit 962afad into main Aug 23, 2023
3 checks passed
@yuenmichelle1 yuenmichelle1 deleted the classification_user_groups-migrations-and-models branch September 15, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants