Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(crons): Record historic check-in volume counts #79448

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

evanpurkhiser
Copy link
Member

Part of GH-79328

@evanpurkhiser evanpurkhiser requested a review from a team as a code owner October 21, 2024 19:56
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 21, 2024
@evanpurkhiser evanpurkhiser force-pushed the evanpurkhiser/feat-crons-record-historic-check-in-volume-counts branch from 535e3b2 to f5759f0 Compare October 21, 2024 20:27
have had some data-loss (due to an incident) and would want to tick our
clock in a mode where misses and time-outs are created as "unknown".
"""
redis_client = redis.redis_clusters.get(settings.SENTRY_MONITORS_REDIS_CLUSTER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using rc-default-cache in US, I wonder if we should try and migrate to our own cluster so that we're more isolated

MONITOR_VOLUME_HISTORY = "sentry.monitors.volume_history:{}"

# We record 30 days worth of historical data for each minute of check-ins.
MONITOR_VOLUME_RETENTION = timedelta(days=30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we keep keys every minute, this will be 43k keys. Could be a good reason to have our own cluster.

Comment on lines +101 to +102
pipeline.incr(key, amount=1)
pipeline.expire(key, MONITOR_VOLUME_RETENTION)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just probabilistically set the expire 10% of the time to nearly halve the number of writes. Probably no need if we do the batching option though

@@ -937,6 +937,7 @@ def process_checkin_group(items: list[CheckinItem]):
completely serially.
"""
for item in items:
update_check_in_volume(item.ts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're doing this here, I think it'd likely be better to just pass all the timestamps to the function in bulk. The single process method can just pass a list of one item.

Actually, it'd be even better to just do this in process_batch. Just store all the timestamps after we decode the messages, and then call bulk_update_check_in_volume with all of them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah good call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants