Add Sentry monitoring for cron jobs #2275

Jongmassey · 2024-12-30T14:25:17Z

fixes #2257

deploy/bin/run_cron.sh

mikerkelly · 2025-01-09T13:35:09Z

This is in draft and you asked in Slack for an in-flight review. What's left to do?

The simplified approach you point to in #2279 looks like it would have worked okay but I agree that DRY is preferable for the actual HTTP API interactions as there's a bit of processing and logic to do, which is more manageable and extensible when centralized. I really like automating the deployment steps and getting the schedule from config.

How have you tested what you've done so far? I guess for bash scripts automated testing will be a bit tricky?

I like the cron jobs docs added to DEPLOY.md. Can this also include a link to the sentry page and docs and briefly explain how we will do the plumbing in the Sentry web UI once this is deployed? I think that would be part of a full redeploy and could save someone some learning time if we record it now, while we're doing it the first time. I think this is using the HTTP Sentry API as documented here?

How does this compare to our approach in Job Server? I think we use the SDK/CLI approach there but to be honest I haven't understood how it works there from looking for a few minutes.

Jongmassey · 2025-01-13T22:32:48Z

I think we use the SDK/CLI approach there but to be honest I haven't understood how it works there from looking for a few minutes.

We do use the Python SDK since Job Server has its own job scheduling system (why yes, that is another overload for job) which is all Python and so we can use the (far nicer) Python SDK.

In this case it's all bash scripts so making HTTP requests with CURL is the most universally compatible approach. If we were, for example, to write a small python wrapper for the SDK we couldn't necessarily rely on the SDK being installed in every python environment these bash scripts get called from.

Jongmassey · 2025-01-13T22:44:05Z

How have you tested what you've done so far? I guess for bash scripts automated testing will be a bit tricky?

It's tricky for sure. I've tested by calling run_crons with a dummy monitor/script name and then deleting the resultant monitor from the Sentry web UI as not to cause notification noise when it's not run again on a schedule.

From poking at the web UI, it's possible to create a monitor with a failure threshold of 720 missed "check-ins", a missed "check in" threshold of 525,600 mins and a check-in timeout of 40,320 mins so having a test that hits a designated test monitor with these configured values should hopefully reduce/avoid notification noise for the team.

mikerkelly

Looks good now, good job. Made some minor comments on docs.

deploy/bin/import_latest_release.sh

deploy/bin/sentry_cron_functions.sh

deploy/bin/run_cron.sh

Write key sentry-specific functions into own .sh file and source into scripts to be monitored, allowing for them to be tested independently. Testing the HTTP actions requires an out-of-test-process accessible HTTP server (e.g. pytest_httpserver) rather than, for example `responses` and so our usual approach to mocking HTTP requests doesn't work here. Assuming the URL and the params are constructed correctly (which these tests test) then this is a relatively low-risk thing to leave un-tested.

The three separate cronfiles only differ in the arg with which the import_latest_release script is called and their schedules. These can all be consolidated into a single cronfile with a line for each invocation. This simplifies the deployment to /etc/cron.d/ and allows for this to be scripted and the variables to be templated, ensuring consistency of deployment.

This previously was a manual task described in a comment in `import_latest_release.sh`, make it automatic.

Jongmassey force-pushed the Jongmassey/sentry-monitor-crons branch 6 times, most recently from c1e8c97 to 045a62e Compare January 2, 2025 19:59

mikerkelly reviewed Jan 9, 2025

View reviewed changes