-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install Sentry #5
Conversation
We'll still need to call `Sentry.init` somewhere, but I don't think the right place to do that exists yet.
One Sentry-related thing I haven't done yet is add this repository's name to https://github.com/alphagov/govuk-infrastructure/blob/main/terraform/deployments/sentry/locals.tf. I'm not sure what it does exactly, but I've spotted it on my travels so, FWIW, it's on my list of things to look into. |
spec/exception_handler_spec.rb
Outdated
end | ||
end | ||
|
||
describe ".report" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better named propagate? I keep checking the difference between report
and capture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that report
and capture
are bad names. And the more I look at them, probably also that ExceptionHandler.propagate
and ExceptionHandler.capture
are a better pair. But I hope we can do even better 😆
collect
Outdated
require "govuk_sli_collector" | ||
|
||
GovukSliCollector.call | ||
ExceptionHandler.report { GovukSliCollector.call } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the report method needed at all? Unhandled errors should be logged in Sentry anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hoped that the Sentry gem might work that way but it didn't happen in my integration testing. Maybe I've missed a configuration option, but I haven't seen one in the docs so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's because you needed to Sentry.init
somewhere first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly just a call to Sentry.init underneath the requires in this file to see if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try that now. I thought I had already, but I can only find deploys featuring all the other combinations of Sentry.init
and raising an exception on purpose 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's how I've just tested it...
I deployed this branch https://github.com/alphagov/govuk-sli-collector/tree/install-sentry-test-handling-unhandled (the significant changes are https://github.com/alphagov/govuk-sli-collector/blob/install-sentry-test-handling-unhandled/collect#L8-L14 and https://github.com/alphagov/govuk-sli-collector/blob/install-sentry-test-handling-unhandled/lib/govuk_sli_collector.rb#L8).
I confirmed that the error was logged in the container https://argo.eks.integration.govuk.digital/applications/govuk-jobs?orphaned=false&resource=&node=%2FPod%2Fapps%2Fgovuk-sli-collector-28366080-l9v5d&tab=logs (and appears in Logit https://kibana.logit.io/s/42f4d2d5-e9ce-451f-8ffc-cdb25bd624f8/app/discover#/doc/filebeat-*/filebeat-2023.12.07?id=Su8CRYwB5HoHRDiIDzvo).
But I can only see the existing errors in Sentry (https://govuk.sentry.io/issues/?groupStatsPeriod=auto&project=4506338071150592&query=&referrer=issue-list&statsPeriod=7d) with no increase in their event counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the DSN, where is that set? 😕
Sentry.init do |config|
config.dsn = ''
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets that from the container's environment (I don't know if I did a good job of it, but I tried to explain this in the commit that calls Sentry.init
: 074c091 in case it was a bit mysterious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BTW, I've only ever called Sentry.init
without a config block on all versions of this branch, so we can see from the errors being in Sentry at all that the environment variables work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it reads them from the env vars if available.
It does say that unhandled errors should be logged, I can't see anything wrong with the install-sentry-test-handling-unhandled branch though https://docs.sentry.io/product/sentry-basics/integrate-backend/capturing-errors/#unhandled-errors
collect
Outdated
require "govuk_sli_collector" | ||
|
||
GovukSliCollector.call | ||
ExceptionHandler.report { GovukSliCollector.call } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of errors would not be swallowed i.e., raised outside of PublishingLatencySli?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Currently, without any code changes, there's nothing that can break that wouldn't already be broken.
Having this was useful while I was making recent changes, so I figured it makes sense to leave it to cover future changes, too. Does it feel too "future use case" to you?
For sending exceptions to Sentry. One method sends and silences the exception, the other sends and re-raises it. `Sentry.init` needed to be called before we used any Sentry methods, so while it's a little bit gross, this ended up feeling like the right place. Our Helm chart already includes the SENTRY_DSN, SENTRY_CURRENT_ENV and SENTRY_RELEASE in our job's environment, by the way, so Sentry should be configured out-of-the-box. (Othewise, we'd supply configuration to `Sentry.init`.)
Nothing's changed, I just spotted that this isn't needed
I might merge these commits later, but attempting to minimise noise in the commits where I'm actually doing something
Send exceptions to Sentry and don't let them propagate any further. I imagine this allowing us to add new SLI classes without any one class preventing the others collecting their data.
But allow them to propagate so that they can result in a non-zero exit code and so that they're also sent to Logit
7eea586
to
781c6e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the renaming, as well as the well named commits, I'd still like to know why unhandled errors aren't logging but happy to approve!
The container that this project is runs in now* uses the `latest`-tagged image from ECR. Our `build-and-push-image` action only tags its built image as `latest` if it's built from the latest commit on `main`**. This does mean that we can't deploy an arbitrary branch or commit, but since we only deploy to production, that _might not_ be a bad thing. At least, there seems to be plenty we can test locally before needing to deploy a change, we have tooling (#5) for error handling and reporting and there's no reason we can't practice something closer to trunk-based development going forward. *https://github.com/alphagov/govuk-helm-charts/blob/456388977d052d7d50c26f550e24fc11166149a4/charts/govuk-sli-collector/values.yaml#L10 **https://github.com/alphagov/govuk-infrastructure/blob/a046a2c0da8f002e9e4507cd9d14310f2c1669a9/.github/workflows/build-and-push-image.yml#L76
The container that this project is runs in now* uses the `latest`-tagged image from ECR. Our `build-and-push-image` action only tags its built image as `latest` if it's built from the latest commit on `main`**. This does mean that we can't deploy an arbitrary branch or commit, but since we only deploy to production, that _might not_ be a bad thing. At least, there seems to be plenty we can test locally before needing to deploy a change, we have tooling (#5) for error handling and reporting and there's no reason we can't practice something closer to trunk-based development going forward. *https://github.com/alphagov/govuk-helm-charts/blob/456388977d052d7d50c26f550e24fc11166149a4/charts/govuk-sli-collector/values.yaml#L10 **https://github.com/alphagov/govuk-infrastructure/blob/a046a2c0da8f002e9e4507cd9d14310f2c1669a9/.github/workflows/build-and-push-image.yml#L76
The container that this project is runs in now* uses the `latest`-tagged image from ECR. Our `build-and-push-image` action only tags its built image as `latest` if it's built from the latest commit on `main`**. This does mean that we can't deploy an arbitrary branch or commit, but since we only deploy to production, that _might not_ be a bad thing. At least, there seems to be plenty we can test locally before needing to deploy a change, we have tooling (#5) for error handling and reporting and there's no reason we can't practice something closer to trunk-based development going forward. *https://github.com/alphagov/govuk-helm-charts/blob/456388977d052d7d50c26f550e24fc11166149a4/charts/govuk-sli-collector/values.yaml#L10 **https://github.com/alphagov/govuk-infrastructure/blob/a046a2c0da8f002e9e4507cd9d14310f2c1669a9/.github/workflows/build-and-push-image.yml#L76
https://trello.com/c/jDSDGNyn/814-measure-and-record-our-publishing-latency-sli
I've tested this in integration (https://govuk.sentry.io/issues/?project=4506338071150592&query=is%3Aresolved&referrer=issue-list&statsPeriod=90d)