-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Incident] CarbonPlan AWS hub had running infrastructure we didn't track #1666
Comments
After-action reportThese sections should be filled out once we've resolved the incident and know what happened. TimelineA short list of dates / times and major updates, with links to relevant comments in the issue for more context. All times in CET 2022-08-30 - 10:45amWe receive a FreshDesk request asking us to look into an abnormally high cloud bill on AWS for CarbonPlan. We investigate the grafana dashboards at 11:30amWe decide we don't see anything too abnormal in grafana. A team member tried to log in to the AWS console for CarbonPlan, but noted that it wasn't present in 2i2c's AWS console website. We weren't sure how to access the CarbonPlan cluster. We decided that we need to wait for a team member to wake up because they were the last ones to touch the infrastructure. 2022-08-31 07:00A team member who had set this up before logged in to the AWS console for CarbonPlan and discovered that there was an old cluster running from an earlier iteration of the deployment. The old iteration had used In that process, we had to create a new cluster with There was also a 08:00We entirely decommissioned the old cluster, and are now monitoring for changes in cloud costs to see how much this will save. What went wrongThings that could have gone better. Ideally these should result in concrete
Follow-up actionsProcess improvementsDocumentation improvementsTechnical improvements
|
I think all the checkboxes detailed in the top comment were completed, so closing this one now. |
I think @choldgraf suggested to them that we check back on usage in a week, and see how it went - so this isn't done yet. |
Thanks for the clarification, @yuvipanda! |
I believe that we can close this one, we are following up on the response to AWS here: |
Summary
We migrated CarbonPlan's cloud infrastructure away from a bespoke
kops
-based cluster, and towards an AWS-nativeeksctl
cluster. In the process, some of the oldkops
cluster infrastructure was not properly shut down and got lost during the transition. It was running in the background in a "semi-running" state and incurred a regular amount of cloud costs over time even though nobody was using it.About a year later a member of CarbonPlan asked us to look into an abnormally high cloud bill, and we discovered this running infrastructure.
Impact on users
Important information
Tasks and updates
After-action report template
The text was updated successfully, but these errors were encountered: