Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add steps for decommissioning London production env #579

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ GEM
PLATFORMS
aarch64-linux
arm64-darwin-21
arm64-darwin-23
x86_64-darwin-21
x86_64-linux

Expand Down
3 changes: 2 additions & 1 deletion source/index.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,8 @@ title: PaaS Team Manual

### Upcoming Plans

- [Decomissioning of Production Ireland](upcoming_plans/ireland_decomissioning/)
- [Decommissioning of Production Ireland](upcoming_plans/ireland_decomissioning/)
- [Decommissioning of Production London](upcoming_plans/london_decomissioning/)

## Architecture decision records

Expand Down
78 changes: 78 additions & 0 deletions source/upcoming_plans/london_decomissioning.html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: Production London Decommissioning
---

# Production London Decommissioning

## Introduction

The purpose of this document is to provide a guide for decommissioning the production environment in London after the Ireland production environment has been decommissioned.

This is only a guide, and is based on the steps outlined on [the Ireland decommissioning page](../ireland_decomissioning). There are likely steps missing and the process may need to be adapted as you go.

## Pre-checks

Before starting the decommissioning process, ensure that the following checks have been completed:

- [ ] Ensure all tenants have been migrated off the environment.
- [ ] Ensure all final bills have been sent. Decommissioning the environment will stop the billing process.
- [ ] Ensure logit graphs do not show any traffic to the environment other than the normal platform traffic.
- [ ] Ensure all user applications have been removed or stopped.
- [ ] Ensure all user services have been removed from the environment. PaaS services will be removed as part of the decommissioning process. [The pipeline script](https://github.com/alphagov/paas-cf/blob/main/scripts/unbind-and-delete-all-services.sh) will try to remove all services, however it may fail if a service is not ready for removal (For example, if an s3 bucket is not empty).

## Before decommissioning

Before decommissioning the environment, ensure that the following steps have been completed:

- [ ] Take a pg_dump of the billing database and store somewhere safe. We will have a final rds snapshot as well, but it is good to have a backup in case we have post-decommissioning billing queries.
- [ ] Take a pg_dump of the audit database and store somewhere safe. This contains all the cf events since the auditor was deployed.
- [ ] Remove [all peers from the terrraform config](https://github.com/alphagov/paas-cf/blob/main/terraform/prod-lon.vpc_peering.json). Set to [] to ensure removal. Merge and deploy.

## Decommissioning PaaS London

- [ ] Extract pingdom credentials [from paas-credentials](https://github.com/alphagov/paas-credentials/tree/main/pingdom.com). Log into pingdom and remove the London checks.
- [ ] Remove [protection for the prod-lon environment](https://github.com/alphagov/paas-cf/blob/main/scripts/unbind-and-delete-all-services.sh#L51). Merge to main.
- [ ] Add '$(eval export ENABLE_DESTROY=true)' [to prod-lon section in paas-cf Makefile](https://github.com/alphagov/paas-cf/blob/3efbd129cd1b6914f75a2391d35ab701cf8774a9/Makefile#L320). Merge to main.
- [ ] Announce on #cyber-security-notifications (slack) using 'Action Notification' your intention to decommission the environment. Getting team member approval.
- [ ] Run `gds aws paas-prod-admin -- make prod-lon pipelines` to push the destroy pipeline to concourse.
- [ ] Start the 'destroy-cloudfoundry' pipeline [from concourse](https://deployer.cloud.service.gov.uk/)
- Note: It is likely the terraform destroy might fail on s3 buckets. They may need to be manually emptied and concourse job re-run. It is also an option to add force_destroy to the terraform if it is missing.

= DO NOT CONTINUE UNTIL THE DESTROY PIPELINE HAS COMPLETED SUCCESSFULLY =

- [ ] Add `$(eval export ENABLE_DESTROY=true)` to the prod-lon section [in paas-bootstrap Makefile](https://github.com/alphagov/paas-bootstrap/blob/5be4d2f09635d2d51200206a5f1cc33e41766bba/Makefile#L139). Merge to main.
- [ ] Spin up a production london vagrant vm with `gds aws paas-prod-admin -- make prod-lon deployer-concourse bootstrap`
- [ ] Start `destroy-bosh-concourse pipeline` from the vagrant machine concourse. Ensure this runs to completion successfully.
- [ ] Remove vagrant vm with `gds aws paas-prod-admin -- make prod-lon deployer-concourse bootstrap-destroy`


## Post Decommissioning Checks

- [ ] Click around AWS console and enable the resource explorer in Ireland to look for orphaned items. Check:
- [ ] ec2
- [ ] ebs
- [ ] ebs snapshots
- [ ] elbs
- [ ] cloudfront (remember this is global, it won't be empty so check with care)
- [ ] s3 (remember is this global, check with care). Prod-lon state bucket will still probably be there and can now be removed.
- [ ] rds
- [ ] rds snapshots (we expect to still have snapshots)
- [ ] sqs
- [ ] eips
- [ ] amis (Bosh might have left a few amis). Clean up.
- [ ] cloudwatch
- [ ] elasticache redis caches
- [ ] Check aiven project "paas-cf-prod". Ensure we don't have any services beginning with "prod-lon-".
- [ ] Check AWS billing in the following days to see we aren't being charged for anything unexpected in the London region.

- [ ] Database snapshots will no longer automatically time out. We will need to remove them all once we are happy we don't need them. Leave for a few weeks to be sure before cleanup.


## Post Decommissioning Clean Up

- [ ] Archive documentation as needed. This includes:
- [ ] [team-manual](https://github.com/alphagov/paas-team-manual)
- [ ] [product-pages](https://github.com/alphagov/paas-product-pages)
- [ ] [paas-tech-docs](https://github.com/alphagov/paas-tech-docs)
- [ ] Archive the [paas-credentials](https://github.com/alphagov/paas-credentials) repository.
- [ ] Remove PaaS people from the [GDS PagerDuty config](https://github.com/alphagov/gds-pagerduty-config)