diff --git a/source/manual/alerts/aws-rds-storage.html.md b/source/manual/alerts/aws-rds-storage.html.md deleted file mode 100644 index 50da6f9..0000000 --- a/source/manual/alerts/aws-rds-storage.html.md +++ /dev/null @@ -1,13 +0,0 @@ ---- -owner_slack: "#govuk-2ndline-tech" -title: AWS RDS Instance Storage Utilization -parent: "/manual.html" -layout: manual_layout -section: Icinga alerts ---- - -This alert relates to disk usage of our databases (RDS) in AWS being higher than we would expect. To check the current usage. - -- [Access the AWS web console][] and view the statistics. - -[Access the AWS web console]: https://eu-west-1.console.aws.amazon.com/rds/home?region=eu-west-1 diff --git a/source/manual/how-to-escalate-to-AWS-support.html.md b/source/manual/how-to-escalate-to-AWS-support.html.md deleted file mode 100644 index 00f7317..0000000 --- a/source/manual/how-to-escalate-to-AWS-support.html.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -owner_slack: "#ott-core" -title: How to raise a support ticket with AWS - -section: AWS -layout: manual_layout -type: learn -parent: "/manual.html" ---- - -Sign in to the AWS Management Console. - -`gds aws -l` - -Where `` maps to an AWS account and IAM role you have permissions to assume into. This should be the AWS account that the problem is associated with e.g. `govuk-production-poweruser`. - -Then follow steps 2-6 in the official AWS docs: [Creating a support case](https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case) diff --git a/source/manual/how-to-rollback-training-models.html.md b/source/manual/how-to-rollback-training-models.html.md new file mode 100644 index 0000000..5978aa5 --- /dev/null +++ b/source/manual/how-to-rollback-training-models.html.md @@ -0,0 +1,60 @@ +--- +owner_slack: "#trade-tariff-infrastructure" +title: Rolling back training models +section: Deployment +layout: manual_layout +parent: "/manual.html" +--- + +This document describes how to rollback training models that have been deployed with Serverless to +our AWS environments. + +## Rollback process + +1. Identify the model you want to rollback to. This could be the previous model or a specific version of the model. +2. Adjust the [search-config.toml][search-config] file to point to the model you want to rollback to. +3. Open a PR with the changes to the `search-config.toml` file. +4. Once the PR is merged, the changes will be deployed to the `fpo-search` environment. + +### Identifying the model to rollback to + +Models have iterations on versions and the latest iteration is the one that will be actively deployed today. + +Inactive model versions (e.g. previous iterations/those not in an active development branch) are stored in the model s3 bucket. + +#### Easiest solution + +Assuming you know the code that introduced the failure you can checkout the code history of the [search-config.toml][search-config] file and see what model version was being used at that time. + +The model version prior is typically going to be the version you want. + +#### List recent iterations of models + +In the [FPO search lambda][fpo-search-lambda-repo] repository, run the following command to list the most recent iterations of the last n models: + +> Note: You will need to have the AWS CLI installed and configured with the correct permissions. + +```bash +.circleci/bin/getversions +``` + +This will output like so: + +```bash +1.5.1-6b3c782 +1.6.0-262a174 +1.7.0-4595f36 +``` + +You can work out what the currently deployed model version is by making a GET request to the healthcheck endpoint: + +```bash +curl --silent -X GET https://search.dev.trade-tariff.service.gov.uk/healthcheck -H 'Content-Type: application/json' | jq -r '.model_version' + +1.7.0-4595f36 +``` + +You can then review the release notes and benchmarks for these models to know which one you want to rollback to. + +[search-config]: https://github.com/trade-tariff/trade-tariff-lambdas-fpo-search/blob/main/search-config.toml +[fpo-search-lambda-repo]: https://github.com/trade-tariff/trade-tariff-lambdas-fpo-search