Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document backup strategy in a draft ADR #2246

Merged
merged 2 commits into from
Dec 19, 2024

Conversation

mikerkelly
Copy link
Contributor

@mikerkelly mikerkelly commented Dec 15, 2024

Partially fixes #2151.

This documents the current context and decisions after changes from #2151. I'm proposing that we keep this in draft until backup-related issues in the Onboarding OpenCodelists initiative are resolved or deferred. Then the outcomes can be combined, which will be easier to follow in future than multiple updates in a short period of time. E.g., #2225, #2128, #2153.

@Jongmassey
Copy link
Contributor

It's a bit meta but in other projects where we use ADRs we record the architectural decision to use ADRs as adr0000 eg, do we want to do this here, too?

@mikerkelly
Copy link
Contributor Author

we record the architectural decision to use ADRs as adr0000 eg, do we want to do this here, too?

It's a nice idea, but I don't like that it's not a good example of an ADR. It doesn't include the kind of detail a real one would. And a documentation choice is almost by definition not a decision about architecture but rather process.

I think I'll add a README in the directory and update DEVELOPERS.md to point at that.

@mikerkelly mikerkelly force-pushed the mikerkelly/docs/adr-backup-policy branch from 633aece to d05cb09 Compare December 17, 2024 11:19
@mikerkelly mikerkelly enabled auto-merge December 19, 2024 15:18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the write-up of what a decision is and specifically how we are using ADRs here. I wonder if some of this could be made into guidance for the wider BI. That said, I think a README in the same directory as the ADRs is more likely to be read than something in the team manual.

## Consequences

The core database can be straightforwardly restored from the local nightly
backups, limiting core data loss to one day if the droplet file system is not lost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be difficult to restore the core database from a droplet backup. As sqlite is a single file, I believe that if the droplet backup is taken at the same time as a process is writing to it, then we can't guarantee consistency of the data. It might be worth checking that and in practice, I think it's quite unlikely given the time the droplet backup is likely to be taken. However, that does make the locally stored backups more important (those could be restored from a droplet backup if needed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sqlite .backup command the ADR suggests takes care of this, by taking a full db lock before doing the copy.

Additionally, it handles the case where there are multiple files related to the db, such as when WAL mode is enabled and there is an additional WAL file.

Copy link
Contributor Author

@mikerkelly mikerkelly Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite, in any journal_mode other than memory or off, writes transactions to additional files before they are committed. The core OpenCodelists database uses wal mode.

The SQLIte backup API ensures that when completed "the backup database contains a consistent and up-to-date snapshot of the original." The nightly backup process described will use this approach. If writes are performed on the source database during the backup process they will slightly delay the process but be included in the backup database file.

If the DigitalOcean snapshot for the backup is taken while a transaction is being written the eventual droplet backup may contain a corrupt or inconsistent database. I don't know if the droplet backup process does or could do something to ensure this cannot happen, such as entering "maintenance mode" before taking the backup. If not, we might have to restore the latest daily backup after restoring the droplet to get the database in a usable state. And it might be prudent to do this in case of usable but inconsistent data.

Note that this would increase the maximum data loss to 7 days plus the lag between taking the last daily backup and taking the weekly snapshot. If nothing else, I suggest we should make that lag short.

I can look at this when I'm back in and do whatever's appropriate including updating the ADR.

Copy link
Contributor Author

@mikerkelly mikerkelly Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems sensible to use wal in the core database as reads may sometimes be much more common than writes and it helps address concurrency concerns. I checked the mode by downloading and restoring a backup locally and querying pragma journal_mode; giving wal. However I'm a little concerned that I cannot see where this is configured in the repo. I've only looked for a couple of minutes. If it's not captured in code or in manual deployment instructions, it should be.
Cross-posted to #2251. #2268 should fix.

Copy link
Contributor

@lucyb lucyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this up. I think it's likely to be extremely helpful. I've made some comments, but am happy to leave it to you to address them and merge it.

backups, limiting core data loss to one day if the droplet file system is not lost.

The coding system and mapping databases can only be restored from the droplet
backups or recreated from source. Their read-only permissions partially mitigate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that all of the coding system and mapping databases have read-only file permissions. I also don't think that we're creating a read-only database connection.

continuing weekly droplet backups.

## Consequences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any consequences to compressing the core database backup? I assume we're doing it to save disk space?

@mikerkelly mikerkelly merged commit 0172a78 into main Dec 19, 2024
6 checks passed
@mikerkelly mikerkelly deleted the mikerkelly/docs/adr-backup-policy branch December 19, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use sqlite cli to backup core database
4 participants