-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPIKE Investigate the use of Digital Ocean Block/Object Storage for the databases #2225
Comments
I would be very surprised if this was true. Block storage is essentially a networked file system, albeit at a very different way than NFS. Unless something has drastically changed in cloud land here, I would expect it to be measurably slower. Possibly at an acceptable level, especially for reads and if we tune sqlite things like mmap and cache sizes to keep stuff in the fs page cache. So I would encourage performance testing to be done early (perhaps enabling sqlite otel telemetry if not already?) as it may provide a good early failure signal. For example, there were recently discussions about how N+1 queries are not nearly as much of a problem with sqlite as they are with postgres - that might be less the case with the volume mounted on Block storage. FWIW, the model I've seen used with sqlite in the cloud is to have the sqlite db on fast local storage, and use backups or litestream off host for durability. Some other ideas:
Regarding system configuration, we may be able to create and manage BS volumes with ansible. However, we ended up not using ansible to manage the DO assets, which is basically the creation of instances. Instead, we documented the use of the |
Much of the following might be questions for the spike to discover response to. The issue reads to me as if we currently expect that Block Storage is a better fit than Object storage. Is that right and why? Needing to restore from droplet would be costly for the core DB, losing up to a week of user-generated content. Having nightly core backups on a separate file system would reduce such potential data loss. I think the team agreed we do want to achieve that. I'm less certain about the value of backing up the coding system DBs. If we lost dokku3 entirely would our approach to standing it back up be restoring from droplet? Would that be faster than starting deployment from scratch? Each version DB is read-only. New versions are added less than monthly. Only new DBs would be lost and those would be easy to reproduce as the work and source would be recent. Or are the envisaged coding system database backups mainly to guard against accidental deletion or data corruption on dokku3? In that case it might be more convenient to restore from a networked file system than restoring a droplet. But we could also consider local backups if dokku3 had more space. Might we ever want to upgrade to the next sized droplet anyway for other reasons (e.g., compute/network speed in general?), in which case the space concern with a local backup would be less relevant? I haven't found which droplet plan we're currently on, but if dokku3 is a basic droplet, going from 160GiB SSD to 320GiB would double monthly costs from ~40GBP to ~80GBP, I think. BS pricing, 100GiB is ~8GBP, ~80% cheaper. Object storage pricing is ~4GBP for 250GiB, the most economical for just storage.
Is the anticipated improvement being on a separate file system, hence safer (in some sense)? Are there other benefits? The costings above seem to show that either block or object storage is quite cheap so space becomes less of a concern. Could we consider taking core DB backups more frequently than nightly if sufficiently cheap and fast to improve recovery point frequency (these can then be condensed later)? Or does it need to be outside of "business hours" if it impacts performance/availability? Does the type of file system allow more de-duplication as discussed in this blog post mentioned in #2151? Would this "kind of incremental" backup be much different in terms of speed? Do we know how much space it would save? (Maybe that's not pressing if we have more cheap space).
I'd still like us to try standing up a restored droplet from backup and think about how we would restore production from that. At some point, maybe now is not the time. Resulting documentation could help rapid response during an incident. Even if that backup is the "backup backup", it would be good to be sure we can use it if we need it. |
Thanks both of you for the excellent points. I suggest that we restrict this spike to look at moving over just the backups of the core application database for now, with a view to moving over the backups of the coding system database (when we have them) and potentially the "live" databases in future. The core app database is backed up using a cron job that we could plug into Sentry to monitor any performance changes, which should answer some of the concerns. |
Thanks, Lucy, that makes sense. It'll be easier to do the spike if we limit the scope, and easier to take coding system backups to networked storage if the storage is already available. The consideration of which network storage to use should consider that it may be extended to coding system backups.
Another option discussed elsewhere was moving to a managed server-based database like Digital Ocean postgres. Either would avoid losing live data if the droplet were lost, which seems more likely than cloud storage going down. Unclear if our concurrency needs are high enough to switch from SQLite. |
We think that using Digital Ocean Block Storage will be a cost-effective way of increasing storage capacity, while also providing us with faster database access and improved backups.
In particular, we've identified the following problems:
There are potentially other solutions to this, such as migrating to a managed Postgres service or storing our backups on Digital Ocean Spaces object storage (i.e their S3 equivalent). So this spike should demonstrate that Block Storage is the better solution (or not).
Questions to answer
The text was updated successfully, but these errors were encountered: