-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve concurrency requirements to avoid database is locked
errors
#2251
Comments
There's an sqlite detail here that might be relevant. tl;dr: turns out everyone's been holding sqlite wrong for years. Even Simon Willison. Switching to using IMMEDIATE mode on the sql connection may be the core of the solution to database locked errors. Essentially, not doing this means you will get locked errors that ignore the timeout/retry mechanic altogether. We did this with airlock: https://github.com/opensafely-core/airlock/blob/main/airlock/settings.py#L168 Relevant Slack discussion from 8 months ago with more detail. https://bennettoxford.slack.com/archives/C01T2HACV3K/p1711836958649859 |
Ah ha, this is the extra feature that upgrading to Django 5.1 will give us. Thanks for remembering that Simon, and for discovering it in the first place. From Simon's comment in that thread:
|
Thanks Simon, that's extremely interesting and useful. That fills in a lot of useful detail on the second bulleted item. I've created #2261 for this. As Lucy says it's partially blocked by #2115. It seems sensible to use |
Ah yes, I'm guessing this was just manually enabled at some point and then it persists as a database setting. The right way to do this is with the new If you're blocked on upgrading Django for whatever reason then we made a backport of these features for Airlock. We've since removed this, having upgraded to 5.1, but the PR which originally added it is here: Happy to answer questions on this if that's helpful. |
As a point of comparison for later monitoring, there were about 175 errors regular Sentry alerts for several Issues where requests fail with #2252 has been done as it was cheap and it's possible it can help in some cases (although possibly not in the cases where IMMEDIATE mode would help). #2268 has been raised for triage for the SQLite config and I hope we can do that soon. We might reconsider the timeouts across a requests' lifecycle if this issue is resolved. And improving the speed and load of some views (eg the builder) might also be relevant to that discussion. |
Why are we doing this?
There are regular Sentry alerts for several Issues where requests fail with
OperationalError: database is locked
errors. The root cause is that our level of concurrency is currently too high for SQLite to reliably handle as currently configured. The database is locked while code writes to the database. This can result in failure after a long timeout of requests making updates, which is poor UX and could lead to their updates getting lost.We know from Honeycomb several of our views can respond slowly, taking even 10s of seconds. Possibly they are holding a database lock for an extended period. Interacting with the builder to build a codelist or search for code can both write to the database, leading to updates that need the lock occurring frequently. Possibly a queue of updates builds up and some time out before processing and the request fails.
How will we know when it's done?
There are fewer instances of
OperationalError: database is locked
errors, as monitored through Sentry.This Issue might have a number of sub-issues.
What are we doing?
According to https://docs.djangoproject.com/en/4.2/ref/databases/#database-is-locked-errors:
According to the SQLite docs:
It's unclear if our concurrency needs are fundamentally so high that we should switch to another database backend or if we can resolve the issue by changing configuration or rewriting our code.
Possible actions:
opencodelists/settings.py
. We could try increasing it further to 60 or 90 seconds and monitoring how this improves the situation.IMMEDIATE
transaction mode andPRAGMA
s #2268 Explore other configuration options that can help such as write-ahead log mode if we don't already use it.- We know the builder sends a request to the server every time a checkbox is clicked, which may be multiple times a second, which should probably batch such requests. We know we want to improve or replace the builder in general.
Defining delivery tasks guidance
The text was updated successfully, but these errors were encountered: