-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large project.yaml
can lead to timeouts on "run jobs" page
#4617
Comments
I can see some initial profiling work was done the last time the performance of this page was raised. But that's no longer representative of the current situation. I've done some basic profiling locally of loading the Total time to return HTML was 48 seconds on my laptop. I suspect that on the server, which is not particularly powerful and has more to do than satisfy a single request, this takes even longer leading to the timeout errors. The time breaks down as follows (so roughly 27s parsing the YAML, 6s validating and 15s rendering HTML): The slow YAML parsing time is a consequence of doing the parsing in pure Python, which we had good reason for at the time. Switching the YAML library back from The slow template rendering time is a consequence of having many slippers components in the hot loop and the high overhead of calling in to each component. It may be possible to inline some of the code here, or restructure the page in some way to reduce this. The previous concerns about the calls out to the Github or OpenCodelists APIs don't seem to be particularly relevant to the overall runtime. Footnotes
|
Yet another place were our current method for distributing opensafely cli puts limitations on us (we need a pure python solution). But if it also takes 20+s to parse parsing a 6k job project.yaml on every invocation of |
There may be some quick (if slightly dirty) wins here. One would be for the It's obviously not ideal to be using different parsers in different parts of the system. But there's nothing particularly exotic about the YAML used in pipeline files, so I can't see it would cause problems in practice. Of course this doesn't help with running things locally, nor with the problem of submitting lots of jobs in one go. But it would (hopefully) buy us enough of a speed up that it becomes at least physically possible to run jobs from a very large project file, which seems to me the major blocker at the moment. |
Thanks for this excellent analysis, Dave. The proposed fix for the pipeline library seems like a pragmatic approach that will have some significant benefits. I don't think there's anything we should do on Job Server in the very short term, so I'm going to remove this ticket from our board for now, but keep it open. |
Removing tech-support label as I don't think we need actively follow this up or get back further to the reporter (the tech-support process uses this label for tracking such tickets). We have captured all information needed to resolve the issue and the reporter isn't expecting further responses. |
A user reported getting the big blue Dokku error page:
when trying to access this URL:
http://jobs.opensafely.org/comparing-disparities-in-rsv-influenza-and-covid-19/disparities-comparison-rsv-flu-c19/run-jobs/
The cause seems to be this very large (programatically generated)
project.yaml
file with 6,874 actions in it:https://github.com/opensafely/disparities-comparison/blob/6cba2f6f1609f26dc4ee5698f51afcc7e9be3840/project.yaml
In the short term, the user has been advised to reduce the number of actions to the minimum needed to get started with the project.
It's possible that we decide we just don't support this many actions (the user expressed surprise at the number, so possibly they don't actually need all of them in any case). But if so it would be good to fail more gracefully and give a more explanatory error.
Related issue:
Original Slack thread:
https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1726760871032629
Subsequent discussion:
https://bennettoxford.slack.com/archives/C069SADHP1Q/p1726761557795209
Sentry error:
https://ebm-datalab.sentry.io/issues/5874686309/
The text was updated successfully, but these errors were encountered: