-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More specific stdio parsing #63
Comments
Would the goal be to report the OOM to the job executor for re-running? Or is there a better way to report OOM to the user based on the job executor? |
Yes. Rerunning can even be triggered automatically if the Galaxy admin has configured a job resubmission schema.
I don't think so. The user will see the message if no resubmission is configured. Then the user has to ask the admin for more memory for the corresponding tool. |
That's pretty cool! I don't think we have a good way to represent this at the moment. Since QIIME 2 actions are generally run in-process, there's also not a good way to even handle sigkill. Which means that a mapping of exit codes wouldn't have any immediate use to us (outside of Galaxy) (and otherwise for trappable signals and normal exit codes, it's entirely in the purview of the plugin to handle and respond to). @Oddant1 do you know if Parsl has any mechanism to care about these for tasks? I'm not sure what we would do in the event we saw this anyhow. It's also important to us architecturally that plugins not know of the interface running them, so we'd need some unified reason to represent this exit code mapping (i.e. there won't be anything like a "Galaxy metadata" section we could stick this information). I am going to tentatively close this as out of scope for us at the moment. |
@ebolyen, parsl has some mechanism for keeping track of the status of its tasks, and it also has a built-in retry system, but I think it's a bit more naive than galaxy's (I believe it just tries the exact same thing again and hopes whatever went wrong last time doesn't this time) |
We got a report that dada2 crashed on some instance with the following in the stderr
the forums seem to suggest that this may indicate an out of memory (OOM) -- I did not check.
Depending of the job runner that is used this may be detected automatically by Galaxy, e.g. if SLURM is used.
But also the Galaxy could be annotated to help with detecting such cases: https://docs.galaxyproject.org/en/master/dev/schema.html#tool-stdio
Was wondering if we can accomodate for this by maintainging (manually curated) macro(s) that we can include in the autogenerated tools.
The text was updated successfully, but these errors were encountered: