More specific stdio parsing #63

bernt-matthias · 2024-04-10T09:30:00Z

We got a report that dada2 crashed on some instance with the following in the stderr

raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code -9), please inspect stdout and stderr to learn more.

the forums seem to suggest that this may indicate an out of memory (OOM) -- I did not check.

Depending of the job runner that is used this may be detected automatically by Galaxy, e.g. if SLURM is used.

But also the Galaxy could be annotated to help with detecting such cases: https://docs.galaxyproject.org/en/master/dev/schema.html#tool-stdio

Was wondering if we can accomodate for this by maintainging (manually curated) macro(s) that we can include in the autogenerated tools.

The text was updated successfully, but these errors were encountered:

ebolyen · 2024-05-30T19:34:55Z

Would the goal be to report the OOM to the job executor for re-running? Or is there a better way to report OOM to the user based on the job executor?

bernt-matthias · 2024-05-30T20:35:06Z

Would the goal be to report the OOM to the job executor for re-running?

Yes. Rerunning can even be triggered automatically if the Galaxy admin has configured a job resubmission schema.

Or is there a better way to report OOM to the user based on the job executor?

I don't think so. The user will see the message if no resubmission is configured. Then the user has to ask the admin for more memory for the corresponding tool.

ebolyen · 2024-06-10T18:50:35Z

That's pretty cool! I don't think we have a good way to represent this at the moment.

Since QIIME 2 actions are generally run in-process, there's also not a good way to even handle sigkill. Which means that a mapping of exit codes wouldn't have any immediate use to us (outside of Galaxy) (and otherwise for trappable signals and normal exit codes, it's entirely in the purview of the plugin to handle and respond to).

@Oddant1 do you know if Parsl has any mechanism to care about these for tasks? I'm not sure what we would do in the event we saw this anyhow.

It's also important to us architecturally that plugins not know of the interface running them, so we'd need some unified reason to represent this exit code mapping (i.e. there won't be anything like a "Galaxy metadata" section we could stick this information).

I am going to tentatively close this as out of scope for us at the moment.

Oddant1 · 2024-06-10T18:56:52Z

@ebolyen, parsl has some mechanism for keeping track of the status of its tasks, and it also has a built-in retry system, but I think it's a bit more naive than galaxy's (I believe it just tries the exact same thing again and hopes whatever went wrong last time doesn't this time)

lizgehret assigned ebolyen Jun 6, 2024

ebolyen closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2024

ebolyen removed their assignment Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More specific stdio parsing #63

More specific stdio parsing #63

bernt-matthias commented Apr 10, 2024

ebolyen commented May 30, 2024

bernt-matthias commented May 30, 2024

ebolyen commented Jun 10, 2024

Oddant1 commented Jun 10, 2024

More specific stdio parsing #63

More specific stdio parsing #63

Comments

bernt-matthias commented Apr 10, 2024

ebolyen commented May 30, 2024

bernt-matthias commented May 30, 2024

ebolyen commented Jun 10, 2024

Oddant1 commented Jun 10, 2024