Skip to content

Commit

Permalink
HPCC-32839 Fix Thor aborting race
Browse files Browse the repository at this point in the history
When a Thor workunit was aborted in k8s, there was a race
condition which could cause the job to continue and fail with
an unrelated spurious/confusing error.

Signed-off-by: Jake Smith <[email protected]>
  • Loading branch information
jakesmith committed Oct 22, 2024
1 parent 1e4ca26 commit 316444e
Showing 1 changed file with 15 additions and 3 deletions.
18 changes: 15 additions & 3 deletions common/workunit/workunit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3687,7 +3687,8 @@ EnumMapping priorityClasses[] = {

const char * getWorkunitStateStr(WUState state)
{
dbgassertex(state < WUStateSize);
if (state >= WUStateSize)
return "unknown workunit state";
return states[state].str; // MORE - should be using getEnumText, or need to take steps to ensure values remain contiguous and in order.
}

Expand Down Expand Up @@ -14496,11 +14497,22 @@ void executeThorGraph(const char * graphName, IConstWorkUnit &workunit, const IP
}
}

// NB: check for expected success state (WUStateWait). If any other state, abort.
{
Owned<IWorkUnit> w = &workunit.lock();
WUState state = w->getState();
if (WUStateFailed == state)
throw makeStringException(0, "Workunit failed");
if (WUStateWait != state) // expected state from successful Thor run from above
{
switch (state)
{
case WUStateAborting:
throw new WorkflowException(0, "Workunit abort requested", 0, WorkflowException::ABORT, MSGAUD_user);
case WUStateFailed:
throw makeStringException(0, "Workunit failed");
default:
throw makeStringExceptionV(0, "Workunit failed. Unexpected state: %s", getWorkunitStateStr(state));
}
}
w->setState(WUStateRunning);
}
#else
Expand Down

0 comments on commit 316444e

Please sign in to comment.