-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event watch cancel #196
Event watch cancel #196
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A couple suggestions inline.
@@ -171,11 +172,14 @@ static int dws_environment_init (flux_plugin_t *p, | |||
} | |||
if (read_future (shell, fut) < 0) { | |||
shell_log_error ("Error reading DW environment from eventlog"); | |||
flux_job_event_watch_cancel (fut); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any error terminates a streaming RPC, so the flux_job_event_watch_cancel()
is not necessary here.
See RFC 6:
The response stream SHALL consist of zero or more non-error responses, terminated by exactly one error response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read_future
function could return < 0
for some other reason though, and not because the response stream gave an error response? But I suppose that in any case if the plugin returns -1
the shell will fail and disconnect...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, up to you then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's not doing any harm I'll leave it. Thanks for the comments! Setting MWP.
src/shell/plugins/dws_environment.c
Outdated
if ((rc = flux_job_event_watch_cancel (fut)) < 0) | ||
shell_log_error ("flux_job_event_watch_cancel"); | ||
flux_future_destroy (fut); | ||
return 0; | ||
return rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though an error here is highly unlikely, I wouldn't have failure of flux_job_event_watch_cancel()
cause the function to return failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, fixed and force-pushed.
4d3484b
to
367f910
Compare
Problem: as described in issue flux-framework#195, the dws_environment shell plugin should call flux_job_event_watch_cancel after finding the dws_environment event, otherwise the watch will stay active in the job-info service, which will continue to send responses as new events appear in the eventlog until the shell exits and a disconnect message is sent. Call flux_job_event_watch_cancel. Fixes flux-framework#195.
Problem: the dws_environment shell plugin's errors are less clear than they could be, and do not distinguish the case where the 'dws_environment' event is not posted from the case where the event_watch is simply slow. Add a check for the 'start' event and error out if it is found before the 'dws_environment' event.
Problem: the dws_environment shell plugin now checks that the 'dws_environment' event posts before the 'start' event. However, this breaks some tests which allowed the 'dws_environment' event to post afterwards. Change tests to make sure the 'dws_environment' event posts before 'start'.
Problem: two coral2_dws.py log messages do not format all their arguments. Fix them.
367f910
to
aef7b00
Compare
Problem: as described in issue #195, the dws_environment shell
plugin should call flux_job_event_watch_cancel after finding the
dws_environment event, otherwise the watch will stay active in
the job-info service, which will continue to send responses as new
events appear in the eventlog until the shell exits and a
disconnect message is sent.
Call flux_job_event_watch_cancel.
Fixes #195.