Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event watch cancel #196

Merged
merged 4 commits into from
Aug 9, 2024
Merged

Conversation

jameshcorbett
Copy link
Member

Problem: as described in issue #195, the dws_environment shell
plugin should call flux_job_event_watch_cancel after finding the
dws_environment event, otherwise the watch will stay active in
the job-info service, which will continue to send responses as new
events appear in the eventlog until the shell exits and a
disconnect message is sent.

Call flux_job_event_watch_cancel.

Fixes #195.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A couple suggestions inline.

@@ -171,11 +172,14 @@ static int dws_environment_init (flux_plugin_t *p,
}
if (read_future (shell, fut) < 0) {
shell_log_error ("Error reading DW environment from eventlog");
flux_job_event_watch_cancel (fut);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any error terminates a streaming RPC, so the flux_job_event_watch_cancel() is not necessary here.

See RFC 6:

The response stream SHALL consist of zero or more non-error responses, terminated by exactly one error response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read_future function could return < 0 for some other reason though, and not because the response stream gave an error response? But I suppose that in any case if the plugin returns -1 the shell will fail and disconnect...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, up to you then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's not doing any harm I'll leave it. Thanks for the comments! Setting MWP.

Comment on lines 179 to 182
if ((rc = flux_job_event_watch_cancel (fut)) < 0)
shell_log_error ("flux_job_event_watch_cancel");
flux_future_destroy (fut);
return 0;
return rc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though an error here is highly unlikely, I wouldn't have failure of flux_job_event_watch_cancel() cause the function to return failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, fixed and force-pushed.

Problem: as described in issue flux-framework#195, the dws_environment shell
plugin should call flux_job_event_watch_cancel after finding the
dws_environment event, otherwise the watch will stay active in
the job-info service, which will continue to send responses as new
events appear in the eventlog until the shell exits and a
disconnect message is sent.

Call flux_job_event_watch_cancel.

Fixes flux-framework#195.
Problem: the dws_environment shell plugin's errors are less clear
than they could be, and do not distinguish the case where
the 'dws_environment' event is not posted from the case where the
event_watch is simply slow.

Add a check for the 'start' event and error out if it is found
before the 'dws_environment' event.
Problem: the dws_environment shell plugin now checks that the
'dws_environment' event posts before the 'start' event. However,
this breaks some tests which allowed the 'dws_environment' event
to post afterwards.

Change tests to make sure the 'dws_environment' event posts
before 'start'.
Problem: two coral2_dws.py log messages do not format all their
arguments.

Fix them.
@mergify mergify bot merged commit 05a8c62 into flux-framework:master Aug 9, 2024
8 checks passed
@jameshcorbett jameshcorbett deleted the event-watch-cancel branch August 10, 2024 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dws_environment shell plugin should cancel eventlog watch
2 participants