Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout for node execution #66

Merged
merged 6 commits into from
Oct 3, 2023
Merged

Add timeout for node execution #66

merged 6 commits into from
Oct 3, 2023

Conversation

dbwiddis
Copy link
Member

@dbwiddis dbwiddis commented Sep 30, 2023

Description

Adds a timeout setting for each node's processing of the Workflow step's execute() method in the ProcessNode.

Some steps should complete quickly (e.g., create index) while others may take more time to complete (upload/deploy model).

Presently implementing that in the user template (See #45 where I proposed this and nobody objected), with a default to 10s if omitted.

  • It is not required to be entered in the template. It's an option.
  • This location is temporary, it can be moved anywhere else with a simple code change. See [FEATURE] Define required input and provided output keys #67 for one proposal where to move it.
  • Open to any other ideas, but...
  • I want to keep functionality similar to existing REST APIs where users can add a "timeout" param. We have multiple APIs and need to enable multiple timeouts that the user can change.
  • Selecting a timeout of 0 means no timeout.

I considered a global timeout, but decided against it as the "sum of node timeouts" effectively enforces that. I also considered a timeout by workflow type, tied to the default timeouts for their implemented APIs.

Other changes needed along the way:

  • Switched from executor to thread pool to pass around from create components and used for scheduled execution
  • Got rid of test thread leaks by using thread pool properly
  • Added test for plugin class
  • Got rid of singleton step factory and process sorter classes since they didn't play nice with tests (a singleton is still injected when we eventually get to transport actions)
  • Added missing javadocs creating warnings for JDK18+

Issues Resolved

Fixes #42 properly
Fixes #45

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Sep 30, 2023

Codecov Report

Merging #66 (869cb43) into main (0ed717d) will increase coverage by 4.74%.
The diff coverage is 87.80%.

@@             Coverage Diff              @@
##               main      #66      +/-   ##
============================================
+ Coverage     86.93%   91.68%   +4.74%     
- Complexity      163      167       +4     
============================================
  Files            13       13              
  Lines           574      565       -9     
  Branches         75       78       +3     
============================================
+ Hits            499      518      +19     
+ Misses           52       24      -28     
  Partials         23       23              
Files Coverage Δ
.../opensearch/flowframework/FlowFrameworkPlugin.java 100.00% <100.00%> (+100.00%) ⬆️
...g/opensearch/flowframework/model/WorkflowNode.java 90.54% <ø> (ø)
...ch/flowframework/workflow/WorkflowStepFactory.java 64.70% <100.00%> (+8.70%) ⬆️
.../flowframework/workflow/WorkflowProcessSorter.java 97.10% <80.00%> (+4.24%) ⬆️
...opensearch/flowframework/workflow/ProcessNode.java 93.47% <88.46%> (+34.78%) ⬆️

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware that we are trying to map timeouts similar to REST handler timeouts for nodes but we should think about the right place for them rather taking it as an input from the user. For now, this change looks good.

Copy link
Member

@joshpalis joshpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan for the changes. Lgtm, though I'm curious if the sum of all the node timeouts should be the timeout value of the provision API. I think it could be an option, but perhaps it might be better to have the workflow execution run after the workflow_id response is sent back to the user.

Signed-off-by: Daniel Widdis <[email protected]>
@dbwiddis dbwiddis merged commit 28326bd into opensearch-project:main Oct 3, 2023
10 checks passed
@dbwiddis dbwiddis deleted the timeout branch October 3, 2023 21:38
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 3, 2023
* Add timeout for node execution

Signed-off-by: Daniel Widdis <[email protected]>

* Properly implement delays using OpenSearch ThreadPool

Signed-off-by: Daniel Widdis <[email protected]>

* Add coverage for plugin class, make test threshold dynamic

Signed-off-by: Daniel Widdis <[email protected]>

* Tests don't like singletons

Signed-off-by: Daniel Widdis <[email protected]>

* More thorough ProcessNode testing

Signed-off-by: Daniel Widdis <[email protected]>

* Util method for timeout parsing

Signed-off-by: Daniel Widdis <[email protected]>

---------

Signed-off-by: Daniel Widdis <[email protected]>
(cherry picked from commit 28326bd)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@dbwiddis
Copy link
Member Author

dbwiddis commented Oct 3, 2023

I am aware that we are trying to map timeouts similar to REST handler timeouts for nodes but we should think about the right place for them rather taking it as an input from the user. For now, this change looks good.

I'll move them to the step JSON definition file described in #67

dbwiddis pushed a commit that referenced this pull request Oct 4, 2023
Add timeout for node execution (#66)

* Add timeout for node execution



* Properly implement delays using OpenSearch ThreadPool



* Add coverage for plugin class, make test threshold dynamic



* Tests don't like singletons



* More thorough ProcessNode testing



* Util method for timeout parsing



---------


(cherry picked from commit 28326bd)

Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport PRs to 2.x branch
Projects
None yet
3 participants