-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SRVKS-1171] [RELEASE-1.12] Graceful shutdown #130
[SRVKS-1171] [RELEASE-1.12] Graceful shutdown #130
Conversation
skonto
commented
Apr 29, 2024
- Backport of Gracefully drain connections when stopping the gateway knative-extensions/net-kourier#1203
…sions#1203) * Fix gateway's preStop hook: curl does not exist (anymore?) in envoy image * Before stopping the gateway, wait until requests are finished on all public listeners (and exit anyway if it exceeds terminationGracePeriodSeconds) * Drain listeners with appropriate endpoint * Simpler drain + sleep * Remove PARENT_SHUTDOWN_TIME_SECONDS and terminationGracePeriodSeconds * Use a perl script (no need to open the admin HTTP interface!) * Use bash instead of perl in preStop hook * Review @skonto comments: use socket address for admin cluster * [WIP] add graceful shutdown test and tweak CI to just run that test * [WIP] Fix gracefulshutdown_test.go * [WIP] try to fix race condition and lint * [WIP] use initialTimeout + debug * [WIP] fix gracefulshutdown_test.go logic * [WIP] refacto and add some comments to clarify * [WIP] fix lint * [WIP] reintroduce kind-e2e-upgrade.yaml * [WIP] add test case when request takes a little longer than the drain time * [WIP] fix compilation issue * [WIP] FIx compilation issue (again) * [WIP] hopefully fix data race * [WIP] refacto and hopefully fix race condition (use sync.Map) * [WIP] fix compilation issue * [WIP] Handle EOF * [WIP] check gateway pod has been removed + manual debugging * [WIP] debugging * [WIP] more debugging * [WIP] more debugging * [WIP] increase livenessProbe failure threshold as I'm not sure it should return EOF * [WIP] remove debugging related stuff * Revert all unnecessary changes made for testing * Revert unnecessary change (livenessProbe) * Scale to 1 replica * Typo * Run gracefulshutdown test first (speed up feedback loop) * Add a comment for terminationGracePeriodSeconds * Don't update deployment twice Patch env and terminationGracePeriodSeconds at the same time * Fix bad patch * Run gracefulshutdown test at the end - avoids conflicts with other tests - change gracefulshutdown test to delete all gateway pods * Fix gracefulshutdown test * Fix gracefulshutdown test * Lint
/hold I will test downstream at the S-O side. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## release-v1.12 #130 +/- ##
=================================================
+ Coverage 60.63% 62.31% +1.67%
=================================================
Files 24 24
Lines 2002 1632 -370
=================================================
- Hits 1214 1017 -197
+ Misses 726 553 -173
Partials 62 62 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ReToCode, skonto The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-v1.14 |
@ReToCode: once the present PR merges, I will cherry-pick it on top of release-v1.14 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -526,7 +542,9 @@ spec: | |||
memory: 200Mi | |||
limits: | |||
cpu: "1" | |||
memory: 500Mi | |||
memory: 800Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing this from the upstream. @ReToCode do we need to update any docs if we bring this in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, we have not documented limits, just what is a typical use with a certain amount of service. Maybe as a release note to raise awareness?
/unhold Tests passed. |
553b97a
into
openshift-knative:release-v1.12
@ReToCode: #130 failed to apply on top of branch "release-v1.14":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* [SRVKS-1171] [RELEASE-1.12] Graceful shutdown (#130) * Gracefully drain connections when stopping the gateway (knative-extensions#1203) * Fix gateway's preStop hook: curl does not exist (anymore?) in envoy image * Before stopping the gateway, wait until requests are finished on all public listeners (and exit anyway if it exceeds terminationGracePeriodSeconds) * Drain listeners with appropriate endpoint * Simpler drain + sleep * Remove PARENT_SHUTDOWN_TIME_SECONDS and terminationGracePeriodSeconds * Use a perl script (no need to open the admin HTTP interface!) * Use bash instead of perl in preStop hook * Review @skonto comments: use socket address for admin cluster * [WIP] add graceful shutdown test and tweak CI to just run that test * [WIP] Fix gracefulshutdown_test.go * [WIP] try to fix race condition and lint * [WIP] use initialTimeout + debug * [WIP] fix gracefulshutdown_test.go logic * [WIP] refacto and add some comments to clarify * [WIP] fix lint * [WIP] reintroduce kind-e2e-upgrade.yaml * [WIP] add test case when request takes a little longer than the drain time * [WIP] fix compilation issue * [WIP] FIx compilation issue (again) * [WIP] hopefully fix data race * [WIP] refacto and hopefully fix race condition (use sync.Map) * [WIP] fix compilation issue * [WIP] Handle EOF * [WIP] check gateway pod has been removed + manual debugging * [WIP] debugging * [WIP] more debugging * [WIP] more debugging * [WIP] increase livenessProbe failure threshold as I'm not sure it should return EOF * [WIP] remove debugging related stuff * Revert all unnecessary changes made for testing * Revert unnecessary change (livenessProbe) * Scale to 1 replica * Typo * Run gracefulshutdown test first (speed up feedback loop) * Add a comment for terminationGracePeriodSeconds * Don't update deployment twice Patch env and terminationGracePeriodSeconds at the same time * Fix bad patch * Run gracefulshutdown test at the end - avoids conflicts with other tests - change gracefulshutdown test to delete all gateway pods * Fix gracefulshutdown test * Fix gracefulshutdown test * Lint * run hack/update-deps.sh * update openshift files --------- Co-authored-by: norbjd <[email protected]> * update deps --------- Co-authored-by: norbjd <[email protected]>