Operator refactor to control pods + pvcs directly instead of statefulsets #1149

ikreymer · 2023-09-06T14:08:41Z

Refactoring operator to control pods directly, instead of statefulsets, fixes #1147

This PR includes a number of optimizations for the operator:

Ability for pod to be Completed, unlike in Statefulset - eg. if 3 pods are running and first one finishes, all 3 must be running until all 3 are done. With this setup, the first finished pod can remain in Completed state.
Fixed shutdown order - crawler pods now correctly shutdown first before redis pods, by switching to background deletion.
Redis pod remains inactive until crawler is first active, or after no crawl pods are active for 60 seconds
Job deletion starts as soon as post-finish crawl operations are run
Post-crawl operations get their own redis instance, since one during response is being cleaned up in finalizer
Finalizer ignores request with incorrect state (returns 400 if reported as not finished while crawl is finished)
Current resource usage added to status

add redis_storage param

set 'max_crawl_scale' in values.yaml to indicate max possible scale, used to create crawl-instance-{0, N} priority classes, each with lower priority allows crawl instance 0 to preempt crawls with more instances (and lower priorities) eg. 2nd instance of a crawl can preempt 3rd instance of another, and a new crawl (1st instance) can preempt 2nd instance of another crawl

- ensure redis pod is deleted last - start deletion in background as soon as crawl is done - operator may call finalizer with old state: if not finished but in finalizer, attempt to cancel, and throw 400 if already canceled - recreate redis in finalizer from yaml to avoid change event

- support reconciling desired and actual scale - if desired scale is lower, attempt to gracefully shutdown each instance via new redis 'stopone' key - once each instance above > desired scale exit successfully, adjust the status.scale down to clean up pods. also clean up redis per-instance state when scaling down

…have been running for >60 seconds, not immediately

…ulset

add placeholder for adding podmetrics as related resources fix canceled condition

- async add_crawl_errors_to_db() call creates its own redis connection, as other one is supposed to be closed by caller - remove unneeded 'sync_db_state_if_finished' - delete job after crawl finished tasks - log if crawl finished but not yet deleted on next update

ikreymer · 2023-09-07T20:01:05Z

Should be ready for review -- there's a bunch of changes / optimizations to get operator more robust and in preparation for possible autoscaling, let me know if you have any questions @tw4l

- pods explicitly deleted if spec.restartTime != status.restartTime, then updates status.restartTime - use force_restart to remove pods for one sync response to force deletion - update to latest metacontroller v4.11.0 - add --restartOnError flag for crawler

tw4l

Testing locally, if I set the crawl scale high and then set it back to 1 via the frontend, it looks like the second and third instances of the crawler don't get the interrupt signal and continue crawling, although I can only see that there are no longer screencasting messages in the second/third crawler pod and I can only see the first instance in the UI.

Edit: Looks like I was mistaken. It looks like pod crawl-><crawler-id>-0 only issues screencasting and status update methods after scaling back down, and the WACZ produced by that crawler is significantly smaller than the other, so this may be working as intended after all. Makes sense that the crawler would remain active so we can get the WACZ at the end.

backend/btrixcloud/operator.py

ikreymer · 2023-09-08T15:49:58Z

Testing locally, if I set the crawl scale high and then set it back to 1 via the frontend, it looks like the second and third instances of the crawler don't get the interrupt signal and continue crawling, although I can only see that there are no longer screencasting messages in the second/third crawler pod and I can only see the first instance in the UI.

Edit: Looks like I was mistaken. It looks like pod crawl-><crawler-id>-0 only issues screencasting and status update methods after scaling back down, and the WACZ produced by that crawler is significantly smaller than the other, so this may be working as intended after all. Makes sense that the crawler would remain active so we can get the WACZ at the end.

This is a good question and should be documented here. Previously, with the StatefulSet, we automatically scale down, which sends the interrupt signal to the pods being scaled down. However, there is a bit of a risk, if any of those pods fail for any reason (upload fails, or get evicted before they finish and upload), then they will not be restarted again, which could lead to data loss. With this setup, wanted to be a bit more careful, and instead set the request each instance to be stopped via <id>:stopone redis key (introduced in webrecorder/browsertrix-crawler#366), which will wait until that pod is stopped. If it gets interrupted, it will restart again, so effectively we don't actually scale down until the pods finish with 'Completed' (may need to look at error handling here again in case pod is never able to finish..)
But i think that leads to more safe, even though more complicated, shutdown process.

Co-authored-by: Tessa Walsh <[email protected]>

cancel crawl test: just wait until page is found, not necessarily done

…ditional logging for failed crawls, if enabled - print logs: print logs for default container - also print pod status on failure - use mark_finished(... 'canceled') for canceled crawl - tests: also check other finished states to avoid stuck in infinite loop if crawl fails - tests: disable disk utilization check, which adds unpredictability to crawl testing!

ikreymer and others added 14 commits September 1, 2023 00:27

convert operator to control pods/pvcs directly

deed92d

add redis_storage param

update templates

cf5c194

finalizing: first remove crawler pods, then redis, then all pvcs

ad6ffce

templates: add required tolerations to templates to avoid recreate loop

278fb82

redis: pause redis container (set initRedis to false) if no crawlers …

d7202a6

…have been running for >60 seconds, not immediately

profilebrowser: update operator to manage pod directly without statef…

51aa164

…ulset

resource usage: add resources to crawljob status

59052e2

add placeholder for adding podmetrics as related resources fix canceled condition

cleanup:

b95365c

- async add_crawl_errors_to_db() call creates its own redis connection, as other one is supposed to be closed by caller - remove unneeded 'sync_db_state_if_finished' - delete job after crawl finished tasks - log if crawl finished but not yet deleted on next update

Merge branch 'main' into op-pod

5302fad

chart: set redis storage default to 3Gi

d356b14

Merge branch 'main' into op-pod

0b49bbc

ikreymer requested a review from tw4l September 7, 2023 19:59

ikreymer marked this pull request as ready for review September 7, 2023 20:00

tw4l reviewed Sep 8, 2023

View reviewed changes

backend/btrixcloud/operator.py Outdated Show resolved Hide resolved

backend/btrixcloud/operator.py Outdated Show resolved Hide resolved

ikreymer and others added 4 commits September 8, 2023 08:50

Update backend/btrixcloud/operator.py

0c6f24c

Co-authored-by: Tessa Walsh <[email protected]>

Update backend/btrixcloud/operator.py

fb95872

Co-authored-by: Tessa Walsh <[email protected]>

improve logging for tests

263c5e2

cancel crawl test: just wait until page is found, not necessarily done

ikreymer mentioned this pull request Sep 11, 2023

Additional logging for CI #1156

Closed

tw4l approved these changes Sep 11, 2023

View reviewed changes

ikreymer merged commit ad9bca2 into main Sep 11, 2023
4 checks passed

ikreymer deleted the op-pod branch September 11, 2023 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator refactor to control pods + pvcs directly instead of statefulsets #1149

Operator refactor to control pods + pvcs directly instead of statefulsets #1149

ikreymer commented Sep 6, 2023 •

edited

Loading

ikreymer commented Sep 7, 2023

tw4l left a comment •

edited

Loading

ikreymer commented Sep 8, 2023

Operator refactor to control pods + pvcs directly instead of statefulsets #1149

Operator refactor to control pods + pvcs directly instead of statefulsets #1149

Conversation

ikreymer commented Sep 6, 2023 • edited Loading

ikreymer commented Sep 7, 2023

tw4l left a comment • edited Loading

Choose a reason for hiding this comment

ikreymer commented Sep 8, 2023

ikreymer commented Sep 6, 2023 •

edited

Loading

tw4l left a comment •

edited

Loading