Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator refactor to control pods + pvcs directly instead of statefulsets #1149

Merged
merged 19 commits into from
Sep 11, 2023

Commits on Sep 1, 2023

  1. convert operator to control pods/pvcs directly

    add redis_storage param
    ikreymer committed Sep 1, 2023
    Configuration menu
    Copy the full SHA
    deed92d View commit details
    Browse the repository at this point in the history
  2. update templates

    ikreymer committed Sep 1, 2023
    Configuration menu
    Copy the full SHA
    cf5c194 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2023

  1. Configuration menu
    Copy the full SHA
    ad6ffce View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    278fb82 View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2023

  1. priority classes: add priority classes for each additional crawl replica

    set 'max_crawl_scale' in values.yaml to indicate max possible scale, used to create crawl-instance-{0, N}
    priority classes, each with lower priority
    allows crawl instance 0 to preempt crawls with more instances (and lower priorities)
    eg. 2nd instance of a crawl can preempt 3rd instance of another, and a new crawl (1st instance)
    can preempt 2nd instance of another crawl
    ikreymer committed Sep 3, 2023
    Configuration menu
    Copy the full SHA
    e28f0b3 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2023

  1. refactor finalizing:

    - ensure redis pod is deleted last
    - start deletion in background as soon as crawl is done
    - operator may call finalizer with old state: if not finished but in finalizer, attempt to
    cancel, and throw 400 if already canceled
    - recreate redis in finalizer from yaml to avoid change event
    ikreymer committed Sep 5, 2023
    Configuration menu
    Copy the full SHA
    ac5ee06 View commit details
    Browse the repository at this point in the history
  2. scale work:

    - support reconciling desired and actual scale
    - if desired scale is lower, attempt to gracefully shutdown each instance
    via new redis 'stopone' key
    - once each instance above > desired scale exit successfully, adjust
    the status.scale down to clean up pods. also clean up redis per-instance
    state when scaling down
    ikreymer committed Sep 5, 2023
    Configuration menu
    Copy the full SHA
    6aeaa97 View commit details
    Browse the repository at this point in the history
  3. redis: pause redis container (set initRedis to false) if no crawlers …

    …have been running for >60 seconds, not immediately
    ikreymer committed Sep 5, 2023
    Configuration menu
    Copy the full SHA
    d7202a6 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    51aa164 View commit details
    Browse the repository at this point in the history
  5. resource usage: add resources to crawljob status

    add placeholder for adding podmetrics as related resources
    fix canceled condition
    ikreymer committed Sep 5, 2023
    Configuration menu
    Copy the full SHA
    59052e2 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2023

  1. cleanup:

    - async add_crawl_errors_to_db() call creates its own redis connection, as other one is supposed to be closed
    by caller
    - remove unneeded 'sync_db_state_if_finished'
    - delete job after crawl finished tasks
    - log if crawl finished but not yet deleted on next update
    ikreymer committed Sep 6, 2023
    Configuration menu
    Copy the full SHA
    b95365c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5302fad View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d356b14 View commit details
    Browse the repository at this point in the history

Commits on Sep 7, 2023

  1. Configuration menu
    Copy the full SHA
    0b49bbc View commit details
    Browse the repository at this point in the history

Commits on Sep 8, 2023

  1. switch pod childPolicy to OnDelete to avoid recreate loops in edgecases:

    - pods explicitly deleted if spec.restartTime != status.restartTime, then updates status.restartTime
    - use force_restart to remove pods for one sync response to force deletion
    - update to latest metacontroller v4.11.0
    - add --restartOnError flag for crawler
    ikreymer committed Sep 8, 2023
    Configuration menu
    Copy the full SHA
    36990be View commit details
    Browse the repository at this point in the history
  2. Update backend/btrixcloud/operator.py

    Co-authored-by: Tessa Walsh <[email protected]>
    ikreymer and tw4l authored Sep 8, 2023
    Configuration menu
    Copy the full SHA
    0c6f24c View commit details
    Browse the repository at this point in the history
  3. Update backend/btrixcloud/operator.py

    Co-authored-by: Tessa Walsh <[email protected]>
    ikreymer and tw4l authored Sep 8, 2023
    Configuration menu
    Copy the full SHA
    fb95872 View commit details
    Browse the repository at this point in the history
  4. improve logging for tests

    cancel crawl test: just wait until page is found, not necessarily done
    ikreymer committed Sep 8, 2023
    Configuration menu
    Copy the full SHA
    263c5e2 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2023

  1. - add 'fail_crawl()' to be used for failing a crawl, which handles ad…

    …ditional logging for failed crawls, if enabled
    
    - print logs: print logs for default container
    - also print pod status on failure
    - use mark_finished(... 'canceled') for canceled crawl
    - tests: also check other finished states to avoid stuck in infinite loop if crawl fails
    - tests: disable disk utilization check, which adds unpredictability to crawl testing!
    ikreymer committed Sep 10, 2023
    Configuration menu
    Copy the full SHA
    fa444d1 View commit details
    Browse the repository at this point in the history