Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional logging for CI #1156

Closed
wants to merge 41 commits into from
Closed

Additional logging for CI #1156

wants to merge 41 commits into from

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Sep 8, 2023

Additional logging for debugging failed crawls on CI:

  • Print the previous container logs for failed pods
  • Also print pod status

ikreymer and others added 17 commits September 1, 2023 00:27
set 'max_crawl_scale' in values.yaml to indicate max possible scale, used to create crawl-instance-{0, N}
priority classes, each with lower priority
allows crawl instance 0 to preempt crawls with more instances (and lower priorities)
eg. 2nd instance of a crawl can preempt 3rd instance of another, and a new crawl (1st instance)
can preempt 2nd instance of another crawl
- ensure redis pod is deleted last
- start deletion in background as soon as crawl is done
- operator may call finalizer with old state: if not finished but in finalizer, attempt to
cancel, and throw 400 if already canceled
- recreate redis in finalizer from yaml to avoid change event
- support reconciling desired and actual scale
- if desired scale is lower, attempt to gracefully shutdown each instance
via new redis 'stopone' key
- once each instance above > desired scale exit successfully, adjust
the status.scale down to clean up pods. also clean up redis per-instance
state when scaling down
…have been running for >60 seconds, not immediately
add placeholder for adding podmetrics as related resources
fix canceled condition
- async add_crawl_errors_to_db() call creates its own redis connection, as other one is supposed to be closed
by caller
- remove unneeded 'sync_db_state_if_finished'
- delete job after crawl finished tasks
- log if crawl finished but not yet deleted on next update
- pods explicitly deleted if spec.restartTime != status.restartTime, then updates status.restartTime
- use force_restart to remove pods for one sync response to force deletion
- update to latest metacontroller v4.11.0
- add --restartOnError flag for crawler
@ikreymer ikreymer requested a review from tw4l September 8, 2023 16:51
cancel crawl test: just wait until page is found, not necessarily done
- print previous log
- print pod statuses for failed crawls
to ensure get original log (previous container logs may not be available)
@ikreymer
Copy link
Member Author

These changes / branch has been merged into #1149 changes

@ikreymer ikreymer closed this Sep 11, 2023
@ikreymer ikreymer deleted the more-failed-logging branch September 11, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant