Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support auto delete WorkloadRebalancer when time up #4894

Merged
merged 1 commit into from
May 27, 2024

Conversation

chaosi-zju
Copy link
Member

@chaosi-zju chaosi-zju commented Apr 30, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Support auto delete WorkloadRebalancer when time up:

referring to Automatic Cleanup for Finished Jobs.

Introduces field ttlSecondsAfterFinished which limits the lifetime of a WorkloadRebalancer that has finished execution
(finished execution means each target workload is finished with result of Successful or Failed).

  • If this field is set, ttlSecondsAfterFinished after the WorkloadRebalancer finishes, it is eligible to be automatically deleted.
  • If this field is unset, the WorkloadRebalancer won't be automatically deleted.
  • If this field is set to zero, the WorkloadRebalancer becomes eligible to be deleted immediately after it finishes.

Considering several corner cases:

  • case 1: if a new target workload was added into WorkloadRebalancer before ttlSecondsAfterFinished expired,
    which means the finish time of the WorkloadRebalancer is refreshed, so the delete action is deferred since expire time is refreshed too.
  • case 2: if ttlSecondsAfterFinished is modified before ttlSecondsAfterFinished expired,
    delete action should be performed according to latest ttlSecondsAfterFinished.
  • case 3: when we have got and checked latest WorkloadRebalancer object and try to delete it,
    if a modification to WorkloadRebalancer occurred just right between the two time point, the previous delete action should be Interrupted.

Several key implementation:

  • A WorkloadRebalancer is judged as finished should meet two requirements:
    • all expected workloads are finished with result of Successful or Failed.
    • introduce a new field named ObservedGeneration to Status of WorkloadRebalancer, and it should be equal to .metadata.Generation,
      to prevent that the WorkloadRebalancer is updated but controller hasn't in time refreshed its Status.
  • When WorkloadRebalancer is Created or Updated, add it to the workqueue and calculate its expiring time, and
    call workqueue.AddAfter() function to re-enqueue it once more if it hasn't expired.
  • Before deleting the WorkloadRebalancer, do a final sanity check. Use the latest WorkloadRebalancer directly
    fetched from api server to see if the TTL truly expires, rather than object from lister cache.
  • When deleting the WorkloadRebalancer, it is needed to confirm that the resourceVersion of the deleted object is as expected,
    to prevent from above corner case 3.

Which issue(s) this PR fixes:

Fixes part of #4840

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

support auto delete WorkloadRebalancer when time up

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 30, 2024
@karmada-bot karmada-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 30, 2024
@chaosi-zju
Copy link
Member Author

/hold

for #4875 #4860

@karmada-bot karmada-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2024
@chaosi-zju chaosi-zju force-pushed the reschedule-delete branch 2 times, most recently from 9b1e1a5 to b016c25 Compare April 30, 2024 08:22
@chaosi-zju chaosi-zju force-pushed the reschedule-delete branch 2 times, most recently from 4606fb1 to 7b02e4e Compare April 30, 2024 09:00
@chaosi-zju chaosi-zju force-pushed the reschedule-delete branch 3 times, most recently from 55d9849 to 5ca7500 Compare May 9, 2024 12:59
@chaosi-zju
Copy link
Member Author

/hold cancel

@karmada-bot karmada-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 9, 2024
@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 9, 2024
@codecov-commenter
Copy link

codecov-commenter commented May 9, 2024

Codecov Report

Attention: Patch coverage is 71.92982% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 53.34%. Comparing base (d465fcd) to head (a8b4050).
Report is 2 commits behind head on master.

Files Patch % Lines
...orkloadrebalancer/workloadrebalancer_controller.go 71.92% 9 Missing and 7 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4894   +/-   ##
=======================================
  Coverage   53.34%   53.34%           
=======================================
  Files         252      252           
  Lines       20481    20531   +50     
=======================================
+ Hits        10926    10953   +27     
- Misses       8834     8853   +19     
- Partials      721      725    +4     
Flag Coverage Δ
unittests 53.34% <71.92%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chaosi-zju chaosi-zju force-pushed the reschedule-delete branch 2 times, most recently from 47b6e50 to 28b3332 Compare May 13, 2024 01:25
@chaosi-zju
Copy link
Member Author

CC @RainbowMango can this begin to review?

@RainbowMango
Copy link
Member

/assign

By the way.

If this field is set, ttlSecondsAfterFinished after the WorkloadRebalancer finishes, it is eligible to be automatically deleted.

ttlSecondsAfterFinished --> ttlMinutesAfterFinished in the PR description.

@chaosi-zju chaosi-zju force-pushed the reschedule-delete branch 3 times, most recently from b1dce42 to c2f38ca Compare May 15, 2024 09:39
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2024
@RainbowMango
Copy link
Member

/hold
Please @XiShanYongYe-Chang take another look, particularly the e2e part.

@chaosi-zju
Copy link
Member Author

chaosi-zju commented May 24, 2024

Please @XiShanYongYe-Chang take another look, particularly the e2e part.

@XiShanYongYe-Chang not only e2e, but also ut

@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label May 24, 2024
@chaosi-zju
Copy link
Member Author

@XiShanYongYe-Chang
Copy link
Member

Ok~
/assign

@chaosi-zju chaosi-zju force-pushed the reschedule-delete branch 2 times, most recently from 67968da to eb84176 Compare May 25, 2024 08:06
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~
Generally OK.

test/e2e/framework/workloadrebalancer.go Show resolved Hide resolved
@chaosi-zju chaosi-zju force-pushed the reschedule-delete branch 2 times, most recently from 262b69c to 015a571 Compare May 25, 2024 09:40
@RainbowMango
Copy link
Member

[FAILED] in [DeferCleanup (Each)] - /home/runner/work/karmada/karmada/test/e2e/framework/workloadrebalancer.go:46 @ 05/25/24 10:13:54.839

It seems related, see the logs here.

[FAILED] Unexpected error:
      <*errors.StatusError | 0xc0005c8d20>: 
      workloadrebalancers.apps.karmada.io "rebalancer-lhnmp" not found
      {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "workloadrebalancers.apps.karmada.io \"rebalancer-lhnmp\" not found",
              Reason: "NotFound",
              Details: {
                  Name: "rebalancer-lhnmp",
                  Group: "apps.karmada.io",
                  Kind: "workloadrebalancers",
                  UID: "",
                  Causes: nil,
                  RetryAfterSeconds: 0,
              },
              Code: 404,
          },
      }
  occurred
  In [DeferCleanup (Each)] at: /home/runner/work/karmada/karmada/test/e2e/framework/workloadrebalancer.go:46 @ 05/25/24 10:13:54.839

@chaosi-zju
Copy link
Member Author

CI problem fixed, is there any further comments?

@XiShanYongYe-Chang
Copy link
Member

/lgtm
Thanks

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2024
@RainbowMango
Copy link
Member

/hold cancel

@karmada-bot karmada-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2024
@karmada-bot karmada-bot merged commit be1a4fb into karmada-io:master May 27, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants