-
Notifications
You must be signed in to change notification settings - Fork 124
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: chaosi-zju <[email protected]>
- Loading branch information
1 parent
c76b569
commit 412f301
Showing
2 changed files
with
331 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,330 @@ | ||
--- | ||
title: Using Workload Rebalancer to achieve a fresh Rescheduling | ||
--- | ||
|
||
In general case, after replicas of workloads is scheduled, it will keep the scheduling result inert | ||
and the replicas distribution will not change. Even if reschedule is triggered by modifying replicas or placement, | ||
it will maintain the exist replicas distribution as closely as possible, only making minimal adjustments when necessary, | ||
which minimizes disruptions and preserves the balance across clusters. | ||
|
||
However, in some scenarios, users hope to have approach to actively trigger a fresh rescheduling, which disregards the | ||
previous assignment entirely and seeks to establish an entirely new replica distribution across clusters. | ||
|
||
## Applicable Scenarios | ||
|
||
### Scenario 1 | ||
|
||
In cluster failover scenario, replicas are distributed in member1 + member2 two clusters, however they would all migrate to | ||
member2 cluster if member1 cluster fails. | ||
|
||
As a cluster administrator, I hope the replicas redistribute to two clusters when member1 cluster recovered, so that | ||
the resources of the member1 cluster will be re-utilized, also for the sake of high availability. | ||
|
||
### Scenario 2 | ||
|
||
In application-level failover, low-priority applications may be preempted, resulting in shrinking from multi clusters | ||
to single cluster due to cluster resources are in short supply | ||
(refer to [Application-level Failover](https://karmada.io/docs/next/userguide/failover/application-failover#why-application-level-failover-is-required)). | ||
|
||
As a user, I hope the replicas of low-priority applications can be redistributed to multi clusters when | ||
cluster resources are sufficient to ensure the high availability of application. | ||
|
||
### Scenario 3 | ||
|
||
In `Aggregated` schedule type, replicas may still distribute across multiple clusters due to resource constraints. | ||
|
||
As a user, I hope the replicas to be redistributed in an aggregated strategy when any cluster has | ||
sufficient resource to accommodate all replicas, so that the application better meets actual business requirements. | ||
|
||
|
||
### Scenario 4 | ||
|
||
In disaster-recovery scenario, replicas migrated from primary cluster to backup cluster when primary cluster failure. | ||
|
||
As a cluster administrator, I hope that replicas can migrate back when cluster restored, so that: | ||
|
||
1. restore to the disaster-recovery mode to ensure the reliability and stability of the cluster federation. | ||
2. save the cost of the backup cluster. | ||
|
||
## Prerequisites | ||
|
||
### Karmada has been installed | ||
|
||
We can install Karmada by referring to [quick-start](https://github.com/karmada-io/karmada#quick-start), or directly | ||
run `hack/local-up-karmada.sh` script which is also used to run our E2E cases. | ||
|
||
## Example | ||
|
||
### Step 1: create a Deployment and a ClusterRole | ||
|
||
You should first prepare a Deployment named `demo-deploy-1`, and a ClusterRole named `demo-role`. | ||
|
||
To achieve this, you can create new file `/tmp/deployments-and-services.yaml` and copy text below to it: | ||
|
||
<details> | ||
<summary>/tmp/deployments-and-services.yaml</summary> | ||
|
||
```yaml | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: demo-deploy-1 | ||
labels: | ||
app: test | ||
spec: | ||
replicas: 3 | ||
selector: | ||
matchLabels: | ||
app: demo-deploy-1 | ||
template: | ||
metadata: | ||
labels: | ||
app: demo-deploy-1 | ||
spec: | ||
terminationGracePeriodSeconds: 0 | ||
containers: | ||
- image: nginx | ||
name: demo-deploy-1 | ||
resources: | ||
limits: | ||
cpu: 10m | ||
memory: 10Mi | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
name: demo-role | ||
rules: | ||
- apiGroups: | ||
- '*' | ||
resources: | ||
- '*' | ||
verbs: | ||
- '*' | ||
--- | ||
apiVersion: policy.karmada.io/v1alpha1 | ||
kind: ClusterPropagationPolicy | ||
metadata: | ||
name: default-pp | ||
spec: | ||
placement: | ||
clusterTolerations: | ||
- effect: NoSchedule | ||
key: workload-rebalancer-test | ||
operator: Exists | ||
tolerationSeconds: 0 | ||
clusterAffinity: | ||
clusterNames: | ||
- member1 | ||
- member2 | ||
replicaScheduling: | ||
replicaDivisionPreference: Weighted | ||
replicaSchedulingType: Divided | ||
weightPreference: | ||
dynamicWeight: AvailableReplicas | ||
resourceSelectors: | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-deploy-1 | ||
namespace: default | ||
- apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
name: demo-role | ||
``` | ||
</details> | ||
Then run the following command to create those resources: | ||
```bash | ||
kubectl --context karmada-apiserver apply -f /tmp/deployments-and-services.yaml | ||
``` | ||
|
||
And you can check whether this step succeed like this: | ||
|
||
```bash | ||
$ kubectl --context karmada-apiserver get deploy demo-deploy-1 | ||
NAME READY UP-TO-DATE AVAILABLE AGE | ||
demo-deploy-1 3/3 3 3 3m18s | ||
$ kubectl --context member1 get po | ||
NAME READY STATUS RESTARTS AGE | ||
demo-deploy-1-784cd456bf-dv6xw 1/1 Running 0 3m18s | ||
demo-deploy-1-784cd456bf-fgjn7 1/1 Running 0 3m18s | ||
$ kubectl --context member2 get po | ||
NAME READY STATUS RESTARTS AGE | ||
demo-deploy-1-784cd456bf-856rf 1/1 Running 0 3m18s | ||
|
||
$ kubectl --context karmada-apiserver get clusterrole demo-role | ||
NAME CREATED AT | ||
demo-role 2024-05-22T11:10:29Z | ||
``` | ||
|
||
take `deployment/demo-deploy-1` as example, 2 replicas propagated to member1 cluster and 1 replica propagated to member2 cluster. | ||
|
||
### Step 2: add `NoExecute` taint to member1 cluster to mock cluster failover | ||
|
||
* Run the following command to add `NoExecute` taint to member1 cluster: | ||
|
||
```bash | ||
kubectl --context karmada-apiserver patch cluster member1 --type='json' -p '[{"op": "replace", "path": "/spec/taints", "value": [{"key": "workload-rebalancer-test", "effect": "NoExecute"}]}]' | ||
``` | ||
|
||
Then, reschedule will be triggered for the reason of cluster failover, and all replicas will be propagated to member2 cluster, | ||
you can see: | ||
|
||
```bash | ||
$ kubectl --context member1 get po | ||
No resources found in default namespace. | ||
$ kubectl --context member2 get po | ||
NAME READY STATUS RESTARTS AGE | ||
demo-deploy-1-784cd456bf-856rf 1/1 Running 0 5m27s | ||
demo-deploy-1-784cd456bf-b5977 1/1 Running 0 35s | ||
demo-deploy-1-784cd456bf-pqthv 1/1 Running 0 35s | ||
``` | ||
|
||
* Run the following command to remove the above `NoExecute` taint from member1 cluster: | ||
|
||
```bash | ||
kubectl --context karmada-apiserver patch cluster member1 --type='json' -p '[{"op": "replace", "path": "/spec/taints", "value": []}]' | ||
``` | ||
|
||
Removing the taint will not lead to replicas propagation changed for the reason of scheduling result inert, | ||
all replicas will keep in member2 cluster unchanged. | ||
|
||
### Step 3. apply a WorkloadRebalancer to trigger rescheduling. | ||
|
||
Assuming you want to trigger the rescheduling of above resources, you can create new file `/tmp/workload-rebalancer.yaml` | ||
and copy text below to it: | ||
|
||
```yaml | ||
apiVersion: apps.karmada.io/v1alpha1 | ||
kind: WorkloadRebalancer | ||
metadata: | ||
name: demo | ||
spec: | ||
workloads: | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-deploy-1 | ||
namespace: default | ||
- apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
name: demo-role | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-deploy-2 | ||
namespace: default | ||
``` | ||
> tip: `Deployment/demo-deploy-2` represents a non-existing resource. | ||
|
||
Then run the following command to apply it: | ||
|
||
```bash | ||
kubectl --context karmada-apiserver apply -f /tmp/workload-rebalancer.yaml | ||
``` | ||
|
||
you will get a `workloadrebalancer.apps.karmada.io/demo created` result, which means the API created success. | ||
|
||
### Step 4: check the status of WorkloadRebalancer. | ||
|
||
Run the following command: | ||
|
||
```bash | ||
$ kubectl --context karmada-apiserver get workloadrebalancer demo -o yaml | ||
apiVersion: apps.karmada.io/v1alpha1 | ||
kind: WorkloadRebalancer | ||
metadata: | ||
... | ||
creationTimestamp: "2024-05-22T11:16:10Z" | ||
name: demo | ||
... | ||
spec: | ||
... | ||
status: | ||
finishTime: "2024-05-22T11:16:10Z" | ||
observedGeneration: 1 | ||
observedWorkloads: | ||
- result: Successful | ||
workload: | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-deploy-1 | ||
namespace: default | ||
- reason: ReferencedBindingNotFound | ||
result: Failed | ||
workload: | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-deploy-2 | ||
namespace: default | ||
- result: Successful | ||
workload: | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
name: demo-role | ||
``` | ||
|
||
Thus, you can observe the rescheduling result at `status.observedWorkloads` field of `workloadrebalancer/demo`. | ||
As you can see, `Deployment/demo-deploy-1` and `ClusterRole/demo-role` rescheduled successfully, | ||
while non-existing resource `deployment/demo-deploy-2` failed with `ReferencedBindingNotFound` result. | ||
|
||
### Step 5: Observe the real effect of WorkloadRebalancer | ||
|
||
Take `deployment/demo-deploy-1` as an example, you can observe the real replicas propagation status: | ||
|
||
```bash | ||
$ kubectl --context member1 get po | ||
NAME READY STATUS RESTARTS AGE | ||
demo-deploy-1-784cd456bf-82kt6 1/1 Running 0 89s | ||
demo-deploy-1-784cd456bf-k9fhl 1/1 Running 0 89s | ||
$ kubectl --context member2 get po | ||
NAME READY STATUS RESTARTS AGE | ||
demo-deploy-1-784cd456bf-856rf 1/1 Running 0 9m23s | ||
``` | ||
|
||
As you see, rescheduling happened and 2 replicas migrated back to member1 cluster while 1 replica in member2 cluster keep unchanged. | ||
|
||
Besides, you can observe a schedule event emitted by `default-scheduler`, such as: | ||
|
||
```bash | ||
$ kubectl --context karmada-apiserver describe deployment demo-deploy-1 | ||
... | ||
Events: | ||
Type Reason Age From Message | ||
---- ------ ---- ---- ------- | ||
... | ||
Normal ScheduleBindingSucceed 31s default-scheduler Binding has been scheduled successfully. Result: {member2:2, member1:1} | ||
Normal GetDependenciesSucceed 31s dependencies-distributor Get dependencies([]) succeed. | ||
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo-deploy-1) to cluster member1 | ||
Normal AggregateStatusSucceed 31s (x4 over 31s) resource-binding-status-controller Update resourceBinding(default/demo-deploy-1-deployment) with AggregatedStatus successfully. | ||
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo-deploy-1) to cluster member2 | ||
``` | ||
|
||
### Step 6: Update and Auto-clean WorkloadRebalancer | ||
|
||
Assuming you want the WorkloadRebalancer resource been auto cleaned in the future, you can just edit it and set | ||
`spec.ttlSecondsAfterFinished` field to `300`, just like: | ||
|
||
```yaml | ||
apiVersion: apps.karmada.io/v1alpha1 | ||
kind: WorkloadRebalancer | ||
metadata: | ||
name: demo | ||
spec: | ||
ttlSecondsAfterFinished: 300 | ||
workloads: | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-deploy-1 | ||
namespace: default | ||
- apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
name: demo-role | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-deploy-2 | ||
namespace: default | ||
``` | ||
|
||
After you applied this modification, this WorkloadRebalancer resource will be auto deleted after 300 seconds. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters