-
Notifications
You must be signed in to change notification settings - Fork 885
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce a mechanism to actively trigger rescheduling
Signed-off-by: chaosi-zju <[email protected]>
- Loading branch information
1 parent
fdad87e
commit edb362e
Showing
1 changed file
with
266 additions
and
0 deletions.
There are no files selected for viewing
266 changes: 266 additions & 0 deletions
266
docs/proposals/scheduling/reschedule-task/reschedule-task.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,266 @@ | ||
--- | ||
title: Introduce a mechanism to actively triggle rescheduling | ||
authors: | ||
- "@chaosi-zju" | ||
reviewers: | ||
- "@RainbowMango" | ||
- "@chaunceyjiang" | ||
- "TBD" | ||
approvers: | ||
- "@RainbowMango" | ||
- "TBD" | ||
|
||
creation-date: 2024-01-30 | ||
--- | ||
|
||
# Introduce a mechanism to actively trigger rescheduling | ||
|
||
## Background | ||
|
||
According to the current implementation, after the replicas of workload is scheduled, it will remain inertia and the | ||
replicas distribution will not change. | ||
|
||
However, in some scenarios, users hope to have means to actively trigger rescheduling. | ||
|
||
### Motivation | ||
|
||
Assuming the user has propagated the workloads to member clusters, replicas migrated due to member cluster failure. | ||
|
||
However, the user expects an approach to trigger rescheduling after member cluster restored, so that replicas can | ||
migrate back. | ||
|
||
### Goals | ||
|
||
Introduce a mechanism to actively trigger rescheduling of workload resource. | ||
|
||
### Applicable scenario | ||
|
||
This feature might help in a scenario where: the `replicas` in resource template or `placement` in policy has not changed, | ||
but the user wants to actively trigger rescheduling of replicas. | ||
|
||
## Proposal | ||
|
||
### Overview | ||
|
||
This proposal aims to introduce a mechanism of active triggering rescheduling, which benefits a lot in application | ||
failover scenarios. This can be realized by introducing a new API, and a new field would be marked when this new API | ||
called, so that scheduler can perceive the need for rescheduling. | ||
|
||
### User story | ||
|
||
In application failover scenarios, replicas migrated from primary cluster to backup cluster when primary cluster failue. | ||
|
||
As a user, I want to trigger replicas migrating back when cluster restored, so that: | ||
|
||
1. restore the disaster recovery mode to ensure the reliability and stability of the cluster. | ||
2. save the cost of the backup cluster. | ||
|
||
### Notes/Constraints/Caveats | ||
|
||
This ability is limited to triggering rescheduling. The scheduling result will be recalculated according to the | ||
Placement in the current ResourceBinding, and the scheduling result is not guaranteed to be exactly the same as before | ||
the cluster failure. | ||
|
||
> Notes: pay attention to the recalculation is basing on Placement in the current `ResourceBinding`, not "Policy". So if | ||
> your activation preference of Policy is `Lazy`, the rescheduling is still basing on previous `ResourceBinding` even if | ||
> the current Policy has been changed. | ||
## Design Details | ||
|
||
### API change | ||
|
||
* Introduce a new API named `Reschedule` into a new apiGroup `command.karmada.io`: | ||
|
||
```go | ||
//revive:disable:exported | ||
|
||
// +genclient | ||
// +genclient:nonNamespaced | ||
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object | ||
|
||
// Reschedule represents the desire state and status of a task which can enforces a rescheduling. | ||
type Reschedule struct { | ||
metav1.TypeMeta | ||
metav1.ObjectMeta | ||
|
||
// Spec represents the specification of the desired behavior of Reschedule. | ||
// +required | ||
Spec RescheduleSpec | ||
} | ||
|
||
// RescheduleSpec represents the specification of the desired behavior of Reschedule. | ||
type RescheduleSpec struct { | ||
// TargetRefPolicy used to select batch of resources managed by certain policies. | ||
// +optional | ||
TargetRefPolicy []PolicySelector | ||
|
||
// TargetRefResource used to select resources. | ||
// +optional | ||
TargetRefResource []ResourceSelector | ||
} | ||
|
||
// PolicySelector the resources bound policy will be selected. | ||
type PolicySelector struct { | ||
// Namespace of the target policy. | ||
// Default is empty, which means inherit from the parent object scope. | ||
// +optional | ||
Namespace string | ||
|
||
// Name of the target resource. | ||
// Default is empty, which means selecting all resources. | ||
// +optional | ||
Name string | ||
} | ||
|
||
// ResourceSelector the resources will be selected. | ||
type ResourceSelector struct { | ||
// APIVersion represents the API version of the target resources. | ||
// +required | ||
APIVersion string | ||
|
||
// Kind represents the Kind of the target resources. | ||
// +required | ||
Kind string | ||
|
||
// Namespace of the target resource. | ||
// Default is empty, which means inherit from the parent object scope. | ||
// +optional | ||
Namespace string | ||
|
||
// Name of the target resource. | ||
// Default is empty, which means selecting all resources. | ||
// +optional | ||
Name string | ||
|
||
// A label query over a set of resources. | ||
// If name is not empty, labelSelector will be ignored. | ||
// +optional | ||
LabelSelector *metav1.LabelSelector | ||
} | ||
|
||
//revive:enable:exported | ||
``` | ||
|
||
* Add two new field named `ForceRescheduling` to ResourceBinding/ClusterResourceBinding | ||
|
||
```go | ||
// ResourceBindingSpec represents the expectation of ResourceBinding. | ||
type ResourceBindingSpec struct { | ||
... | ||
// RescheduleTriggeredAt is a timestamp representing when the referenced resource is triggered rescheduling. | ||
// Only when this timestamp is later than timestamp in status.rescheduledAt will the rescheduling actually execute. | ||
// | ||
// It is represented in RFC3339 form (like '2006-01-02T15:04:05Z') and is in UTC. | ||
// It is recommended to be populated by the REST handler of command.karmada.io/Reschedule API. | ||
// +optional | ||
RescheduleTriggeredAt metav1.Time `json:"rescheduleTriggeredAt,omitempty"` | ||
... | ||
} | ||
|
||
// ResourceBindingStatus represents the overall status of the strategy as well as the referenced resources. | ||
type ResourceBindingStatus struct { | ||
... | ||
// RescheduledAt is a timestamp representing scheduler finished a rescheduling. | ||
// It is represented in RFC3339 form (like '2006-01-02T15:04:05Z') and is in UTC. | ||
// +optional | ||
RescheduledAt metav1.Time `json:"rescheduledAt,omitempty"` | ||
... | ||
} | ||
``` | ||
|
||
### Example | ||
|
||
Assuming there is a Deployment named `nginx`, the user wants to trigger its rescheduling, | ||
he just needs to apply following yaml: | ||
|
||
|
||
```yaml | ||
apiVersion: command.karmada.io/v1alpha1 | ||
kind: Reschedule | ||
metadata: | ||
name: demo-command | ||
spec: | ||
targetRefResource: | ||
- apiVersion: apps/v1 | ||
kind: Deployment | ||
name: demo-test-1 | ||
namespace: default | ||
targetRefPolicy: | ||
- name: default-pp | ||
namespace: default | ||
``` | ||
Then, he will get a `reschedule.command.karmada.io/demo-task created` result, which means the task started, attention, | ||
not finished. Simultaneously, he will see the new field `spec.placement.rescheduleTriggeredAt` in binding of the selected | ||
resource been set to current timestamp. | ||
|
||
```yaml | ||
apiVersion: work.karmada.io/v1alpha2 | ||
kind: ResourceBinding | ||
metadata: | ||
name: nginx-deployment | ||
namespace: default | ||
spec: | ||
rescheduleTriggeredAt: "2024-04-17T15:04:05Z" | ||
... | ||
``` | ||
|
||
Then, rescheduling is in progress. If it succeeds, the `status.rescheduledAt` field of binding will be updated, | ||
which represents scheduler finished a rescheduling.; If it failed, scheduler will retry. | ||
|
||
```yaml | ||
apiVersion: work.karmada.io/v1alpha2 | ||
kind: ResourceBinding | ||
metadata: | ||
name: nginx-deployment | ||
namespace: default | ||
spec: | ||
rescheduleTriggeredAt: "2024-04-17T15:04:05Z" | ||
... | ||
status: | ||
rescheduledAt: "2024-04-17T15:04:06Z" | ||
conditions: | ||
- ... | ||
- lastTransitionTime: "2024-03-08T08:53:03Z" | ||
message: Binding has been scheduled successfully. | ||
reason: Success | ||
status: "True" | ||
type: Scheduled | ||
- lastTransitionTime: "2024-03-08T08:53:03Z" | ||
message: All works have been successfully applied | ||
reason: FullyAppliedSuccess | ||
status: "True" | ||
type: FullyApplied | ||
``` | ||
|
||
Finally, all works have been successfully applied, the user will observe changes in the actual distribution of resource | ||
template; the user can also see several recorded event in resource template, just like: | ||
|
||
```shell | ||
$ kubectl --context karmada-apiserver describe deployment demo | ||
... | ||
Events: | ||
Type Reason Age From Message | ||
---- ------ ---- ---- ------- | ||
... | ||
Normal ScheduleBindingSucceed 31s default-scheduler Binding has been scheduled successfully. | ||
Normal GetDependenciesSucceed 31s dependencies-distributor Get dependencies([]) succeed. | ||
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo) to cluster member1 | ||
Normal AggregateStatusSucceed 31s (x4 over 31s) resource-binding-status-controller Update resourceBinding(default/demo-deployment) with AggregatedStatus successfully. | ||
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo1) to cluster member2 | ||
``` | ||
|
||
### Implementation logic | ||
|
||
1) add an aggregated-api into karmada-aggregated-apiserver, detail described as above. | ||
|
||
2) add an aggregated-api handler into karmada-aggregated-apiserver, which only impletement `Create` method. It will | ||
fetch all referred resource declared in `targetRefResource` or indirectly declared by `targetRefPolicy`, and then set | ||
`spec.rescheduleTriggeredAt` field to current timestamp in corresponding ResourceBinding. | ||
|
||
> This api is no resource, not need to restore, no state, no idempotency, no implemention of `Update` or `Dalete` method. | ||
> This is also why we not choose CRD type API. | ||
|
||
3) in scheduling process, add a trigger condition: even if `Placement` and `Replicas` of binding unchanged, schedule will | ||
be triggerred if `spec.rescheduleTriggeredAt` is later than `status.rescheduledAt`. After schedule finished, scheduler | ||
will update `status.rescheduledAt` when refreshing binding back. |