Introduce a mechanism to actively trigger rescheduling

Signed-off-by: chaosi-zju <[email protected]>
karmada-io · Apr 17, 2024 · c57f463 · c57f463
1 parent dca5c1a
commit c57f463
Showing 1 changed file with 266 additions and 0 deletions.
diff --git a/docs/proposals/scheduling/reschedule-task/reschedule-task.md b/docs/proposals/scheduling/reschedule-task/reschedule-task.md
@@ -0,0 +1,266 @@
+---
+title: Introduce a mechanism to actively triggle rescheduling
+authors:
+  - "@chaosi-zju"
+reviewers:
+  - "@RainbowMango"
+  - "@chaunceyjiang"
+  - "TBD"
+approvers:
+  - "@RainbowMango"
+  - "TBD"
+
+creation-date: 2024-01-30
+---
+
+# Introduce a mechanism to actively trigger rescheduling
+
+## Background
+
+According to the current implementation, after the replicas of workload is scheduled, it will remain inertia and the 
+replicas distribution will not change. 
+
+However, in some scenarios, users hope to have means to actively trigger rescheduling.
+
+### Motivation
+
+Assuming the user has propagated the workloads to member clusters, replicas migrated due to member cluster failure.
+
+However, the user expects an approach to trigger rescheduling after member cluster restored, so that replicas can
+migrate back.
+
+### Goals
+
+Introduce a mechanism to actively trigger rescheduling of workload resource.
+
+### Applicable scenario
+
+This feature might help in a scenario where: the `replicas` in resource template or `placement` in policy has not changed, 
+but the user wants to actively trigger rescheduling of replicas.
+
+## Proposal
+
+### Overview
+
+This proposal aims to introduce a mechanism of active triggering rescheduling, which benefits a lot in application 
+failover scenarios. This can be realized by introducing a new API, and a new field would be marked when this new API 
+called, so that scheduler can perceive the need for rescheduling.
+
+### User story
+
+In application failover scenarios, replicas migrated from primary cluster to backup cluster when primary cluster failue.
+
+As a user, I want to trigger replicas migrating back when cluster restored, so that:
+
+1. restore the disaster recovery mode to ensure the reliability and stability of the cluster.
+2. save the cost of the backup cluster.
+
+### Notes/Constraints/Caveats
+
+This ability is limited to triggering rescheduling. The scheduling result will be recalculated according to the
+Placement in the current ResourceBinding, and the scheduling result is not guaranteed to be exactly the same as before
+the cluster failure.
+
+> Notes: pay attention to the recalculation is basing on Placement in the current `ResourceBinding`, not "Policy". So if
+> your activation preference of Policy is `Lazy`, the rescheduling is still basing on previous `ResourceBinding` even if
+> the current Policy has been changed.
+
+## Design Details
+
+### API change
+
+* Introduce a new API named `Reschedule` into a new apiGroup `command.karmada.io`:
+
+```go
+//revive:disable:exported
+
+// +genclient
+// +genclient:nonNamespaced
+// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
+
+// Reschedule represents the desire state and status of a task which can enforces a rescheduling.
+type Reschedule struct {
+    metav1.TypeMeta
+    metav1.ObjectMeta
+
+    // Spec represents the specification of the desired behavior of Reschedule.
+    // +required
+    Spec RescheduleSpec
+}
+
+// RescheduleSpec represents the specification of the desired behavior of Reschedule.
+type RescheduleSpec struct {
+    // TargetRefPolicy used to select batch of resources managed by certain policies.
+    // +optional
+    TargetRefPolicy []PolicySelector
+
+    // TargetRefResource used to select resources.
+    // +optional
+    TargetRefResource []ResourceSelector
+}
+
+// PolicySelector the resources bound policy will be selected.
+type PolicySelector struct {
+    // Namespace of the target policy.
+    // Default is empty, which means inherit from the parent object scope.
+    // +optional
+    Namespace string
+
+    // Name of the target resource.
+    // Default is empty, which means selecting all resources.
+    // +optional
+    Name string
+}
+
+// ResourceSelector the resources will be selected.
+type ResourceSelector struct {
+    // APIVersion represents the API version of the target resources.
+    // +required
+    APIVersion string
+
+    // Kind represents the Kind of the target resources.
+    // +required
+    Kind string
+
+    // Namespace of the target resource.
+    // Default is empty, which means inherit from the parent object scope.
+    // +optional
+    Namespace string
+
+    // Name of the target resource.
+    // Default is empty, which means selecting all resources.
+    // +optional
+    Name string
+
+    // A label query over a set of resources.
+    // If name is not empty, labelSelector will be ignored.
+    // +optional
+    LabelSelector *metav1.LabelSelector
+}
+
+//revive:enable:exported
+```
+
+* Add two new field named `ForceRescheduling` to ResourceBinding/ClusterResourceBinding
+
+```go
+// ResourceBindingSpec represents the expectation of ResourceBinding.
+type ResourceBindingSpec struct {
+    ...
+	// RescheduleTriggeredAt is a timestamp representing when the referenced resource is triggered rescheduling.
+	// Only when this timestamp is later than timestamp in status.rescheduledAt will the rescheduling actually execute.
+	//
+	// It is represented in RFC3339 form (like '2006-01-02T15:04:05Z') and is in UTC.
+	// It is recommended to be populated by the REST handler of command.karmada.io/Reschedule API.
+	// +optional
+	RescheduleTriggeredAt metav1.Time `json:"rescheduleTriggeredAt,omitempty"`
+    ...
+}
+
+// ResourceBindingStatus represents the overall status of the strategy as well as the referenced resources.
+type ResourceBindingStatus struct {
+	...
+    // RescheduledAt is a timestamp representing scheduler finished a rescheduling.
+    // It is represented in RFC3339 form (like '2006-01-02T15:04:05Z') and is in UTC.
+    // +optional
+    RescheduledAt metav1.Time `json:"rescheduledAt,omitempty"`
+    ...
+}
+```
+
+### Example
+
+Assuming there is a Deployment named `nginx`, the user wants to trigger its rescheduling,
+he just needs to apply following yaml:
+
+
+```yaml
+apiVersion: command.karmada.io/v1alpha1
+kind: Reschedule
+metadata:
+  name: demo-command
+spec:
+  targetRefResource:
+    - apiVersion: apps/v1
+      kind: Deployment
+      name: demo-test-1
+      namespace: default
+  targetRefPolicy:
+    - name: default-pp
+      namespace: default
+```
+
+Then, he will get a `reschedule.command.karmada.io/demo-task created` result, which means the task started, attention,
+not finished. Simultaneously, he will see the new field `spec.placement.rescheduleTriggeredAt` in binding of the selected
+resource been set to current timestamp.
+
+```yaml
+apiVersion: work.karmada.io/v1alpha2
+kind: ResourceBinding
+metadata:
+  name: nginx-deployment
+  namespace: default
+spec:
+  rescheduleTriggeredAt: "2024-04-17T15:04:05Z"
+  ...
+```
+
+Then, rescheduling is in progress. If it succeeds, the `status.rescheduledAt` field of binding will be updated,
+which represents scheduler finished a rescheduling.; If it failed, scheduler will retry.
+
+```yaml
+apiVersion: work.karmada.io/v1alpha2
+kind: ResourceBinding
+metadata:
+  name: nginx-deployment
+  namespace: default
+spec:
+  rescheduleTriggeredAt: "2024-04-17T15:04:05Z"
+  ...
+status:
+  rescheduledAt: "2024-04-17T15:04:06Z"
+  conditions:
+    - ...
+    - lastTransitionTime: "2024-03-08T08:53:03Z"
+      message: Binding has been scheduled successfully.
+      reason: Success
+      status: "True"
+      type: Scheduled
+    - lastTransitionTime: "2024-03-08T08:53:03Z"
+      message: All works have been successfully applied
+      reason: FullyAppliedSuccess
+      status: "True"
+      type: FullyApplied
+```
+
+Finally, all works have been successfully applied, the user will observe changes in the actual distribution of resource 
+template; the user can also see several recorded event in resource template, just like:
+
+```shell
+$ kubectl --context karmada-apiserver describe deployment demo
+...
+Events:
+  Type    Reason                  Age                From                                Message
+  ----    ------                  ----               ----                                -------
+  ...
+  Normal  ScheduleBindingSucceed  31s                default-scheduler                   Binding has been scheduled successfully.
+  Normal  GetDependenciesSucceed  31s                dependencies-distributor            Get dependencies([]) succeed.
+  Normal  SyncSucceed             31s                execution-controller                Successfully applied resource(default/demo) to cluster member1
+  Normal  AggregateStatusSucceed  31s (x4 over 31s)  resource-binding-status-controller  Update resourceBinding(default/demo-deployment) with AggregatedStatus successfully.
+  Normal  SyncSucceed             31s                execution-controller                Successfully applied resource(default/demo1) to cluster member2
+```
+
+### Implementation logic
+
+1) add an aggregated-api into karmada-aggregated-apiserver, detail described as above.
+
+2) add an aggregated-api handler into karmada-aggregated-apiserver, which only impletement `Create` method. It will 
+fetch all referred resource declared in `targetRefResource` or indirectly declared by `targetRefPolicy`, and then set
+`spec.rescheduleTriggeredAt` field to current timestamp in corresponding ResourceBinding.
+
+> This api is no resource, not need to restore, no state, no idempotency, no implemention of `Update` or `Dalete` method.
+> This is also why we not choose CRD type API.
+
+3) in scheduling process, add a trigger condition: even if `Placement` and `Replicas` of binding unchanged, schedule will
+be triggerred if `spec.rescheduleTriggeredAt` is later than `status.rescheduledAt`. After schedule finished, scheduler 
+will update `status.rescheduledAt` when refreshing binding back.