Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to configure Cluster Agent Priority Classes And Pod Disruption Budgets on rke2 and k3s clusters #13068

Open
6 tasks
eva-vashkevich opened this issue Jan 10, 2025 · 1 comment
Assignees
Labels
area/clusterprovisioningv2 kind/enhancement QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this RFC Originated or linked to an RFC size/8 Size Estimate 8
Milestone

Comments

@eva-vashkevich
Copy link
Member

eva-vashkevich commented Jan 10, 2025

RFC: Cluster Agent Priority Classes And Pod Disruption Budgets
JIRA: SURE-9007, SURE-8174

This feature will only be supported for node-driver provisioned and custom rke2/k3s clusters.

Feature flag

This feature will be disabled by default and users can opt-in by enabling cluster-agent-scheduling-customization feature flag.

Global settings

Default configuration can be obtained and configured from the following global settings:
ClusterAgentDefaultPriorityClass for Priority Class:

data: {
        "preemptionPolicy": "PreemptLowerPriority",
         "value": 10000000
  }

ClusterAgentDefaultPodDisruptionBudget for Pod Disruption Budget, so the format of the global setting would look like

data: {
       "minAvailable": 1,
       "maxUnavailable": 0,
}

These fields should behave similarly to the rke-metadata-config field, and expose the partial JSON representation for both settings.

Priority Class value

  • Cannot have a value larger than 1 billion or smaller than negative 1 billion.

Priority Class preemptive behavior:

  • Must be a string value equal to 'PreemptLowerPriority' or 'Never'

Pod Disruption Budget values will have minAvailable and maxAvailable properties:

  • One, and only one, of these values must be equal to a non-zero at a given time. They can both be set to 0, in which case maxUnavailable takes precedence
  • They must either be a non-negative integer, a string representing a non-negative whole number integer, or a string containing a whole number percentage (i.e. “50%”, but not “50.5%”).
  • When updating the global setting, setting one field to a value of ‘0’ is equivalent to it being omitted from the resulting object.

This default configuration will be used during the initial provisioning of downstream clusters.

New fields during provisioning

We will need to add new elements to the cluster configuration page when this feature is enabled. They should expose properties configurable by global settings and will have the same constraints.
These properties would be found in spec.ClusterAgentDeploymentCustomization.SchedulingCustomization. Its value will be a new struct, titled AgentSchedulingCustomization. Example:

spec.ClusterAgentDeploymentCustomization.SchedulingCustomization :
{
        PriorityClass: {
                  "preemptionPolicy": "PreemptLowerPriority",
                  "value": 10000000
       },
       PodDisruptionBudget: {
                  "minAvailable": 1,
	          "maxUnavailable": 0,
      }
}

These values should be pre-populated using the global default settings when creating new v1.Clusters objects. When editing existing clusters without scheduling customization fields present, these options should be disabled or their values should be empty.

If a cluster has previously been configured to use a Priority Class or Pod Disruption Budget, then the new UI fields should be rendered even if the feature has been disabled, however they should only allow for the deletion of the configuration. Ideally, a tool tip or short explanation will be added to inform users that they may must enable the feature in order to modify the configuration further.

Please refer to the RFC linked in JIRA for more detail if needed.

Acceptance criteria:

  • Feature flag should be present and user should be able to enable and disable it.
  • Global setting should be present and user should be able to set them. Constraints should be enforced.
    UI fields:
  • If feature is enabled, user should be able to provision a cluster with new configuration. Default configuration should match global setting.
  • If feature is enabled, user should be able to edit configuration of an existing cluster.
  • If feature is not enabled, user should not be able to create a cluster with this configuration or add it to an existing cluster.
  • If feature is not enabled, user should be able to see this configuration and remove it, but they should not be able to modify it.
@eva-vashkevich eva-vashkevich added this to the v2.11.0 milestone Jan 10, 2025
@github-actions github-actions bot added the QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this label Jan 10, 2025
@richard-cox richard-cox added the RFC Originated or linked to an RFC label Jan 13, 2025
@gaktive gaktive added the size/8 Size Estimate 8 label Jan 14, 2025
@richard-cox
Copy link
Member

@eva-vashkevich Might have brought this up elsewhere, but might sure to review the description once / if the RFC is approved by the RATs team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterprovisioningv2 kind/enhancement QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this RFC Originated or linked to an RFC size/8 Size Estimate 8
Projects
None yet
Development

No branches or pull requests

3 participants