Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1421: Make individual NodeFit predicates configurable #1421

Open
ingvagabund opened this issue May 29, 2024 · 3 comments
Open

KEP-1421: Make individual NodeFit predicates configurable #1421

ingvagabund opened this issue May 29, 2024 · 3 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@ingvagabund
Copy link
Contributor

ingvagabund commented May 29, 2024

Is your feature request related to a problem? Please describe.

The NodeFit predicate was introduced to allow the descheduler to make better decisions about evictions to avoid cases where there's no feasible node for re-scheduling after a pod gets evicted. To enable the predicate, DefaultEvictor plugin provides an optional nodeFit option that each plugin can utilize. The list of existing checks has been extended over time in good faith to improve the eviction decisions. The NodeFit predicate currently consists of the following checks:

  • a pod matches a node selector
  • a pod tolerates taints
  • a pod fits resource requests
  • a node is unschedulable
  • a pod matches inter pod anti-affinity

Some plugins adopted the NodeFit predicate natively through invocation of additional PodFitsAnyOtherNode, PodFitsAnyNode and PodFitsCurrentNode predicates built on top of NodeFit. Nevertheless, there are cases where it's more preferable to check only a subset of existing checks or disable the checks completely. Which is problematic for such plugins where it's impossible to fully disable the checks.

User stories

  • Plugins like RemovePodsViolatingNodeAffinity or RemovePodsViolatingNodeTaints
    have subset of NodeFit checks enabled natively. These checks can not be disabled
    without disabling the corresponding plugin. Instead, as an administrator
    I'd like to disable specific checks like "a pod fits resource requests" to get
    as close as possible to disabling all NodeFit checks. So I can evict and detect
    pending pods and allow cluster autoscalers or other tools to reconcile the situation.
  • As an administrator I'd like to configure PodLifetime plugin to check
    there are nodes with sufficient resources that can accept any evicted pod
    even though pod node selector does not match any node. So when there are
    too many pending pods due to node label mismatch my automation can label
    existing nodes and allocate more resources or have the multi-cluster scheduler
    reschedule my workload to a different cluster.
  • As an administrator I'd like to make sure RemovePodsViolatingInterPodAntiAffinity
    plugin evicts pods even though there's currently no node with sufficient resources
    while respecting node affinities and taints. So the cluster autoscaler
    can scale up new nodes when too many Pending pods are observed.
  • As an administrator I'd like to run a different scheduler than the default one.
    For that I might need to disable some of the existing NodeFit checks that
    are no longer valid or might collide with how the non-default scheduler works.
  • As an AI/ML infrastructure administrator I'd like to extend available NodeFit
    predicates with GPU oriented checks and enable them only for specific (custom)
    plugins/workload.
  • As a plugin developer I'd like to specify a list of NodeFit checks that need
    to be disabled. Checks that either produce suboptional evictions or
    are re-implemented by a given plugin.

Describe the solution you'd like
Allow to enable/disable individual checks the NodeFit predicate consists of.

Describe alternatives you've considered
TBD through a proposal.

What version of descheduler are you using?

descheduler version: 0.30.z

Additional context

@ingvagabund ingvagabund added the kind/feature Categorizes issue or PR as related to a new feature. label May 29, 2024
@ingvagabund ingvagabund changed the title KEP-NNNN: KEP-1421: Make individual NodeFit predicates configurable May 29, 2024
@ingvagabund ingvagabund self-assigned this May 29, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 27, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 26, 2024
@ingvagabund ingvagabund added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Sep 26, 2024
@googs1025
Copy link
Member

/cc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

4 participants