Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPUManager Support (allocate exclusive CPUs to containers) #4319

Open
infinitydon opened this issue Apr 21, 2024 · 14 comments
Open

CPUManager Support (allocate exclusive CPUs to containers) #4319

infinitydon opened this issue Apr 21, 2024 · 14 comments
Labels
bug Something isn't working
Milestone

Comments

@infinitydon
Copy link

infinitydon commented Apr 21, 2024

Is your feature request related to a problem? Please describe.

Currently trying to bring a single node with CPUManager static policy with the following k0s.yaml:

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  creationTimestamp: null
  name: k0s
spec:
  api:
    address: 192.168.100.63
    k0sApiPort: 9443
    port: 6443
    sans:
    - 192.168.100.63
    - fe80::be24:11ff:fe0c:7c87
  controllerManager: {}
  extensions:
    helm:
      charts: null
      concurrencyLevel: 5
      repositories: null
    storage:
      create_default_storage_class: false
      type: external_storage
  installConfig:
    users:
      etcdUser: etcd
      kineUser: kube-apiserver
      konnectivityUser: konnectivity-server
      kubeAPIserverUser: kube-apiserver
      kubeSchedulerUser: kube-scheduler
  konnectivity:
    adminPort: 8133
    agentPort: 8132
  network:
    calico: null
    clusterDomain: cluster.local
    dualStack: {}
    kubeProxy:
      iptables:
        minSyncPeriod: 0s
        syncPeriod: 0s
      ipvs:
        minSyncPeriod: 0s
        syncPeriod: 0s
        tcpFinTimeout: 0s
        tcpTimeout: 0s
        udpTimeout: 0s
      metricsBindAddress: 0.0.0.0:10249
      mode: iptables
    kuberouter:
      autoMTU: true
      hairpin: Enabled
      ipMasq: false
      metricsPort: 8080
      mtu: 0
      peerRouterASNs: ""
      peerRouterIPs: ""
    nodeLocalLoadBalancing:
      envoyProxy:
        apiServerBindPort: 7443
        konnectivityServerBindPort: 7132
      type: EnvoyProxy
    podCIDR: 10.244.0.0/16
    provider: kuberouter
    serviceCIDR: 10.96.0.0/12
  scheduler: {}
  storage:
    etcd:
      externalCluster: null
      peerAddress: 192.168.100.63
    type: etcd
  telemetry:
    enabled: true
  workerProfiles:
    - name: custom-cpu
      values:
        cpuManagerPolicy: static
        reservedSystemCPUs: "0-5"

I used this command for the installation: k0s install controller --profile custom-cpu --single -c /etc/k0s/k0s.yaml

But k0s is adding a conflicting parameter that won't allow the CPUManager policy to be applied, below is the eventual kubelet-config.yaml:

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous: {}
  webhook:
    cacheTTL: 0s
  x509:
    clientCAFile: /var/lib/k0s/pki/ca.crt
authorization:
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: unix:///run/k0s/containerd.sock
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 0s
eventRecordQPS: 0
evictionPressureTransitionPeriod: 0s
failSwapOn: false
fileCheckFrequency: 0s
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReservedCgroup: system.slice
kubeletCgroups: /system.slice/containerd.service
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
registerWithTaints:
- effect: NoSchedule
  key: node-role.kubernetes.io/master
reservedSystemCPUs: 0-5
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
serverTLSBootstrap: true
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
tlsMinVersion: VersionTLS12
volumePluginDir: /usr/libexec/k0s/kubelet-plugins/volume/exec

It is adding kubeReservedCgroup and kubeletCgroups and this seems to be hard-coded in the code:

grep -r -i kubeReservedCgroup
pkg/component/worker/kubelet.go:        KubeReservedCgroup string
pkg/component/worker/kubelet.go:                KubeReservedCgroup: "system.slice",
pkg/component/worker/kubelet.go:        preparedConfig.KubeReservedCgroup = kubeletConfigData.KubeReservedCgroup
grep -r -i kubeletCgroups
pkg/component/worker/kubelet.go:        KubeletCgroups     string
pkg/component/worker/kubelet.go:                KubeletCgroups:     "/system.slice/containerd.service",
pkg/component/worker/kubelet.go:        preparedConfig.KubeletCgroups = kubeletConfigData.KubeletCgroups

With this the kubelet daemon can not start up, gives the following error:

run.go:74] "command failed" err="failed to validate kubelet configuration, error: invalid configuration: can't use reservedSystemCPUs (--reserved-cpus) with systemReservedCgroup

Describe the solution you would like

Support cpuManagerPolicy and reservedSystemCPUs in the kubelet configuration

Describe alternatives you've considered

No response

Additional context

No response

@infinitydon infinitydon added the enhancement New feature or request label Apr 21, 2024
@jnummelin
Copy link
Member

This relates heavily on the same findings as in #4255 . Essentially we need to figure out better way to "default" cgroup settings without hardcoding anything like we do in some places currently

@infinitydon
Copy link
Author

Thanks @jnummelin - Should I keep this issue open or close it in favor of #4255 and use that to track this feature also?

@twz123
Copy link
Member

twz123 commented Apr 24, 2024

I think leaving this open is fair, as this is a real blocker, i.e. there's no way to use CPUManager with k0s right now, unfortunately.

(And I consider this a bug, since nobody expected CPUManager not to work with k0s.)

@twz123 twz123 added bug Something isn't working and removed enhancement New feature or request labels Apr 24, 2024
@twz123 twz123 added this to the 1.31 milestone Apr 24, 2024
@infinitydon
Copy link
Author

@twz123 - Noted, I will leave it open

@ianb-mp
Copy link
Contributor

ianb-mp commented May 8, 2024

I think leaving this open is fair, as this is a real blocker, i.e. there's no way to use CPUManager with k0s right now, unfortunately.

(And I consider this a bug, since nobody expected CPUManager not to work with k0s.)

To be clear, CPUManager can be used with k0s just not with reservedSystemCPUs. I installed k0s with arg --kubelet-extra-args='--cpu-manager-policy=static' and kubelet is running without error and I can see log entries for cpu_manager

@twz123
Copy link
Member

twz123 commented May 8, 2024

Ah, good to know. Still, the hard coded cgroup related settings in k0s are something that needs to be addressed somehow.

@maxkochubey
Copy link

maxkochubey commented May 14, 2024

In my case (k0sctl version: v0.17.5), --kubelet-extra-args='--cpu-manager-policy=static' was not enough, I had also set the resources reservation parameters:

installFlags:
- --debug
- --disable-components=konnectivity-server,metrics-server
- --kubelet-extra-args='--cpu-manager-policy=static --kube-reserved=cpu=500m,memory=1Gi --kube-reserved-cgroup=system.slice --kubelet-cgroups=/system.slice/containerd.service'

@ianb-mp
Copy link
Contributor

ianb-mp commented May 14, 2024

I had also set the resources reservation parameters:

Correct! I'm also specifying those (I should have mentioned that my previous comment).

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Jun 14, 2024
@twz123 twz123 removed the Stale label Jun 15, 2024
@turdusmerula
Copy link

If you look at the issue #4234 I found a hack to allow override kubelet parameters. The default kubelet-config.yaml overrides some parameters even if you try to pass them directly as extra args. However you can build your own kubelet-config.yaml file and pass it to kubelet with --kubelet-extra-args=--config=/var/lib/k0s/kubelet-ext-config.yaml.

But you'll then be facing another problem I have not yet solved. Putting limits to a cgroup and running k0s inside this cgroup works but kubelet will still use the system limits and the eviction mechanism does not work as expected.

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Jul 29, 2024
@twz123 twz123 removed the Stale label Jul 30, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

Copy link
Contributor

github-actions bot commented Oct 2, 2024

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Oct 2, 2024
@twz123 twz123 removed the Stale label Oct 8, 2024
Copy link
Contributor

github-actions bot commented Nov 7, 2024

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Nov 7, 2024
@twz123 twz123 removed the Stale label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants