Liveness probe failing for the vsphere-csi-node pod #7457

ShylajaDevadiga · 2024-12-30T19:31:54Z

Environmental Info:
RKE2 Version:
rke2 version v1.32.0-rc2+rke2r1

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble

Cluster Configuration:
Single node

Describe the bug:
Liveness probe failing for the vsphere-csi-node pod.
Back-off restarting node-driver-registrar and vsphere-csi-node containers.

Note: This issue is not seen in v1.31.4+rke2r1
Steps To Reproduce:
config.yaml

write-kubeconfig-mode: 644
cloud-provider-name: "rancher-vsphere"

Copy the

sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2/config.yaml

Copy vsphere values

sudo mkdir -p /var/lib/rancher/rke2/server/manifests && sudo cp vsphere-values.yaml /var/lib/rancher/rke2/server/manifests/vsphere-values.yaml

Install rke2

Expected behavior:
csi and cpi pods should be running

Actual behavior:
vsphere-csi-node pod is in crashloopbackoff state

Additional context / logs:

$ kubectl describe pod  vsphere-csi-node-l7ds9 -n kube-system 
Name:             vsphere-csi-node-l7ds9
Namespace:        kube-system
Priority:         0
Service Account:  vsphere-csi-node
Node:             sdevadiga-csi-test/10.124.139.186
Start Time:       Mon, 30 Dec 2024 19:15:15 +0000
Labels:           app=vsphere-csi-node
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/version=3.3.1-rancher8
                  controller-revision-hash=f645cccfb
                  helm.sh/chart=rancher-vsphere-csi-3.3.1-rancher800
                  pod-template-generation=4
                  role=vsphere-csi
Annotations:      <none>
Status:           Running
IP:               10.124.139.186
IPs:
  IP:           10.124.139.186
Controlled By:  DaemonSet/vsphere-csi-node
Containers:
  node-driver-registrar:
    Container ID:  containerd://cc5092467e87148265ae4fd43aa2f9ff7c4022bfee43174ed207df895d475ba6
    Image:         rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0
    Image ID:      docker.io/rancher/mirrored-sig-storage-csi-node-driver-registrar@sha256:e0bc3089217d78c7811281e8db2ee84b889e2f1e3cbb50bc79cdfa1a8e44c9ec
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 30 Dec 2024 19:20:52 +0000
      Finished:     Mon, 30 Dec 2024 19:21:22 +0000
    Ready:          False
    Restart Count:  5
    Liveness:       exec [/csi-node-driver-registrar --kubelet-registration-path=/var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock --mode=kubelet-registration-probe] delay=3s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADDRESS:               /csi/csi.sock
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock
    Mounts:
      /csi from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-99bdw (ro)
  vsphere-csi-node:
    Container ID:  containerd://ed4f6cd5be64f142970e60ae2f441044f61f2311c7c4916bcf969a92caa2a5eb
    Image:         rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v3.3.1
    Image ID:      docker.io/rancher/mirrored-cloud-provider-vsphere-csi-release-driver@sha256:adb820cc2e0abe5f89aaa648c9e9bc8f4adab282fe706fa371f10e3476a65bfd
    Port:          9808/TCP
    Host Port:     9808/TCP
    Args:
      --fss-name=internal-feature-states.csi.vsphere.vmware.com
      --fss-namespace=$(CSI_NAMESPACE)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 30 Dec 2024 19:22:34 +0000
      Finished:     Mon, 30 Dec 2024 19:22:56 +0000
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:healthz/healthz delay=10s timeout=5s period=5s #success=1 #failure=3
    Environment:
      NODE_NAME:                           (v1:spec.nodeName)
      CSI_ENDPOINT:                       unix:///csi/csi.sock]
      MAX_VOLUMES_PER_NODE:               59
      X_CSI_MODE:                         node
      X_CSI_SPEC_REQ_VALIDATION:          false
      X_CSI_SPEC_DISABLE_LEN_CHECK:       true
      LOGGER_LEVEL:                       PRODUCTION
      CSI_NAMESPACE:                      kube-system (v1:metadata.namespace)
      NODEGETINFO_WATCH_TIMEOUT_MINUTES:  1
    Mounts:
      /csi from plugin-dir (rw)
      /dev from device-dir (rw)
      /sys/block from blocks-dir (rw)
      /sys/devices from sys-devices-dir (rw)
      /var/lib/kubelet from pods-mount-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-99bdw (ro)
  liveness-probe:
    Container ID:  containerd://a49e1b9948de0c8578e97aff98e9f53a2c7c29073921bbc6e6a3b7ddfaadc8cb
    Image:         rancher/mirrored-sig-storage-livenessprobe:v2.14.0
    Image ID:      docker.io/rancher/mirrored-sig-storage-livenessprobe@sha256:1df5b4f69c87ab95088db81d495c03a9ea06867dee8c77c16937d252558aaf5f
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=4
      --csi-address=/csi/csi.sock
    State:          Running
      Started:      Mon, 30 Dec 2024 19:15:17 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /csi from plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-99bdw (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry
    HostPathType:  Directory
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi.vsphere.vmware.com
    HostPathType:  DirectoryOrCreate
  pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:  Directory
  device-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  blocks-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/block
    HostPathType:  Directory
  sys-devices-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/devices
    HostPathType:  Directory
  kube-api-access-99bdw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/controlplane=true:NoSchedule
                             node-role.kubernetes.io/etcd:NoExecute op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  8m                      default-scheduler  Successfully assigned kube-system/vsphere-csi-node-l7ds9 to sdevadiga-csi-test
  Normal   Pulling    8m                      kubelet            Pulling image "rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0"
  Normal   Pulled     7m59s                   kubelet            Container image "rancher/mirrored-sig-storage-livenessprobe:v2.14.0" already present on machine
  Normal   Pulled     7m59s                   kubelet            Successfully pulled image "rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0" in 1.249s (1.249s including waiting). Image size: 14038309 bytes.
  Normal   Started    7m59s                   kubelet            Started container liveness-probe
  Normal   Created    7m59s                   kubelet            Created container: liveness-probe
  Normal   Started    7m28s (x2 over 7m59s)   kubelet            Started container node-driver-registrar
  Normal   Created    7m28s (x2 over 7m59s)   kubelet            Created container: node-driver-registrar
  Normal   Started    6m55s (x4 over 7m59s)   kubelet            Started container vsphere-csi-node
  Normal   Created    6m55s (x4 over 7m59s)   kubelet            Created container: vsphere-csi-node
  Normal   Pulled     6m55s (x4 over 7m59s)   kubelet            Container image "rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v3.3.1" already present on machine
  Normal   Killing    6m55s (x3 over 7m35s)   kubelet            Container vsphere-csi-node failed liveness probe, will be restarted
  Warning  Unhealthy  6m15s (x15 over 7m45s)  kubelet            Liveness probe failed: Get "http://10.124.139.186:9808/healthz": dial tcp 10.124.139.186:9808: connect: connection refused
  Warning  BackOff    6m (x4 over 6m15s)      kubelet            Back-off restarting failed container vsphere-csi-node in pod vsphere-csi-node-l7ds9_kube-system(f3188d40-6691-41a1-bfcd-c48d7a273b6f)
  Warning  BackOff    2m40s (x19 over 6m57s)  kubelet            Back-off restarting failed container node-driver-registrar in pod vsphere-csi-node-l7ds9_kube-system(f3188d40-6691-41a1-bfcd-c48d7a273b6f)
  Normal   Pulled     2m25s (x5 over 7m28s)   kubelet            Container image "rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0" already present on machine

$ kubectl logs  vsphere-csi-node-l7ds9 -n kube-system 
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I1230 19:24:10.035668       1 main.go:150] "Version" version="v1.12.0"
I1230 19:24:10.035724       1 main.go:151] "Running node-driver-registrar" mode=""
I1230 19:24:10.035729       1 main.go:172] "Attempting to open a gRPC connection" csiAddress="/csi/csi.sock"
I1230 19:24:10.035742       1 connection.go:234] "Connecting" address="unix:///csi/csi.sock"
I1230 19:24:20.035866       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I1230 19:24:30.036754       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I1230 19:24:40.036769       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
E1230 19:24:40.036914       1 main.go:176] "Error connecting to CSI driver" err="context deadline exceeded"

The text was updated successfully, but these errors were encountered:

ShylajaDevadiga · 2024-12-30T20:32:21Z

CSI doesn't actually support 1.32 at this time. Will hold on validation until upstream puts out a new release.
rancher/vsphere-charts#98

ShylajaDevadiga added the kind/bug Something isn't working label Dec 30, 2024

ShylajaDevadiga added this to the v1.32.0+rke2r1 milestone Dec 30, 2024

caroline-suse-rancher removed this from the v1.32.0+rke2r1 milestone Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liveness probe failing for the vsphere-csi-node pod #7457

Liveness probe failing for the vsphere-csi-node pod #7457

ShylajaDevadiga commented Dec 30, 2024

ShylajaDevadiga commented Dec 30, 2024

Liveness probe failing for the vsphere-csi-node pod #7457

Liveness probe failing for the vsphere-csi-node pod #7457

Comments

ShylajaDevadiga commented Dec 30, 2024

ShylajaDevadiga commented Dec 30, 2024