Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness probe failing for the vsphere-csi-node pod #7457

Open
ShylajaDevadiga opened this issue Dec 30, 2024 · 1 comment
Open

Liveness probe failing for the vsphere-csi-node pod #7457

ShylajaDevadiga opened this issue Dec 30, 2024 · 1 comment
Labels
kind/bug Something isn't working

Comments

@ShylajaDevadiga
Copy link
Contributor

Environmental Info:
RKE2 Version:
rke2 version v1.32.0-rc2+rke2r1

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble

Cluster Configuration:
Single node

Describe the bug:
Liveness probe failing for the vsphere-csi-node pod.
Back-off restarting node-driver-registrar and vsphere-csi-node containers.

Note: This issue is not seen in v1.31.4+rke2r1
Steps To Reproduce:
config.yaml

write-kubeconfig-mode: 644
cloud-provider-name: "rancher-vsphere"
  1. Copy the
sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2/config.yaml
  1. Copy vsphere values
sudo mkdir -p /var/lib/rancher/rke2/server/manifests && sudo cp vsphere-values.yaml /var/lib/rancher/rke2/server/manifests/vsphere-values.yaml
  1. Install rke2

Expected behavior:
csi and cpi pods should be running

Actual behavior:
vsphere-csi-node pod is in crashloopbackoff state

Additional context / logs:

$ kubectl describe pod  vsphere-csi-node-l7ds9 -n kube-system 
Name:             vsphere-csi-node-l7ds9
Namespace:        kube-system
Priority:         0
Service Account:  vsphere-csi-node
Node:             sdevadiga-csi-test/10.124.139.186
Start Time:       Mon, 30 Dec 2024 19:15:15 +0000
Labels:           app=vsphere-csi-node
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/version=3.3.1-rancher8
                  controller-revision-hash=f645cccfb
                  helm.sh/chart=rancher-vsphere-csi-3.3.1-rancher800
                  pod-template-generation=4
                  role=vsphere-csi
Annotations:      <none>
Status:           Running
IP:               10.124.139.186
IPs:
  IP:           10.124.139.186
Controlled By:  DaemonSet/vsphere-csi-node
Containers:
  node-driver-registrar:
    Container ID:  containerd://cc5092467e87148265ae4fd43aa2f9ff7c4022bfee43174ed207df895d475ba6
    Image:         rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0
    Image ID:      docker.io/rancher/mirrored-sig-storage-csi-node-driver-registrar@sha256:e0bc3089217d78c7811281e8db2ee84b889e2f1e3cbb50bc79cdfa1a8e44c9ec
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 30 Dec 2024 19:20:52 +0000
      Finished:     Mon, 30 Dec 2024 19:21:22 +0000
    Ready:          False
    Restart Count:  5
    Liveness:       exec [/csi-node-driver-registrar --kubelet-registration-path=/var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock --mode=kubelet-registration-probe] delay=3s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ADDRESS:               /csi/csi.sock
      DRIVER_REG_SOCK_PATH:  /var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock
    Mounts:
      /csi from plugin-dir (rw)
      /registration from registration-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-99bdw (ro)
  vsphere-csi-node:
    Container ID:  containerd://ed4f6cd5be64f142970e60ae2f441044f61f2311c7c4916bcf969a92caa2a5eb
    Image:         rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v3.3.1
    Image ID:      docker.io/rancher/mirrored-cloud-provider-vsphere-csi-release-driver@sha256:adb820cc2e0abe5f89aaa648c9e9bc8f4adab282fe706fa371f10e3476a65bfd
    Port:          9808/TCP
    Host Port:     9808/TCP
    Args:
      --fss-name=internal-feature-states.csi.vsphere.vmware.com
      --fss-namespace=$(CSI_NAMESPACE)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 30 Dec 2024 19:22:34 +0000
      Finished:     Mon, 30 Dec 2024 19:22:56 +0000
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:healthz/healthz delay=10s timeout=5s period=5s #success=1 #failure=3
    Environment:
      NODE_NAME:                           (v1:spec.nodeName)
      CSI_ENDPOINT:                       unix:///csi/csi.sock]
      MAX_VOLUMES_PER_NODE:               59
      X_CSI_MODE:                         node
      X_CSI_SPEC_REQ_VALIDATION:          false
      X_CSI_SPEC_DISABLE_LEN_CHECK:       true
      LOGGER_LEVEL:                       PRODUCTION
      CSI_NAMESPACE:                      kube-system (v1:metadata.namespace)
      NODEGETINFO_WATCH_TIMEOUT_MINUTES:  1
    Mounts:
      /csi from plugin-dir (rw)
      /dev from device-dir (rw)
      /sys/block from blocks-dir (rw)
      /sys/devices from sys-devices-dir (rw)
      /var/lib/kubelet from pods-mount-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-99bdw (ro)
  liveness-probe:
    Container ID:  containerd://a49e1b9948de0c8578e97aff98e9f53a2c7c29073921bbc6e6a3b7ddfaadc8cb
    Image:         rancher/mirrored-sig-storage-livenessprobe:v2.14.0
    Image ID:      docker.io/rancher/mirrored-sig-storage-livenessprobe@sha256:1df5b4f69c87ab95088db81d495c03a9ea06867dee8c77c16937d252558aaf5f
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=4
      --csi-address=/csi/csi.sock
    State:          Running
      Started:      Mon, 30 Dec 2024 19:15:17 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /csi from plugin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-99bdw (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  registration-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins_registry
    HostPathType:  Directory
  plugin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/plugins/csi.vsphere.vmware.com
    HostPathType:  DirectoryOrCreate
  pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:  Directory
  device-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  blocks-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/block
    HostPathType:  Directory
  sys-devices-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/devices
    HostPathType:  Directory
  kube-api-access-99bdw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/controlplane=true:NoSchedule
                             node-role.kubernetes.io/etcd:NoExecute op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  8m                      default-scheduler  Successfully assigned kube-system/vsphere-csi-node-l7ds9 to sdevadiga-csi-test
  Normal   Pulling    8m                      kubelet            Pulling image "rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0"
  Normal   Pulled     7m59s                   kubelet            Container image "rancher/mirrored-sig-storage-livenessprobe:v2.14.0" already present on machine
  Normal   Pulled     7m59s                   kubelet            Successfully pulled image "rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0" in 1.249s (1.249s including waiting). Image size: 14038309 bytes.
  Normal   Started    7m59s                   kubelet            Started container liveness-probe
  Normal   Created    7m59s                   kubelet            Created container: liveness-probe
  Normal   Started    7m28s (x2 over 7m59s)   kubelet            Started container node-driver-registrar
  Normal   Created    7m28s (x2 over 7m59s)   kubelet            Created container: node-driver-registrar
  Normal   Started    6m55s (x4 over 7m59s)   kubelet            Started container vsphere-csi-node
  Normal   Created    6m55s (x4 over 7m59s)   kubelet            Created container: vsphere-csi-node
  Normal   Pulled     6m55s (x4 over 7m59s)   kubelet            Container image "rancher/mirrored-cloud-provider-vsphere-csi-release-driver:v3.3.1" already present on machine
  Normal   Killing    6m55s (x3 over 7m35s)   kubelet            Container vsphere-csi-node failed liveness probe, will be restarted
  Warning  Unhealthy  6m15s (x15 over 7m45s)  kubelet            Liveness probe failed: Get "http://10.124.139.186:9808/healthz": dial tcp 10.124.139.186:9808: connect: connection refused
  Warning  BackOff    6m (x4 over 6m15s)      kubelet            Back-off restarting failed container vsphere-csi-node in pod vsphere-csi-node-l7ds9_kube-system(f3188d40-6691-41a1-bfcd-c48d7a273b6f)
  Warning  BackOff    2m40s (x19 over 6m57s)  kubelet            Back-off restarting failed container node-driver-registrar in pod vsphere-csi-node-l7ds9_kube-system(f3188d40-6691-41a1-bfcd-c48d7a273b6f)
  Normal   Pulled     2m25s (x5 over 7m28s)   kubelet            Container image "rancher/mirrored-sig-storage-csi-node-driver-registrar:v2.12.0" already present on machine
$ kubectl logs  vsphere-csi-node-l7ds9 -n kube-system 
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I1230 19:24:10.035668       1 main.go:150] "Version" version="v1.12.0"
I1230 19:24:10.035724       1 main.go:151] "Running node-driver-registrar" mode=""
I1230 19:24:10.035729       1 main.go:172] "Attempting to open a gRPC connection" csiAddress="/csi/csi.sock"
I1230 19:24:10.035742       1 connection.go:234] "Connecting" address="unix:///csi/csi.sock"
I1230 19:24:20.035866       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I1230 19:24:30.036754       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
I1230 19:24:40.036769       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
E1230 19:24:40.036914       1 main.go:176] "Error connecting to CSI driver" err="context deadline exceeded"
@ShylajaDevadiga ShylajaDevadiga added the kind/bug Something isn't working label Dec 30, 2024
@ShylajaDevadiga ShylajaDevadiga added this to the v1.32.0+rke2r1 milestone Dec 30, 2024
@ShylajaDevadiga
Copy link
Contributor Author

CSI doesn't actually support 1.32 at this time. Will hold on validation until upstream puts out a new release.
rancher/vsphere-charts#98

@caroline-suse-rancher caroline-suse-rancher removed this from the v1.32.0+rke2r1 milestone Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants