Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: csi loses track of pv/pvc after kubernetes upgrade #653

Closed
rajbaratht opened this issue Feb 8, 2023 · 11 comments
Closed

[BUG]: csi loses track of pv/pvc after kubernetes upgrade #653

rajbaratht opened this issue Feb 8, 2023 · 11 comments
Assignees
Labels
area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity type/bug Something isn't working. This is the default label associated with a bug issue.

Comments

@rajbaratht
Copy link

Bug Description

We recently upgraded kubernetes from 1.22.x to 1.24.x and we noticed in the events of the pod that it was complaining the pv was unable to mount as it was mounted elsewhere. However, the pod was up and running and I could see the pv was mounted in the pod.

Events:
  Type     Reason       Age                    From     Message
  ----     ------       ----                   ----     -------
  Warning  FailedMount  53m (x94 over 24h)     kubelet  Unable to attach or mount volumes: unmounted volumes=[elasticsearch-master], unattached volumes=[elasticsearch-mast
er elastic-certificates esconfig kube-api-access-96tgx]: timed out waiting for the condition
  Warning  FailedMount  9m42s (x957 over 24h)  kubelet  (combined from similar events): MountVolume.MountDevice failed for volume "dev2-p2-6eb2722f70" : rpc error: code = 
Internal desc =  runid=889223 device already in use and mounted elsewhere. Cannot do private mount
  Warning  FailedMount  3m48s (x72 over 24h)   kubelet  Unable to attach or mount volumes: unmounted volumes=[elasticsearch-master], unattached volumes=[elastic-certificat
es esconfig kube-api-access-96tgx elasticsearch-master]: timed out waiting for the condition
 kgp
NAME                                                     READY   STATUS    RESTARTS        AGE
elasticsearch-master-0                                   1/1     Running   0               150d
elasticsearch-master-1                                   1/1     Running   0               150d
elasticsearch-master-2                                   1/1     Running   0               150d

Logs

I0203 19:03:09.525732       1 csi_handler.go:751] Can't get nodeID from CSINode d2-cont-wkr6: csinode.storage.k8s.io "d2-cont-wkr6" not found
I0203 19:03:09.525747       1 connection.go:183] GRPC call: /csi.v1.Controller/ControllerUnpublishVolume
I0203 19:03:09.525750       1 connection.go:184] GRPC request: {"node_id":"d2-cont-wkr6,d2-cont-wkr6","volume_id":"dev2-p2-fd48dd5d69-iSCSI-apm00202405493-sv_458"}
I0203 19:03:09.570763       1 connection.go:186] GRPC response: {}
I0203 19:03:09.570794       1 connection.go:187] GRPC error: rpc error: code = NotFound desc =  runid=215 Find Host Failed unable to find host
I0203 19:03:09.570808       1 csi_handler.go:607] Saving detach error to "csi-72e1a9ada51d204e024cbd82911f77bf794d6e96e8aca06887087332302ef70a"
I0203 19:03:09.579845       1 connection.go:186] GRPC response: {}
I0203 19:03:09.579885       1 connection.go:187] GRPC error: rpc error: code = NotFound desc =  runid=216 Find Host Failed unable to find host
I0203 19:03:09.579896       1 csi_handler.go:607] Saving detach error to "csi-7d57b831f561bad271ddd1b008f2f03ec3ceec5f23a5ae204175d7ff41bb274c"
I0203 19:03:09.586158       1 csi_handler.go:618] Saved detach error to "csi-72e1a9ada51d204e024cbd82911f77bf794d6e96e8aca06887087332302ef70a"
I0203 19:03:09.586187       1 csi_handler.go:234] Error processing "csi-72e1a9ada51d204e024cbd82911f77bf794d6e96e8aca06887087332302ef70a": failed to detach: rpc error: code = NotFound desc =  runid=215 Find Host Failed unable to find host
I0203 19:03:09.586196       1 controller.go:167] Ignoring VolumeAttachment "csi-72e1a9ada51d204e024cbd82911f77bf794d6e96e8aca06887087332302ef70a" change
I0203 19:03:09.586906       1 csi_handler.go:618] Saved detach error to "csi-7d57b831f561bad271ddd1b008f2f03ec3ceec5f23a5ae204175d7ff41bb274c"
I0203 19:03:09.586925       1 csi_handler.go:234] Error processing "csi-7d57b831f561bad271ddd1b008f2f03ec3ceec5f23a5ae204175d7ff41bb274c": failed to detach: rpc error: code = NotFound desc =  runid=216 Find Host Failed unable to find host
I0203 19:03:09.589130       1 controller.go:167] Ignoring VolumeAttachment "csi-7d57b831f561bad271ddd1b008f2f03ec3ceec5f23a5ae204175d7ff41bb274c" change
I0203 19:03:09.854661       1 connection.go:186] GRPC response: {}
I0203 19:03:09.854704       1 connection.go:187] GRPC error: rpc error: code = NotFound desc =  runid=217 Find Host Failed unable to find host
I0203 19:03:09.854716       1 csi_handler.go:607] Saving detach error to "csi-15f95304025292f4874ff684e6c263941291a823cdd6e5186232ac42409cd812"
I0203 19:03:09.861916       1 csi_handler.go:618] Saved detach error to "csi-15f95304025292f4874ff684e6c263941291a823cdd6e5186232ac42409cd812"
I0203 19:03:09.861949       1 csi_handler.go:234] Error processing "csi-15f95304025292f4874ff684e6c263941291a823cdd6e5186232ac42409cd812": failed to detach: rpc error: code = NotFound desc =  runid=217 Find Host Failed unable to find host
I0203 19:03:09.862204       1 controller.go:167] Ignoring VolumeAttachment "csi-15f95304025292f4874ff684e6c263941291a823cdd6e5186232ac42409cd812" change
I0203 19:03:09.963513       1 connection.go:186] GRPC response: {}
I0203 19:03:09.963548       1 connection.go:187] GRPC error: rpc error: code = NotFound desc =  runid=218 Find Host Failed unable to find host
I0203 19:03:09.963560       1 csi_handler.go:607] Saving detach error to "csi-a7f28b4dd6da91414bd420040a78aa47138a8a11c15026b9209506e410d60cca"
I0203 19:03:09.969535       1 csi_handler.go:618] Saved detach error to "csi-a7f28b4dd6da91414bd420040a78aa47138a8a11c15026b9209506e410d60cca"
I0203 19:03:09.969560       1 csi_handler.go:234] Error processing "csi-a7f28b4dd6da91414bd420040a78aa47138a8a11c15026b9209506e410d60cca": failed to detach: rpc error: code = NotFound desc =  runid=218 Find Host Failed unable to find host
I0203 19:03:09.969605       1 controller.go:167] Ignoring VolumeAttachment "csi-a7f28b4dd6da91414bd420040a78aa47138a8a11c15026b9209506e410d60cca" change
I0203 19:03:11.077575       1 leaderelection.go:278] successfully renewed lease unity/external-attacher-leader-csi-unity-dellemc-com
I0203 19:03:16.090005       1 leaderelection.go:278] successfully renewed lease unity/external-attacher-leader-csi-unity-dellemc-com

Screenshots

No response

Additional Environment Information

kg pv
NAME                         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                              STORAGECLASS   REASON   AGE
dev2-p2-2f18e76e60           1Gi        RWO            Delete           Bound    dapr-system/raft-log-dapr-placement-server-0       unity-iscsi             321d
dev2-p2-31a62a3fff           30Gi       RWO            Delete           Bound    data/elasticsearch-master-elasticsearch-master-1   unity-iscsi             319d
dev2-p2-367f656b8c           8Gi        RWO            Delete           Bound    data/redis-data-redis-replicas-2                   unity-iscsi             321d
dev2-p2-3aed428393           8Gi        RWO            Delete           Bound    data/data-rabbitmq-1                               unity-iscsi             155d
dev2-p2-5990bdfa8c           1Gi        RWO            Delete           Bound    dapr-system/raft-log-dapr-placement-server-1       unity-iscsi             321d
dev2-p2-5eb875ec05           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-rs1-1       unity-iscsi             229d
dev2-p2-6070947843           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-rs1-0       unity-iscsi             229d
dev2-p2-6ddd766c95           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-cfg-0       unity-iscsi             229d
dev2-p2-6eb2722f70           30Gi       RWO            Delete           Bound    data/elasticsearch-master-elasticsearch-master-0   unity-iscsi             319d
dev2-p2-79ec95e9a3           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-cfg-2       unity-iscsi             229d
dev2-p2-8d3c4ec9c9           8Gi        RWO            Delete           Bound    data/redis-data-redis-master-0                     unity-iscsi             321d
dev2-p2-8f229208fd           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-cfg-1       unity-iscsi             229d
dev2-p2-932a7f64a4           1Gi        RWO            Delete           Bound    dapr-system/raft-log-dapr-placement-server-2       unity-iscsi             321d
dev2-p2-945d7d68f0           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-rs0-1       unity-iscsi             229d
dev2-p2-a040514693           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-rs0-2       unity-iscsi             229d
dev2-p2-c521d6ee82           8Gi        RWO            Delete           Bound    data/redis-data-redis-replicas-0                   unity-iscsi             321d
dev2-p2-e34e0056ea           8Gi        RWO            Delete           Bound    data/data-rabbitmq-2                               unity-iscsi             155d
dev2-p2-e9350a81b3           8Gi        RWO            Delete           Bound    data/data-rabbitmq-0                               unity-iscsi             155d
dev2-p2-efe6ca9547           8Gi        RWO            Delete           Bound    data/redis-data-redis-replicas-1                   unity-iscsi             321d
dev2-p2-fc701c2e79           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-rs0-0       unity-iscsi             229d
dev2-p2-fcfa11737c           10Gi       RWO            Delete           Bound    data/mongod-data-percona-mongo-psmdb-d-rs1-2       unity-iscsi             229d
dev2-p2-fd48dd5d69           30Gi       RWO            Delete           Bound    data/elasticsearch-master-elasticsearch-master-2   unity-iscsi             319d
dev2-static-squirt-website   350Gi      RWX            Retain           Bound    backend/dev2-static-squirt-website                 smb                     175d
kg pvc -A
NAMESPACE     NAME                                          STATUS   VOLUME                       CAPACITY   ACCESS MODES   STORAGECLASS   AGE
backend       dev2-static-squirt-website                    Bound    dev2-static-squirt-website   350Gi      RWX            smb            175d
dapr-system   raft-log-dapr-placement-server-0              Bound    dev2-p2-2f18e76e60           1Gi        RWO            unity-iscsi    321d
dapr-system   raft-log-dapr-placement-server-1              Bound    dev2-p2-5990bdfa8c           1Gi        RWO            unity-iscsi    321d
dapr-system   raft-log-dapr-placement-server-2              Bound    dev2-p2-932a7f64a4           1Gi        RWO            unity-iscsi    321d
data          data-rabbitmq-0                               Bound    dev2-p2-e9350a81b3           8Gi        RWO            unity-iscsi    155d
data          data-rabbitmq-1                               Bound    dev2-p2-3aed428393           8Gi        RWO            unity-iscsi    155d
data          data-rabbitmq-2                               Bound    dev2-p2-e34e0056ea           8Gi        RWO            unity-iscsi    155d
data          elasticsearch-master-elasticsearch-master-0   Bound    dev2-p2-6eb2722f70           30Gi       RWO            unity-iscsi    319d
data          elasticsearch-master-elasticsearch-master-1   Bound    dev2-p2-31a62a3fff           30Gi       RWO            unity-iscsi    319d
data          elasticsearch-master-elasticsearch-master-2   Bound    dev2-p2-fd48dd5d69           30Gi       RWO            unity-iscsi    319d
data          mongod-data-percona-mongo-psmdb-d-cfg-0       Bound    dev2-p2-6ddd766c95           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-cfg-1       Bound    dev2-p2-8f229208fd           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-cfg-2       Bound    dev2-p2-79ec95e9a3           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-rs0-0       Bound    dev2-p2-fc701c2e79           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-rs0-1       Bound    dev2-p2-945d7d68f0           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-rs0-2       Bound    dev2-p2-a040514693           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-rs1-0       Bound    dev2-p2-6070947843           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-rs1-1       Bound    dev2-p2-5eb875ec05           10Gi       RWO            unity-iscsi    229d
data          mongod-data-percona-mongo-psmdb-d-rs1-2       Bound    dev2-p2-fcfa11737c           10Gi       RWO            unity-iscsi    229d
data          redis-data-redis-master-0                     Bound    dev2-p2-8d3c4ec9c9           8Gi        RWO            unity-iscsi    321d
data          redis-data-redis-replicas-0                   Bound    dev2-p2-c521d6ee82           8Gi        RWO            unity-iscsi    321d
data          redis-data-redis-replicas-1                   Bound    dev2-p2-efe6ca9547           8Gi        RWO            unity-iscsi    321d
data          redis-data-redis-replicas-2                   Bound    dev2-p2-367f656b8c           8Gi        RWO            unity-iscsi    321d
test          elasticsearch-master-elasticsearch-master-0   Lost     development2-p2-96af118f31   0                         unity-iscsi    159d
test          elasticsearch-master-elasticsearch-master-1   Lost     development2-p2-7c5e89fe4e   0                         unity-iscsi    159d
test          elasticsearch-master-elasticsearch-master-2   Lost     development2-p2-5fbfcc71ad   0                         unity-iscsi    159d

Steps to Reproduce

  1. Upgrade kubernetes from v1.22 -> v1.23 -> 1.24
  2. Check the events in any pod that uses pv
  3. You will see the events will be populated with warning and errors that the volume is unmounted and mounted elsewhere

We had to restart the unity controller and node daemonset and then restart all the nodes in the cluster to fix the issue.

Expected Behavior

After kubernetes upgrade to v1.24 there should not be any warning/errors in the pods events with respect to volumes

CSM Driver(s)

CSI driver v2.4

Installation Type

Helm

Container Storage Modules Enabled

No response

Container Orchestrator

Rancher v2.7.1 with Kubernetes Version: v1.22.10

Operating System

RockyLinux 8.6

@rajbaratht rajbaratht added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Feb 8, 2023
@csmbot
Copy link
Collaborator

csmbot commented Feb 8, 2023

@rajbaratht: Thank you for submitting this issue!

The issue is currently awaiting triage. Please make sure you have given us as much context as possible.

If the maintainers determine this is a relevant issue, they will remove the needs-triage label and assign an appropriate priority label.


We want your feedback! If you have any questions or suggestions regarding our contributing process/workflow, please reach out to us at [email protected].

@rajendraindukuri rajendraindukuri added area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity and removed needs-triage Issue requires triage. labels Feb 9, 2023
@bandak2
Copy link

bandak2 commented Feb 9, 2023

Hi @rajbaratht, Thanks for reaching out to us.
While we look into the issue, can you please confirm what is the RKE version being used here?
Thanks!

@bandak2
Copy link

bandak2 commented Feb 9, 2023

It would help to know the rke version used during the installation of 1.22, and if there was any newer rke version used for the upgrade (to 1.24).

@rajbaratht
Copy link
Author

rajbaratht commented Feb 9, 2023

@bandak2 We are use RKEv1 and we use the same RKE version before and after the upgrade.

@rajbaratht
Copy link
Author

@bandak2 I have created an SR with dell SR#161977353. I have uploaded logs before and after upgrade. And also uploaded a video of the issue.

@gashof
Copy link

gashof commented Feb 9, 2023

Hi bandak2,
I am tracking the issue internally.

@shaynafinocchiaro
Copy link
Collaborator

@bandak2 @gashof have any updates been made here?

@dell dell deleted a comment from rensyct Mar 23, 2023
@gallacher
Copy link
Contributor

This issue is currently being investigated

@bandak2
Copy link

bandak2 commented Mar 27, 2023

After an initial investigation, it feels like an RKE issue. We've already filed a request 3203 with Rancher on this, and follow up with them for further look into it.

@Tejeev
Copy link

Tejeev commented Sep 5, 2023

@bandak2 request 3203 is waiting for a response. Not sure if you've seen it

@Tejeev
Copy link

Tejeev commented Sep 25, 2023

@bandak2 Is this still an issue, or did you find a resolution on your side?
cc: @rajbaratht

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

8 participants