-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MountDevice failed log spamming for statefulset volumes after RKE version upgrade #3203
Comments
Can someone please help us with taking a look at this? |
Hi, can someone please help with the issue? |
This is in relation to our Dell CSI storage driver, where an RKE upgrade is leading kubelet spamming mount error for statefulsets. |
Hi @bandak2, Thank you, |
Hi TJ, The CSI spec our Dell csi-unity storage driver uses is CSI Spec 1.5.
sample statefulset used: `---
apiVersion: apps/v1
|
Hi TJ, Do you have an update on this, or if you need anything from us? |
Hi Keerthi I'll see if there is any information we can provide him to speed up this part of the process. |
I've been reviewing things, and it may help improve the productivity of our recreations if we could double check the formatting of the statefulset you provided so as to prevent any misunderstandings on our part from propagating through our tests. Please review my edits below for accuracy:
(Took me a sec to figure out but it looks like the code block syntax for github is three back ticks (```)) |
Thanks for the response @Tejeev .
|
Is this correct?
|
seems like the identation is a little off for the claimtempalte. You can use the following which works for me:
|
Thanks! |
I'm told we're having some difficulty securing access to the specific server hardware (and, therefore also the CSI driver) that is being used here. Can you please help us investigate by trying the following:
The key here is to try the above steps on the same hardware with the same CSI drivers, as I think trying it on different hardware with different drivers wouldn't produce useful results for this particular issue. |
Apologies for the delayed response. It was a long weekend.
We'll revert on point 1. |
We tried the upgrade scenario with rke v1.3.20 from 1.22 through 1.24, and we've hit the issue once we landed in 1.24. RKE v1.3.20 with k8s v1.22.17-rancher1-2
upgrading to v1.23.16-rancher2-2.....
..upgrading from v1.23.16-rancher2-2 to v1.24.13-rancher2-1
We are not sure why there is a change in the staging path when we hit k8s 1.24 version. |
Hi @bandak2 I've been looking into this issue. As @Tejeev mentioned, it's difficult for us to reproduce because we don't have the Dell hardware required. From going through the code, I understand that this error is happening when |
We're trying this on vanilla k8s and will update our findings from our end. |
@bandak2 , just curious if there is any update on it? |
Hi @snasovich, Thanks for checking. |
There are too many logs to sanitize given that we are including the driver logs. We're reaching out to your team via a different channel where we can share those securely, without having to go through sanitizing all of these. |
The logs for this have been provided through backend channels. Once you have some insights please let us know. |
Hi @bandak2. Thanks for the logs. I've spent some time digging through them and the only meaningful thing I've found so far is (as you mentioned before), the new mount calls to a different path. Before v1.24, all the mounts are to the mount point mount -t ext4 -o defaults /dev/dm-0 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/krv11506-d05ec878ad/globalmount After v1.24, I see that there are still mount calls that use the same mount point, but there are also new ones that use mount -t ext4 -o defaults /dev/dm-1 /var/lib/kubelet/plugins/kubernetes.io/csi/csi-unity.dellemc.com/4bb10b984d5ca1bf16a8c79733866361f7061d665cba703eef3307a3426d3b21/globalmount I'm not sure if this is a red herring or not, though, because these mounts all seem to succeed eventually. I did however notice that |
Thanks @bfbachmann, I'll check on the link of the similar issue to see if there is anything that stands out from our end of things.
|
While at it, as you can see in the logs, during the k8s upgrade to 1.24, the older path was unpublished and then re-published with the new path. I believe the CO does that. These actions don't seem to happen while upgrading RKE, and the older path stays as the StagingPath. |
Hi @bandak2, did you find anything interesting on the multipath front? |
RKE version
v1.3.18
Docker version
20.10.17-ce
Operating system and kernel
SUSE Linux Enterprise Server 15 SP4 -- Kernel 5.14.21-150400.22-default
Type/provider of hosts
VMware VM
cluster.yml file:
Steps to Reproduce:
Results:
SURE-6124
The text was updated successfully, but these errors were encountered: