Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guest with 2 distinct VBD sharing the same userdevice - following race condition around VM.revert? #5849

Open
ydirson opened this issue Jul 16, 2024 · 1 comment

Comments

@ydirson
Copy link
Contributor

ydirson commented Jul 16, 2024

For some reason one of guests now has 2 CD VBDs, with the same userdevice. I understand that this is not supposed to happen, with xe normally getting a DEVICE_ALREADY_EXISTSerror from XAPI:

# xe vbd-create vm-uuid=8c222e47-1de3-86b5-d0fa-64c0964026fa device=3 type=CD mode=RO
A device with the name given already exists on the selected VM
device: 3

Despite this I now have VBDs on a guest violating this constraint:

# xe vbd-list vm-uuid=8c222e47-1de3-86b5-d0fa-64c0964026fa userdevice=3
uuid ( RO)             : 70f2e451-e4d9-6ba5-f121-3fab8595835b
          vm-uuid ( RO): 8c222e47-1de3-86b5-d0fa-64c0964026fa
    vm-name-label ( RO): YDI - XCPng 8.3
         vdi-uuid ( RO): 386fd9e9-3778-47d8-ba3c-2abdb5755830
            empty ( RO): false
           device ( RO): 


uuid ( RO)             : 6af8c1c2-44d4-7ece-508e-c6ef78811c74
          vm-uuid ( RO): 8c222e47-1de3-86b5-d0fa-64c0964026fa
    vm-name-label ( RO): YDI - XCPng 8.3
         vdi-uuid ( RO): <not in database>
            empty ( RO): true
           device ( RO): xvdd

xensource.log shows for this VBD creation:

Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VM.get_allowed_VBD_devices D:e2ec9afbc29d|audit] VM.get_allowed_VBD_devices: VM = '8c222e47-1de3-86b5-d0fa-64c0964026fa (YDI - XCPng 8.3)'
Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VBD.create R:2729074df84e|audit] VBD.create: VM = '8c222e47-1de3-86b5-d0fa-64c0964026fa (YDI - XCPng 8.3)'; VDI = '386fd9e9-3778-47d8-ba3c
-2abdb5755830'
Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VBD.create R:2729074df84e|vbdops] Checking whether there's a migrate in progress...
Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VBD.create R:2729074df84e|vbdops] VBD.create (device = 3; uuid = 70f2e451-e4d9-6ba5-f121-3fab8595835b; ref = OpaqueRef:43accab4-4527-4277-a667-736f4e5a0511)

This matches the XO code that (when requested to insert a CD, and after determining a guest does not have a CD VBD yet) queries XAPI for allowed VBD devices and creates one. Which seems to imply that on this months-old VM on which I used that CD VBD tens of times, this particular time XAPI hallucinated the lack of the CD VBD for long enough to let the XO XAPI client create a new, conflicting one and insert a VDI in it.

The symptom to the user then, since is that this 2nd CD drive, which had a VDI inserted at creation time, causes that VDI to be automatically inserted in the 1st CD drive every time the VM starts. I guess that one would fall into "undefined behavior" because we're in a state that's not supposed to be possible?

The log (and existing snapshot timestamps) shows this incident closely follows a VM.revert:

Jul 15 19:01:40 R620-1 xapi: [debug||4728907 HTTPS 172.16.210.100->|Async.VM.revert R:2384787d6e36|xapi_vm_snapshot] Cloning the snapshotted disks

A few additional tests through XO show that, when using "VM.reset with the 'snapshot before' option activated", there is enough time to request a CD insertion before the notification of VM.revert ending comes in, and this time the extra VBD gets a distinct userdevice:

[18:53 R620-1 ~]# xe vm-cd-list uuid=8c222e47-1de3-86b5-d0fa-64c0964026fa
CD 0 VBD:
uuid ( RO)             : d214eb3b-51cd-0fbe-7fcc-3f2dedcad6b5
    vm-name-label ( RO): YDI XCPng 8.3
            empty ( RO): false
       userdevice ( RW): 4


CD 0 VDI:
uuid ( RO)             : 386fd9e9-3778-47d8-ba3c-2abdb5755830
       name-label ( RW): xcp-ng-8.3.0-rc1+ydi7.iso
    sr-name-label ( RO): ISOs
     virtual-size ( RO): 649068544


CD 1 VBD:
uuid ( RO)             : cedaa7ab-4cc3-5153-fd1e-0913c16238e2
    vm-name-label ( RO): YDI XCPng 8.3
            empty ( RO): true
       userdevice ( RW): 3

I assume once the race condition is triggered, YMMV.

This is on XAPI 1.249.32 on XCP-ng 8.2.1.

@edwintorok
Copy link
Contributor

Unfortunately XAPI doesn't have a transactional database, or support for transactions, so these race conditions are always possible.
In this particular case 'allowed operations' -style locking could be used, to forbid further changes to the VM while the revert is running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants