Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm-checkpoint task indefinitely stuck while interacting with GC #6032

Open
ydirson opened this issue Oct 2, 2024 · 2 comments
Open

vm-checkpoint task indefinitely stuck while interacting with GC #6032

ydirson opened this issue Oct 2, 2024 · 2 comments

Comments

@ydirson
Copy link
Contributor

ydirson commented Oct 2, 2024

During a vm-checkpoint on XCP-ng 8.3 (so using xcp-emu-manager), I got a case of xe vm-checkpoint never returning. According to the logs xenopsd got non-responsive but we fail to see why.
The log shows a SR GC between the checkpoint start and its failure, featuring errors of its own, involving the VDI holding the VM we're attempting to checkpoint.

[10:13 host1 ~]# date
Wed Oct  2 10:13:41 CEST 2024
[10:13 host1 ~]# xe task-list uuid=42646262-60ba-6896-c759-e86f9551ee83 params=name-label,status,progress,created
name-label ( RO)    : VM.checkpoint
        status ( RO): pending
      progress ( RO): 0.056
       created ( RO): 20241001T11:11:07Z
[10:25 host1 ~]# xe vm-list  uuid=069bf5db-5e87-0f51-322d-901f8a01a742 params=VBDs
VBDs (SRO)    : 800c783f-5511-00a9-804e-76de14e89bcf; 0047698a-8f07-6366-7d30-ccaf7f2b5293

[10:25 host1 ~]# xe vbd-list uuid=0047698a-8f07-6366-7d30-ccaf7f2b5293 params=vdi-uuid 
vdi-uuid ( RO)    : 371e7067-d032-49ec-9dd1-552e0c5c68a9

The problem seems to be manyfold:

  • possible locking issue allowing the VDI of a checkpointing VM to get involved in a GC
  • ... likely causing Failed to read from xenopsd because timeout reached. reported by emu-manager, but xenopsd does not show anything
  • XAPI not noticing that the emu-manager process it called has indeed finished with an error, and keeping the task pending

xsensource.log
daemon.log
SMlog

@ydirson ydirson changed the title vm-checkpoint task indefinitely stuck vm-checkpoint task indefinitely stuck while interacting with GC Oct 2, 2024
@edwintorok
Copy link
Contributor

SMGC claims to have finished at 13:11:27, so not sure what it was doing until 13:13 that it timed out:

Oct  1 13:11:27 host1 systemd[1]: Started Garbage Collector for SR e6e40ee6-0491-0c3f-186c-db3d00a623a7.
Oct  1 13:13:14 host1 emu-manager-4[16196]: Failed to read from xenopsd because timeout reached.

Do you have more logs for that period? The SM log seems to be truncated, or was there really no more activity there?

@ydirson
Copy link
Contributor Author

ydirson commented Oct 2, 2024

Hm I cannot rule out a copypaste mistake. Will try to see if I still get the full logfiles, otherwise will upload the next occurrence (already saw this 3 times in 2 days, pretty confident I can reproduce)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants