Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qubes Backup hangs and never finishes if any of the target qubes are running #7411

Open
scallyob opened this issue Apr 2, 2022 · 8 comments
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: installer C: storage diagnosed Technical diagnosis has been performed (see issue comments). P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@scallyob
Copy link

scallyob commented Apr 2, 2022

How to file a helpful issue

Qubes OS release

4.1rc3 and 4.1 final release

Brief summary

Qubes Backup hangs after creating 30K file if qube being backed up is still running.
If multiple qubes being backed up and some are running it will create a bigger file (presumably of the non-running ones) before hanging.

Steps to reproduce

  1. Create a new standalone qube based on fedora-34 called backuptest
  2. Start backuptest
  3. Run Qubes Backup
  4. select just backuptest to be backedup
  5. "Compress backup" is checked
  6. click "Next>"
  7. Select external drive and directory
  8. set password for encryption
  9. uncheck save settings as default backup profile
  10. leave turn computer off unchecked
  11. click "Next>"
  12. it warns you that it will backup the state prior to starting the qube
  13. click the button to start backup
  14. watch the file in terminal

Expected behavior

If I shutdown backuptest and run the steps above I see the progress bar move on the Backup tool until it reaches 100%.
This creates a 3.7GB file that I can verify with the Restore Backup tool.
I would expect these same results when backuptest is running.

Actual behavior

progress bar stays at 0% in Backup tool
file never exceeds 30K and timestamp doesn't change after that in Terminal
backup never completes, have to cancel the backup tool

@scallyob scallyob added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Apr 2, 2022
@scallyob
Copy link
Author

scallyob commented Apr 2, 2022

By the way, have been working through this last couple months before posting here: https://forum.qubes-os.org/t/backup-fails-since-upgrade-to-4-1-unless-qubes-are-all-shut-down/8660/15

Originally discovered in 4.1rc3 install, but just did a fresh 4.1 install and found it has not been fixed.

@DemiMarie DemiMarie added P: major Priority: major. Between "default" and "critical" in severity. needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Apr 2, 2022
@rustybird
Copy link

rustybird commented Apr 3, 2022

In #7198 (comment) you wrote that you have an ext4 installation. It's intentional that the Qubes backup system will refuse to back up a running VM stored on a deprecated legacy 'file' driver pool (which was never a safe thing to do).

The unintentional part is that when the backup destination is a VM (instead of dom0) the process will indeed just hang for some reason, instead of aborting with the message "Backup error: file pool cannot export running volumes" as it should.

You may want to reinstall R4.1 with one of the three supported automatic partitioning schemes:

  • LVM Thin Provisioning (the default)
  • Btrfs
  • Standard Partition, keeping the preset File System choice of xfs for the partition at /var/lib/qubes

@scallyob
Copy link
Author

scallyob commented Apr 4, 2022

I believe I did use the default. I just went through the installer and here is what I remember doing.

When I got to the page entitled "Device Selecting", I selected /dev/sda and /dev/sdb, which I intended to use as a RAID1, then I used "custom" in order to set up the RAID.

The next screen is entitled "New Qubes OS R4.1.0 Installation"
I can't think of any reason why I would have changed it from the default of "LVM Thin Provisioning" there. Then I had to delete the old root partition and recreate it as
type: RAID
check encrypt
select RAID1
then select format as ext4

When I first attempted installing the release candidate I tried manually setting up the RAID with fdisk and mdadm but couldn't figure it out. The above seemed to work and is what I repeated the other day when doing a fresh install. But from what you've said it seems likely my lack of skills in the RAID install may be the source of my problem? Does what I describe above seem like the wrong way to do it?

@DemiMarie
Copy link

The next screen is entitled "New Qubes OS R4.1.0 Installation" I can't think of any reason why I would have changed it from the default of "LVM Thin Provisioning" there. Then I had to delete the old root partition and recreate it as
type: RAID
check encrypt
select RAID1
then select format as ext4

I suspect that is what the installer didn’t handle properly. You were probably hoping that the installer would create a RAID (either using LVM RAID or mdadm) and then do the normal provisioning on top of that. But the manual partitioning probably got it confused and caused it to leave out the LVM layer. That left you stuck with an ext4 filesystem without LVM, so Qubes OS had to resort to the old, crummy file pool. The result is that lots of stuff doesn’t work properly.

(I don’t think I have ever seen a good explanation for why the file pool is no good, so here is mine: The file pool uses Linux’s dm-snapshot driver to provide snapshots. However, dm-snapshot only supports a single origin and a single snapshot, so if one wants to support N revisions, one needs N dm-snapshot layers. Reflinks and LVM thin provisioning, on the other hand, don’t distinguish between origins and snapshots. Both the origin and snapshot are independent of each other, and one can have huge numbers of snapshots with only O(log N) overhead. So it is much more practical to implement nice features this way.)

When I first attempted installing the release candidate I tried manually setting up the RAID with fdisk and mdadm but couldn't figure it out. The above seemed to work and is what I repeated the other day when doing a fresh install. But from what you've said it seems likely my lack of skills in the RAID install may be the source of my problem? Does what I describe above seem like the wrong way to do it?

This is an installer bug. At the very least, the installer should emit a giant warning if it cannot create a non-deprecated pool, with the default action being to not continue.

@scallyob
Copy link
Author

scallyob commented Apr 4, 2022

OK, thanks for the info @DemiMarie. So will have to wait for a new release of the installer?

Or is there a way to create a RAID setup with the current installer that won't cause this problem?

@DemiMarie
Copy link

OK, thanks for the info @DemiMarie. So will have to wait for a new release of the installer?

Or is there a way to create a RAID setup with the current installer that won't cause this problem?

BTRFS’s built-in RAID should work (subject to the usual BTRFS caveats: avoid RAID5 and RAID6, I/O performance can be unstable, etc), and selecting XFS or BTRFS instead of ext4 will result in a usable varlibqubes pool. If you want to use dm-raid + LVM thin provisioning I believe you will need to do it manually.

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). C: storage and removed C: core needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Apr 6, 2022
@scallyob
Copy link
Author

reinstalling with BTRFS fixed the backup issue. But it did greatly increase the load, lagging of the system. Maybe the I/O performance you mention? I described it more in the forum for reference: https://forum.qubes-os.org/t/unable-to-get-fully-functional-system-when-installing-qubes-4-1-on-a-raid1/10682

@DemiMarie
Copy link

reinstalling with BTRFS fixed the backup issue. But it did greatly increase the load, lagging of the system. Maybe the I/O performance you mention?

It probably is. BTRFS isn’t known for being fast and VM disks are one of the worst-case scenarios for it. You could also try XFS, which might be faster.

I described it more in the forum for reference: https://forum.qubes-os.org/t/unable-to-get-fully-functional-system-when-installing-qubes-4-1-on-a-raid1/10682

I will take a look at that in a bit.

@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: installer C: storage diagnosed Technical diagnosis has been performed (see issue comments). P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

4 participants