Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multipath-tools to openSUSE CI container #2353

Closed
LaszloGombos opened this issue May 20, 2023 · 7 comments · Fixed by #2361
Closed

Add multipath-tools to openSUSE CI container #2353

LaszloGombos opened this issue May 20, 2023 · 7 comments · Fixed by #2361
Labels
enhancement Issue adding new functionality good first issue Great issue for getting started with contributing to Dracut opensuse Issue tracker for the OpenSuse distribution test Issues related to testing

Comments

@LaszloGombos
Copy link
Collaborator

LaszloGombos commented May 20, 2023

I noticed that multipath module is not tested on the openSUSE CI container as multipath-tools are not installed.
Installing multipath-tools into the container seemed to actually expose some potential issues and dracut CI test failures (see #2258).

@mwilck @aafeijoo-suse Would you support adding multipath-tools to the Opensuse CI container ?

@LaszloGombos LaszloGombos added enhancement Issue adding new functionality test Issues related to testing opensuse Issue tracker for the OpenSuse distribution good first issue Great issue for getting started with contributing to Dracut labels May 20, 2023
@mwilck
Copy link
Contributor

mwilck commented May 23, 2023

makes sense, yes.

@LaszloGombos
Copy link
Collaborator Author

LaszloGombos commented May 27, 2023

@aafeijoo-suse @mwilck It seems quite a few tests are now failing after c08ae40 - and only on the openSUSE container.

Perhaps Test 03 failure is the best test to understand why simply installing multipath-tools made this test fail.

@LaszloGombos LaszloGombos changed the title Add multipath-tools to Opensuse CI container Add multipath-tools to openSUSE CI container May 27, 2023
@mwilck
Copy link
Contributor

mwilck commented May 30, 2023

It seems quite a few tests are now failing after c08ae40

I would consider that a test artifact. I have strong evidence that booting opensuse with multipath works reliably in practice.

@LaszloGombos
Copy link
Collaborator Author

LaszloGombos commented May 30, 2023

It seems quite a few tests are now failing after c08ae40

I would consider that a test artifact. I have strong evidence that booting opensuse with multipath works reliably in practice.

Thanks for helping out @mwilck . The failing tests cases pass if the multipath dracut module is explicitly omitted (or if systemd module is omitted). I think these failures are not about multipath expected to work (happy path), but it is more about multipath gets involved even when it is not expected and introduces some kind of side effect when it is not used or not expected to be in use.

Test-03 is not specifically expected to test multipath dracut module.

Here is the link to the openSUSE run - https://github.com/dracutdevs/dracut/actions/runs/5098827739/jobs/9166201077

These tests are passing in fedora (and on arch in a similar conditions) - https://github.com/dracutdevs/dracut/actions/runs/5098827739/jobs/9166200202

One difference between a passing and failing run that I have spotted is on fedora (passing case) there does not seem to be a /etc/multipath.conf file exists. This is the line that is only in fedora but not in openSUSE

/etc/multipath.conf does not exist, blacklisting all devices.

I also noticed that in fedora container "/etc/multipath/" directory exists but in openSUSE it does not. In addition openSUSE container seem to include 56-multipath.rules but fedora does not.

At this point my guess is that somehow this is about systemd service dependencies - as omitting systemd but including multipath on openSUSE container also resolves the issue.

Just to state the obvious - it could very well be that both fedora (and arch) and the tests have the bugs and openSUSE is acting as expected.

@LaszloGombos
Copy link
Collaborator Author

As as an additional data point - a PR to disable multipath dracut container by default (only include it if it is explicitly needed) seem to make all CI tests pass on openSUSE (including test 3,13,14) - #2382

@mwilck
Copy link
Contributor

mwilck commented Jun 14, 2023

Here is the link to the openSUSE run - https://github.com/dracutdevs/dracut/actions/runs/5098827739/jobs/9166201077

The problem may be related to the fact that this test features a separate file system for /usr.

2023-05-27T12:47:28.4380294Z [   50.571367][  T451] BTRFS: device label dracut devid 1 transid 8 /dev/dm-2 scanned by (udev-worker) (451)
2023-05-27T12:47:28.5143238Z [   50.647501][  T442] BTRFS: device label dracutusr devid 1 transid 12 /dev/dm-1 scanned by (udev-worker) (442)
2023-05-27T12:47:29.6036754Z [  OK  ] Found device /dev/disk/by-label/dracut.
...
2023-05-27T12:47:29.8331781Z [  OK  ] Finished dracut initqueue hook.
2023-05-27T12:47:29.8439815Z [  OK  ] Reached target Initrd Root Device.
...
2023-05-27T12:47:31.2589086Z [  OK  ] Finished File System Check on /dev/disk/by-label/dracut.
2023-05-27T12:47:31.3234001Z          Mounting /sysroot...
22023-05-27T12:47:31.6333728Z [  OK  ] Reached target Initrd Root File System.
2023-05-27T12:47:31.7015347Z          Starting Mountpoints Configured in the Real Root...
...
2023-05-27T12:47:44.2518909Z [  OK  ] Finished Mountpoints Configured in the Real Root.
2023-05-27T12:47:44.2913991Z          Stopping Device-Mapper Multipath Device Controller...
2023-05-27T12:47:44.3514920Z [  OK  ] Stopped Device-Mapper Multipath Device Controller.
2023-05-27T12:47:46.5645581Z [  *** ] A start job is running for /dev/dis…by-id/ata-disk_usr (2s / no limit)

I am not sure how the block device layout is supposed to be. But the above log shows that two dm devices holding the file systems dracut and dracutusr had been detected by the kernel. This suggests that multipath had worked correctly (we'd need to enable udev.log-priority=debug to see this more clearly). It mounted sysroot successfully and finished initrd-parse-etc.service ("Mountpoints Configured in the Real Root"). The latter then adds a dependency on the /usr device. But the symlink /dev/disk/by-id/ata-disk_usr is missing, even though the dracutusr device above had been found by the kernel, and dracut times out waiting for it. This looks like an udev issue.

We can see that multipathd is being stopped before the ata-disk_usr´ device has been detected. With multipath.socketmissing in the initrd (we removed recently, didn't we?) this could cause multipath-related udev rules to mis-classify device. Note that multipathd is stopped before waiting for the/usrdevice, whereas all other essential services (in particular, udevd) are still running and aren't stopped before the timeout occurs. The only directives inmultipathd.servicethat I can think of that could cause this are

Before=initrd-cleanup.service
Conflicts=initrd-cleanup.service

introduced by 3c244c7. Perhaps we should try removing this directive (for testing only) and see if it improves matters. The idea of 3c244c7 (stop multipath before cleaning up udev db) is not entirely wrong, but note that multipathd never writes to the udev db and thus can't corrupt it. If anything, udevd itself should be terminated before the cleanup service runs.

OTOH, the message "Starting Cleaning Up and Shutting Down Daemons" occurs only after the timeout. So it's rather unlikely that this was causing multipathd to be stopped. But if it wasn't that, what else?

@LaszloGombos
Copy link
Collaborator Author

LaszloGombos commented Oct 29, 2023

The problem may be related to the fact that this test features a separate file system for /usr.

Test 13 and Test 14 do not have /usr on a seperate filesystem and yet as similar issue is observed with those tests.

--- a/test/TEST-13-ENC-RAID-LVM/test.sh
+++ b/test/TEST-13-ENC-RAID-LVM/test.sh
@@ -147,7 +147,7 @@ test_setup() {
         chmod 0600 "$initdir"/etc/key
     )
     "$DRACUT" -l -i "$TESTDIR"/overlay / \
-        -o "plymouth network kernel-network-modules" \
+        -o "plymouth network kernel-network-modules multipath" \
         -a "debug" \

Related PR: #2499

Maybe for now we can just disable multipath module for these failing tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue adding new functionality good first issue Great issue for getting started with contributing to Dracut opensuse Issue tracker for the OpenSuse distribution test Issues related to testing
Projects
None yet
2 participants