Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage0: Initrd not discovered when booting? #5048

Open
CookieComputing opened this issue Jan 23, 2025 · 5 comments
Open

Stage0: Initrd not discovered when booting? #5048

CookieComputing opened this issue Jan 23, 2025 · 5 comments

Comments

@CookieComputing
Copy link

CookieComputing commented Jan 23, 2025

Hi folks, running into some issues (or perhaps I've missed some documentation somewhere) when trying to run with the stage0 bootloader in our stack using SEV-SNP, wondering if anyone ran into similar issues!

I'm trying to leverage direct boot which seems like it was supported in #4189, but I'm running into some weird issues where by enabling SEV-SNP, the kernel doesn't seem to want to load the initrd, instead complaining that I can't boot without some root device (although I expected the kernel to load initrd which would then find the root device).

I've attached two files with some logs detailing the issue I'm seeing:

No SEV-SNP

cvm_stage0_no_sev_snp.txt

Launched with:

/tmp/cvm/qemu-system-x86_64 -cpu EPYC-v4 -smp 4 -m 3584 -enable-kvm -nographic -drive if=virtio,format=qcow2,file=/tmp/processvm-agentLV4zdm/processvm.qcow2 -machine q35  -drive if=virtio,format=raw,file=/tmp/ciseed.iso -bios /packages/cvm/stage0_bin -kernel /cvm/launch/cvm_vmlinuz -initrd /cvm/launch/layer_no_integrity.cpio.gz -append "console=ttyS0 audit=0 biosdevname=0 net.ifnames=0" -object memory-backend-memfd,id=ram1,size=3584M,share=true,reserve=false -vga none

With SEV SNP

cvm_stage0_with_sev_snp.txt

Launched with:

/tmp/cvm/qemu-system-x86_64 -cpu EPYC-v4 -smp 4 -m 3584 -enable-kvm -nographic -drive if=virtio,format=qcow2,file=/tmp/processvm-agentLV4zdm/processvm.qcow2 -machine q35  -drive if=virtio,format=raw,file=/tmp/ciseed.iso -bios /packages/cvm/stage0_bin -kernel /cvm/launch/cvm_vmlinuz -initrd /cvm/launch/layer_no_integrity.cpio.gz -append "console=ttyS0 audit=0 biosdevname=0 net.ifnames=0 -drive if=virtio,format=raw,file=/tmp/cvm_data.iso"  -object memory-backend-memfd,id=ram1,size=3584M,share=true,reserve=false -machine q35,confidential-guest-support=sev0,memory-backend=ram1 -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,author-key-enabled=false -vga none

Note both were launched with q35, but I recall also seeing some similar issues with microvm, see:
cvm_stage0_with_sev_snp_microvm.txt

From what I can tell, it looks like from the logs, the launch with SEV-SNP is indeed reading in the initrd file, and putting something inside zero_page: https://github.com/project-oak/oak/blob/main/stage0/src/lib.rs#L174-L179 and jumping directly into the kernel, as expected:

stage0 INFO: jumping to kernel at 0x0000000002000200
[    0.000000] Linux version 6.11.2-0_fbk1_ga4cb149fd9cd (clang version 15.0.7 (Red Hat 15.0.7-2.el9), LLD 15.0.7) #1 SMP Wed Oct 16 08:16:41 PDT 2024
...

However, when launching without SEV-SNP, I notice this log:

[    3.785277] Trying to unpack rootfs image as initramfs...

Whereas there is no equivalent log when I launch with the SEV-SNP arguments.

I've also noticed that #4189 (comment) launches with a bin/initramfs file, whereas we're loading in with a layer.cpio.gz initrd file, which is obviously compressed. Not sure if this is an issue, since I think set_initial_ram_disk() should just load the initrd into the page, which the kernel then converts into an initramfs? Not sure if I've misunderstood that part. We're basically just running an upstream 6.11 kernel with some light patches that do not affect the boot process.

Any support would be greatly appreciated, thanks!

@conradgrobler
Copy link
Collaborator

Hi, my best guess from looking at the logs is that the difference in behaviour is caused by the different append statements (causing different kernel command-lines) rather than whether AMD SEV-SNP is enabled or not.

-append "console=ttyS0 audit=0 biosdevname=0 net.ifnames=0"

vs

-append "console=ttyS0 audit=0 biosdevname=0 net.ifnames=0 -drive if=virtio,format=raw,file=/tmp/cvm_data.iso"

A compressed initrd file should not make a difference as long as it is a format the kernel understands, since Stage 0 just writes it to memory and passes it to the kernel unmodified.

@CookieComputing
Copy link
Author

CookieComputing commented Jan 24, 2025

Thanks for the suggestion! Unfortunately, I think that might've been a red herring, as another run with the corrected -append line seemed to also run into the same issue: cvm_stage0_with_sev_snp_corrected_cmdline.txt

From that file, it definitely seems like our other partitions are present (vda2 is our rootfs), but it's strange that initrd is for some reason not triggering. Adding root=PARTUUID=7e60d416-9d8b-456f-a5e5-013616a4b808 (the UUID of vda2) will let us boot directly to the disk, but this skips some critical boot procedures in our initrd, so we're hoping to figure out what's going on with our setup

@conradgrobler
Copy link
Collaborator

Looking at this in more detail it looks like there is a compatibility problem with the QEMU virtio drivers and SEV-SNP. The kernel seems to find the initial RAM file system, but fails when trying to mount it.

Which version of QEMU are you using? I would recommend trying the latest version or QEMU. You could also try the following patch that was needed for virtio-net-pci. It might also be needed for virtio disks (I haven't tested that since we don't expose any disk devices to the guest): dingelish/qemu@876e262

It might also be worth checking what you are running on the host. All our successful testing was with the latest relese of Debian stable and the latest upstream kernel release.

@conradgrobler
Copy link
Collaborator

I missed the bit in your answer above that you can boot using the disk. That would mean my suggestion above that it might be a virtio compatibility issue is incorrect.

It still seems like it is finding the initial RAM disk, but cannot start it because it tries to mount it to the root and fails. I am not sure what the cause would be. Have you tried other kernels (like ours)?

@CookieComputing
Copy link
Author

CookieComputing commented Jan 24, 2025

Which version of QEMU are you using? I would recommend trying the latest version or QEMU

Just in case, I'll mention that we're using one of the hyperscaler QEMU versions: https://git.centos.org/rpms/qemu/tree/c9s-sig-hyperscale.

This is basically just upstream 9.1+ QEMU, with some SNP patches applied. It might not be the issue as you mentioned.

It still seems like it is finding the initial RAM disk, but cannot start it because it tries to mount it to the root and fails. I am not sure what the cause would be. Have you tried other kernels (like ours)?

I'll try out your kernel to see if it's an issue with ours, but our kernel should essentially just be the upstream 6.11 with light security patches. I can try later versions of that as well and report back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants