Add private pool and keepalive test to VMM, device tree updates #740

yupavlen-ms · 2025-01-29T22:16:40Z

Start bringing up missing coverage for private pool and NVMe keepalive to VMM test suite.
This is not complete end-to-end test yet, but brings necessary infrastructure changes.

Update device tree properties to sync with Windows host changes.

chris-oo

Functionally it seems fine, but I'd like to get other's thoughts on exposing such a raw protocol detail at high level APIs. It doesn't sit well with me, I think we probably need to abstract it a bit more.

chris-oo · 2025-01-29T22:32:38Z

openvmm/hvlite_helpers/src/underhill.rs

@@ -12,6 +13,7 @@ use mesh::rpc::RpcSend;
 pub async fn service_underhill(
    vm_send: &mesh::Sender<VmRpc>,
    send: &mesh::Sender<GuestEmulationRequest>,
+    flags: SaveGuestVtl2StateFlags,


To me, this feels like a layering/abstraction violation, as in it seems odd to me to specify raw GET protocol at such a high level function. What are other people's thoughts on this?

Hmmm let me think, it didn't feel to me this way...

At least to me, these properties seem device specific. We generally try to abstract stuff at the vmm level even so we don't refer to hyper-v specific concepts, so it seems really wrong to use a device specific protocol crate here.

IE should we add some capabilities field to where GuestEmulationRequest lives?

How do we anticipate this struct changing in the future? Currently it only has the one bit, are we thinking we'll be adding many more? What might they be?

Or should we be exposing something more like an nvme_keepalive: bool here? Or maybe a unified device_keepalive that will also control mana someday? Do we anticipate any scenarios where one will be kept alive and the other won't?

this is a weird function, since from what I understand, its only really used in OpenVMM and petri, right?
I wouldn't be so sure that this is a layering violation in and of itself.

I'd me more concerned if we saw these sorts of concepts leaking into various "core" OpenVMM and OpenHCL codepaths, but I don't think this helper falls under that bucket

Our vision is that more bits will be added in future. For now we definitely can declare a bool, but having so very specific NVMe-related bool feels less intuitive than a bitmask. Also, they already started working on MANA-keepalive, so more changes are coming.

chris-oo · 2025-01-29T22:33:22Z

openvmm/hvlite_core/src/worker/vm_loaders/igvm.rs

@@ -494,6 +495,12 @@ fn build_device_tree(
            .end_node()?;
    }

+    // Indicate that NVMe keep-alive feature is supported by this VMM.


Don't we want this to be configurable, or we want to always advertise we can do this?

It follows current Windows-side implementation. This particular property is always present starting with specific OS version, other properties are configurable.

Keep-alive is actually triggered when both this property exist and capabilities_flags allows it.

I think want the ability to test with hosts that do not support keepalive, so that we can make sure the old tear down/restart flow also works since we have hosts that will operate in that mode too? It seems like this should be configurable?

If we want some sort of live migration downgrade test then I guess it should be configurable.
The original meaning of this property is to tell VTL that this specific version of VMM (OpenVMM) does support keepalive.
E.g. it is also unconditionally hardcoded in AH2025 worker process, but doesn't exist in anything older than AH2025.

chris-oo · 2025-01-29T22:33:45Z

petri/src/vm/openvmm/runtime.rs

@@ -172,7 +173,11 @@ impl PetriVmOpenVmm {
    );
    petri_vm_fn!(
        /// Restarts OpenHCL.
-        pub async fn restart_openhcl(&mut self, new_openhcl: ArtifactHandle<impl petri_artifacts_common::tags::IsOpenhclIgvm>) -> anyhow::Result<()>
+        pub async fn restart_openhcl(


is it just me, or is there a weird spacing/line ending here?

This is macro so clippy/fmt task ignores it. After adding another parameter the line would be too long. @smalis-msft what do you think?

ahh right it's in a macro.. I think github is just rendering it weird for me then

chris-oo · 2025-01-29T22:33:56Z

petri/src/vm/openvmm/runtime.rs

@@ -12,6 +12,7 @@ use anyhow::Context;
 use async_trait::async_trait;
 use futures::FutureExt;
 use futures_concurrency::future::Race;
+use get_protocol::SaveGuestVtl2StateFlags;


Again, this seems like an abstraction/layering violation.

Would you prefer test-specific definitions similar to declare_artifacts / LATEST_LINUX_DIRECT_TEST_X64, for example?

leaking this protocol type to the user-facing petri API certainly seems less palettable to me, vs. the other use-case in the hvlite_helper crate...

I would suggest having restart_openhcl accept a more petri-specific OpenhclServicingFlags, or something like that, rather than exposing this low-level protocol type directly

smalis-msft · 2025-01-30T16:52:29Z

vmm_tests/vmm_tests/tests/tests/x86_64/openhcl_servicing.rs

+    openhcl_servicing_core(
+        config,
+        LATEST_LINUX_DIRECT_TEST_X64,
+        SaveGuestVtl2StateFlags::new().with_enable_nvme_keepalive(false),


No need to specify false, that's what new does.

daprilik · 2025-01-30T17:22:55Z

vm/devices/get/get_resources/src/lib.rs

@@ -160,7 +160,7 @@ pub mod ged {
        /// Wait for VTL2 to start VTL0.
        WaitForVtl0Start(Rpc<(), Result<(), Vtl0StartError>>),
        /// Save VTL2 state.
-        SaveGuestVtl2State(Rpc<(), Result<(), SaveRestoreError>>),
+        SaveGuestVtl2State(Rpc<u64, Result<(), SaveRestoreError>>),


please define a proper struct (bitfield struct?) or this payload, vs. sending a raw (and presently undocumented) u64 payload

Add private pool and keepalive test to VMM, device tree updates

2d670fd

yupavlen-ms requested review from a team as code owners January 29, 2025 22:16

chris-oo reviewed Jan 29, 2025

View reviewed changes

smalis-msft reviewed Jan 30, 2025

View reviewed changes

daprilik reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add private pool and keepalive test to VMM, device tree updates #740

Add private pool and keepalive test to VMM, device tree updates #740

yupavlen-ms commented Jan 29, 2025

chris-oo left a comment

chris-oo Jan 29, 2025

yupavlen-ms Jan 29, 2025

chris-oo Jan 29, 2025 •

edited

Loading

smalis-msft Jan 30, 2025 •

edited

Loading

daprilik Jan 30, 2025

yupavlen-ms Jan 30, 2025

chris-oo Jan 29, 2025

yupavlen-ms Jan 29, 2025

chris-oo Jan 29, 2025

yupavlen-ms Jan 30, 2025

chris-oo Jan 29, 2025

yupavlen-ms Jan 29, 2025

chris-oo Jan 29, 2025

chris-oo Jan 29, 2025

yupavlen-ms Jan 29, 2025

daprilik Jan 30, 2025

smalis-msft Jan 30, 2025

daprilik Jan 30, 2025

Add private pool and keepalive test to VMM, device tree updates #740

Are you sure you want to change the base?

Add private pool and keepalive test to VMM, device tree updates #740

Conversation

yupavlen-ms commented Jan 29, 2025

chris-oo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris-oo Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

smalis-msft Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris-oo Jan 29, 2025 •

edited

Loading

smalis-msft Jan 30, 2025 •

edited

Loading