On Tue, Apr 09, 2024, Peter Gonda wrote:
Add GHCB management functionality similar to the ucall management. Allows for selftest vCPUs to acquire GHCBs for their usage.
Do we actually need a dedicated pool of GHCBs? The conundrum with ucalls is that we have no place in the guest to store the pointer on all architectures. Or rather, we were too lazy to find one. :-)
But for SEV-ES, we have MSR_AMD64_SEV_ES_GHCB, and any test that clobbers that obviously can't use ucalls anyways. Argh, but we can't get a value in that MSR from the host.
Hmm, that seems like a KVM flaw. KVM should advertise its supported GHCB protocol to *userspace*, but userspace should control the actual information exposed to the guest.
Oof, and doesn't SNP support effectively *require* version 2? I.e. shouldn't the KVM patch[*] that adds support for the AP reset MSR protocol bump the version? The GHCB spec very cleary states that that's v2+.
And what if userspace wants to advertise v2 to an SEV-ES guest? KVM shouldn't switch *all* SEV-ES guests to v2, without a way back. And the GHCB spec clearly states that some of the features are in scope for SEV-ES, e.g.
In addition to hypervisor feature advertisement, version 2 provides:
SEV-ES enhancements: o GHCB Format Version 2: ▪ The addition of the XSS MSR value (if supported) when CPUID 0xD is requested. ▪ The shared area specified in the GHCB SW_SCRATCH field must reside in the GHCB SharedBuffer area of the GHCB. o MSR protocol support for AP reset hold.
So I'm pretty sure KVM needs to let userspace set the initial value for MSR_AMD64_SEV_ES_GHCB. I suppose we could do that indirectly, e.g. through a capability. Hrm, but even if userspace can set the initial value, KVM would need to parse the min/max version so that KVM knows what *it* should support, which means that throwing in a GPA for selftests would screw things up.
Blech. Why do CPU folks insist on using ascending version numbers to bundle features?
Anyways, back to selftests. Since we apparently can't stuff MSR_AMD64_SEV_ES_GHCB from host userspace, what if we instead use a trampoline? Instead having vcpu_arch_set_entry_point() point directly at guest_code, point it at a trampoline for SEV-ES guests, and then have the trampoline set MSR_AMD64_SEV_ES_GHCB to the vCPU-specific GHCB before invoking guest_code().
Then we just need a register to stuff the GHCB into. Ah, and the actual guest entry point. GPRs are already part of selftest's "ABI", since they're set by vcpu_args_set(). And this is all 64-bit only, so we can use r10+.
Ugh, the complication is that the trampoline would need to save/restore RAX, RCX, and RDX in order to preserve the values from vcpu_args_set(), but that's just annoying, not hard. And I think it'd be less painful overall than having to create a GHCB pool?
In rough pseudo-asm, something like this?
static void vcpu_sev_es_guest_trampoline(void) { asm volatile(<save rax, rcx, rdx> "mov %%r15d, %%eax\n\t" "shr %%r15, $32\n\t" "mov %%r15d, %%eax\n\t" "mov $MSR_AMD64_SEV_ES_GHCB, %%ecx\n\t" <restore rax, rcx, rdx> "jmp %%r14") }
[*] https://lore.kernel.org/all/20240421180122.1650812-3-michael.roth@amd.com