This series introduces a new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT. It allows hypervisors to inhibit remote TLB flushing of a vCPU coming from Hyper-V hyper-calls (namely HvFlushVirtualAddressSpace(Ex) and HvFlushirtualAddressList(Ex)). It is required to implement the HvTranslateVirtualAddress hyper-call as part of the ongoing effort to emulate VSM within KVM and QEMU. The hyper-call requires several new KVM APIs, one of which is KVM_HYPERV_SET_TLB_FLUSH_INHIBIT.
Once the inhibit flag is set, any processor attempting to flush the TLB on the marked vCPU, with a HyperV hyper-call, will be suspended until the flag is cleared again. During the suspension the vCPU will not run at all, neither receiving events nor running other code. It will wake up from suspension once the vCPU it is waiting on clears the inhibit flag. This behaviour is specified in Microsoft's "Hypervisor Top Level Functional Specification" (TLFS).
The vCPU will block execution during the suspension, making it transparent to the hypervisor. An alternative design to what is proposed here would be to exit from the Hyper-V hypercall upon finding an inhibited vCPU. We decided against it, to allow for a simpler and more performant implementation. Exiting to user space would create an additional synchronisation burden and make the resulting code more complex. Additionally, since the suspension is specific to HyperV events, it wouldn't provide any functional benefits.
The TLFS specifies that the instruction pointer is not moved during the suspension, so upon unsuspending the hyper-calls is re-executed. This means that, if the vCPU encounters another inhibited TLB and is resuspended, any pending events and interrupts are still executed. This is identical to the vCPU receiving such events right before the hyper-call.
This inhibiting of TLB flushes is necessary, to securely implement intercepts. These allow a higher "Virtual Trust Level" (VTL) to react to a lower VTL accessing restricted memory. In such an intercept the VTL may want to emulate a memory access in software, however, if another processor flushes the TLB during that operation, incorrect behaviour can result.
The patch series includes basic testing of the ioctl and suspension state. All previously passing KVM selftests and KVM unit tests still pass.
Series overview: - 1: Document the new ioctl - 2: Implement the suspension state - 3: Update TLB flush hyper-call in preparation - 4-5: Implement the ioctl - 6: Add traces - 7: Implement testing
As the suspension state is transparent to the hypervisor, testing is complicated. The current version makes use of a set time intervall to give the vCPU time to enter the hyper-call and get suspended. Ideas for improvement on this are very welcome.
This series, alongside my series [1] implementing KVM_TRANSLATE2, the series by Nicolas Saenz Julienne [2] implementing the core building blocks for VSM and the accompanying QEMU implementation [3], is capable of booting Windows Server 2019 with VSM/CredentialGuard enabled.
All three series are also available on GitHub [4].
[1] https://lore.kernel.org/linux-kernel/20240910152207.38974-1-nikwip@amazon.de... [2] https://lore.kernel.org/linux-hyperv/20240609154945.55332-1-nsaenz@amazon.co... [3] https://github.com/vianpl/qemu/tree/vsm/next [4] https://github.com/vianpl/linux/tree/vsm/next
Best, Nikolas
Nikolas Wipper (7): KVM: Add API documentation for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT KVM: x86: Implement Hyper-V's vCPU suspended state KVM: x86: Check vCPUs before enqueuing TLB flushes in kvm_hv_flush_tlb() KVM: Introduce KVM_HYPERV_SET_TLB_FLUSH_INHIBIT KVM: x86: Implement KVM_HYPERV_SET_TLB_FLUSH_INHIBIT KVM: x86: Add trace events to track Hyper-V suspensions KVM: selftests: Add tests for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
Documentation/virt/kvm/api.rst | 41 +++ arch/x86/include/asm/kvm_host.h | 5 + arch/x86/kvm/hyperv.c | 86 +++++- arch/x86/kvm/hyperv.h | 17 ++ arch/x86/kvm/trace.h | 39 +++ arch/x86/kvm/x86.c | 41 ++- include/uapi/linux/kvm.h | 15 + tools/testing/selftests/kvm/Makefile | 1 + .../kvm/x86_64/hyperv_tlb_flush_inhibit.c | 274 ++++++++++++++++++ 9 files changed, 503 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush_inhibit.c