The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
The Idle HLT intercept feature is used for enlightened guests who wish to securely handle the events. When an enlightened guest does a HLT while an interrupt is pending, hypervisor will not have a way to figure out whether the guest needs to be re-entered or not. The Idle HLT intercept feature allows the HLT execution only if there are no pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-x86/next (eb723766b103) + [2].
Testing Done: - Tested the functionality for the Idle HLT intercept feature using selftest ipi_hlt_test. - Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept functionality. - Tested the Idle HLT intercept functionality on nested guest.
v5 -> v6 - Incorporated Neeraj's review comments on selftest.
v4 -> v5 - Incorporated Sean's review comments on nested Idle HLT intercept support. - Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.
v3 -> v4 - Drop the patches to add vcpu_get_stat() into a new series [2]. - Added nested Idle HLT intercept support.
v2 -> v3 - Incorporated Andrew's suggestion to structure vcpu_stat_types in a way that each architecture can share the generic types and also provide its own.
v1 -> v2 - Did changes in svm_idle_hlt_test based on the review comments from Sean. - Added an enum based approach to get binary stats in vcpu_get_stat() which doesn't use string to get stat data based on the comments from Sean. - Added safe_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306251
[2]: https://lore.kernel.org/kvm/ee027335-f1b9-4637-bc79-27a610c1ab08@amd.com/T/#...
--- V5: https://lore.kernel.org/kvm/20250103081828.7060-1-manali.shukla@amd.com/ V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/ V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/ V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/ V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/
Manali Shukla (3): x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept KVM: SVM: Add Idle HLT intercept support KVM: selftests: Add self IPI HLT test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/svm/svm.c | 13 ++- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/include/x86/processor.h | 1 + tools/testing/selftests/kvm/ipi_hlt_test.c | 81 +++++++++++++++++++ 7 files changed, 97 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0 prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549 prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8 prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
From: Manali Shukla Manali.Shukla@amd.com
The Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending events (V_INTR and V_NMI) for the vCPU. When the vCPU is expected to service the pending events (V_INTR and V_NMI), the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is idle and reduces wasteful VMEXITs.
In addition to the aforementioned use case, the Idle HLT intercept feature is also used for enlightened guests who aim to securely manage events without the hypervisor’s awareness. If a HLT occurs while a virtual event is pending and the hypervisor is unaware of this pending event (as could be the case with enlightened guests), the absence of the Idle HLT intercept feature could result in a vCPU being suspended indefinitely.
Presence of Idle HLT intercept feature for guests is indicated via CPUID function 0x8000000A_EDX[30].
Signed-off-by: Manali Shukla Manali.Shukla@amd.com Acked-by: Borislav Petkov (AMD) bp@alien8.de --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 645aa360628d..4fef0533f764 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -386,6 +386,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* SVME addr check */ +#define X86_FEATURE_IDLE_HLT (15*32+30) /* IDLE HLT intercept */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* "avx512vbmi" AVX512 Vector Bit Manipulation instructions*/
base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0 prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549 prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8 prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
From: Manali Shukla Manali.Shukla@amd.com
The hypervisor can intercept the HLT instruction by setting the HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if there are pending V_INTR and V_NMI events, as the hypervisor must then initiate a VMRUN to handle them.
If the HLT-Intercept Bit is cleared and the vCPU executes HLT while there are pending V_INTR and V_NMI events, the hypervisor won’t detect them, potentially causing indefinite suspension of the vCPU. This poses a problem for enlightened guests who wish to securely handle the events.
For Secure AVIC scenarios, if a guest does a HLT while an interrupt is pending (in IRR), the hypervisor does not have a way to figure out whether the guest needs to be re-entered, as it cannot read the guest backing page. The Idle HLT intercept feature allows the hypervisor to intercept HLT execution only if there are no pending V_INTR and V_NMI events.
There are two use cases for the Idle HLT intercept feature: - Secure VMs that wish to handle pending events securely without exiting to the hypervisor on HLT (Secure AVIC). - Optimization for all the VMs to avoid a wasteful VMEXIT during HLT when there are pending events.
On discovering the Idle HLT Intercept, the KVM hypervisor, Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB. When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0), offset 0xFh) should be cleared.
Before entering the HLT state, the HLT instruction performs checks in following order: - The HLT intercept check, if set, it unconditionally triggers SVM_EXIT_HLT (0x78). - The Idle HLT intercept check, if set and there are no pending V_INTR or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6).
Details about the Idle HLT intercept feature can be found in AMD APM [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306250
Signed-off-by: Manali Shukla Manali.Shukla@amd.com Reviewed-by: Nikunj A Dadhania nikunj@amd.com --- arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 13 ++++++++++--- 3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 2b59b9951c90..992050cb83d0 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -116,6 +116,7 @@ enum { INTERCEPT_INVPCID, INTERCEPT_MCOMMIT, INTERCEPT_TLBSYNC, + INTERCEPT_IDLE_HLT = 166, };
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 1814b413fd57..ec1321248dac 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -95,6 +95,7 @@ #define SVM_EXIT_CR14_WRITE_TRAP 0x09e #define SVM_EXIT_CR15_WRITE_TRAP 0x09f #define SVM_EXIT_INVPCID 0x0a2 +#define SVM_EXIT_IDLE_HLT 0x0a6 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -224,6 +225,7 @@ { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ { SVM_EXIT_INVPCID, "invpcid" }, \ + { SVM_EXIT_IDLE_HLT, "idle-halt" }, \ { SVM_EXIT_NPF, "npf" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7640a84e554a..456d841298f7 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1297,8 +1297,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu) svm_set_intercept(svm, INTERCEPT_MWAIT); }
- if (!kvm_hlt_in_guest(vcpu->kvm)) - svm_set_intercept(svm, INTERCEPT_HLT); + if (!kvm_hlt_in_guest(vcpu->kvm)) { + if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT)) + svm_set_intercept(svm, INTERCEPT_IDLE_HLT); + else + svm_set_intercept(svm, INTERCEPT_HLT); + }
control->iopm_base_pa = iopm_base; control->msrpm_base_pa = __sme_set(__pa(svm->msrpm)); @@ -3342,6 +3346,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception, + [SVM_EXIT_IDLE_HLT] = kvm_emulate_halt, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, @@ -3504,7 +3509,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code) return interrupt_window_interception(vcpu); else if (exit_code == SVM_EXIT_INTR) return intr_interception(vcpu); - else if (exit_code == SVM_EXIT_HLT) + else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT) return kvm_emulate_halt(vcpu); else if (exit_code == SVM_EXIT_NPF) return npf_interception(vcpu); @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void) if (vnmi) kvm_cpu_cap_set(X86_FEATURE_VNMI);
+ kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT); + /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); }
On Tue, Jan 28, 2025, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The hypervisor can intercept the HLT instruction by setting the HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if there are pending V_INTR and V_NMI events, as the hypervisor must then initiate a VMRUN to handle them.
If the HLT-Intercept Bit is cleared and the vCPU executes HLT while there are pending V_INTR and V_NMI events, the hypervisor won’t detect them, potentially causing indefinite suspension of the vCPU. This poses a problem for enlightened guests who wish to securely handle the events.
For Secure AVIC scenarios, if a guest does a HLT while an interrupt is pending (in IRR), the hypervisor does not have a way to figure out whether the guest needs to be re-entered, as it cannot read the guest backing page. The Idle HLT intercept feature allows the hypervisor to intercept HLT execution only if there are no pending V_INTR and V_NMI events.
There are two use cases for the Idle HLT intercept feature:
- Secure VMs that wish to handle pending events securely without exiting to the hypervisor on HLT (Secure AVIC).
I honestly don't see any reason to talk about Secure AVIC. It takes a *lot* of background reading to understand how Secure AVIC actually works with Idle HLT. Critically, it requires a *reduction* in acceleration relative to what the APM calls "legacy" AVIC (lol), in that cross-vCPU IPIs are *never* accelerated. Ignoring device posted interrupts, lack of IPI virtualization means that KVM always has visibility into *new* interrupts. The only blind spot is self-IPI and interrupts that were already made pending by KVM.
- Optimization for all the VMs to avoid a wasteful VMEXIT during HLT when there are pending events.
On discovering the Idle HLT Intercept, the KVM hypervisor, Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB. When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0), offset 0xFh) should be cleared.
Before entering the HLT state, the HLT instruction performs checks in following order:
- The HLT intercept check, if set, it unconditionally triggers SVM_EXIT_HLT (0x78).
- The Idle HLT intercept check, if set and there are no pending V_INTR or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6).
Details about the Idle HLT intercept feature can be found in AMD APM [1].
This is not a useful changelog. While I do care about AMD's motivation for adding Idle HLT, it's far, far more important that the changelog cover (a) what is (and isn't) being changed in KVM, (b) what edge cases are (or aren't) being handled, and (c) why the KVM implementation is correct.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306250
*sigh*
The inscrutable reference to the APM is longer than the table entry. Just copy the table entry.
When both HLT and Idle HLT intercepts are active at the same time, the HLT intercept takes priority. This intercept occurs only if a virtual interrupt is not pending (V_INTR or V_NMI).
Signed-off-by: Manali Shukla Manali.Shukla@amd.com Reviewed-by: Nikunj A Dadhania nikunj@amd.com
arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 13 ++++++++++--- 3 files changed, 13 insertions(+), 3 deletions(-)
...
@@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void) if (vnmi) kvm_cpu_cap_set(X86_FEATURE_VNMI);
kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
I am 99% certain this is wrong. Or at the very least, severly lacking an explanation of why it's correct. If L1 enables Idle HLT but not HLT interception, then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or V_NMI.
Yeah, it's buggy. But, it's buggy in part because *existing* KVM support is buggy. If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events prior to scheduling out the vCPU if L2 executes HLT. E.g. nVMX handles this by reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events(). I don't see the equivalent in nSVM.
Amusingly, that means Idle HLT is actually a bug fix to some extent. E.g. if there is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do the right thing, i.e. not hang the vCPU.
Anyways, for now, I think the easiest and best option is to simply skip full nested support for the moment.
I rewrote the changelog as I was going (I didn't expect to go down so many rabbit holes), and ended up with the below. Please review and let me know if I missed any wrinkles and/or got anything wrong.
--- From 05281b3782b7f880b3afd68e0074bf3abf6d55a7 Mon Sep 17 00:00:00 2001 From: Manali Shukla Manali.Shukla@amd.com Date: Tue, 28 Jan 2025 12:48:11 +0000 Subject: [PATCH] KVM: SVM: Add Idle HLT intercept support
Add support for "Idle HLT" interception on AMD CPUs, and enable Idle HLT interception instead of "normal" HLT interception for all VMs for which HLT-exiting is enabled. Idle HLT provides a mild performance boost for all VM types, by avoiding a VM-Exit in the scenario where KVM would immediately "wake" and resume the vCPU.
Idle HLT makes HLT-exiting conditional on the vCPU not having a valid, unmasked interrupt. Specifically, a VM-Exit occurs on execution of HLT if and only if there are no pending V_IRQ or V_NMI events. Note, Idle is a replacement for full HLT interception, i.e. enabling HLT interception would result in all HLT instructions causing unconditional VM-Exits. Per the APM:
When both HLT and Idle HLT intercepts are active at the same time, the HLT intercept takes priority. This intercept occurs only if a virtual interrupt is not pending (V_INTR or V_NMI).
For KVM's use of V_IRQ (also called V_INTR in the APM) to detect interrupt windows, the net effect of enabling Idle HLT is that, if a virtual interupt is pending and unmasked at the time of HLT, the vCPU will take a V_IRQ intercept instead of a HLT intercept.
When AVIC is enabled, Idle HLT works as intended: the vCPU continues unimpeded and services the pending virtual interrupt.
Note, the APM's description of V_IRQ interaction with AVIC is quite confusing, and requires piecing together implied behavior. Per the APM, when AVIC is enabled, V_IRQ *from the VMCB* is ignored:
When AVIC mode is enabled for a virtual processor, the V_IRQ, V_INTR_PRIO, V_INTR_VECTOR, and V_IGN_TPR fields in the VMCB are ignored.
Which seems to contradict the behavior of Idle HLT:
This intercept occurs only if a virtual interrupt is not pending (V_INTR or V_NMI).
What's not explicitly stated is that hardware's internal copy of V_IRQ (and related fields) *are* still active, i.e. are presumably used to cache information from the virtual APIC.
Handle Idle HLT exits as if they were normal HLT exits, e.g. don't try to optimize the handling under the assumption that there isn't a pending IRQ. Irrespective of AVIC, Idle HLT is inherently racy with respect to the vIRR, as KVM can set vIRR bits asychronously.
No changes are required to support KVM's use Idle HLT while running L2. In fact, supporting Idle HLT is actually a bug fix to some extent. If L1 wants to intercept HLT, recalc_intercepts() will enable HLT interception in vmcb02 and forward the intercept to L1 as normal.
But if L1 does not want to intercept HLT, then KVM will run L2 with Idle HLT enabled and HLT interception disabled. If a V_IRQ or V_NMI for L2 becomes pending and L2 executes HLT, then use of Idle HLT will do the right thing, i.e. not #VMEXIT and instead deliver the virtual event. KVM currently doesn't handle this scenario correctly, e.g. doesn't check V_IRQ or V_NMI in vmcs02 as part of kvm_vcpu_has_events().
Do not expose Idle HLT to L1 at this time, as supporting nested Idle HLT is more complex than just enumerating the feature, e.g. requires KVM to handle the aforementioned scenarios of V_IRQ and V_NMI at the time of exit.
Signed-off-by: Manali Shukla Manali.Shukla@amd.com Reviewed-by: Nikunj A Dadhania nikunj@amd.com Link: https://bugzilla.kernel.org/attachment.cgi?id=306250 Link: https://lore.kernel.org/r/20250128124812.7324-3-manali.shukla@amd.com [sean: rewrite changelog, drop nested "support"] Signed-off-by: Sean Christopherson seanjc@google.com --- arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 11 ++++++++--- 3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index e2fac21471f5..12a9dde1e842 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -116,6 +116,7 @@ enum { INTERCEPT_INVPCID, INTERCEPT_MCOMMIT, INTERCEPT_TLBSYNC, + INTERCEPT_IDLE_HLT = 166, };
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 1814b413fd57..ec1321248dac 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -95,6 +95,7 @@ #define SVM_EXIT_CR14_WRITE_TRAP 0x09e #define SVM_EXIT_CR15_WRITE_TRAP 0x09f #define SVM_EXIT_INVPCID 0x0a2 +#define SVM_EXIT_IDLE_HLT 0x0a6 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -224,6 +225,7 @@ { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ { SVM_EXIT_INVPCID, "invpcid" }, \ + { SVM_EXIT_IDLE_HLT, "idle-halt" }, \ { SVM_EXIT_NPF, "npf" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7640a84e554a..37e83bde8f9f 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1297,8 +1297,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu) svm_set_intercept(svm, INTERCEPT_MWAIT); }
- if (!kvm_hlt_in_guest(vcpu->kvm)) - svm_set_intercept(svm, INTERCEPT_HLT); + if (!kvm_hlt_in_guest(vcpu->kvm)) { + if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT)) + svm_set_intercept(svm, INTERCEPT_IDLE_HLT); + else + svm_set_intercept(svm, INTERCEPT_HLT); + }
control->iopm_base_pa = iopm_base; control->msrpm_base_pa = __sme_set(__pa(svm->msrpm)); @@ -3342,6 +3346,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception, + [SVM_EXIT_IDLE_HLT] = kvm_emulate_halt, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, @@ -3504,7 +3509,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code) return interrupt_window_interception(vcpu); else if (exit_code == SVM_EXIT_INTR) return intr_interception(vcpu); - else if (exit_code == SVM_EXIT_HLT) + else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT) return kvm_emulate_halt(vcpu); else if (exit_code == SVM_EXIT_NPF) return npf_interception(vcpu);
base-commit: b9cd96a7ff9cc9ddf95de59d69afb174a9e90c6e
From: Manali Shukla Manali.Shukla@amd.com
The IPI HLT test simulates a scenario where a pending event is present while the HLT instruction is executed.
This test evaluates the idle HLT intercept feature of the AMD architecture, if available. If the feature is not present, the test exercises halt-exits/guest-entry for pending interrupts. It can be extended in the future to include cross-vCPU IPI testing.
Suggested-by: Sean Christopherson seanjc@google.com Signed-off-by: Manali Shukla Manali.Shukla@amd.com --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/include/x86/processor.h | 1 + tools/testing/selftests/kvm/ipi_hlt_test.c | 81 +++++++++++++++++++ 3 files changed, 83 insertions(+) create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index 4277b983cace..d6eda8c19fed 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -135,6 +135,7 @@ TEST_GEN_PROGS_x86 += steal_time TEST_GEN_PROGS_x86 += kvm_binary_stats_test TEST_GEN_PROGS_x86 += system_counter_offset_test TEST_GEN_PROGS_x86 += pre_fault_memory_test +TEST_GEN_PROGS_x86 += ipi_hlt_test
# Compiled outputs used by test targets TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h index e28c3e462fa7..fe0ed8d33e37 100644 --- a/tools/testing/selftests/kvm/include/x86/processor.h +++ b/tools/testing/selftests/kvm/include/x86/processor.h @@ -200,6 +200,7 @@ struct kvm_x86_cpu_feature { #define X86_FEATURE_PAUSEFILTER KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 10) #define X86_FEATURE_PFTHRESHOLD KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 12) #define X86_FEATURE_VGIF KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 16) +#define X86_FEATURE_IDLE_HLT KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 30) #define X86_FEATURE_SEV KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 1) #define X86_FEATURE_SEV_ES KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 3) #define X86_FEATURE_PERFMON_V2 KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 0) diff --git a/tools/testing/selftests/kvm/ipi_hlt_test.c b/tools/testing/selftests/kvm/ipi_hlt_test.c new file mode 100644 index 000000000000..449845fa2a82 --- /dev/null +++ b/tools/testing/selftests/kvm/ipi_hlt_test.c @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * + */ +#include <kvm_util.h> +#include <processor.h> +#include <test_util.h> +#include "apic.h" + +#define INTR_VECTOR 0x30 +#define NUM_ITERATIONS 1000 + +static bool irq_received; + +/* + * The guest code instruments the scenario where there is a V_INTR pending + * event available while hlt instruction is executed. + */ + +static void guest_code(void) +{ + uint64_t icr_val; + int i; + + x2apic_enable(); + + icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | INTR_VECTOR); + + for (i = 0; i < NUM_ITERATIONS; i++) { + cli(); + x2apic_write_reg(APIC_ICR, icr_val); + safe_halt(); + GUEST_ASSERT(READ_ONCE(irq_received)); + WRITE_ONCE(irq_received, false); + } + GUEST_DONE(); +} + +static void guest_intr_handler(struct ex_regs *regs) +{ + WRITE_ONCE(irq_received, true); + x2apic_write_reg(APIC_EOI, 0x00); +} + +int main(int argc, char *argv[]) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + struct ucall uc; + uint64_t halt_exits; + + TEST_REQUIRE(kvm_has_cap(KVM_CAP_BINARY_STATS_FD)); + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + + vm_install_exception_handler(vm, INTR_VECTOR, guest_intr_handler); + + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + halt_exits = vcpu_get_stat(vcpu, halt_exits); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + /* NOT REACHED */ + case UCALL_DONE: + break; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } + + if (kvm_cpu_has(X86_FEATURE_IDLE_HLT)) + TEST_ASSERT_EQ(halt_exits, 0); + else + TEST_ASSERT_EQ(halt_exits, NUM_ITERATIONS); + + kvm_vm_free(vm); + return 0; +}
On 1/28/2025 6:18 PM, Manali Shukla wrote:
From: Manali Shukla Manali.Shukla@amd.com
The IPI HLT test simulates a scenario where a pending event is present while the HLT instruction is executed.
This test evaluates the idle HLT intercept feature of the AMD architecture, if available. If the feature is not present, the test exercises halt-exits/guest-entry for pending interrupts. It can be extended in the future to include cross-vCPU IPI testing.
Suggested-by: Sean Christopherson seanjc@google.com Signed-off-by: Manali Shukla Manali.Shukla@amd.com
Reviewed-by: Neeraj Upadhyay Neeraj.Upadhyay@amd.com
- Neeraj
On Tue, Jan 28, 2025, Manali Shukla wrote:
- if (kvm_cpu_has(X86_FEATURE_IDLE_HLT))
Well, shoot. I gave you bad input, and we're stuck.
this_cpu_has() isn't correct, because the part of my previous feedback about needing to check *KVM* support was 100% correct. But kvm_cpu_has() isn't right either, because that checks what KVM supports exposing to the guest, not what KVM itself supports/uses. E.g. even if we add full nested support, the test would fail if nested=0 due to KVM not "supporting" Idle HLT despite using it under the hood.
The lack of a way for KVM to communicate support to the user has come up in the past, e.g. in discussion around /proc/cpuinfo. Sadly, AFAIK there are no (good) ideas on what that should look like.
For now, I'll just skip this patch, even though doing so makes me quite sad.
On 1/28/2025 6:18 PM, Manali Shukla wrote:
The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
The Idle HLT intercept feature is used for enlightened guests who wish to securely handle the events. When an enlightened guest does a HLT while an interrupt is pending, hypervisor will not have a way to figure out whether the guest needs to be re-entered or not. The Idle HLT intercept feature allows the HLT execution only if there are no pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-x86/next (eb723766b103) + [2].
Testing Done:
- Tested the functionality for the Idle HLT intercept feature using selftest ipi_hlt_test.
- Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept functionality.
- Tested the Idle HLT intercept functionality on nested guest.
v5 -> v6
- Incorporated Neeraj's review comments on selftest.
v4 -> v5
- Incorporated Sean's review comments on nested Idle HLT intercept support.
- Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.
v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in a way that each architecture can share the generic types and also provide its own.
v1 -> v2
- Did changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which doesn't use string to get stat data based on the comments from Sean.
- Added safe_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306251
V5: https://lore.kernel.org/kvm/20250103081828.7060-1-manali.shukla@amd.com/ V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/ V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/ V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/ V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/
Manali Shukla (3): x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept KVM: SVM: Add Idle HLT intercept support KVM: selftests: Add self IPI HLT test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/svm/svm.c | 13 ++- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/include/x86/processor.h | 1 + tools/testing/selftests/kvm/ipi_hlt_test.c | 81 +++++++++++++++++++ 7 files changed, 97 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0 prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549 prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8 prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
A gentle reminder for the review.
-Manali
On 2/10/2025 10:36 AM, Manali Shukla wrote:
On 1/28/2025 6:18 PM, Manali Shukla wrote:
The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
The Idle HLT intercept feature is used for enlightened guests who wish to securely handle the events. When an enlightened guest does a HLT while an interrupt is pending, hypervisor will not have a way to figure out whether the guest needs to be re-entered or not. The Idle HLT intercept feature allows the HLT execution only if there are no pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-x86/next (eb723766b103) + [2].
Testing Done:
- Tested the functionality for the Idle HLT intercept feature using selftest ipi_hlt_test.
- Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept functionality.
- Tested the Idle HLT intercept functionality on nested guest.
v5 -> v6
- Incorporated Neeraj's review comments on selftest.
v4 -> v5
- Incorporated Sean's review comments on nested Idle HLT intercept support.
- Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.
v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in a way that each architecture can share the generic types and also provide its own.
v1 -> v2
- Did changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which doesn't use string to get stat data based on the comments from Sean.
- Added safe_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT). https://bugzilla.kernel.org/attachment.cgi?id=306251
V5: https://lore.kernel.org/kvm/20250103081828.7060-1-manali.shukla@amd.com/ V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/ V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/ V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/ V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/
Manali Shukla (3): x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept KVM: SVM: Add Idle HLT intercept support KVM: selftests: Add self IPI HLT test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/svm/svm.c | 13 ++- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/include/x86/processor.h | 1 + tools/testing/selftests/kvm/ipi_hlt_test.c | 81 +++++++++++++++++++ 7 files changed, 97 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0 prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549 prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8 prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
A gentle reminder for the review.
-Manali
A gentle reminder for the review.
-Manali
linux-kselftest-mirror@lists.linaro.org