KVM's implementation of nested SVM treats PAT the same way whether or not nested NPT is enabled: L1 and L2 share a PAT.
This is correct when nested NPT is disabled, but incorrect when nested NPT is enabled. When nested NPT is enabled, L1 and L2 have independent PATs.
The architectural specification for this separation is unusual. There is a "guest PAT register" that is accessed by references to the PAT MSR in guest mode, but it is different from the (host) PAT MSR. Other resources that have distinct host and guest values have a shared storage location, and the values are swapped on VM-entry/VM-exit.
In https://lore.kernel.org/kvm/20251107201151.3303170-1-jmattson@google.com/, I proposed an implementation that adhered to the architectural specification. It had a few warts. The worst was the necessity of "fixing up" KVM_SET_MSRS when executing KVM_SET_NESTED_STATE if L2 was active and nested NPT was enabled when a snapshot was taken. Aside from Yosry's clarification, no one has responded. I will take silence to imply rejection. That's okay; I wasn't fond of that implementation myself.
The current series treats PAT just like any other resource with distinct host and guest values. There is a single shared storage location (vcpu->arch.pat), and the values are swapped on VM-entry/VM-exit. Though this implementation doesn't precisely follow the architectural specification, the guest visible behavior is the same as architected.
The first three patches ensure that the vmcb01.g_pat value at VMRUN is preserved through virtual SMM and serialization. When NPT is enabled, this field holds the host (L1) hPAT value from emulated VMRUN to emulated #VMEXIT.
The fourth patch restores (L1) hPAT value from vmcb01.g_pat at emulated #VMEXIT. Note that this is not architected, but it is required for this implementation, because hPAT and gPAT occupy the same storage location.
The next three patches handle loading vmcb12.g_pat into the (L2) guest PAT register at VMRUN. Most of this behavior is architected, but the architectural specification states that the value is loaded into the guest PAT register, leaving the hPAT register unchanged.
The eighth patch stores the (L2) guest PAT register into vmcb12_g_pat on emulated #VMEXIT, as architected.
The ninth patch fixes the emulation of WRMSR(IA32_PAT) when nested NPT is enabled.
The tenth patch introduces a new KVM selftest to validate virtualized PAT behavior.
Jim Mattson (10): KVM: x86: nSVM: Add g_pat to fields copied by svm_copy_vmrun_state() KVM: x86: nSVM: Add VALID_GPAT flag to kvm_svm_nested_state_hdr KVM: x86: nSVM: Handle legacy SVM nested state in SET_NESTED_STATE KVM: x86: nSVM: Restore L1's PAT on emulated #VMEXIT from L2 to L1 KVM: x86: nSVM: Cache g_pat in vmcb_save_area_cached KVM: x86: nSVM: Add validity check for VMCB12 g_pat KVM: x86: nSVM: Set vmcb02.g_pat correctly for nested NPT KVM: x86: nSVM: Save gPAT to vmcb12.g_pat on emulated #VMEXIT from L2 to L1 KVM: x86: nSVM: Fix assignment to IA32_PAT from L2 KVM: selftests: nSVM: Add svm_nested_pat test
arch/x86/include/uapi/asm/kvm.h | 3 + arch/x86/kvm/svm/nested.c | 74 +++- arch/x86/kvm/svm/svm.c | 14 +- arch/x86/kvm/svm/svm.h | 2 +- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/x86/svm_nested_pat_test.c | 357 ++++++++++++++++++ 6 files changed, 432 insertions(+), 19 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_pat_test.c
base-commit: f62b64b970570c92fe22503b0cdc65be7ce7fc7c
The vmcb01 g_pat field holds the value of L1's IA32_PAT MSR. To preserve this value through virtual SMM and serialization, add g_pat to the fields copied by svm_copy_vmrun_state().
Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index f295a41ec659..a0e5bf1aba52 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1090,6 +1090,7 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save, to_save->gdtr = from_save->gdtr; to_save->idtr = from_save->idtr; to_save->rflags = from_save->rflags | X86_EFLAGS_FIXED; + to_save->g_pat = from_save->g_pat; to_save->efer = from_save->efer; to_save->cr0 = from_save->cr0; to_save->cr3 = from_save->cr3;
Now that the svm_copy_vmrun_state() copies the g_pat field, L1's g_pat will be stored in the serialized nested state. Add a 'flags' field to the SVM nested state header, define a VALID_GPAT flag, and start reporting this flag in the serialized nested state populated by KVM_GET_NESTED_STATE.
Note that struct kvm_svm_nested_state_hdr is included in a union padded to 120 bytes, so there is room to add the flags field without changing any offsets.
Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/include/uapi/asm/kvm.h | 3 +++ arch/x86/kvm/svm/nested.c | 1 + 2 files changed, 4 insertions(+)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 7ceff6583652..18581c4b2511 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -495,6 +495,8 @@ struct kvm_sync_regs {
#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
+#define KVM_STATE_SVM_VALID_GPAT 0x00000001 + /* vendor-independent attributes for system fd (group 0) */ #define KVM_X86_GRP_SYSTEM 0 # define KVM_X86_XCOMP_GUEST_SUPP 0 @@ -530,6 +532,7 @@ struct kvm_svm_nested_state_data {
struct kvm_svm_nested_state_hdr { __u64 vmcb_pa; + __u32 flags; };
/* for KVM_CAP_NESTED_STATE */ diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index a0e5bf1aba52..ed24e08d2d21 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1769,6 +1769,7 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu, /* First fill in the header and copy it out. */ if (is_guest_mode(vcpu)) { kvm_state.hdr.svm.vmcb_pa = svm->nested.vmcb12_gpa; + kvm_state.hdr.svm.flags = KVM_STATE_SVM_VALID_GPAT; kvm_state.size += KVM_STATE_NESTED_SVM_VMCB_SIZE; kvm_state.flags |= KVM_STATE_NESTED_GUEST_MODE;
Previously, KVM didn't record the vmcb01 G_PAT (i.e. the IA32_PAT MSR) in the serialized nested state. It didn't have to, because it ignored the vmcb12 g_pat field entirely. L1 and L2 simply shared the same PAT.
To preserve legacy behavior, copy the current value of the IA32_PAT MSR to the location of the vmcb01 G_PAT in the serialized nested state. (On restore, KVM_SET_MSRS should be called before KVM_SET_NESTED_STATE, so the value of the shared IA32_PAT MSR should be available.)
Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index ed24e08d2d21..c751be470364 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1884,6 +1884,13 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, if (((cr0 & X86_CR0_CD) == 0) && (cr0 & X86_CR0_NW)) goto out_free;
+ /* + * If kvm_state doesn't have a valid saved L1 g_pat, use the + * PAT MSR instead. This preserves the legacy behavior. + */ + if (!(kvm_state->hdr.svm.flags & KVM_STATE_SVM_VALID_GPAT)) + save->g_pat = vcpu->arch.pat; + /* * Validate host state saved from before VMRUN (see * nested_svm_check_permissions).
KVM doesn't implement a separate G_PAT register to hold the guest's PAT in guest mode with nested NPT enabled. Consequently, L1's IA32_PAT MSR must be restored on emulated #VMEXIT from L2 to L1.
Note: if L2 uses shadow paging, L1 and L2 share the same IA32_PAT MSR.
Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index c751be470364..9aec836ac04c 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1292,6 +1292,16 @@ int nested_svm_vmexit(struct vcpu_svm *svm) kvm_rsp_write(vcpu, vmcb01->save.rsp); kvm_rip_write(vcpu, vmcb01->save.rip);
+ /* + * KVM doesn't implement a separate guest PAT + * register. Instead, the guest PAT lives in vcpu->arch.pat + * while in guest mode with nested NPT enabled. Hence, the + * IA32_PAT MSR has to be restored from the vmcb01 g_pat at + * #VMEXIT. + */ + if (nested_npt_enabled(svm)) + vcpu->arch.pat = vmcb01->save.g_pat; + svm->vcpu.arch.dr7 = DR7_FIXED_1; kvm_update_dr7(&svm->vcpu);
The g_pat field from the vmcb12 save state area must be validated. To accommodate validation without TOCTTOU issues, add a g_pat field to the vmcb_save_area_cached struct, and include it in the fields copied by __nested_copy_vmcb_save_to_cache().
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 2 ++ arch/x86/kvm/svm/svm.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 9aec836ac04c..ad9272aae908 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -506,6 +506,8 @@ static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
to->dr6 = from->dr6; to->dr7 = from->dr7; + + to->g_pat = from->g_pat; }
void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm, diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 7d28a739865f..39138378531e 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -145,6 +145,7 @@ struct vmcb_save_area_cached { u64 cr0; u64 dr7; u64 dr6; + u64 g_pat; };
struct vmcb_ctrl_area_cached {
When nested paging is enabled for VMCB12, an invalid g_pat causes an immediate #VMEXIT with exit code VMEXIT_INVALID, as specified in the APM, volume 2: "Nested Paging and VMRUN/#VMEXIT."
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index ad9272aae908..501102625f69 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -369,7 +369,8 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
/* Common checks that apply to both L1 and L2 state. */ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu, - struct vmcb_save_area_cached *save) + struct vmcb_save_area_cached *save, + struct vmcb_ctrl_area_cached *control) { if (CC(!(save->efer & EFER_SVME))) return false; @@ -400,6 +401,10 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu, if (CC(!kvm_valid_efer(vcpu, save->efer))) return false;
+ if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && + npt_enabled && !kvm_pat_valid(save->g_pat))) + return false; + return true; }
@@ -407,8 +412,9 @@ static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); struct vmcb_save_area_cached *save = &svm->nested.save; + struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
- return __nested_vmcb_check_save(vcpu, save); + return __nested_vmcb_check_save(vcpu, save, ctl); }
static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu) @@ -1911,7 +1917,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, if (!(save->cr0 & X86_CR0_PG) || !(save->cr0 & X86_CR0_PE) || (save->rflags & X86_EFLAGS_VM) || - !__nested_vmcb_check_save(vcpu, &save_cached)) + !__nested_vmcb_check_save(vcpu, &save_cached, &ctl_cached)) goto out_free;
When nested NPT is enabled in VMCB12, copy the (cached and validated) VMCB12 g_pat field to the IA32_PAT MSR and to the VMCB02 g_pat field. (The latter can be skipped if the VMCB02 g_pat field already has the correct value.)
When NPT is enabled, but nested NPT is disabled, copy L1's IA32_PAT MSR to the VMCB02 g_pat field (L1 and L2 share the same IA32_PAT MSR in this scenario).
When NPT is disabled, the VMCB02 g_pat field is ignored by hardware.
Fixes: 15038e147247 ("KVM: SVM: obey guest PAT") Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 501102625f69..90edea73ec58 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -656,9 +656,6 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12 struct vmcb *vmcb02 = svm->nested.vmcb02.ptr; struct kvm_vcpu *vcpu = &svm->vcpu;
- nested_vmcb02_compute_g_pat(svm); - vmcb_mark_dirty(vmcb02, VMCB_NPT); - /* Load the nested guest state */ if (svm->nested.vmcb12_gpa != svm->nested.last_vmcb12_gpa) { new_vmcb12 = true; @@ -666,6 +663,26 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12 svm->nested.force_msr_bitmap_recalc = true; }
+ if (npt_enabled) { + if (nested_npt_enabled(svm)) { + /* + * KVM doesn't implement a separate guest PAT + * register. Instead, the guest PAT lives in + * vcpu->arch.pat while in guest mode with + * nested NPT enabled. + */ + vcpu->arch.pat = svm->nested.save.g_pat; + if (unlikely(new_vmcb12 || + vmcb_is_dirty(vmcb12, VMCB_NPT))) { + vmcb02->save.g_pat = svm->nested.save.g_pat; + vmcb_mark_dirty(vmcb02, VMCB_NPT); + } + } else { + vmcb02->save.g_pat = vcpu->arch.pat; + vmcb_mark_dirty(vmcb02, VMCB_NPT); + } + } + if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_SEG))) { vmcb02->save.es = vmcb12->save.es; vmcb02->save.cs = vmcb12->save.cs;
According to the APM volume 3 pseudo-code for "VMRUN," when nested paging is enabled in the VMCB, the guest PAT register (gPAT) is saved to the VMCB on #VMEXIT.
KVM doesn't implement a separate gPAT register. Instead, the guest PAT is stored in the IA32_PAT MSR while in guest mode (L2) and nested NPT is enabled in vmcs02.
Save the current IA32_PAT MSR to the vmcb12 g_pat field on emulated #VMEXIT from L2 to L1.
Fixes: 15038e147247 ("KVM: SVM: obey guest PAT") Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 90edea73ec58..5fbe730d4c69 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1197,6 +1197,15 @@ int nested_svm_vmexit(struct vcpu_svm *svm) vmcb12->save.dr6 = svm->vcpu.arch.dr6; vmcb12->save.cpl = vmcb02->save.cpl;
+ /* + * KVM stores the guest PAT in the IA32_PAT register while in + * guest mode with nested NPT enabled (rather than in a + * separate G_PAT register). Hence, the IA32_PAT MSR is stored + * in the VMCB12 g_pat field on #VMEXIT. + */ + if (nested_npt_enabled(svm)) + vmcb12->save.g_pat = vcpu->arch.pat; + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) { vmcb12->save.s_cet = vmcb02->save.s_cet; vmcb12->save.isst_addr = vmcb02->save.isst_addr;
In svm_set_msr(), when the IA32_PAT MSR is updated, up to two vmcb g_pat fields must be updated.
When NPT is disabled, no g_pat fields have to be updated, as they are ignored by hardware.
When NPT is enabled, the current VMCB (either VMCB01 or VMCB02) g_pat field must be updated.
In addition, when in guest mode and nested NPT is disabled, the VMCB01 g_pat field must be updated. In this scenario, L1 and L2 share the same IA32_PAT MSR.
Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Jim Mattson jmattson@google.com --- arch/x86/kvm/svm/nested.c | 9 --------- arch/x86/kvm/svm/svm.c | 14 +++++++++++--- arch/x86/kvm/svm/svm.h | 1 - 3 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 5fbe730d4c69..b9b8d26db8dc 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -640,15 +640,6 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, return 0; }
-void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm) -{ - if (!svm->nested.vmcb02.ptr) - return; - - /* FIXME: merge g_pat from vmcb01 and vmcb12. */ - svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat; -} - static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12) { bool new_vmcb12 = false; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7041498a8091..74130d67a372 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2933,10 +2933,18 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) if (ret) break;
- svm->vmcb01.ptr->save.g_pat = data; - if (is_guest_mode(vcpu)) - nested_vmcb02_compute_g_pat(svm); + if (!npt_enabled) + break; + + svm->vmcb->save.g_pat = data; vmcb_mark_dirty(svm->vmcb, VMCB_NPT); + + if (!is_guest_mode(vcpu) || nested_npt_enabled(svm)) + break; + + svm->vmcb01.ptr->save.g_pat = data; + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_NPT); + break; case MSR_IA32_SPEC_CTRL: if (!msr->host_initiated && diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 39138378531e..b25f06ec1c9c 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -801,7 +801,6 @@ void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm, void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm, struct vmcb_save_area *save); void nested_sync_control_from_vmcb02(struct vcpu_svm *svm); -void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm); void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
extern struct kvm_x86_nested_ops svm_nested_ops;
Verify KVM's virtualization of the PAT MSR and--when nested NPT is enabled--the VMCB12 g_pat field and the guest PAT register.
Signed-off-by: Jim Mattson jmattson@google.com --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/x86/svm_nested_pat_test.c | 357 ++++++++++++++++++ 2 files changed, 358 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/svm_nested_pat_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index 33ff81606638..27f8087eafec 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -109,6 +109,7 @@ TEST_GEN_PROGS_x86 += x86/state_test TEST_GEN_PROGS_x86 += x86/vmx_preemption_timer_test TEST_GEN_PROGS_x86 += x86/svm_vmcall_test TEST_GEN_PROGS_x86 += x86/svm_int_ctl_test +TEST_GEN_PROGS_x86 += x86/svm_nested_pat_test TEST_GEN_PROGS_x86 += x86/svm_nested_shutdown_test TEST_GEN_PROGS_x86 += x86/svm_nested_soft_inject_test TEST_GEN_PROGS_x86 += x86/tsc_scaling_sync diff --git a/tools/testing/selftests/kvm/x86/svm_nested_pat_test.c b/tools/testing/selftests/kvm/x86/svm_nested_pat_test.c new file mode 100644 index 000000000000..fa016e65dbf6 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/svm_nested_pat_test.c @@ -0,0 +1,357 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * KVM nested SVM PAT test + * + * Copyright (C) 2026, Google LLC. + * + * Test that KVM correctly virtualizes the PAT MSR and VMCB g_pat field + * for nested SVM guests: + * + * o With nested NPT disabled: + * - L1 and L2 share the same PAT + * - The vmcb12.g_pat is ignored + * o With nested NPT enabled: + * - Invalid g_pat in vmcb12 should cause VMEXIT_INVALID + * - L2 should see vmcb12.g_pat via RDMSR, not L1's PAT + * - L2's writes to PAT should be saved to vmcb12 on exit + * - L1's PAT should be restored after #VMEXIT from L2 + * - State save/restore should preserve both L1's and L2's PAT values + */ +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "svm_util.h" + +#define L2_GUEST_STACK_SIZE 256 + +#define PAT_DEFAULT 0x0007040600070406ULL +#define L1_PAT_VALUE 0x0007040600070404ULL /* Change PA0 to WT */ +#define L2_VMCB12_PAT 0x0606060606060606ULL /* All WB */ +#define L2_PAT_MODIFIED 0x0606060606060604ULL /* Change PA0 to WT */ +#define INVALID_PAT_VALUE 0x0808080808080808ULL /* 8 is reserved */ + +/* + * Shared state between L1 and L2 for verification. + */ +struct pat_test_data { + uint64_t l2_pat_read; + uint64_t l2_pat_after_write; + uint64_t l1_pat_after_vmexit; + uint64_t vmcb12_gpat_after_exit; + bool l2_done; +}; + +static struct pat_test_data *pat_data; + +static void l2_guest_code_npt_disabled(void) +{ + pat_data->l2_pat_read = rdmsr(MSR_IA32_CR_PAT); + wrmsr(MSR_IA32_CR_PAT, L2_PAT_MODIFIED); + pat_data->l2_pat_after_write = rdmsr(MSR_IA32_CR_PAT); + pat_data->l2_done = true; + vmmcall(); +} + +static void l2_guest_code_npt_enabled(void) +{ + pat_data->l2_pat_read = rdmsr(MSR_IA32_CR_PAT); + wrmsr(MSR_IA32_CR_PAT, L2_PAT_MODIFIED); + pat_data->l2_pat_after_write = rdmsr(MSR_IA32_CR_PAT); + pat_data->l2_done = true; + vmmcall(); +} + +static void l2_guest_code_saverestoretest(void) +{ + pat_data->l2_pat_read = rdmsr(MSR_IA32_CR_PAT); + + GUEST_SYNC(1); + GUEST_ASSERT_EQ(rdmsr(MSR_IA32_CR_PAT), pat_data->l2_pat_read); + + wrmsr(MSR_IA32_CR_PAT, L2_PAT_MODIFIED); + pat_data->l2_pat_after_write = rdmsr(MSR_IA32_CR_PAT); + + GUEST_SYNC(2); + GUEST_ASSERT_EQ(rdmsr(MSR_IA32_CR_PAT), L2_PAT_MODIFIED); + + pat_data->l2_done = true; + vmmcall(); +} + +static void l1_svm_code_npt_disabled(struct svm_test_data *svm, + struct pat_test_data *data) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb = svm->vmcb; + + pat_data = data; + + wrmsr(MSR_IA32_CR_PAT, L1_PAT_VALUE); + GUEST_ASSERT_EQ(rdmsr(MSR_IA32_CR_PAT), L1_PAT_VALUE); + + generic_svm_setup(svm, l2_guest_code_npt_disabled, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + vmcb->save.g_pat = L2_VMCB12_PAT; + + vmcb->control.intercept &= ~(1ULL << INTERCEPT_MSR_PROT); + + run_guest(vmcb, svm->vmcb_gpa); + + GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_VMMCALL); + GUEST_ASSERT(data->l2_done); + + GUEST_ASSERT_EQ(data->l2_pat_read, L1_PAT_VALUE); + + GUEST_ASSERT_EQ(data->l2_pat_after_write, L2_PAT_MODIFIED); + + data->l1_pat_after_vmexit = rdmsr(MSR_IA32_CR_PAT); + GUEST_ASSERT_EQ(data->l1_pat_after_vmexit, L2_PAT_MODIFIED); + + GUEST_DONE(); +} + +static void l1_svm_code_invalid_gpat(struct svm_test_data *svm, + struct pat_test_data *data) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb = svm->vmcb; + + pat_data = data; + + generic_svm_setup(svm, l2_guest_code_npt_enabled, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + vmcb->save.g_pat = INVALID_PAT_VALUE; + + run_guest(vmcb, svm->vmcb_gpa); + + GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_ERR); + + GUEST_ASSERT(!data->l2_done); + + GUEST_DONE(); +} + +static void l1_svm_code_npt_enabled(struct svm_test_data *svm, + struct pat_test_data *data) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb = svm->vmcb; + uint64_t l1_pat_before; + + pat_data = data; + + wrmsr(MSR_IA32_CR_PAT, L1_PAT_VALUE); + l1_pat_before = rdmsr(MSR_IA32_CR_PAT); + GUEST_ASSERT_EQ(l1_pat_before, L1_PAT_VALUE); + + generic_svm_setup(svm, l2_guest_code_npt_enabled, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + vmcb->save.g_pat = L2_VMCB12_PAT; + + vmcb->control.intercept &= ~(1ULL << INTERCEPT_MSR_PROT); + + run_guest(vmcb, svm->vmcb_gpa); + + GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_VMMCALL); + GUEST_ASSERT(data->l2_done); + + GUEST_ASSERT_EQ(data->l2_pat_read, L2_VMCB12_PAT); + + GUEST_ASSERT_EQ(data->l2_pat_after_write, L2_PAT_MODIFIED); + + data->vmcb12_gpat_after_exit = vmcb->save.g_pat; + GUEST_ASSERT_EQ(data->vmcb12_gpat_after_exit, L2_PAT_MODIFIED); + + data->l1_pat_after_vmexit = rdmsr(MSR_IA32_CR_PAT); + GUEST_ASSERT_EQ(data->l1_pat_after_vmexit, L1_PAT_VALUE); + + GUEST_DONE(); +} + +static void l1_svm_code_saverestore(struct svm_test_data *svm, + struct pat_test_data *data) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb = svm->vmcb; + + pat_data = data; + + wrmsr(MSR_IA32_CR_PAT, L1_PAT_VALUE); + + generic_svm_setup(svm, l2_guest_code_saverestoretest, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + vmcb->save.g_pat = L2_VMCB12_PAT; + vmcb->control.intercept &= ~(1ULL << INTERCEPT_MSR_PROT); + + run_guest(vmcb, svm->vmcb_gpa); + + GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_VMMCALL); + GUEST_ASSERT(data->l2_done); + + GUEST_ASSERT_EQ(rdmsr(MSR_IA32_CR_PAT), L1_PAT_VALUE); + + GUEST_ASSERT_EQ(vmcb->save.g_pat, L2_PAT_MODIFIED); + + GUEST_DONE(); +} + +/* + * L2 guest code for multiple VM-entry test. + * On first VM-entry, read and modify PAT, then VM-exit. + * On second VM-entry, verify we see our modified PAT from first VM-entry. + */ +static void l2_guest_code_multi_vmentry(void) +{ + pat_data->l2_pat_read = rdmsr(MSR_IA32_CR_PAT); + wrmsr(MSR_IA32_CR_PAT, L2_PAT_MODIFIED); + pat_data->l2_pat_after_write = rdmsr(MSR_IA32_CR_PAT); + vmmcall(); + + pat_data->l2_pat_read = rdmsr(MSR_IA32_CR_PAT); + pat_data->l2_done = true; + vmmcall(); +} + +static void l1_svm_code_multi_vmentry(struct svm_test_data *svm, + struct pat_test_data *data) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb = svm->vmcb; + + pat_data = data; + + wrmsr(MSR_IA32_CR_PAT, L1_PAT_VALUE); + + generic_svm_setup(svm, l2_guest_code_multi_vmentry, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + vmcb->save.g_pat = L2_VMCB12_PAT; + vmcb->control.intercept &= ~(1ULL << INTERCEPT_MSR_PROT); + + run_guest(vmcb, svm->vmcb_gpa); + GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_VMMCALL); + + GUEST_ASSERT_EQ(data->l2_pat_after_write, L2_PAT_MODIFIED); + + GUEST_ASSERT_EQ(vmcb->save.g_pat, L2_PAT_MODIFIED); + + GUEST_ASSERT_EQ(rdmsr(MSR_IA32_CR_PAT), L1_PAT_VALUE); + + vmcb->save.rip += 3; /* vmmcall */ + run_guest(vmcb, svm->vmcb_gpa); + + GUEST_ASSERT_EQ(vmcb->control.exit_code, SVM_EXIT_VMMCALL); + GUEST_ASSERT(data->l2_done); + + GUEST_ASSERT_EQ(data->l2_pat_read, L2_PAT_MODIFIED); + + GUEST_ASSERT_EQ(rdmsr(MSR_IA32_CR_PAT), L1_PAT_VALUE); + + GUEST_DONE(); +} + +static void l1_guest_code(struct svm_test_data *svm, struct pat_test_data *data, + int test_num) +{ + switch (test_num) { + case 0: + l1_svm_code_npt_disabled(svm, data); + break; + case 1: + l1_svm_code_invalid_gpat(svm, data); + break; + case 2: + l1_svm_code_npt_enabled(svm, data); + break; + case 3: + l1_svm_code_saverestore(svm, data); + break; + case 4: + l1_svm_code_multi_vmentry(svm, data); + break; + } +} + +static void run_test(int test_number, const char *test_name, bool npt_enabled, + bool do_save_restore) +{ + struct pat_test_data *data_hva; + vm_vaddr_t svm_gva, data_gva; + struct kvm_x86_state *state; + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + + pr_info("Testing: %d: %s\n", test_number, test_name); + + vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code); + if (npt_enabled) + vm_enable_npt(vm); + + vcpu_alloc_svm(vm, &svm_gva); + + data_gva = vm_vaddr_alloc_page(vm); + data_hva = addr_gva2hva(vm, data_gva); + memset(data_hva, 0, sizeof(*data_hva)); + + if (npt_enabled) + tdp_identity_map_default_memslots(vm); + + vcpu_args_set(vcpu, 3, svm_gva, data_gva, test_number); + + for (;;) { + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + /* NOT REACHED */ + case UCALL_SYNC: + if (do_save_restore) { + pr_info(" Save/restore at sync point %ld\n", + uc.args[1]); + state = vcpu_save_state(vcpu); + kvm_vm_release(vm); + vcpu = vm_recreate_with_one_vcpu(vm); + vcpu_load_state(vcpu, state); + kvm_x86_state_cleanup(state); + } + break; + case UCALL_DONE: + pr_info(" PASSED\n"); + kvm_vm_free(vm); + return; + default: + TEST_FAIL("Unknown ucall %lu", uc.cmd); + } + } +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM)); + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_NPT)); + TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE)); + + run_test(0, "nested NPT disabled", false, false); + + run_test(1, "invalid g_pat", true, false); + + run_test(2, "nested NPT enabled", true, false); + + run_test(3, "save/restore", true, true); + + run_test(4, "multiple entries", true, false); + + return 0; +}
linux-kselftest-mirror@lists.linaro.org