Add nested virtualization support for mediated PMU by combining the MSR interception bitmaps of vmcs01 and vmcs12. Readers may argue even without this patch, nested virtualization works for mediated PMU because L1 will see Perfmon v2 and will have to use legacy vPMU implementation if it is Linux. However, any assumption made on L1 may be invalid, e.g., L1 may not even be Linux.
If both L0 and L1 pass through PMU MSRs, the correct behavior is to allow MSR access from L2 directly touch HW MSRs, since both L0 and L1 passthrough the access.
However, in current implementation, if without adding anything for nested, KVM always set MSR interception bits in vmcs02. This leads to the fact that L0 will emulate all MSR read/writes for L2, leading to errors, since the current mediated vPMU never implements set_msr() and get_msr() for any counter access except counter accesses from the VMM side.
So fix the issue by setting up the correct MSR interception for PMU MSRs.
Signed-off-by: Mingwei Zhang mizhang@google.com Co-developed-by: Dapeng Mi dapeng1.mi@linux.intel.com Signed-off-by: Dapeng Mi dapeng1.mi@linux.intel.com --- arch/x86/kvm/vmx/nested.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index cf557acf91f8..dbec40cb55bc 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -626,6 +626,36 @@ static inline void nested_vmx_set_intercept_for_msr(struct vcpu_vmx *vmx, #define nested_vmx_merge_msr_bitmaps_rw(msr) \ nested_vmx_merge_msr_bitmaps(msr, MSR_TYPE_RW)
+/* + * Disable PMU MSRs interception for nested VM if L0 and L1 are + * both mediated vPMU. + */ +static void nested_vmx_merge_pmu_msr_bitmaps(struct kvm_vcpu *vcpu, + unsigned long *msr_bitmap_l1, + unsigned long *msr_bitmap_l0) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct vcpu_vmx *vmx = to_vmx(vcpu); + int i; + + if (!kvm_mediated_pmu_enabled(vcpu)) + return; + + for (i = 0; i < pmu->nr_arch_gp_counters; i++) { + nested_vmx_merge_msr_bitmaps_rw(MSR_ARCH_PERFMON_EVENTSEL0 + i); + nested_vmx_merge_msr_bitmaps_rw(MSR_IA32_PERFCTR0 + i); + nested_vmx_merge_msr_bitmaps_rw(MSR_IA32_PMC0 + i); + } + + for (i = 0; i < pmu->nr_arch_fixed_counters; i++) + nested_vmx_merge_msr_bitmaps_rw(MSR_CORE_PERF_FIXED_CTR0 + i); + + nested_vmx_merge_msr_bitmaps_rw(MSR_CORE_PERF_FIXED_CTR_CTRL); + nested_vmx_merge_msr_bitmaps_rw(MSR_CORE_PERF_GLOBAL_CTRL); + nested_vmx_merge_msr_bitmaps_read(MSR_CORE_PERF_GLOBAL_STATUS); + nested_vmx_merge_msr_bitmaps_write(MSR_CORE_PERF_GLOBAL_OVF_CTRL); +} + /* * Merge L0's and L1's MSR bitmap, return false to indicate that * we do not use the hardware. @@ -717,6 +747,8 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, nested_vmx_merge_msr_bitmaps_write(MSR_IA32_PRED_CMD); nested_vmx_merge_msr_bitmaps_write(MSR_IA32_FLUSH_CMD);
+ nested_vmx_merge_pmu_msr_bitmaps(vcpu, msr_bitmap_l1, msr_bitmap_l0); + kvm_vcpu_unmap(vcpu, &map);
vmx->nested.force_msr_bitmap_recalc = false;