On Mon, Mar 24, 2025, Mingwei Zhang wrote:
From: Dapeng Mi dapeng1.mi@linux.intel.com
Introduce enable_mediated_pmu global parameter to control if mediated vPMU can be enabled on KVM level. Even enable_mediated_pmu is set to true in KVM, user space hypervisor still need to enable mediated vPMU explicitly by calling KVM_CAP_PMU_CAPABILITY ioctl. This gives hypervisor flexibility to enable or disable mediated vPMU for each VM.
Mediated vPMU depends on some PMU features on higher PMU version, like PERF_GLOBAL_STATUS_SET MSR in v4+ for Intel PMU. Thus introduce a pmu_ops variable MIN_MEDIATED_PMU_VERSION to indicates the minimum host PMU version which mediated vPMU needs.
Currently enable_mediated_pmu is not exposed to user space as a module parameter until all mediated vPMU code are in place.
Suggested-by: Sean Christopherson seanjc@google.com Co-developed-by: Mingwei Zhang mizhang@google.com Signed-off-by: Mingwei Zhang mizhang@google.com Signed-off-by: Dapeng Mi dapeng1.mi@linux.intel.com
arch/x86/kvm/pmu.c | 3 ++- arch/x86/kvm/pmu.h | 11 +++++++++ arch/x86/kvm/svm/pmu.c | 1 + arch/x86/kvm/vmx/capabilities.h | 3 ++- arch/x86/kvm/vmx/pmu_intel.c | 5 ++++ arch/x86/kvm/vmx/vmx.c | 3 ++- arch/x86/kvm/x86.c | 44 ++++++++++++++++++++++++++++++--- arch/x86/kvm/x86.h | 1 + 8 files changed, 64 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 75e9cfc689f8..4f455afe4009 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -775,7 +775,8 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) pmu->pebs_data_cfg_rsvd = ~0ull; bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (!vcpu->kvm->arch.enable_pmu)
- if (!vcpu->kvm->arch.enable_pmu ||
(!lapic_in_kernel(vcpu) && enable_mediated_pmu))
This check belongs in KVM_CAP_PMU_CAPABILITY, i.e. KVM needs to reject enabling a mediated PMU without an in-kernel local APIC, not silently drop the PMU.
return;
kvm_pmu_call(refresh)(vcpu); diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index ad89d0bd6005..dd45a0c6be74 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -45,6 +45,7 @@ struct kvm_pmu_ops { const u64 EVENTSEL_EVENT; const int MAX_NR_GP_COUNTERS; const int MIN_NR_GP_COUNTERS;
- const int MIN_MEDIATED_PMU_VERSION;
I like the idea, but simply checking the PMU version is insufficient on Intel, i.e. just add a callback.
}; void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops); @@ -63,6 +64,12 @@ static inline bool kvm_pmu_has_perf_global_ctrl(struct kvm_pmu *pmu) return pmu->version > 1; } +static inline bool kvm_mediated_pmu_enabled(struct kvm_vcpu *vcpu)
kvm_vcpu_has_mediated_pmu() to align with e.g. guest_cpu_cap_has(), and because kvm_mediated_pmu_enabled() sounds like a VM-scoped or module-scoped helper.
+{
- return vcpu->kvm->arch.enable_pmu &&
This is superfluous, pmu->version should never be non-zero without the PMU being enabled at the VM level.
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 77012b2eca0e..425e93d4b1c6 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -739,4 +739,9 @@ struct kvm_pmu_ops intel_pmu_ops __initdata = { .EVENTSEL_EVENT = ARCH_PERFMON_EVENTSEL_EVENT, .MAX_NR_GP_COUNTERS = KVM_MAX_NR_INTEL_GP_COUNTERS, .MIN_NR_GP_COUNTERS = 1,
- /*
* Intel mediated vPMU support depends on
* MSR_CORE_PERF_GLOBAL_STATUS_SET which is supported from 4+.
*/
- .MIN_MEDIATED_PMU_VERSION = 4,
}; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 00ac94535c21..a4b5b6455c7b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7916,7 +7916,8 @@ static __init u64 vmx_get_perf_capabilities(void) if (boot_cpu_has(X86_FEATURE_PDCM)) rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
- if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) {
- if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR) &&
x86_perf_get_lbr(&vmx_lbr_caps);!enable_mediated_pmu) {
/*
There's a bit too much going on in this patch. I think it makes sense to split the vendor chunks out to separate patches, so that each can elaborate on the exact requirements.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 72995952978a..1ebe169b88b6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -188,6 +188,14 @@ bool __read_mostly enable_pmu = true; EXPORT_SYMBOL_GPL(enable_pmu); module_param(enable_pmu, bool, 0444); +/*
- Enable/disable mediated passthrough PMU virtualization.
- Don't expose it to userspace as a module paramerter until
- all mediated vPMU code is in place.
- */
No need for the comment, documenting this in the changelog is sufficient.
+bool __read_mostly enable_mediated_pmu; +EXPORT_SYMBOL_GPL(enable_mediated_pmu);
bool __read_mostly eager_page_split = true; module_param(eager_page_split, bool, 0644); @@ -6643,9 +6651,28 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; mutex_lock(&kvm->lock);
if (!kvm->created_vcpus) {
kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
r = 0;
/*
* To keep PMU configuration "simple", setting vPMU support is
* disallowed if vCPUs are created, or if mediated PMU support
* was already enabled for the VM.
*/
if (!kvm->created_vcpus &&
(!enable_mediated_pmu || !kvm->arch.enable_pmu)) {
bool pmu_enable = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
if (enable_mediated_pmu && pmu_enable) {
Local APIC check goes here.
char *err_msg = "Fail to enable mediated vPMU, " \
"please disable system wide perf events or nmi_watchdog " \
"(echo 0 > /proc/sys/kernel/nmi_watchdog).\n";
r = perf_get_mediated_pmu();
if (r)
kvm_err("%s", err_msg);
#define MEDIATED_PMU_MSG "Fail to enable mediated vPMU, disable system wide perf events and nmi_watchdog.\n"
r = perf_create_mediated_pmu(); if (r) kvm_err(MEDIATED_PMU_MSG);
} else
r = 0;
if (!r)
} mutex_unlock(&kvm->lock); break;kvm->arch.enable_pmu = pmu_enable;
@@ -12723,7 +12750,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz; kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT; kvm->arch.guest_can_read_msr_platform_info = true;
- kvm->arch.enable_pmu = enable_pmu;
- /*
* PMU virtualization is opt-in when mediated PMU support is enabled.
* KVM_CAP_PMU_CAPABILITY ioctl must be called explicitly to enable
* mediated vPMU. For legacy perf-based vPMU, its behavior isn't changed,
* KVM_CAP_PMU_CAPABILITY ioctl is optional.
*/
Again, too much extraneous info, the exception proves the rule. I.e. by calling out that mediated PMU is special, it's clear the rule is that PMUs are enabled by default in the !mediated case.
/* * Userspace must explicitly opt-in to PMU virtualization when mediated * PMU support is enabled (see KVM_CAP_PMU_CAPABILITY). */
- kvm->arch.enable_pmu = enable_pmu && !enable_mediated_pmu;
So I tried to run a QEMU with this and it failed, because QEMU expected the PMU to be enabled and tried to write to PMU MSRs. I haven't dug through the QEMU code, but I assume that QEMU rightly expects that passing in PMU in CPUID when KVM_GET_SUPPORTED_CPUID says its supported will result in the VM having a PMU.
I.e. by trying to get cute with backwards compatibility, I think we broke backwards compatiblity. At this point, I'm leaning toward making the module param off-by-default, but otherwise not messing with the behavior of kvm->arch.enable_pmu. Not sure if that has implications for KVM_PMU_CAP_DISABLE though.