This series introduces support in the KVM and ARM PMUv3 driver for partitioning PMU counters into two separate ranges by taking advantage of the MDCR_EL2.HPMN register field.
The advantage of a partitioned PMU would be to allow KVM guests direct access to a subset of PMU functionality, greatly reducing the overhead of performance monitoring in guests.
While this feature could be accepted on its own merits, practically there is a lot more to be done before it will be fully useful, so I'm sending as an RFC for now.
v3: * Include cpucap definition for FEAT_HPMN0 to allow for setting HPMN to 0
* Include PMU header cleanup provided by Marc [1] with some minor changes so compilation works
* Pull functions out of pmu-emul.c that aren't specific to the emulated PMU. This and the previous item aren't strictly needed but they provide a nicer starting point.
* As suggested by Oliver, start a file for partitioned PMU functions and move the reserved_host_counters parameter and MDCR handling into KVM so the driver does not have to know about it and we need fewer hacks to keep the driver working on 32-bit ARM. This was not a complete separation because the driver still needs to start and stop the host counters all at once and needs to toggle MDCR_EL2.HPME to do that. Introduce kvm_pmu_host_counters_{enable,disable}() functions to handle this and define them as no ops on 32-bit ARM.
* As suggested by Oliver, don't limit PMCR.N on emulated PMU. This value will be read correctly when the right traps are disabled to use the partitioned PMU
v2: https://lore.kernel.org/kvm/20250208020111.2068239-1-coltonlewis@google.com/
v1: https://lore.kernel.org/kvm/20250127222031.3078945-1-coltonlewis@google.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h...
Colton Lewis (7): arm64: cpufeature: Add cap for HPMN0 arm64: Generate sign macro for sysreg Enums KVM: arm64: Reorganize PMU functions KVM: arm64: Introduce module param to partition the PMU perf: arm_pmuv3: Generalize counter bitmasks perf: arm_pmuv3: Keep out of guest counter partition KVM: arm64: selftests: Reword selftests error
Marc Zyngier (1): KVM: arm64: Cleanup PMU includes
arch/arm/include/asm/arm_pmuv3.h | 2 + arch/arm64/include/asm/arm_pmuv3.h | 2 +- arch/arm64/include/asm/kvm_host.h | 199 +++++++- arch/arm64/include/asm/kvm_pmu.h | 47 ++ arch/arm64/kernel/cpufeature.c | 8 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/arm.c | 1 - arch/arm64/kvm/debug.c | 10 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 1 + arch/arm64/kvm/pmu-emul.c | 464 +----------------- arch/arm64/kvm/pmu-part.c | 63 +++ arch/arm64/kvm/pmu.c | 454 +++++++++++++++++ arch/arm64/kvm/sys_regs.c | 2 + arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/gen-sysreg.awk | 1 + arch/arm64/tools/sysreg | 6 +- drivers/perf/arm_pmuv3.c | 73 ++- include/kvm/arm_pmu.h | 204 -------- include/linux/perf/arm_pmu.h | 16 +- include/linux/perf/arm_pmuv3.h | 27 +- .../selftests/kvm/arm64/vpmu_counter_access.c | 2 +- virt/kvm/kvm_main.c | 1 + 22 files changed, 882 insertions(+), 704 deletions(-) create mode 100644 arch/arm64/include/asm/kvm_pmu.h create mode 100644 arch/arm64/kvm/pmu-part.c delete mode 100644 include/kvm/arm_pmu.h
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b -- 2.48.1.601.g30ceb7b040-goog
Add a capability for HPMN0, whether MDCR_EL2.HPMN can specify 0 counters reserved for the guest.
This required changing HPMN0 to an UnsignedEnum in tools/sysreg because otherwise not all the appropriate macros are generated to add it to arm64_cpu_capabilities_arm64_features.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/kernel/cpufeature.c | 8 ++++++++ arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 6 +++--- 3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4eb7c6698ae4..396327b4da7d 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -538,6 +538,7 @@ static const struct arm64_ftr_bits ftr_id_mmfr0[] = { };
static const struct arm64_ftr_bits ftr_id_aa64dfr0[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_HPMN0_SHIFT, 4, 0), S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_DoubleLock_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_PMSVer_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64DFR0_EL1_CTX_CMPs_SHIFT, 4, 0), @@ -2842,6 +2843,13 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, ARM64_CPUID_FIELDS(ID_AA64MMFR0_EL1, FGT, IMP) }, + { + .desc = "Hypervisor PMU Partitioning 0 Guest Counters", + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .capability = ARM64_HAS_HPMN0, + .matches = has_cpuid_feature, + ARM64_CPUID_FIELDS(ID_AA64DFR0_EL1, HPMN0, IMP) + }, #ifdef CONFIG_ARM64_SME { .desc = "Scalable Matrix Extension", diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 1e65f2fb45bd..9242e460ebe6 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -38,6 +38,7 @@ HAS_GIC_CPUIF_SYSREGS HAS_GIC_PRIO_MASKING HAS_GIC_PRIO_RELAXED_SYNC HAS_HCR_NV1 +HAS_HPMN0 HAS_HCX HAS_LDAPR HAS_LPA2 diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 762ee084b37c..35aa5f6476b9 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1240,9 +1240,9 @@ EndEnum EndSysreg
Sysreg ID_AA64DFR0_EL1 3 0 0 5 0 -Enum 63:60 HPMN0 - 0b0000 UNPREDICTABLE - 0b0001 DEF +UnsignedEnum 63:60 HPMN0 + 0b0000 NI + 0b0001 IMP EndEnum UnsignedEnum 59:56 ExtTrcBuff 0b0000 NI
There's no reason Enums shouldn't be equivalent to UnsignedEnums and explicitly specify they are unsigned.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/tools/gen-sysreg.awk | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/tools/gen-sysreg.awk b/arch/arm64/tools/gen-sysreg.awk index 1a2afc9fdd42..a227f73cd31e 100755 --- a/arch/arm64/tools/gen-sysreg.awk +++ b/arch/arm64/tools/gen-sysreg.awk @@ -306,6 +306,7 @@ END { parse_bitdef(reg, field, $2)
define_field(reg, field, msb, lsb) + define_field_sign(reg, field, "false")
next }
From: Marc Zyngier maz@kernel.org
asm/kvm_host.h includes asm/arm_pmu.h which includes perf/arm_pmuv3.h which includes asm/arm_pmuv3.h which includes asm/kvm_host.h This causes compilation problems why trying to use anything defined in any of the headers in any other headers.
Reorganize these tangled headers. In particular:
* Move the declarations defining the interface between KVM and PMU to its own header asm/kvm_pmu.h that can be used without the problem described above.
* Delete kvm/arm_pmu.h. These functions are mostly internal to KVM and should go in asm/kvm_host.h.
Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/arm_pmuv3.h | 2 +- arch/arm64/include/asm/kvm_host.h | 198 +++++++++++++++++++++-- arch/arm64/include/asm/kvm_pmu.h | 38 +++++ arch/arm64/kvm/arm.c | 1 - arch/arm64/kvm/debug.c | 1 + arch/arm64/kvm/hyp/include/hyp/switch.h | 1 + arch/arm64/kvm/pmu-emul.c | 30 ++-- arch/arm64/kvm/pmu.c | 2 + arch/arm64/kvm/sys_regs.c | 2 + include/kvm/arm_pmu.h | 204 ------------------------ include/linux/perf/arm_pmu.h | 14 +- virt/kvm/kvm_main.c | 1 + 12 files changed, 255 insertions(+), 239 deletions(-) create mode 100644 arch/arm64/include/asm/kvm_pmu.h delete mode 100644 include/kvm/arm_pmu.h
diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index 8a777dec8d88..32c003a7b810 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -6,7 +6,7 @@ #ifndef __ASM_PMUV3_H #define __ASM_PMUV3_H
-#include <asm/kvm_host.h> +#include <asm/kvm_pmu.h>
#include <asm/cpufeature.h> #include <asm/sysreg.h> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 7cfa024de4e3..80e5c09790b9 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -14,6 +14,7 @@ #include <linux/arm-smccc.h> #include <linux/bitmap.h> #include <linux/types.h> +#include <linux/irq_work.h> #include <linux/jump_label.h> #include <linux/kvm_types.h> #include <linux/maple_tree.h> @@ -35,7 +36,6 @@
#include <kvm/arm_vgic.h> #include <kvm/arm_arch_timer.h> -#include <kvm/arm_pmu.h>
#define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
@@ -705,6 +705,35 @@ struct vcpu_reset_state { bool reset; };
+struct vncr_tlb; + +#if IS_ENABLED(CONFIG_HW_PERF_EVENTS) + +#define KVM_ARMV8_PMU_MAX_COUNTERS 32 + +struct kvm_pmc { + u8 idx; /* index into the pmu->pmc array */ + struct perf_event *perf_event; +}; + +struct kvm_pmu_events { + u64 events_host; + u64 events_guest; +}; + +struct kvm_pmu { + struct irq_work overflow_work; + struct kvm_pmu_events events; + struct kvm_pmc pmc[KVM_ARMV8_PMU_MAX_COUNTERS]; + int irq_num; + bool created; + bool irq_level; +}; +#else +struct kvm_pmu { +}; +#endif + struct kvm_vcpu_arch { struct kvm_cpu_context ctxt;
@@ -1385,25 +1414,11 @@ void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
-static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr) -{ - return (!has_vhe() && attr->exclude_host); -} - #ifdef CONFIG_KVM -void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr); -void kvm_clr_pmu_events(u64 clr); -bool kvm_set_pmuserenr(u64 val); void kvm_enable_trbe(void); void kvm_disable_trbe(void); void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest); #else -static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} -static inline void kvm_clr_pmu_events(u64 clr) {} -static inline bool kvm_set_pmuserenr(u64 val) -{ - return false; -} static inline void kvm_enable_trbe(void) {} static inline void kvm_disable_trbe(void) {} static inline void kvm_tracing_set_el1_configuration(u64 trfcr_while_in_guest) {} @@ -1555,4 +1570,157 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val); #define kvm_has_s1poe(k) \ (kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP))
+#define kvm_vcpu_has_pmu(vcpu) \ + (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PMU_V3)) + +#if IS_ENABLED(CONFIG_HW_PERF_EVENTS) + +DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available); + +static __always_inline bool kvm_arm_support_pmu_v3(void) +{ + return static_branch_likely(&kvm_arm_pmu_available); +} + +u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx); +void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); +u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu); +u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1); +void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu); +void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu); +void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu); +void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val); +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu); +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu); +bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu); +void kvm_pmu_update_run(struct kvm_vcpu *vcpu); +void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val); +void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val); +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, + u64 select_idx); +void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu); +int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr); +int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr); +int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr); +int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu); + +struct kvm_pmu_events *kvm_get_pmu_events(void); +void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); +void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); + +/* + * Updates the vcpu's view of the pmu events for this cpu. + * Must be called before every vcpu run after disabling interrupts, to ensure + * that an interrupt cannot fire and update the structure. + */ +#define kvm_pmu_update_vcpu_events(vcpu) \ + do { \ + if (!has_vhe() && kvm_arm_support_pmu_v3()) \ + vcpu->arch.pmu.events = *kvm_get_pmu_events(); \ + } while (0) + +u8 kvm_arm_pmu_get_pmuver_limit(void); +u64 kvm_pmu_evtyper_mask(struct kvm *kvm); +int kvm_arm_set_default_pmu(struct kvm *kvm); +u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm); + +u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu); +bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx); +void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu); +#else +static inline bool kvm_arm_support_pmu_v3(void) +{ + return false; +} + +static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, + u64 select_idx) +{ + return 0; +} +static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, + u64 select_idx, u64 val) {} +static inline u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu) +{ + return 0; +} +static inline void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) {} +static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {} +static inline void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) {} +static inline void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val) {} +static inline void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {} +static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {} +static inline bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu) +{ + return false; +} +static inline void kvm_pmu_update_run(struct kvm_vcpu *vcpu) {} +static inline void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {} +static inline void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {} +static inline void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, + u64 data, u64 select_idx) {} +static inline int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + return -ENXIO; +} +static inline int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + return -ENXIO; +} +static inline int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, + struct kvm_device_attr *attr) +{ + return -ENXIO; +} +static inline int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) +{ + return 0; +} +static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) +{ + return 0; +} + +static inline void kvm_pmu_update_vcpu_events(struct kvm_vcpu *vcpu) {} +static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} +static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} +static inline void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) {} +static inline u8 kvm_arm_pmu_get_pmuver_limit(void) +{ + return 0; +} +static inline u64 kvm_pmu_evtyper_mask(struct kvm *kvm) +{ + return 0; +} + +static inline int kvm_arm_set_default_pmu(struct kvm *kvm) +{ + return -ENODEV; +} + +static inline u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) +{ + return 0; +} + +static inline u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) +{ + return 0; +} + +static inline bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx) +{ + return false; +} + +static inline void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu) {} + +#endif + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h new file mode 100644 index 000000000000..613cddbdbdd8 --- /dev/null +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef __KVM_PMU_H +#define __KVM_PMU_H + +/* + * Define the interface between the PMUv3 driver and KVM. + */ +struct perf_event_attr; +struct arm_pmu; + +#define kvm_pmu_counter_deferred(attr) \ + ({ \ + !has_vhe() && (attr)->exclude_host; \ + }) + +#ifdef CONFIG_KVM + +void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr); +void kvm_clr_pmu_events(u64 clr); +bool kvm_set_pmuserenr(u64 val); +void kvm_vcpu_pmu_resync_el0(void); +void kvm_host_pmu_init(struct arm_pmu *pmu); + +#else + +static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} +static inline void kvm_clr_pmu_events(u64 clr) {} +static inline bool kvm_set_pmuserenr(u64 val) +{ + return false; +} +static inline void kvm_vcpu_pmu_resync_el0(void) {} +static inline void kvm_host_pmu_init(struct arm_pmu *pmu) {} + +#endif + +#endif diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 646e806c6ca6..efe1ea0c5ac0 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -43,7 +43,6 @@ #include <asm/sections.h>
#include <kvm/arm_hypercalls.h> -#include <kvm/arm_pmu.h> #include <kvm/arm_psci.h>
#include "sys_regs.h" diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 0e4c805e7e89..7fb1d9e7180f 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -9,6 +9,7 @@
#include <linux/kvm_host.h> #include <linux/hw_breakpoint.h> +#include <linux/perf/arm_pmuv3.h>
#include <asm/debug-monitors.h> #include <asm/kvm_asm.h> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index f838a45665f2..53db98dbfd5f 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -14,6 +14,7 @@ #include <linux/kvm_host.h> #include <linux/types.h> #include <linux/jump_label.h> +#include <linux/perf/arm_pmuv3.h> #include <uapi/linux/psci.h>
#include <kvm/arm_psci.h> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 6c5950b9ceac..5bf9f582ca8d 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -8,11 +8,10 @@ #include <linux/kvm.h> #include <linux/kvm_host.h> #include <linux/list.h> -#include <linux/perf_event.h> #include <linux/perf/arm_pmu.h> +#include <linux/perf/arm_pmuv3.h> #include <linux/uaccess.h> #include <asm/kvm_emulate.h> -#include <kvm/arm_pmu.h> #include <kvm/arm_vgic.h>
#define PERF_ATTR_CFG1_COUNTER_64BIT BIT(0) @@ -26,6 +25,8 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc); static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc); static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc);
+#define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) + static struct kvm_vcpu *kvm_pmc_to_vcpu(const struct kvm_pmc *pmc) { return container_of(pmc, struct kvm_vcpu, arch.pmu.pmc[pmc->idx]); @@ -247,6 +248,16 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) pmu->pmc[i].idx = i; }
+static u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu) +{ + u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu)); + + if (val == 0) + return BIT(ARMV8_PMU_CYCLE_IDX); + else + return GENMASK(val - 1, 0) | BIT(ARMV8_PMU_CYCLE_IDX); +} + /** * kvm_pmu_vcpu_reset - reset pmu state for cpu * @vcpu: The vcpu pointer @@ -318,16 +329,6 @@ u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu) return mask & ~kvm_pmu_hyp_counter_mask(vcpu); }
-u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu) -{ - u64 val = FIELD_GET(ARMV8_PMU_PMCR_N, kvm_vcpu_read_pmcr(vcpu)); - - if (val == 0) - return BIT(ARMV8_PMU_CYCLE_IDX); - else - return GENMASK(val - 1, 0) | BIT(ARMV8_PMU_CYCLE_IDX); -} - static void kvm_pmc_enable_perf_event(struct kvm_pmc *pmc) { if (!pmc->perf_event) { @@ -775,6 +776,11 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, kvm_pmu_create_perf_event(pmc); }
+struct arm_pmu_entry { + struct list_head entry; + struct arm_pmu *arm_pmu; +}; + void kvm_host_pmu_init(struct arm_pmu *pmu) { struct arm_pmu_entry *entry; diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 0b3adf3e17b4..3affc9074d71 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -8,6 +8,8 @@ #include <linux/perf/arm_pmu.h> #include <linux/perf/arm_pmuv3.h>
+#include <asm/kvm_pmu.h> + static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
/* diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index f6cd1ea7fb55..edf6695eed3c 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -17,6 +17,8 @@ #include <linux/mm.h> #include <linux/printk.h> #include <linux/uaccess.h> +#include <linux/irqchip/arm-gic-v3.h> +#include <linux/perf/arm_pmuv3.h>
#include <asm/arm_pmuv3.h> #include <asm/cacheflush.h> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h deleted file mode 100644 index 147bd3ee4f7b..000000000000 --- a/include/kvm/arm_pmu.h +++ /dev/null @@ -1,204 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2015 Linaro Ltd. - * Author: Shannon Zhao shannon.zhao@linaro.org - */ - -#ifndef __ASM_ARM_KVM_PMU_H -#define __ASM_ARM_KVM_PMU_H - -#include <linux/perf_event.h> -#include <linux/perf/arm_pmuv3.h> - -#define KVM_ARMV8_PMU_MAX_COUNTERS 32 - -#if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM) -struct kvm_pmc { - u8 idx; /* index into the pmu->pmc array */ - struct perf_event *perf_event; -}; - -struct kvm_pmu_events { - u64 events_host; - u64 events_guest; -}; - -struct kvm_pmu { - struct irq_work overflow_work; - struct kvm_pmu_events events; - struct kvm_pmc pmc[KVM_ARMV8_PMU_MAX_COUNTERS]; - int irq_num; - bool created; - bool irq_level; -}; - -struct arm_pmu_entry { - struct list_head entry; - struct arm_pmu *arm_pmu; -}; - -DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available); - -static __always_inline bool kvm_arm_support_pmu_v3(void) -{ - return static_branch_likely(&kvm_arm_pmu_available); -} - -#define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) -u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx); -void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); -u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu); -u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu); -u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1); -void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu); -void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu); -void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu); -void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val); -void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu); -void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu); -bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu); -void kvm_pmu_update_run(struct kvm_vcpu *vcpu); -void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val); -void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val); -void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, - u64 select_idx); -void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu); -int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, - struct kvm_device_attr *attr); -int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, - struct kvm_device_attr *attr); -int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, - struct kvm_device_attr *attr); -int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu); - -struct kvm_pmu_events *kvm_get_pmu_events(void); -void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); -void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); -void kvm_vcpu_pmu_resync_el0(void); - -#define kvm_vcpu_has_pmu(vcpu) \ - (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PMU_V3)) - -/* - * Updates the vcpu's view of the pmu events for this cpu. - * Must be called before every vcpu run after disabling interrupts, to ensure - * that an interrupt cannot fire and update the structure. - */ -#define kvm_pmu_update_vcpu_events(vcpu) \ - do { \ - if (!has_vhe() && kvm_arm_support_pmu_v3()) \ - vcpu->arch.pmu.events = *kvm_get_pmu_events(); \ - } while (0) - -u8 kvm_arm_pmu_get_pmuver_limit(void); -u64 kvm_pmu_evtyper_mask(struct kvm *kvm); -int kvm_arm_set_default_pmu(struct kvm *kvm); -u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm); - -u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu); -bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx); -void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu); -#else -struct kvm_pmu { -}; - -static inline bool kvm_arm_support_pmu_v3(void) -{ - return false; -} - -#define kvm_arm_pmu_irq_initialized(v) (false) -static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, - u64 select_idx) -{ - return 0; -} -static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, - u64 select_idx, u64 val) {} -static inline u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu) -{ - return 0; -} -static inline u64 kvm_pmu_accessible_counter_mask(struct kvm_vcpu *vcpu) -{ - return 0; -} -static inline void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) {} -static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {} -static inline void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) {} -static inline void kvm_pmu_reprogram_counter_mask(struct kvm_vcpu *vcpu, u64 val) {} -static inline void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {} -static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {} -static inline bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu) -{ - return false; -} -static inline void kvm_pmu_update_run(struct kvm_vcpu *vcpu) {} -static inline void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {} -static inline void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {} -static inline void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, - u64 data, u64 select_idx) {} -static inline int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, - struct kvm_device_attr *attr) -{ - return -ENXIO; -} -static inline int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, - struct kvm_device_attr *attr) -{ - return -ENXIO; -} -static inline int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, - struct kvm_device_attr *attr) -{ - return -ENXIO; -} -static inline int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) -{ - return 0; -} -static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) -{ - return 0; -} - -#define kvm_vcpu_has_pmu(vcpu) ({ false; }) -static inline void kvm_pmu_update_vcpu_events(struct kvm_vcpu *vcpu) {} -static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {} -static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {} -static inline void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) {} -static inline u8 kvm_arm_pmu_get_pmuver_limit(void) -{ - return 0; -} -static inline u64 kvm_pmu_evtyper_mask(struct kvm *kvm) -{ - return 0; -} -static inline void kvm_vcpu_pmu_resync_el0(void) {} - -static inline int kvm_arm_set_default_pmu(struct kvm *kvm) -{ - return -ENODEV; -} - -static inline u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) -{ - return 0; -} - -static inline u64 kvm_vcpu_read_pmcr(struct kvm_vcpu *vcpu) -{ - return 0; -} - -static inline bool kvm_pmu_counter_is_hyp(struct kvm_vcpu *vcpu, unsigned int idx) -{ - return false; -} - -static inline void kvm_pmu_nested_transition(struct kvm_vcpu *vcpu) {} - -#endif - -#endif diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 4b5b83677e3f..35c3a85bee43 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -13,6 +13,9 @@ #include <linux/platform_device.h> #include <linux/sysfs.h> #include <asm/cputype.h> +#ifdef CONFIG_ARM64 +#include <asm/kvm_pmu.h> +#endif
#ifdef CONFIG_ARM_PMU
@@ -25,6 +28,11 @@ #else #define ARMPMU_MAX_HWEVENTS 33 #endif + +#ifdef CONFIG_ARM +#define kvm_host_pmu_init(_x) { (void)_x; } +#endif + /* * ARM PMU hw_event flags */ @@ -165,12 +173,6 @@ int arm_pmu_acpi_probe(armpmu_init_fn init_fn); static inline int arm_pmu_acpi_probe(armpmu_init_fn init_fn) { return 0; } #endif
-#ifdef CONFIG_KVM -void kvm_host_pmu_init(struct arm_pmu *pmu); -#else -#define kvm_host_pmu_init(x) do { } while(0) -#endif - bool arm_pmu_irq_is_nmi(void);
/* Internal functions only for core arm_pmu code */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index faf10671eed2..34455126f5b7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -49,6 +49,7 @@ #include <linux/lockdep.h> #include <linux/kthread.h> #include <linux/suspend.h> +#include <linux/perf_event.h>
#include <asm/processor.h> #include <asm/ioctl.h>
A lot of functions in pmu-emul.c aren't specific to the emulated PMU implementation. Move them to the more appropriate pmu.c file where shared PMU functions should live.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/pmu-emul.c | 448 ----------------------------- arch/arm64/kvm/pmu.c | 450 ++++++++++++++++++++++++++++++ 3 files changed, 451 insertions(+), 448 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 80e5c09790b9..c419c1686418 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1623,6 +1623,7 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); } while (0)
u8 kvm_arm_pmu_get_pmuver_limit(void); +u32 kvm_pmu_event_mask(struct kvm *kvm); u64 kvm_pmu_evtyper_mask(struct kvm *kvm); int kvm_arm_set_default_pmu(struct kvm *kvm); u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm); diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 5bf9f582ca8d..faf69244d9ef 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -16,17 +16,10 @@
#define PERF_ATTR_CFG1_COUNTER_64BIT BIT(0)
-DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available); - -static LIST_HEAD(arm_pmus); -static DEFINE_MUTEX(arm_pmus_lock); - static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc); static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc); static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc);
-#define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) - static struct kvm_vcpu *kvm_pmc_to_vcpu(const struct kvm_pmc *pmc) { return container_of(pmc, struct kvm_vcpu, arch.pmu.pmc[pmc->idx]); @@ -37,46 +30,6 @@ static struct kvm_pmc *kvm_vcpu_idx_to_pmc(struct kvm_vcpu *vcpu, int cnt_idx) return &vcpu->arch.pmu.pmc[cnt_idx]; }
-static u32 __kvm_pmu_event_mask(unsigned int pmuver) -{ - switch (pmuver) { - case ID_AA64DFR0_EL1_PMUVer_IMP: - return GENMASK(9, 0); - case ID_AA64DFR0_EL1_PMUVer_V3P1: - case ID_AA64DFR0_EL1_PMUVer_V3P4: - case ID_AA64DFR0_EL1_PMUVer_V3P5: - case ID_AA64DFR0_EL1_PMUVer_V3P7: - return GENMASK(15, 0); - default: /* Shouldn't be here, just for sanity */ - WARN_ONCE(1, "Unknown PMU version %d\n", pmuver); - return 0; - } -} - -static u32 kvm_pmu_event_mask(struct kvm *kvm) -{ - u64 dfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1); - u8 pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, dfr0); - - return __kvm_pmu_event_mask(pmuver); -} - -u64 kvm_pmu_evtyper_mask(struct kvm *kvm) -{ - u64 mask = ARMV8_PMU_EXCLUDE_EL1 | ARMV8_PMU_EXCLUDE_EL0 | - kvm_pmu_event_mask(kvm); - - if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL2, IMP)) - mask |= ARMV8_PMU_INCLUDE_EL2; - - if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL3, IMP)) - mask |= ARMV8_PMU_EXCLUDE_NS_EL0 | - ARMV8_PMU_EXCLUDE_NS_EL1 | - ARMV8_PMU_EXCLUDE_EL3; - - return mask; -} - /** * kvm_pmc_is_64bit - determine if counter is 64bit * @pmc: counter context @@ -467,19 +420,6 @@ void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) kvm_pmu_update_state(vcpu); }
-/* - * When perf interrupt is an NMI, we cannot safely notify the vcpu corresponding - * to the event. - * This is why we need a callback to do it once outside of the NMI context. - */ -static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work) -{ - struct kvm_vcpu *vcpu; - - vcpu = container_of(work, struct kvm_vcpu, arch.pmu.overflow_work); - kvm_vcpu_kick(vcpu); -} - /* * Perform an increment on any of the counters described in @mask, * generating the overflow if required, and propagate it as a chained @@ -776,78 +716,6 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, kvm_pmu_create_perf_event(pmc); }
-struct arm_pmu_entry { - struct list_head entry; - struct arm_pmu *arm_pmu; -}; - -void kvm_host_pmu_init(struct arm_pmu *pmu) -{ - struct arm_pmu_entry *entry; - - /* - * Check the sanitised PMU version for the system, as KVM does not - * support implementations where PMUv3 exists on a subset of CPUs. - */ - if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit())) - return; - - mutex_lock(&arm_pmus_lock); - - entry = kmalloc(sizeof(*entry), GFP_KERNEL); - if (!entry) - goto out_unlock; - - entry->arm_pmu = pmu; - list_add_tail(&entry->entry, &arm_pmus); - - if (list_is_singular(&arm_pmus)) - static_branch_enable(&kvm_arm_pmu_available); - -out_unlock: - mutex_unlock(&arm_pmus_lock); -} - -static struct arm_pmu *kvm_pmu_probe_armpmu(void) -{ - struct arm_pmu *tmp, *pmu = NULL; - struct arm_pmu_entry *entry; - int cpu; - - mutex_lock(&arm_pmus_lock); - - /* - * It is safe to use a stale cpu to iterate the list of PMUs so long as - * the same value is used for the entirety of the loop. Given this, and - * the fact that no percpu data is used for the lookup there is no need - * to disable preemption. - * - * It is still necessary to get a valid cpu, though, to probe for the - * default PMU instance as userspace is not required to specify a PMU - * type. In order to uphold the preexisting behavior KVM selects the - * PMU instance for the core during vcpu init. A dependent use - * case would be a user with disdain of all things big.LITTLE that - * affines the VMM to a particular cluster of cores. - * - * In any case, userspace should just do the sane thing and use the UAPI - * to select a PMU type directly. But, be wary of the baggage being - * carried here. - */ - cpu = raw_smp_processor_id(); - list_for_each_entry(entry, &arm_pmus, entry) { - tmp = entry->arm_pmu; - - if (cpumask_test_cpu(cpu, &tmp->supported_cpus)) { - pmu = tmp; - break; - } - } - - mutex_unlock(&arm_pmus_lock); - - return pmu; -} - u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) { unsigned long *bmap = vcpu->kvm->arch.pmu_filter; @@ -904,322 +772,6 @@ void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) kvm_pmu_reprogram_counter_mask(vcpu, mask); }
-int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) -{ - if (!kvm_vcpu_has_pmu(vcpu)) - return 0; - - if (!vcpu->arch.pmu.created) - return -EINVAL; - - /* - * A valid interrupt configuration for the PMU is either to have a - * properly configured interrupt number and using an in-kernel - * irqchip, or to not have an in-kernel GIC and not set an IRQ. - */ - if (irqchip_in_kernel(vcpu->kvm)) { - int irq = vcpu->arch.pmu.irq_num; - /* - * If we are using an in-kernel vgic, at this point we know - * the vgic will be initialized, so we can check the PMU irq - * number against the dimensions of the vgic and make sure - * it's valid. - */ - if (!irq_is_ppi(irq) && !vgic_valid_spi(vcpu->kvm, irq)) - return -EINVAL; - } else if (kvm_arm_pmu_irq_initialized(vcpu)) { - return -EINVAL; - } - - /* One-off reload of the PMU on first run */ - kvm_make_request(KVM_REQ_RELOAD_PMU, vcpu); - - return 0; -} - -static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) -{ - if (irqchip_in_kernel(vcpu->kvm)) { - int ret; - - /* - * If using the PMU with an in-kernel virtual GIC - * implementation, we require the GIC to be already - * initialized when initializing the PMU. - */ - if (!vgic_initialized(vcpu->kvm)) - return -ENODEV; - - if (!kvm_arm_pmu_irq_initialized(vcpu)) - return -ENXIO; - - ret = kvm_vgic_set_owner(vcpu, vcpu->arch.pmu.irq_num, - &vcpu->arch.pmu); - if (ret) - return ret; - } - - init_irq_work(&vcpu->arch.pmu.overflow_work, - kvm_pmu_perf_overflow_notify_vcpu); - - vcpu->arch.pmu.created = true; - return 0; -} - -/* - * For one VM the interrupt type must be same for each vcpu. - * As a PPI, the interrupt number is the same for all vcpus, - * while as an SPI it must be a separate number per vcpu. - */ -static bool pmu_irq_is_valid(struct kvm *kvm, int irq) -{ - unsigned long i; - struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) { - if (!kvm_arm_pmu_irq_initialized(vcpu)) - continue; - - if (irq_is_ppi(irq)) { - if (vcpu->arch.pmu.irq_num != irq) - return false; - } else { - if (vcpu->arch.pmu.irq_num == irq) - return false; - } - } - - return true; -} - -/** - * kvm_arm_pmu_get_max_counters - Return the max number of PMU counters. - * @kvm: The kvm pointer - */ -u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) -{ - struct arm_pmu *arm_pmu = kvm->arch.arm_pmu; - - /* - * The arm_pmu->cntr_mask considers the fixed counter(s) as well. - * Ignore those and return only the general-purpose counters. - */ - return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS); -} - -static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) -{ - lockdep_assert_held(&kvm->arch.config_lock); - - kvm->arch.arm_pmu = arm_pmu; - kvm->arch.pmcr_n = kvm_arm_pmu_get_max_counters(kvm); -} - -/** - * kvm_arm_set_default_pmu - No PMU set, get the default one. - * @kvm: The kvm pointer - * - * The observant among you will notice that the supported_cpus - * mask does not get updated for the default PMU even though it - * is quite possible the selected instance supports only a - * subset of cores in the system. This is intentional, and - * upholds the preexisting behavior on heterogeneous systems - * where vCPUs can be scheduled on any core but the guest - * counters could stop working. - */ -int kvm_arm_set_default_pmu(struct kvm *kvm) -{ - struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu(); - - if (!arm_pmu) - return -ENODEV; - - kvm_arm_set_pmu(kvm, arm_pmu); - return 0; -} - -static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id) -{ - struct kvm *kvm = vcpu->kvm; - struct arm_pmu_entry *entry; - struct arm_pmu *arm_pmu; - int ret = -ENXIO; - - lockdep_assert_held(&kvm->arch.config_lock); - mutex_lock(&arm_pmus_lock); - - list_for_each_entry(entry, &arm_pmus, entry) { - arm_pmu = entry->arm_pmu; - if (arm_pmu->pmu.type == pmu_id) { - if (kvm_vm_has_ran_once(kvm) || - (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) { - ret = -EBUSY; - break; - } - - kvm_arm_set_pmu(kvm, arm_pmu); - cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus); - ret = 0; - break; - } - } - - mutex_unlock(&arm_pmus_lock); - return ret; -} - -int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) -{ - struct kvm *kvm = vcpu->kvm; - - lockdep_assert_held(&kvm->arch.config_lock); - - if (!kvm_vcpu_has_pmu(vcpu)) - return -ENODEV; - - if (vcpu->arch.pmu.created) - return -EBUSY; - - switch (attr->attr) { - case KVM_ARM_VCPU_PMU_V3_IRQ: { - int __user *uaddr = (int __user *)(long)attr->addr; - int irq; - - if (!irqchip_in_kernel(kvm)) - return -EINVAL; - - if (get_user(irq, uaddr)) - return -EFAULT; - - /* The PMU overflow interrupt can be a PPI or a valid SPI. */ - if (!(irq_is_ppi(irq) || irq_is_spi(irq))) - return -EINVAL; - - if (!pmu_irq_is_valid(kvm, irq)) - return -EINVAL; - - if (kvm_arm_pmu_irq_initialized(vcpu)) - return -EBUSY; - - kvm_debug("Set kvm ARM PMU irq: %d\n", irq); - vcpu->arch.pmu.irq_num = irq; - return 0; - } - case KVM_ARM_VCPU_PMU_V3_FILTER: { - u8 pmuver = kvm_arm_pmu_get_pmuver_limit(); - struct kvm_pmu_event_filter __user *uaddr; - struct kvm_pmu_event_filter filter; - int nr_events; - - /* - * Allow userspace to specify an event filter for the entire - * event range supported by PMUVer of the hardware, rather - * than the guest's PMUVer for KVM backward compatibility. - */ - nr_events = __kvm_pmu_event_mask(pmuver) + 1; - - uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr; - - if (copy_from_user(&filter, uaddr, sizeof(filter))) - return -EFAULT; - - if (((u32)filter.base_event + filter.nevents) > nr_events || - (filter.action != KVM_PMU_EVENT_ALLOW && - filter.action != KVM_PMU_EVENT_DENY)) - return -EINVAL; - - if (kvm_vm_has_ran_once(kvm)) - return -EBUSY; - - if (!kvm->arch.pmu_filter) { - kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); - if (!kvm->arch.pmu_filter) - return -ENOMEM; - - /* - * The default depends on the first applied filter. - * If it allows events, the default is to deny. - * Conversely, if the first filter denies a set of - * events, the default is to allow. - */ - if (filter.action == KVM_PMU_EVENT_ALLOW) - bitmap_zero(kvm->arch.pmu_filter, nr_events); - else - bitmap_fill(kvm->arch.pmu_filter, nr_events); - } - - if (filter.action == KVM_PMU_EVENT_ALLOW) - bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents); - else - bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents); - - return 0; - } - case KVM_ARM_VCPU_PMU_V3_SET_PMU: { - int __user *uaddr = (int __user *)(long)attr->addr; - int pmu_id; - - if (get_user(pmu_id, uaddr)) - return -EFAULT; - - return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id); - } - case KVM_ARM_VCPU_PMU_V3_INIT: - return kvm_arm_pmu_v3_init(vcpu); - } - - return -ENXIO; -} - -int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) -{ - switch (attr->attr) { - case KVM_ARM_VCPU_PMU_V3_IRQ: { - int __user *uaddr = (int __user *)(long)attr->addr; - int irq; - - if (!irqchip_in_kernel(vcpu->kvm)) - return -EINVAL; - - if (!kvm_vcpu_has_pmu(vcpu)) - return -ENODEV; - - if (!kvm_arm_pmu_irq_initialized(vcpu)) - return -ENXIO; - - irq = vcpu->arch.pmu.irq_num; - return put_user(irq, uaddr); - } - } - - return -ENXIO; -} - -int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) -{ - switch (attr->attr) { - case KVM_ARM_VCPU_PMU_V3_IRQ: - case KVM_ARM_VCPU_PMU_V3_INIT: - case KVM_ARM_VCPU_PMU_V3_FILTER: - case KVM_ARM_VCPU_PMU_V3_SET_PMU: - if (kvm_vcpu_has_pmu(vcpu)) - return 0; - } - - return -ENXIO; -} - -u8 kvm_arm_pmu_get_pmuver_limit(void) -{ - u64 tmp; - - tmp = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1); - tmp = cpuid_feature_cap_perfmon_field(tmp, - ID_AA64DFR0_EL1_PMUVer_SHIFT, - ID_AA64DFR0_EL1_PMUVer_V3P5); - return FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), tmp); -} - /** * kvm_vcpu_read_pmcr - Read PMCR_EL0 register for the vCPU * @vcpu: The vcpu pointer diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 3affc9074d71..85b5cb432c4f 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -10,6 +10,17 @@
#include <asm/kvm_pmu.h>
+#define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) + +struct arm_pmu_entry { + struct list_head entry; + struct arm_pmu *arm_pmu; +}; + +DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available); + +static LIST_HEAD(arm_pmus); +static DEFINE_MUTEX(arm_pmus_lock); static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events);
/* @@ -211,3 +222,442 @@ void kvm_vcpu_pmu_resync_el0(void)
kvm_make_request(KVM_REQ_RESYNC_PMU_EL0, vcpu); } + +void kvm_host_pmu_init(struct arm_pmu *pmu) +{ + struct arm_pmu_entry *entry; + + /* + * Check the sanitised PMU version for the system, as KVM does not + * support implementations where PMUv3 exists on a subset of CPUs. + */ + if (!pmuv3_implemented(kvm_arm_pmu_get_pmuver_limit())) + return; + + mutex_lock(&arm_pmus_lock); + + entry = kmalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) + goto out_unlock; + + entry->arm_pmu = pmu; + list_add_tail(&entry->entry, &arm_pmus); + + if (list_is_singular(&arm_pmus)) + static_branch_enable(&kvm_arm_pmu_available); + +out_unlock: + mutex_unlock(&arm_pmus_lock); +} + +static struct arm_pmu *kvm_pmu_probe_armpmu(void) +{ + struct arm_pmu *tmp, *pmu = NULL; + struct arm_pmu_entry *entry; + int cpu; + + mutex_lock(&arm_pmus_lock); + + /* + * It is safe to use a stale cpu to iterate the list of PMUs so long as + * the same value is used for the entirety of the loop. Given this, and + * the fact that no percpu data is used for the lookup there is no need + * to disable preemption. + * + * It is still necessary to get a valid cpu, though, to probe for the + * default PMU instance as userspace is not required to specify a PMU + * type. In order to uphold the preexisting behavior KVM selects the + * PMU instance for the core during vcpu init. A dependent use + * case would be a user with disdain of all things big.LITTLE that + * affines the VMM to a particular cluster of cores. + * + * In any case, userspace should just do the sane thing and use the UAPI + * to select a PMU type directly. But, be wary of the baggage being + * carried here. + */ + cpu = raw_smp_processor_id(); + list_for_each_entry(entry, &arm_pmus, entry) { + tmp = entry->arm_pmu; + + if (cpumask_test_cpu(cpu, &tmp->supported_cpus)) { + pmu = tmp; + break; + } + } + + mutex_unlock(&arm_pmus_lock); + + return pmu; +} + + +/** + * kvm_arm_pmu_get_max_counters - Return the max number of PMU counters. + * @kvm: The kvm pointer + */ +u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) +{ + struct arm_pmu *arm_pmu = kvm->arch.arm_pmu; + + /* + * The arm_pmu->cntr_mask considers the fixed counter(s) as well. + * Ignore those and return only the general-purpose counters. + */ + return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS); +} + +static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) +{ + lockdep_assert_held(&kvm->arch.config_lock); + + kvm->arch.arm_pmu = arm_pmu; + kvm->arch.pmcr_n = kvm_arm_pmu_get_max_counters(kvm); +} + +/** + * kvm_arm_set_default_pmu - No PMU set, get the default one. + * @kvm: The kvm pointer + * + * The observant among you will notice that the supported_cpus + * mask does not get updated for the default PMU even though it + * is quite possible the selected instance supports only a + * subset of cores in the system. This is intentional, and + * upholds the preexisting behavior on heterogeneous systems + * where vCPUs can be scheduled on any core but the guest + * counters could stop working. + */ +int kvm_arm_set_default_pmu(struct kvm *kvm) +{ + struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu(); + + if (!arm_pmu) + return -ENODEV; + + kvm_arm_set_pmu(kvm, arm_pmu); + return 0; +} + +static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id) +{ + struct kvm *kvm = vcpu->kvm; + struct arm_pmu_entry *entry; + struct arm_pmu *arm_pmu; + int ret = -ENXIO; + + lockdep_assert_held(&kvm->arch.config_lock); + mutex_lock(&arm_pmus_lock); + + list_for_each_entry(entry, &arm_pmus, entry) { + arm_pmu = entry->arm_pmu; + if (arm_pmu->pmu.type == pmu_id) { + if (kvm_vm_has_ran_once(kvm) || + (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) { + ret = -EBUSY; + break; + } + + kvm_arm_set_pmu(kvm, arm_pmu); + cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus); + ret = 0; + break; + } + } + + mutex_unlock(&arm_pmus_lock); + return ret; +} + + +/* + * For one VM the interrupt type must be same for each vcpu. + * As a PPI, the interrupt number is the same for all vcpus, + * while as an SPI it must be a separate number per vcpu. + */ +static bool pmu_irq_is_valid(struct kvm *kvm, int irq) +{ + unsigned long i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!kvm_arm_pmu_irq_initialized(vcpu)) + continue; + + if (irq_is_ppi(irq)) { + if (vcpu->arch.pmu.irq_num != irq) + return false; + } else { + if (vcpu->arch.pmu.irq_num == irq) + return false; + } + } + + return true; +} + +/* + * When perf interrupt is an NMI, we cannot safely notify the vcpu corresponding + * to the event. + * This is why we need a callback to do it once outside of the NMI context. + */ +static void kvm_pmu_perf_overflow_notify_vcpu(struct irq_work *work) +{ + struct kvm_vcpu *vcpu; + + vcpu = container_of(work, struct kvm_vcpu, arch.pmu.overflow_work); + kvm_vcpu_kick(vcpu); +} + +static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) +{ + if (irqchip_in_kernel(vcpu->kvm)) { + int ret; + + /* + * If using the PMU with an in-kernel virtual GIC + * implementation, we require the GIC to be already + * initialized when initializing the PMU. + */ + if (!vgic_initialized(vcpu->kvm)) + return -ENODEV; + + if (!kvm_arm_pmu_irq_initialized(vcpu)) + return -ENXIO; + + ret = kvm_vgic_set_owner(vcpu, vcpu->arch.pmu.irq_num, + &vcpu->arch.pmu); + if (ret) + return ret; + } + + init_irq_work(&vcpu->arch.pmu.overflow_work, + kvm_pmu_perf_overflow_notify_vcpu); + + vcpu->arch.pmu.created = true; + return 0; +} + +int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) +{ + if (!kvm_vcpu_has_pmu(vcpu)) + return 0; + + if (!vcpu->arch.pmu.created) + return -EINVAL; + + /* + * A valid interrupt configuration for the PMU is either to have a + * properly configured interrupt number and using an in-kernel + * irqchip, or to not have an in-kernel GIC and not set an IRQ. + */ + if (irqchip_in_kernel(vcpu->kvm)) { + int irq = vcpu->arch.pmu.irq_num; + /* + * If we are using an in-kernel vgic, at this point we know + * the vgic will be initialized, so we can check the PMU irq + * number against the dimensions of the vgic and make sure + * it's valid. + */ + if (!irq_is_ppi(irq) && !vgic_valid_spi(vcpu->kvm, irq)) + return -EINVAL; + } else if (kvm_arm_pmu_irq_initialized(vcpu)) { + return -EINVAL; + } + + /* One-off reload of the PMU on first run */ + kvm_make_request(KVM_REQ_RELOAD_PMU, vcpu); + + return 0; +} + +static u32 __kvm_pmu_event_mask(unsigned int pmuver) +{ + switch (pmuver) { + case ID_AA64DFR0_EL1_PMUVer_IMP: + return GENMASK(9, 0); + case ID_AA64DFR0_EL1_PMUVer_V3P1: + case ID_AA64DFR0_EL1_PMUVer_V3P4: + case ID_AA64DFR0_EL1_PMUVer_V3P5: + case ID_AA64DFR0_EL1_PMUVer_V3P7: + return GENMASK(15, 0); + default: /* Shouldn't be here, just for sanity */ + WARN_ONCE(1, "Unknown PMU version %d\n", pmuver); + return 0; + } +} + +u32 kvm_pmu_event_mask(struct kvm *kvm) +{ + u64 dfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64DFR0_EL1); + u8 pmuver = SYS_FIELD_GET(ID_AA64DFR0_EL1, PMUVer, dfr0); + + return __kvm_pmu_event_mask(pmuver); +} + +u64 kvm_pmu_evtyper_mask(struct kvm *kvm) +{ + u64 mask = ARMV8_PMU_EXCLUDE_EL1 | ARMV8_PMU_EXCLUDE_EL0 | + kvm_pmu_event_mask(kvm); + + if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL2, IMP)) + mask |= ARMV8_PMU_INCLUDE_EL2; + + if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL3, IMP)) + mask |= ARMV8_PMU_EXCLUDE_NS_EL0 | + ARMV8_PMU_EXCLUDE_NS_EL1 | + ARMV8_PMU_EXCLUDE_EL3; + + return mask; +} + +int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) +{ + struct kvm *kvm = vcpu->kvm; + + lockdep_assert_held(&kvm->arch.config_lock); + + if (!kvm_vcpu_has_pmu(vcpu)) + return -ENODEV; + + if (vcpu->arch.pmu.created) + return -EBUSY; + + switch (attr->attr) { + case KVM_ARM_VCPU_PMU_V3_IRQ: { + int __user *uaddr = (int __user *)(long)attr->addr; + int irq; + + if (!irqchip_in_kernel(kvm)) + return -EINVAL; + + if (get_user(irq, uaddr)) + return -EFAULT; + + /* The PMU overflow interrupt can be a PPI or a valid SPI. */ + if (!(irq_is_ppi(irq) || irq_is_spi(irq))) + return -EINVAL; + + if (!pmu_irq_is_valid(kvm, irq)) + return -EINVAL; + + if (kvm_arm_pmu_irq_initialized(vcpu)) + return -EBUSY; + + kvm_debug("Set kvm ARM PMU irq: %d\n", irq); + vcpu->arch.pmu.irq_num = irq; + return 0; + } + case KVM_ARM_VCPU_PMU_V3_FILTER: { + u8 pmuver = kvm_arm_pmu_get_pmuver_limit(); + struct kvm_pmu_event_filter __user *uaddr; + struct kvm_pmu_event_filter filter; + int nr_events; + + /* + * Allow userspace to specify an event filter for the entire + * event range supported by PMUVer of the hardware, rather + * than the guest's PMUVer for KVM backward compatibility. + */ + nr_events = __kvm_pmu_event_mask(pmuver) + 1; + + uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr; + + if (copy_from_user(&filter, uaddr, sizeof(filter))) + return -EFAULT; + + if (((u32)filter.base_event + filter.nevents) > nr_events || + (filter.action != KVM_PMU_EVENT_ALLOW && + filter.action != KVM_PMU_EVENT_DENY)) + return -EINVAL; + + if (kvm_vm_has_ran_once(kvm)) + return -EBUSY; + + if (!kvm->arch.pmu_filter) { + kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); + if (!kvm->arch.pmu_filter) + return -ENOMEM; + + /* + * The default depends on the first applied filter. + * If it allows events, the default is to deny. + * Conversely, if the first filter denies a set of + * events, the default is to allow. + */ + if (filter.action == KVM_PMU_EVENT_ALLOW) + bitmap_zero(kvm->arch.pmu_filter, nr_events); + else + bitmap_fill(kvm->arch.pmu_filter, nr_events); + } + + if (filter.action == KVM_PMU_EVENT_ALLOW) + bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents); + else + bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents); + + return 0; + } + case KVM_ARM_VCPU_PMU_V3_SET_PMU: { + int __user *uaddr = (int __user *)(long)attr->addr; + int pmu_id; + + if (get_user(pmu_id, uaddr)) + return -EFAULT; + + return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id); + } + case KVM_ARM_VCPU_PMU_V3_INIT: + return kvm_arm_pmu_v3_init(vcpu); + } + + return -ENXIO; +} + +int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) +{ + switch (attr->attr) { + case KVM_ARM_VCPU_PMU_V3_IRQ: { + int __user *uaddr = (int __user *)(long)attr->addr; + int irq; + + if (!irqchip_in_kernel(vcpu->kvm)) + return -EINVAL; + + if (!kvm_vcpu_has_pmu(vcpu)) + return -ENODEV; + + if (!kvm_arm_pmu_irq_initialized(vcpu)) + return -ENXIO; + + irq = vcpu->arch.pmu.irq_num; + return put_user(irq, uaddr); + } + } + + return -ENXIO; +} + + +int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) +{ + switch (attr->attr) { + case KVM_ARM_VCPU_PMU_V3_IRQ: + case KVM_ARM_VCPU_PMU_V3_INIT: + case KVM_ARM_VCPU_PMU_V3_FILTER: + case KVM_ARM_VCPU_PMU_V3_SET_PMU: + if (kvm_vcpu_has_pmu(vcpu)) + return 0; + } + + return -ENXIO; +} + +u8 kvm_arm_pmu_get_pmuver_limit(void) +{ + u64 tmp; + + tmp = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1); + tmp = cpuid_feature_cap_perfmon_field(tmp, + ID_AA64DFR0_EL1_PMUVer_SHIFT, + ID_AA64DFR0_EL1_PMUVer_V3P5); + return FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMUVer), tmp); +}
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in KVM to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because both KVM and the PMUv3 driver will need to know that to handle guests correctly.
Due to the difficulty this feature would create for the driver running at EL1 on the host, partitioning is only allowed in VHE mode. Working on nVHE mode would require a hypercall for every register access because the counters reserved for the host by HPMN are now only accessible to EL2.
The parameter is only configurable at boot time. Making the parameter configurable on a running system is dangerous due to the difficulty of knowing for sure no counters are in use anywhere so it is safe to reporgram HPMN.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm64/include/asm/kvm_pmu.h | 4 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/debug.c | 9 ++++-- arch/arm64/kvm/pmu-part.c | 47 ++++++++++++++++++++++++++++++++ arch/arm64/kvm/pmu.c | 2 ++ include/linux/perf/arm_pmu.h | 2 ++ 6 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/kvm/pmu-part.c
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 613cddbdbdd8..174b7f376d95 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -22,6 +22,10 @@ bool kvm_set_pmuserenr(u64 val); void kvm_vcpu_pmu_resync_el0(void); void kvm_host_pmu_init(struct arm_pmu *pmu);
+u8 kvm_pmu_get_reserved_counters(void); +u8 kvm_pmu_hpmn(u8 nr_counters); +void kvm_pmu_partition(struct arm_pmu *pmu); + #else
static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 3cf7adb2b503..065a6b804c84 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -25,7 +25,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \ vgic/vgic-its.o vgic/vgic-debug.o
-kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o +kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu-part.o pmu.o kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 7fb1d9e7180f..b5ac5a213877 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -31,15 +31,18 @@ */ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) { + u8 counters = *host_data_ptr(nr_event_counters); + u8 hpmn = kvm_pmu_hpmn(counters); + preempt_disable();
/* * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */ - vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, - *host_data_ptr(nr_event_counters)); - vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | + vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn); + vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD | + MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | MDCR_EL2_TPMCR | diff --git a/arch/arm64/kvm/pmu-part.c b/arch/arm64/kvm/pmu-part.c new file mode 100644 index 000000000000..e74fecc67e37 --- /dev/null +++ b/arch/arm64/kvm/pmu-part.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Google LLC + * Author: Colton Lewis coltonlewis@google.com + */ + +#include <linux/kvm_host.h> +#include <linux/perf/arm_pmu.h> + +#include <asm/kvm_pmu.h> + +static u8 reserved_host_counters __read_mostly; + +module_param(reserved_host_counters, byte, 0); +MODULE_PARM_DESC(reserved_host_counters, + "Partition the PMU into host and guest counters"); + +u8 kvm_pmu_get_reserved_counters(void) +{ + return reserved_host_counters; +} + +u8 kvm_pmu_hpmn(u8 nr_counters) +{ + if (reserved_host_counters >= nr_counters) { + if (this_cpu_has_cap(ARM64_HAS_HPMN0)) + return 0; + + return 1; + } + + return nr_counters - reserved_host_counters; +} + +void kvm_pmu_partition(struct arm_pmu *pmu) +{ + u8 nr_counters = *host_data_ptr(nr_event_counters); + u8 hpmn = kvm_pmu_hpmn(nr_counters); + + if (hpmn < nr_counters) { + pmu->hpmn = hpmn; + pmu->partitioned = true; + } else { + pmu->hpmn = nr_counters; + pmu->partitioned = false; + } +} diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 85b5cb432c4f..7169c1a24dd6 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -243,6 +243,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu) entry->arm_pmu = pmu; list_add_tail(&entry->entry, &arm_pmus);
+ kvm_pmu_partition(pmu); + if (list_is_singular(&arm_pmus)) static_branch_enable(&kvm_arm_pmu_available);
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 35c3a85bee43..ee4fc2e26bff 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -125,6 +125,8 @@ struct arm_pmu {
/* Only to be used by ACPI probing code */ unsigned long acpi_cpuid; + u8 hpmn; /* MDCR_EL2.HPMN: counter partition pivot */ + bool partitioned; };
#define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
Colton Lewis coltonlewis@google.com writes:
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in KVM to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because both KVM and the PMUv3 driver will need to know that to handle guests correctly.
Due to the difficulty this feature would create for the driver running at EL1 on the host, partitioning is only allowed in VHE mode. Working on nVHE mode would require a hypercall for every register access because the counters reserved for the host by HPMN are now only accessible to EL2.
The parameter is only configurable at boot time. Making the parameter configurable on a running system is dangerous due to the difficulty of knowing for sure no counters are in use anywhere so it is safe to reporgram HPMN.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/include/asm/kvm_pmu.h | 4 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/debug.c | 9 ++++-- arch/arm64/kvm/pmu-part.c | 47 ++++++++++++++++++++++++++++++++ arch/arm64/kvm/pmu.c | 2 ++ include/linux/perf/arm_pmu.h | 2 ++ 6 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/kvm/pmu-part.c
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 613cddbdbdd8..174b7f376d95 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -22,6 +22,10 @@ bool kvm_set_pmuserenr(u64 val); void kvm_vcpu_pmu_resync_el0(void); void kvm_host_pmu_init(struct arm_pmu *pmu);
+u8 kvm_pmu_get_reserved_counters(void); +u8 kvm_pmu_hpmn(u8 nr_counters); +void kvm_pmu_partition(struct arm_pmu *pmu);
- #else
static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 3cf7adb2b503..065a6b804c84 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -25,7 +25,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \ vgic/vgic-its.o vgic/vgic-debug.o
-kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o +kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu-part.o pmu.o kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 7fb1d9e7180f..b5ac5a213877 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -31,15 +31,18 @@ */ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) {
- u8 counters = *host_data_ptr(nr_event_counters);
- u8 hpmn = kvm_pmu_hpmn(counters);
- preempt_disable();
/* * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
*host_data_ptr(nr_event_counters));
- vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
- vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD |
MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | MDCR_EL2_TPMCR |
diff --git a/arch/arm64/kvm/pmu-part.c b/arch/arm64/kvm/pmu-part.c new file mode 100644 index 000000000000..e74fecc67e37 --- /dev/null +++ b/arch/arm64/kvm/pmu-part.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Copyright (C) 2025 Google LLC
- Author: Colton Lewis coltonlewis@google.com
- */
+#include <linux/kvm_host.h> +#include <linux/perf/arm_pmu.h>
+#include <asm/kvm_pmu.h>
+static u8 reserved_host_counters __read_mostly;
+module_param(reserved_host_counters, byte, 0); +MODULE_PARM_DESC(reserved_host_counters,
"Partition the PMU into host and guest counters");
+u8 kvm_pmu_get_reserved_counters(void) +{
- return reserved_host_counters;
+}
+u8 kvm_pmu_hpmn(u8 nr_counters) +{
- if (reserved_host_counters >= nr_counters) {
if (this_cpu_has_cap(ARM64_HAS_HPMN0))
return 0;
return 1;
- }
- return nr_counters - reserved_host_counters;
+}
+void kvm_pmu_partition(struct arm_pmu *pmu) +{
- u8 nr_counters = *host_data_ptr(nr_event_counters);
- u8 hpmn = kvm_pmu_hpmn(nr_counters);
- if (hpmn < nr_counters) {
pmu->hpmn = hpmn;
pmu->partitioned = true;
- } else {
pmu->hpmn = nr_counters;
pmu->partitioned = false;
- }
+}
There should be a VHE check in here. I thought I wouldn't need it with moving MDCR_EL2 writes out of the driver but I just remembered there are two spots in patch 7 I still need to write that register.
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 85b5cb432c4f..7169c1a24dd6 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -243,6 +243,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu) entry->arm_pmu = pmu; list_add_tail(&entry->entry, &arm_pmus);
- kvm_pmu_partition(pmu);
- if (list_is_singular(&arm_pmus)) static_branch_enable(&kvm_arm_pmu_available);
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 35c3a85bee43..ee4fc2e26bff 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -125,6 +125,8 @@ struct arm_pmu {
/* Only to be used by ACPI probing code */ unsigned long acpi_cpuid;
- u8 hpmn; /* MDCR_EL2.HPMN: counter partition pivot */
- bool partitioned; };
#define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
2.48.1.601.g30ceb7b040-goog
On 13/02/2025 6:03 pm, Colton Lewis wrote:
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in KVM to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because both KVM and the PMUv3 driver will need to know that to handle guests correctly.
Due to the difficulty this feature would create for the driver running at EL1 on the host, partitioning is only allowed in VHE mode. Working on nVHE mode would require a hypercall for every register access because the counters reserved for the host by HPMN are now only accessible to EL2.
The parameter is only configurable at boot time. Making the parameter configurable on a running system is dangerous due to the difficulty of knowing for sure no counters are in use anywhere so it is safe to reporgram HPMN.
Hi Colton,
For some high level feedback for the RFC, it probably makes sense to include the other half of the feature at the same time. I think there is a risk that it requires something slightly different than what's here and there ends up being some churn.
Other than that I think it looks ok apart from some minor code review nits.
I was also thinking about how BRBE interacts with this. Alex has done some analysis that finds that it's difficult to use BRBE in guests with virtualized counters due to the fact that BRBE freezes on any counter overflow, rather than just guest ones. That leaves the guest with branch blackout windows in the delay between a host counter overflowing and the interrupt being taken and BRBE being restarted.
But with HPMN, BRBE does allow freeze on overflow of only one partition or the other (or both, but I don't think we'd want that) e.g.:
RNXCWF: If EL2 is implemented, a BRBE freeze event occurs when all of the following are true:
* BRBCR_EL1.FZP is 1. * Generation of Branch records is not paused. * PMOVSCLR_EL0[(MDCR_EL2.HPMN-1):0] is nonzero. * The PE is in a BRBE Non-prohibited region.
Unfortunately that means we could only let guests use BRBE with a partitioned PMU, which would massively reduce flexibility if hosts have to lose counters just so the guest can use BRBE.
I don't know if this is a stupid idea, but instead of having a fixed number for the partition, wouldn't it be nice if we could trap and increment HPMN on the first guest use of a counter, then decrement it on guest exit depending on what's still in use? The host would always assign its counters from the top down, and guests go bottom up if they want PMU passthrough. Maybe it's too complicated or won't work for various reasons, but because of BRBE the counter partitioning changes go from an optimization to almost a necessity.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/include/asm/kvm_pmu.h | 4 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/debug.c | 9 ++++-- arch/arm64/kvm/pmu-part.c | 47 ++++++++++++++++++++++++++++++++ arch/arm64/kvm/pmu.c | 2 ++ include/linux/perf/arm_pmu.h | 2 ++ 6 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/kvm/pmu-part.c
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 613cddbdbdd8..174b7f376d95 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -22,6 +22,10 @@ bool kvm_set_pmuserenr(u64 val); void kvm_vcpu_pmu_resync_el0(void); void kvm_host_pmu_init(struct arm_pmu *pmu); +u8 kvm_pmu_get_reserved_counters(void); +u8 kvm_pmu_hpmn(u8 nr_counters); +void kvm_pmu_partition(struct arm_pmu *pmu);
- #else
static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 3cf7adb2b503..065a6b804c84 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -25,7 +25,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \ vgic/vgic-its.o vgic/vgic-debug.o -kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o +kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu-part.o pmu.o kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 7fb1d9e7180f..b5ac5a213877 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -31,15 +31,18 @@ */ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) {
- u8 counters = *host_data_ptr(nr_event_counters);
- u8 hpmn = kvm_pmu_hpmn(counters);
- preempt_disable();
Would you not need to use vcpu->cpu here to access host_data? The preempt_disable() after the access seems suspicious. I think you'll end up with the same issue as here:
https://lore.kernel.org/kvmarm/5edb7c69-f548-4651-8b63-1643c5b13dac@linaro.o...
/* * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
*host_data_ptr(nr_event_counters));
- vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
- vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD |
MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | MDCR_EL2_TPMCR |
diff --git a/arch/arm64/kvm/pmu-part.c b/arch/arm64/kvm/pmu-part.c new file mode 100644 index 000000000000..e74fecc67e37 --- /dev/null +++ b/arch/arm64/kvm/pmu-part.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Copyright (C) 2025 Google LLC
- Author: Colton Lewis coltonlewis@google.com
- */
+#include <linux/kvm_host.h> +#include <linux/perf/arm_pmu.h>
+#include <asm/kvm_pmu.h>
+static u8 reserved_host_counters __read_mostly;
+module_param(reserved_host_counters, byte, 0); +MODULE_PARM_DESC(reserved_host_counters,
"Partition the PMU into host and guest counters");
+u8 kvm_pmu_get_reserved_counters(void) +{
- return reserved_host_counters;
+}
+u8 kvm_pmu_hpmn(u8 nr_counters) +{
- if (reserved_host_counters >= nr_counters) {
if (this_cpu_has_cap(ARM64_HAS_HPMN0))
return 0;
return 1;
- }
- return nr_counters - reserved_host_counters;
+}
+void kvm_pmu_partition(struct arm_pmu *pmu) +{
- u8 nr_counters = *host_data_ptr(nr_event_counters);
- u8 hpmn = kvm_pmu_hpmn(nr_counters);
- if (hpmn < nr_counters) {
pmu->hpmn = hpmn;
pmu->partitioned = true;
Looks like Rob's point about pmu->partitioned being duplicate data stands again. On the previous version you mentioned that saving it was to avoid reading PMCR.N, but now it's not based on PMCR.N anymore.
Thanks James
Hi James,
Thanks for the review.
James Clark james.clark@linaro.org writes:
On 13/02/2025 6:03 pm, Colton Lewis wrote:
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in KVM to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because both KVM and the PMUv3 driver will need to know that to handle guests correctly.
Due to the difficulty this feature would create for the driver running at EL1 on the host, partitioning is only allowed in VHE mode. Working on nVHE mode would require a hypercall for every register access because the counters reserved for the host by HPMN are now only accessible to EL2.
The parameter is only configurable at boot time. Making the parameter configurable on a running system is dangerous due to the difficulty of knowing for sure no counters are in use anywhere so it is safe to reporgram HPMN.
Hi Colton,
For some high level feedback for the RFC, it probably makes sense to include the other half of the feature at the same time. I think there is a risk that it requires something slightly different than what's here and there ends up being some churn.
I agree. That's what I'm working on now. I justed wanted an iteration or two in public so I'm not building on something that needs drastic change later.
Other than that I think it looks ok apart from some minor code review nits.
Thank you
I was also thinking about how BRBE interacts with this. Alex has done some analysis that finds that it's difficult to use BRBE in guests with virtualized counters due to the fact that BRBE freezes on any counter overflow, rather than just guest ones. That leaves the guest with branch blackout windows in the delay between a host counter overflowing and the interrupt being taken and BRBE being restarted.
But with HPMN, BRBE does allow freeze on overflow of only one partition or the other (or both, but I don't think we'd want that) e.g.:
RNXCWF: If EL2 is implemented, a BRBE freeze event occurs when all of the following are true:
- BRBCR_EL1.FZP is 1.
- Generation of Branch records is not paused.
- PMOVSCLR_EL0[(MDCR_EL2.HPMN-1):0] is nonzero.
- The PE is in a BRBE Non-prohibited region.
Unfortunately that means we could only let guests use BRBE with a partitioned PMU, which would massively reduce flexibility if hosts have to lose counters just so the guest can use BRBE.
I don't know if this is a stupid idea, but instead of having a fixed number for the partition, wouldn't it be nice if we could trap and increment HPMN on the first guest use of a counter, then decrement it on guest exit depending on what's still in use? The host would always assign its counters from the top down, and guests go bottom up if they want PMU passthrough. Maybe it's too complicated or won't work for various reasons, but because of BRBE the counter partitioning changes go from an optimization to almost a necessity.
This is a cool idea that would enable useful things. I can think of a few potential problems.
1. Partitioning will give guests direct access to some PMU counter registers. There is no reliable way for KVM to determine what is in use from that state. A counter that is disabled guest at exit might only be so temporarily, which could lead to a lot of thrashing allocating and deallocating counters.
2. HPMN affects reads of PMCR_EL0.N, which is the standard way to determine how many counters there are. If HPMN starts as a low number, guests have no way of knowing there are more counters available. Dynamically changing the counters available could be confusing for guests.
3. If guests were aware they could write beyond HPMN and get the counters allocated to them, nothing stops them from writing at counter N and taking as many counters as possible to starve the host.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm64/include/asm/kvm_pmu.h | 4 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/debug.c | 9 ++++-- arch/arm64/kvm/pmu-part.c | 47 ++++++++++++++++++++++++++++++++ arch/arm64/kvm/pmu.c | 2 ++ include/linux/perf/arm_pmu.h | 2 ++ 6 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/kvm/pmu-part.c
diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 613cddbdbdd8..174b7f376d95 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -22,6 +22,10 @@ bool kvm_set_pmuserenr(u64 val); void kvm_vcpu_pmu_resync_el0(void); void kvm_host_pmu_init(struct arm_pmu *pmu);
+u8 kvm_pmu_get_reserved_counters(void); +u8 kvm_pmu_hpmn(u8 nr_counters); +void kvm_pmu_partition(struct arm_pmu *pmu);
- #else
static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 3cf7adb2b503..065a6b804c84 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -25,7 +25,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \ vgic/vgic-its.o vgic/vgic-debug.o
-kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o +kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu-part.o pmu.o kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 7fb1d9e7180f..b5ac5a213877 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -31,15 +31,18 @@ */ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) {
- u8 counters = *host_data_ptr(nr_event_counters);
- u8 hpmn = kvm_pmu_hpmn(counters);
- preempt_disable();
Would you not need to use vcpu->cpu here to access host_data? The preempt_disable() after the access seems suspicious. I think you'll end up with the same issue as here:
https://lore.kernel.org/kvmarm/5edb7c69-f548-4651-8b63-1643c5b13dac@linaro.o...
I think that's right. I should use the host_data for vcpu->cpu
/* * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK * to disable guest access to the profiling and trace buffers */
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
*host_data_ptr(nr_event_counters));
- vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
- vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
- vcpu->arch.mdcr_el2 |= (MDCR_EL2_HPMD |
MDCR_EL2_TPM | MDCR_EL2_TPMS | MDCR_EL2_TTRF | MDCR_EL2_TPMCR |
diff --git a/arch/arm64/kvm/pmu-part.c b/arch/arm64/kvm/pmu-part.c new file mode 100644 index 000000000000..e74fecc67e37 --- /dev/null +++ b/arch/arm64/kvm/pmu-part.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Copyright (C) 2025 Google LLC
- Author: Colton Lewis coltonlewis@google.com
- */
+#include <linux/kvm_host.h> +#include <linux/perf/arm_pmu.h>
+#include <asm/kvm_pmu.h>
+static u8 reserved_host_counters __read_mostly;
+module_param(reserved_host_counters, byte, 0); +MODULE_PARM_DESC(reserved_host_counters,
"Partition the PMU into host and guest counters");
+u8 kvm_pmu_get_reserved_counters(void) +{
- return reserved_host_counters;
+}
+u8 kvm_pmu_hpmn(u8 nr_counters) +{
- if (reserved_host_counters >= nr_counters) {
if (this_cpu_has_cap(ARM64_HAS_HPMN0))
return 0;
return 1;
- }
- return nr_counters - reserved_host_counters;
+}
+void kvm_pmu_partition(struct arm_pmu *pmu) +{
- u8 nr_counters = *host_data_ptr(nr_event_counters);
- u8 hpmn = kvm_pmu_hpmn(nr_counters);
- if (hpmn < nr_counters) {
pmu->hpmn = hpmn;
pmu->partitioned = true;
Looks like Rob's point about pmu->partitioned being duplicate data stands again. On the previous version you mentioned that saving it was to avoid reading PMCR.N, but now it's not based on PMCR.N anymore.
I will make it a function instead so the meaning of hpmn < nr_counters is clear.
On 25/03/2025 6:32 pm, Colton Lewis wrote:
Hi James,
Thanks for the review.
James Clark james.clark@linaro.org writes:
On 13/02/2025 6:03 pm, Colton Lewis wrote:
For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2.
Introduce a module parameter in KVM to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance.
Track HPMN and whether the pmu is partitioned in struct arm_pmu because both KVM and the PMUv3 driver will need to know that to handle guests correctly.
Due to the difficulty this feature would create for the driver running at EL1 on the host, partitioning is only allowed in VHE mode. Working on nVHE mode would require a hypercall for every register access because the counters reserved for the host by HPMN are now only accessible to EL2.
The parameter is only configurable at boot time. Making the parameter configurable on a running system is dangerous due to the difficulty of knowing for sure no counters are in use anywhere so it is safe to reporgram HPMN.
Hi Colton,
For some high level feedback for the RFC, it probably makes sense to include the other half of the feature at the same time. I think there is a risk that it requires something slightly different than what's here and there ends up being some churn.
I agree. That's what I'm working on now. I justed wanted an iteration or two in public so I'm not building on something that needs drastic change later.
Other than that I think it looks ok apart from some minor code review nits.
Thank you
I was also thinking about how BRBE interacts with this. Alex has done some analysis that finds that it's difficult to use BRBE in guests with virtualized counters due to the fact that BRBE freezes on any counter overflow, rather than just guest ones. That leaves the guest with branch blackout windows in the delay between a host counter overflowing and the interrupt being taken and BRBE being restarted.
But with HPMN, BRBE does allow freeze on overflow of only one partition or the other (or both, but I don't think we'd want that) e.g.:
RNXCWF: If EL2 is implemented, a BRBE freeze event occurs when all of   the following are true:
* BRBCR_EL1.FZP is 1. Â Â * Generation of Branch records is not paused. Â Â * PMOVSCLR_EL0[(MDCR_EL2.HPMN-1):0] is nonzero. Â Â * The PE is in a BRBE Non-prohibited region.
Unfortunately that means we could only let guests use BRBE with a partitioned PMU, which would massively reduce flexibility if hosts have to lose counters just so the guest can use BRBE.
I don't know if this is a stupid idea, but instead of having a fixed number for the partition, wouldn't it be nice if we could trap and increment HPMN on the first guest use of a counter, then decrement it on guest exit depending on what's still in use? The host would always assign its counters from the top down, and guests go bottom up if they want PMU passthrough. Maybe it's too complicated or won't work for various reasons, but because of BRBE the counter partitioning changes go from an optimization to almost a necessity.
This is a cool idea that would enable useful things. I can think of a few potential problems.
- Partitioning will give guests direct access to some PMU counter
registers. There is no reliable way for KVM to determine what is in use from that state. A counter that is disabled guest at exit might only be so temporarily, which could lead to a lot of thrashing allocating and deallocating counters.
- HPMN affects reads of PMCR_EL0.N, which is the standard way to
determine how many counters there are. If HPMN starts as a low number, guests have no way of knowing there are more counters available. Dynamically changing the counters available could be confusing for guests.
Yes I was expecting that PMCR would have to be trapped and N reported to be the number of physical counters rather than how many are in the guest partition.
- If guests were aware they could write beyond HPMN and get the
counters allocated to them, nothing stops them from writing at counter N and taking as many counters as possible to starve the host.
Is that much different than how it is now with virtualized PMUs? As in, the guest can use all of the counters and the host's events will have to contend with them.
You can still have a module param, except it's more of a limit to the size of the partition rather than fixing it upfront. The default value would be the max number of counters, allowing the most flexibility for the common use case where it's unlikely that both host and guests are contending for all counters. But if you really want to make sure the host doesn't get starved you can set it to a lower value.
All this does sound a bit like it could be done on top of the simple partitioning though. And it's mainly for making BRBE more accessible, which I'm not 100% convinced that the blackout windows are that big of a problem. We could say BRBE may have some holes if the host happens to be using counters at the same time, and if you want to be certain of no holes, use a host with partitioned counters.
James
On Wed, Mar 26, 2025 at 05:38:34PM +0000, James Clark wrote:
On 25/03/2025 6:32 pm, Colton Lewis wrote:
I don't know if this is a stupid idea, but instead of having a fixed number for the partition, wouldn't it be nice if we could trap and increment HPMN on the first guest use of a counter, then decrement it on guest exit depending on what's still in use? The host would always assign its counters from the top down, and guests go bottom up if they want PMU passthrough. Maybe it's too complicated or won't work for various reasons, but because of BRBE the counter partitioning changes go from an optimization to almost a necessity.
This is a cool idea that would enable useful things. I can think of a few potential problems.
- Partitioning will give guests direct access to some PMU counter
registers. There is no reliable way for KVM to determine what is in use from that state. A counter that is disabled guest at exit might only be so temporarily, which could lead to a lot of thrashing allocating and deallocating counters.
KVM must always have a reliable way to determine if the PMU is in use. If there's any counter in the vPMU for which kvm_pmu_counter_is_enabled() is true would do the trick...
Generally speaking, I would like to see the guest/host context switch in KVM modeled in a way similar to the debug registers, where the vPMU registers are loaded onto hardware lazily if either:
1) The above definition of an in-use PMU is satisfied
2) The guest accessed a PMU register since the last vcpu_load()
- HPMN affects reads of PMCR_EL0.N, which is the standard way to
determine how many counters there are. If HPMN starts as a low number, guests have no way of knowing there are more counters available. Dynamically changing the counters available could be confusing for guests.
Yes I was expecting that PMCR would have to be trapped and N reported to be the number of physical counters rather than how many are in the guest partition.
I'm not sure this is aligned with the spirit of the feature.
Colton's aim is to minimize the overheads of trapping the PMU *and* relying on the perf subsystem for event scheduling. To do dynamic partitioning as you've described, KVM would need to unconditionally trap the PMU registers so it can pack the guest counters into the guest partition. We cannot assume the VM will allocate counters sequentially.
Dynamic counter allocation can be had with the existing PMU implementation. The partitioned PMU is an alternative userspace can select, not a replacement for what we already have.
Thanks, Oliver
On 26/03/2025 8:40 pm, Oliver Upton wrote:
On Wed, Mar 26, 2025 at 05:38:34PM +0000, James Clark wrote:
On 25/03/2025 6:32 pm, Colton Lewis wrote:
I don't know if this is a stupid idea, but instead of having a fixed number for the partition, wouldn't it be nice if we could trap and increment HPMN on the first guest use of a counter, then decrement it on guest exit depending on what's still in use? The host would always assign its counters from the top down, and guests go bottom up if they want PMU passthrough. Maybe it's too complicated or won't work for various reasons, but because of BRBE the counter partitioning changes go from an optimization to almost a necessity.
This is a cool idea that would enable useful things. I can think of a few potential problems.
- Partitioning will give guests direct access to some PMU counter
registers. There is no reliable way for KVM to determine what is in use from that state. A counter that is disabled guest at exit might only be so temporarily, which could lead to a lot of thrashing allocating and deallocating counters.
KVM must always have a reliable way to determine if the PMU is in use. If there's any counter in the vPMU for which kvm_pmu_counter_is_enabled() is true would do the trick...
Generally speaking, I would like to see the guest/host context switch in KVM modeled in a way similar to the debug registers, where the vPMU registers are loaded onto hardware lazily if either:
The above definition of an in-use PMU is satisfied
The guest accessed a PMU register since the last vcpu_load()
- HPMN affects reads of PMCR_EL0.N, which is the standard way to
determine how many counters there are. If HPMN starts as a low number, guests have no way of knowing there are more counters available. Dynamically changing the counters available could be confusing for guests.
Yes I was expecting that PMCR would have to be trapped and N reported to be the number of physical counters rather than how many are in the guest partition.
I'm not sure this is aligned with the spirit of the feature.
Colton's aim is to minimize the overheads of trapping the PMU *and* relying on the perf subsystem for event scheduling. To do dynamic partitioning as you've described, KVM would need to unconditionally trap the PMU registers so it can pack the guest counters into the guest partition. We cannot assume the VM will allocate counters sequentially.
Yeah I agree, requiring cooperation from the guest probably makes it a non starter.
Dynamic counter allocation can be had with the existing PMU implementation. The partitioned PMU is an alternative userspace can select, not a replacement for what we already have.
Thanks, Oliver
It's just a shame that it doesn't look like there's a way to make BRBE work properly in guests with the existing implementation. Maybe we're stuck with only allowing it in a partition for now.
Thanks James
These bitmasks are valid for enable and interrupt registers as well as overflow registers. Generalize the names.
Signed-off-by: Colton Lewis coltonlewis@google.com --- include/linux/perf/arm_pmuv3.h | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index d698efba28a2..c2448477c37f 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -223,16 +223,23 @@ ARMV8_PMU_PMCR_X | ARMV8_PMU_PMCR_DP | \ ARMV8_PMU_PMCR_LC | ARMV8_PMU_PMCR_LP)
+/* + * Counter bitmask layouts for overflow, enable, and interrupts + */ +#define ARMV8_PMU_CNT_MASK_P GENMASK(30, 0) +#define ARMV8_PMU_CNT_MASK_C BIT(31) +#define ARMV8_PMU_CNT_MASK_F BIT_ULL(32) /* arm64 only */ +#define ARMV8_PMU_CNT_MASK_ALL (ARMV8_PMU_CNT_MASK_P | \ + ARMV8_PMU_CNT_MASK_C | \ + ARMV8_PMU_CNT_MASK_F) /* * PMOVSR: counters overflow flag status reg */ -#define ARMV8_PMU_OVSR_P GENMASK(30, 0) -#define ARMV8_PMU_OVSR_C BIT(31) -#define ARMV8_PMU_OVSR_F BIT_ULL(32) /* arm64 only */ +#define ARMV8_PMU_OVSR_P ARMV8_PMU_CNT_MASK_P +#define ARMV8_PMU_OVSR_C ARMV8_PMU_CNT_MASK_C +#define ARMV8_PMU_OVSR_F ARMV8_PMU_CNT_MASK_F /* Mask for writable bits is both P and C fields */ -#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \ - ARMV8_PMU_OVSR_F) - +#define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_CNT_MASK_ALL /* * PMXEVTYPER: Event selection reg */
If the PMU is partitioned, keep the driver out of the guest counter partition and only use the host counter partition. Partitioning is defined by the MDCR_EL2.HPMN register field and saved in cpu_pmu->hpmn. The range 0..HPMN-1 is accessible by EL1 and EL0 while HPMN..PMCR.N is reserved for EL2.
Define some macros that take HPMN as an argument and construct mutually exclusive bitmaps for testing which partition a particular counter is in. Note that despite their different position in the bitmap, the cycle and instruction counters are always in the guest partition.
Signed-off-by: Colton Lewis coltonlewis@google.com --- arch/arm/include/asm/arm_pmuv3.h | 2 + arch/arm64/include/asm/kvm_pmu.h | 5 +++ arch/arm64/kvm/pmu-part.c | 16 +++++++ drivers/perf/arm_pmuv3.c | 73 +++++++++++++++++++++++++++----- include/linux/perf/arm_pmuv3.h | 8 ++++ 5 files changed, 94 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index 2ec0e5e83fc9..dadd4ddf51af 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -227,6 +227,8 @@ static inline bool kvm_set_pmuserenr(u64 val) }
static inline void kvm_vcpu_pmu_resync_el0(void) {} +static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {}
/* PMU Version in DFR Register */ #define ARMV8_PMU_DFR_VER_NI 0 diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 174b7f376d95..8f25754fde47 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -25,6 +25,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu); u8 kvm_pmu_get_reserved_counters(void); u8 kvm_pmu_hpmn(u8 nr_counters); void kvm_pmu_partition(struct arm_pmu *pmu); +void kvm_pmu_host_counters_enable(void); +void kvm_pmu_host_counters_disable(void);
#else
@@ -37,6 +39,9 @@ static inline bool kvm_set_pmuserenr(u64 val) static inline void kvm_vcpu_pmu_resync_el0(void) {} static inline void kvm_host_pmu_init(struct arm_pmu *pmu) {}
+static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {} + #endif
#endif diff --git a/arch/arm64/kvm/pmu-part.c b/arch/arm64/kvm/pmu-part.c index e74fecc67e37..51da65c678f9 100644 --- a/arch/arm64/kvm/pmu-part.c +++ b/arch/arm64/kvm/pmu-part.c @@ -45,3 +45,19 @@ void kvm_pmu_partition(struct arm_pmu *pmu) pmu->partitioned = false; } } + +void kvm_pmu_host_counters_enable(void) +{ + u64 mdcr = read_sysreg(mdcr_el2); + + mdcr |= MDCR_EL2_HPME; + write_sysreg(mdcr, mdcr_el2); +} + +void kvm_pmu_host_counters_disable(void) +{ + u64 mdcr = read_sysreg(mdcr_el2); + + mdcr &= ~MDCR_EL2_HPME; + write_sysreg(mdcr, mdcr_el2); +} diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 0e360feb3432..442dcff56d5b 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -730,15 +730,19 @@ static void armv8pmu_disable_event_irq(struct perf_event *event) armv8pmu_disable_intens(BIT(event->hw.idx)); }
-static u64 armv8pmu_getreset_flags(void) +static u64 armv8pmu_getreset_flags(struct arm_pmu *cpu_pmu) { u64 value;
/* Read */ value = read_pmovsclr();
+ if (cpu_pmu->partitioned) + value &= ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn); + else + value &= ARMV8_PMU_OVERFLOWED_MASK; + /* Write to clear flags */ - value &= ARMV8_PMU_OVERFLOWED_MASK; write_pmovsclr(value);
return value; @@ -765,6 +769,18 @@ static void armv8pmu_disable_user_access(void) update_pmuserenr(0); }
+static bool armv8pmu_is_guest_part(struct arm_pmu *cpu_pmu, u8 idx) +{ + return cpu_pmu->partitioned && + (BIT(idx) & ARMV8_PMU_GUEST_CNT_PART(cpu_pmu->hpmn)); +} + +static bool armv8pmu_is_host_part(struct arm_pmu *cpu_pmu, u8 idx) +{ + return !cpu_pmu->partitioned || + (BIT(idx) & ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn)); +} + static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) { int i; @@ -773,6 +789,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) if (is_pmuv3p9(cpu_pmu->pmuver)) { u64 mask = 0; for_each_set_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) { + if (armv8pmu_is_guest_part(cpu_pmu, i)) + continue; if (armv8pmu_event_has_user_read(cpuc->events[i])) mask |= BIT(i); } @@ -781,6 +799,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) /* Clear any unused counters to avoid leaking their contents */ for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) { + if (armv8pmu_is_guest_part(cpu_pmu, i)) + continue; if (i == ARMV8_PMU_CYCLE_IDX) write_pmccntr(0); else if (i == ARMV8_PMU_INSTR_IDX) @@ -825,8 +845,10 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu) else armv8pmu_disable_user_access();
- /* Enable all counters */ - armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E); + if (cpu_pmu->partitioned) + kvm_pmu_host_counters_enable(); + else + armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
kvm_vcpu_pmu_resync_el0(); } @@ -834,7 +856,10 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu) static void armv8pmu_stop(struct arm_pmu *cpu_pmu) { /* Disable all counters */ - armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E); + if (cpu_pmu->partitioned) + kvm_pmu_host_counters_disable(); + else + armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E); }
static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) @@ -848,7 +873,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) /* * Get and reset the IRQ flags */ - pmovsr = armv8pmu_getreset_flags(); + pmovsr = armv8pmu_getreset_flags(cpu_pmu);
/* * Did an overflow occur? @@ -906,6 +931,8 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc, int idx;
for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { + if (armv8pmu_is_guest_part(cpu_pmu, idx)) + continue; if (!test_and_set_bit(idx, cpuc->used_mask)) return idx; } @@ -922,6 +949,8 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc, * the lower idx must be even. */ for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { + if (armv8pmu_is_guest_part(cpu_pmu, idx)) + continue; if (!(idx & 0x1)) continue; if (!test_and_set_bit(idx, cpuc->used_mask)) { @@ -944,6 +973,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
/* Always prefer to place a cycle counter into the cycle counter. */ if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) && + !cpu_pmu->partitioned && !armv8pmu_event_get_threshold(&event->attr)) { if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask)) return ARMV8_PMU_CYCLE_IDX; @@ -959,6 +989,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, * may not know how to handle it. */ if ((evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) && + !cpu_pmu->partitioned && !armv8pmu_event_get_threshold(&event->attr) && test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) && !armv8pmu_event_want_user_access(event)) { @@ -970,7 +1001,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, * Otherwise use events counters */ if (armv8pmu_event_is_chained(event)) - return armv8pmu_get_chain_idx(cpuc, cpu_pmu); + return armv8pmu_get_chain_idx(cpuc, cpu_pmu); else return armv8pmu_get_single_idx(cpuc, cpu_pmu); } @@ -1062,6 +1093,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, return 0; }
+static void armv8pmu_reset_host_counters(struct arm_pmu *cpu_pmu) +{ + int idx; + + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { + if (armv8pmu_is_host_part(cpu_pmu, idx)) + armv8pmu_write_evcntr(idx, 0); + } +} + static void armv8pmu_reset(void *info) { struct arm_pmu *cpu_pmu = (struct arm_pmu *)info; @@ -1069,6 +1110,9 @@ static void armv8pmu_reset(void *info)
bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS);
+ if (cpu_pmu->partitioned) + mask &= ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn); + /* The counter and interrupt enable registers are unknown at reset. */ armv8pmu_disable_counter(mask); armv8pmu_disable_intens(mask); @@ -1076,11 +1120,20 @@ static void armv8pmu_reset(void *info) /* Clear the counters we flip at guest entry/exit */ kvm_clr_pmu_events(mask);
+ + pmcr = ARMV8_PMU_PMCR_LC; + /* - * Initialize & Reset PMNC. Request overflow interrupt for - * 64 bit cycle counter but cheat in armv8pmu_write_counter(). + * Initialize & Reset PMNC. Request overflow interrupt for 64 + * bit cycle counter but cheat in armv8pmu_write_counter(). + * + * When partitioned, there is no single bit to reset only the + * host counters. so reset them individually. */ - pmcr = ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_LC; + if (cpu_pmu->partitioned) + armv8pmu_reset_host_counters(cpu_pmu); + else + pmcr = ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C;
/* Enable long event counter support where available */ if (armv8pmu_has_long_event(cpu_pmu)) diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index c2448477c37f..3a5eac11e54d 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -240,6 +240,14 @@ #define ARMV8_PMU_OVSR_F ARMV8_PMU_CNT_MASK_F /* Mask for writable bits is both P and C fields */ #define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_CNT_MASK_ALL + +/* Masks for guest and host counter partitions */ +#define ARMV8_PMU_HPMN_CNT_MASK(N) GENMASK((N) - 1, 0) +#define ARMV8_PMU_GUEST_CNT_PART(N) (ARMV8_PMU_HPMN_CNT_MASK(N) | \ + ARMV8_PMU_CNT_MASK_C | \ + ARMV8_PMU_CNT_MASK_F) +#define ARMV8_PMU_HOST_CNT_PART(N) (ARMV8_PMU_CNT_MASK_ALL & \ + ~ARMV8_PMU_GUEST_CNT_PART(N)) /* * PMXEVTYPER: Event selection reg */
On 13/02/2025 6:03 pm, Colton Lewis wrote:
If the PMU is partitioned, keep the driver out of the guest counter partition and only use the host counter partition. Partitioning is defined by the MDCR_EL2.HPMN register field and saved in cpu_pmu->hpmn. The range 0..HPMN-1 is accessible by EL1 and EL0 while HPMN..PMCR.N is reserved for EL2.
Define some macros that take HPMN as an argument and construct mutually exclusive bitmaps for testing which partition a particular counter is in. Note that despite their different position in the bitmap, the cycle and instruction counters are always in the guest partition.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm/include/asm/arm_pmuv3.h | 2 + arch/arm64/include/asm/kvm_pmu.h | 5 +++ arch/arm64/kvm/pmu-part.c | 16 +++++++ drivers/perf/arm_pmuv3.c | 73 +++++++++++++++++++++++++++----- include/linux/perf/arm_pmuv3.h | 8 ++++ 5 files changed, 94 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index 2ec0e5e83fc9..dadd4ddf51af 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -227,6 +227,8 @@ static inline bool kvm_set_pmuserenr(u64 val) } static inline void kvm_vcpu_pmu_resync_el0(void) {} +static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {} /* PMU Version in DFR Register */ #define ARMV8_PMU_DFR_VER_NI 0 diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 174b7f376d95..8f25754fde47 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -25,6 +25,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu); u8 kvm_pmu_get_reserved_counters(void); u8 kvm_pmu_hpmn(u8 nr_counters); void kvm_pmu_partition(struct arm_pmu *pmu); +void kvm_pmu_host_counters_enable(void); +void kvm_pmu_host_counters_disable(void); #else @@ -37,6 +39,9 @@ static inline bool kvm_set_pmuserenr(u64 val) static inline void kvm_vcpu_pmu_resync_el0(void) {} static inline void kvm_host_pmu_init(struct arm_pmu *pmu) {} +static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {}
- #endif
#endif diff --git a/arch/arm64/kvm/pmu-part.c b/arch/arm64/kvm/pmu-part.c index e74fecc67e37..51da65c678f9 100644 --- a/arch/arm64/kvm/pmu-part.c +++ b/arch/arm64/kvm/pmu-part.c @@ -45,3 +45,19 @@ void kvm_pmu_partition(struct arm_pmu *pmu) pmu->partitioned = false; } }
+void kvm_pmu_host_counters_enable(void) +{
- u64 mdcr = read_sysreg(mdcr_el2);
- mdcr |= MDCR_EL2_HPME;
- write_sysreg(mdcr, mdcr_el2);
+}
+void kvm_pmu_host_counters_disable(void) +{
- u64 mdcr = read_sysreg(mdcr_el2);
- mdcr &= ~MDCR_EL2_HPME;
- write_sysreg(mdcr, mdcr_el2);
+} diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 0e360feb3432..442dcff56d5b 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -730,15 +730,19 @@ static void armv8pmu_disable_event_irq(struct perf_event *event) armv8pmu_disable_intens(BIT(event->hw.idx)); } -static u64 armv8pmu_getreset_flags(void) +static u64 armv8pmu_getreset_flags(struct arm_pmu *cpu_pmu) { u64 value; /* Read */ value = read_pmovsclr();
- if (cpu_pmu->partitioned)
value &= ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn);
- else
value &= ARMV8_PMU_OVERFLOWED_MASK;
- /* Write to clear flags */
- value &= ARMV8_PMU_OVERFLOWED_MASK; write_pmovsclr(value);
return value; @@ -765,6 +769,18 @@ static void armv8pmu_disable_user_access(void) update_pmuserenr(0); } +static bool armv8pmu_is_guest_part(struct arm_pmu *cpu_pmu, u8 idx) +{
- return cpu_pmu->partitioned &&
(BIT(idx) & ARMV8_PMU_GUEST_CNT_PART(cpu_pmu->hpmn));
+}
+static bool armv8pmu_is_host_part(struct arm_pmu *cpu_pmu, u8 idx) +{
- return !cpu_pmu->partitioned ||
(BIT(idx) & ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn));
+}
- static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) { int i;
@@ -773,6 +789,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) if (is_pmuv3p9(cpu_pmu->pmuver)) { u64 mask = 0; for_each_set_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) {
if (armv8pmu_is_guest_part(cpu_pmu, i))
continue;
Hi Colton,
Is it possible to keep the guest bits out of used_mask and cntr_mask in the first place? Then all these loops don't need to have the logic for is_guest_part()/is_host_part().
That leads me to wonder about updating the printout:
hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 (0,8000003f) counters available
It might be a bit confusing if that doesn't quite reflect reality anymore.
Thanks James
James Clark james.clark@linaro.org writes:
On 13/02/2025 6:03 pm, Colton Lewis wrote:
If the PMU is partitioned, keep the driver out of the guest counter partition and only use the host counter partition. Partitioning is defined by the MDCR_EL2.HPMN register field and saved in cpu_pmu->hpmn. The range 0..HPMN-1 is accessible by EL1 and EL0 while HPMN..PMCR.N is reserved for EL2.
Define some macros that take HPMN as an argument and construct mutually exclusive bitmaps for testing which partition a particular counter is in. Note that despite their different position in the bitmap, the cycle and instruction counters are always in the guest partition.
Signed-off-by: Colton Lewis coltonlewis@google.com
arch/arm/include/asm/arm_pmuv3.h | 2 + arch/arm64/include/asm/kvm_pmu.h | 5 +++ arch/arm64/kvm/pmu-part.c | 16 +++++++ drivers/perf/arm_pmuv3.c | 73 +++++++++++++++++++++++++++----- include/linux/perf/arm_pmuv3.h | 8 ++++ 5 files changed, 94 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index 2ec0e5e83fc9..dadd4ddf51af 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -227,6 +227,8 @@ static inline bool kvm_set_pmuserenr(u64 val) }
static inline void kvm_vcpu_pmu_resync_el0(void) {} +static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {}
/* PMU Version in DFR Register */ #define ARMV8_PMU_DFR_VER_NI 0 diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 174b7f376d95..8f25754fde47 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -25,6 +25,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu); u8 kvm_pmu_get_reserved_counters(void); u8 kvm_pmu_hpmn(u8 nr_counters); void kvm_pmu_partition(struct arm_pmu *pmu); +void kvm_pmu_host_counters_enable(void); +void kvm_pmu_host_counters_disable(void);
#else
@@ -37,6 +39,9 @@ static inline bool kvm_set_pmuserenr(u64 val) static inline void kvm_vcpu_pmu_resync_el0(void) {} static inline void kvm_host_pmu_init(struct arm_pmu *pmu) {}
+static inline void kvm_pmu_host_counters_enable(void) {} +static inline void kvm_pmu_host_counters_disable(void) {}
- #endif
#endif diff --git a/arch/arm64/kvm/pmu-part.c b/arch/arm64/kvm/pmu-part.c index e74fecc67e37..51da65c678f9 100644 --- a/arch/arm64/kvm/pmu-part.c +++ b/arch/arm64/kvm/pmu-part.c @@ -45,3 +45,19 @@ void kvm_pmu_partition(struct arm_pmu *pmu) pmu->partitioned = false; } }
+void kvm_pmu_host_counters_enable(void) +{
- u64 mdcr = read_sysreg(mdcr_el2);
- mdcr |= MDCR_EL2_HPME;
- write_sysreg(mdcr, mdcr_el2);
+}
+void kvm_pmu_host_counters_disable(void) +{
- u64 mdcr = read_sysreg(mdcr_el2);
- mdcr &= ~MDCR_EL2_HPME;
- write_sysreg(mdcr, mdcr_el2);
+} diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 0e360feb3432..442dcff56d5b 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -730,15 +730,19 @@ static void armv8pmu_disable_event_irq(struct perf_event *event) armv8pmu_disable_intens(BIT(event->hw.idx)); }
-static u64 armv8pmu_getreset_flags(void) +static u64 armv8pmu_getreset_flags(struct arm_pmu *cpu_pmu) { u64 value;
/* Read */ value = read_pmovsclr();
- if (cpu_pmu->partitioned)
value &= ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn);
- else
value &= ARMV8_PMU_OVERFLOWED_MASK;
- /* Write to clear flags */
- value &= ARMV8_PMU_OVERFLOWED_MASK; write_pmovsclr(value);
return value;
@@ -765,6 +769,18 @@ static void armv8pmu_disable_user_access(void) update_pmuserenr(0); }
+static bool armv8pmu_is_guest_part(struct arm_pmu *cpu_pmu, u8 idx) +{
- return cpu_pmu->partitioned &&
(BIT(idx) & ARMV8_PMU_GUEST_CNT_PART(cpu_pmu->hpmn));
+}
+static bool armv8pmu_is_host_part(struct arm_pmu *cpu_pmu, u8 idx) +{
- return !cpu_pmu->partitioned ||
(BIT(idx) & ARMV8_PMU_HOST_CNT_PART(cpu_pmu->hpmn));
+}
- static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) { int i;
@@ -773,6 +789,8 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) if (is_pmuv3p9(cpu_pmu->pmuver)) { u64 mask = 0; for_each_set_bit(i, cpuc->used_mask, ARMPMU_MAX_HWEVENTS) {
if (armv8pmu_is_guest_part(cpu_pmu, i))
continue;
Hi Colton,
Is it possible to keep the guest bits out of used_mask and cntr_mask in the first place? Then all these loops don't need to have the logic for is_guest_part()/is_host_part().
It should be possible.
That leads me to wonder about updating the printout:
hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 (0,8000003f) counters available
It might be a bit confusing if that doesn't quite reflect reality anymore.
Good point.
It's possible the host has that many counters, but HPMN restricts us from using them.
Signed-off-by: Colton Lewis coltonlewis@google.com --- tools/testing/selftests/kvm/arm64/vpmu_counter_access.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c index f16b3b27e32e..b5bc18b7528d 100644 --- a/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c +++ b/tools/testing/selftests/kvm/arm64/vpmu_counter_access.c @@ -609,7 +609,7 @@ static void run_pmregs_validity_test(uint64_t pmcr_n) */ static void run_error_test(uint64_t pmcr_n) { - pr_debug("Error test with pmcr_n %lu (larger than the host)\n", pmcr_n); + pr_debug("Error test with pmcr_n %lu (larger than the host allows)\n", pmcr_n);
test_create_vpmu_vm_with_pmcr_n(pmcr_n, true); destroy_vpmu_vm();
linux-kselftest-mirror@lists.linaro.org