On Wed, Mar 26, 2025 at 05:38:34PM +0000, James Clark wrote:
On 25/03/2025 6:32 pm, Colton Lewis wrote:
I don't know if this is a stupid idea, but instead of having a fixed number for the partition, wouldn't it be nice if we could trap and increment HPMN on the first guest use of a counter, then decrement it on guest exit depending on what's still in use? The host would always assign its counters from the top down, and guests go bottom up if they want PMU passthrough. Maybe it's too complicated or won't work for various reasons, but because of BRBE the counter partitioning changes go from an optimization to almost a necessity.
This is a cool idea that would enable useful things. I can think of a few potential problems.
- Partitioning will give guests direct access to some PMU counter
registers. There is no reliable way for KVM to determine what is in use from that state. A counter that is disabled guest at exit might only be so temporarily, which could lead to a lot of thrashing allocating and deallocating counters.
KVM must always have a reliable way to determine if the PMU is in use. If there's any counter in the vPMU for which kvm_pmu_counter_is_enabled() is true would do the trick...
Generally speaking, I would like to see the guest/host context switch in KVM modeled in a way similar to the debug registers, where the vPMU registers are loaded onto hardware lazily if either:
1) The above definition of an in-use PMU is satisfied
2) The guest accessed a PMU register since the last vcpu_load()
- HPMN affects reads of PMCR_EL0.N, which is the standard way to
determine how many counters there are. If HPMN starts as a low number, guests have no way of knowing there are more counters available. Dynamically changing the counters available could be confusing for guests.
Yes I was expecting that PMCR would have to be trapped and N reported to be the number of physical counters rather than how many are in the guest partition.
I'm not sure this is aligned with the spirit of the feature.
Colton's aim is to minimize the overheads of trapping the PMU *and* relying on the perf subsystem for event scheduling. To do dynamic partitioning as you've described, KVM would need to unconditionally trap the PMU registers so it can pack the guest counters into the guest partition. We cannot assume the VM will allocate counters sequentially.
Dynamic counter allocation can be had with the existing PMU implementation. The partitioned PMU is an alternative userspace can select, not a replacement for what we already have.
Thanks, Oliver