On 5/15/2025 8:19 AM, Sean Christopherson wrote:
The shortlog is wildly inaccurate. KVM is not simply checking, KVM is actively disabling RDPMC interception. *That* needs to be the focus of the shortlog and changelog.
Sure.
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 92c742ead663..6ad71752be4b 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -604,6 +604,40 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data) return 0; } +inline bool kvm_rdpmc_in_guest(struct kvm_vcpu *vcpu)
Strongly prefer kvm_need_rdpmc_intercept(), e.g. to follow vmx_need_pf_intercept(), and because it makes the users more obviously correct. The "in_guest" terminology from kvm_{hlt,mwait,pause,cstate}_in_guest() isn't great, but at least in those flows it's not awful because they are very direct reflections of knobs that control interception, whereas this helper is making a variety of runtime checks.
Sure.
+{
- struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- if (!kvm_mediated_pmu_enabled(vcpu))
return false;
- /*
* VMware allows access to these Pseduo-PMCs even when read via RDPMC
* in Ring3 when CR4.PCE=0.
*/
- if (enable_vmware_backdoor)
return false;
- /*
* FIXME: In theory, perf metrics is always combined with fixed
* counter 3. it's fair enough to compare the guest and host
* fixed counter number and don't need to check perf metrics
* explicitly. However kvm_pmu_cap.num_counters_fixed is limited
* KVM_MAX_NR_FIXED_COUNTERS (3) as fixed counter 3 is not
* supported now. perf metrics is still needed to be checked
* explicitly here. Once fixed counter 3 is supported, the perf
* metrics checking can be removed.
*/
And then what happens when hardware supported fixed counter #4? KVM has the same problem, and we can't check for features that KVM doesn't know about.
The entire problem is that this code is checking for *KVM* support, but what the guest can see and access needs to be checked against *hardware* support. Handling that is simple, just take a snapshot of the host PMU capabilities before KVM generates kvm_pmu_cap, and use the unadulterated snapshot here (and everywhere else with similar checks).
Yes. That's correct. Whether disabling intercept should check against HW instead of KVM PMU capability since host perf subsystem may hide some PMU features.
- return pmu->nr_arch_gp_counters == kvm_pmu_cap.num_counters_gp &&
pmu->nr_arch_fixed_counters == kvm_pmu_cap.num_counters_fixed &&
vcpu_has_perf_metrics(vcpu) == kvm_host_has_perf_metrics() &&
pmu->counter_bitmask[KVM_PMC_GP] ==
(BIT_ULL(kvm_pmu_cap.bit_width_gp) - 1) &&
pmu->counter_bitmask[KVM_PMC_FIXED] ==
(BIT_ULL(kvm_pmu_cap.bit_width_fixed) - 1);
+} @@ -212,6 +212,18 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); } +static void amd_pmu_refresh(struct kvm_vcpu *vcpu) +{
- struct vcpu_svm *svm = to_svm(vcpu);
- __amd_pmu_refresh(vcpu);
To better communicate the roles of the two paths to refresh():
amd_pmu_refresh_capabilities(vcpu);
amd_pmu_refresh_controls(vcpu);
Ditto for Intel.
Sure.