On Tue, Oct 22, 2024, Adrian Hunter wrote:
On 22/10/24 19:30, Sean Christopherson wrote:
LOL, yeah, this needs to be burned with fire. It's wildly broken. So for stable@,
It doesn't seem wildly broken. Just the VMM passing invalid CPUID and KVM not validating it.
Heh, I agree with "just", but unfortunately "just ... not validating" a large swath of userspace inputs is pretty widly broken. More importantly, it's not easy to fix. E.g. KVM could require the inputs to exactly match hardware, but that creates an ABI that I'm not entirely sure is desirable in the long term.
Although the CPUID ABI does not really change. KVM does not support emulating Intel PT, so accepting CPUID that the hardware cannot support seems like a bit of a lie.
But it's not all or nothing, e.g. KVM should support exposing fewer address ranges than are supported by hardware, so that the same virtual CPU model can be run on different generations of hardware.
Aren't there other features that KVM does not support if the hardware support is not there?
Many. But either features are one-off things without configurable properties, or KVM does the right thing (usually). E.g. nested virtualization heavily relies on hardware, and has a plethora of knobs, but KVM (usually) honors and validates the configuration provided by userspace.
To some degree, a testing and debugging feature does not have to be available in 100% of cases because it can still be useful when it is available.
I don't disagree, but "works on my machine" is how KVM has gotten into so many messes with such features. I also don't necessarily disagree with supporting a very limited subset of use cases, but I want such support to come as well-defined package with proper guard rails, docs, and ideally tests.
I'll post a patch to hide the module param if CONFIG_BROKEN=n (and will omit stable@ for the previous patch).
Going forward, if someone actually cares about virtualizing PT enough to want to fix KVM's mess, then they can put in the effort to fix all the bugs, write all the tests, and in general clean up the implementation to meet KVM's current standards. E.g. KVM usage of intel_pt_validate_cap() instead of KVM's guest CPUID and capabilities infrastructure needs to go.
The problem below seems to be caused by not validating against the *host* CPUID. KVM's CPUID information seems to be invalid.
Yes.
My vote is to queue the current code for removal, and revisit support after the mediated PMU has landed. Because I don't see any point in supporting Intel PT without a mediated PMU, as host/guest mode really only makes sense if the entire PMU is being handed over to the guest.
Why?
To simplify the implementation, and because I don't see how virtualizing Intel PT without also enabling the mediated PMU makes any sense.
Conceptually, KVM's PT implementation is very, very similar to the mediated PMU. They both effectively give the guest control of hardware when the vCPU starts running, and take back control when the vCPU stops running.
If KVM allows Intel PT without the mediated PMU, then KVM and perf have to support two separate implementations for the same model. If virtualizing Intel PT is allowed if and only if the mediated PMU is enabled, then .handle_intel_pt_intr() goes away. And on the flip side, it becomes super obvious that host usage of Intel PT needs to be mutually exclusive with the mediated PMU.
And forgo being able to trace mediated passthough with Intel PT ;-)
It can't work, generally. Anything that generates a ToPA PMI will go sideways. In the worst case scenario, the spurious PMI could crash the guest.
And when the mediated PMU supports PEBS, that would likely break too.