On 1/25/23 23:09, Jim Mattson wrote:
The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a decade* have been passed through unmodified from the host. They have never made sense for KVM_SET_CPUID2, with the unlikely exception of a whole-host VM.
True, unfortunately people have not read the nonexistent documentation and they are:
1) rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;
2) never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for example in inconsistencies between 0xB and 0x1F.
*But* (2) should not be needed unless you care about maintaining homogeneous CPUID within a VM migration pool. For something like kvmtool, having to do (2) would be a workaround for the bug that this patch fixes.
Our VMM populates the topology of the guest CPUID table on its own, as any VMM must. However, it uses the host topology (which KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a decade*) to see if the requested guest topology is possible.
Ok, thanks; this is useful to know.
Changing a long-established ABI in a way that breaks userspace applications is a bad practice. I didn't think we, as a community, did that. I didn't realize that we were only catering to open source implementations here.
We aren't. But the open source implementations provide some guidance as to how the API is being used in the wild, and what the pitfalls are.
You wrote it yourself: any VMM must either populate the topology on its own, or possibly fill it with zeros. Returning a value that is extremely unlikely to be used is worse in pretty much every way (apart from not breaking your VMM, of course).
With a total of six known users (QEMU, crosvm, kvmtool, firecracker, rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and damned if it doesn't. There is a tension between fixing the one VMM that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly, and fixing 3-4 that were silently broken and are now fixed. I will probably send a patch to crosvm, though.
The VMM being _proprietary_ doesn't really matter, however it does matter to me that it is not _public_: it is only used within Google, and the breakage is neither hard to fix in the VMM nor hard to temporarily avoid by reverting the patch in the Google kernel.
Paolo