On Wed, Oct 04, 2023 at 04:11:52AM -0300, Leonardo Bras wrote:
So this patch is supposed to fix migration of VM from a host with pre-ad856280ddea (OLD) kernel to a host with ad856280ddea + your set(NEW). Right?
Let's get the scenario here, where all machines are the same: 1 - VM created on OLD kernel with a host-supported xfeature F, which is not guest supported. 2 - VM is migrated to a NEW kernel/host, and KVM_SET_XSAVE xfeature F. 3 - VM will be migrated to another host, qemu requests KVM_GET_XSAVE, which returns only guest-supported xfeatures, and this is passed to next host 4 - VM will be started on 3rd host with guest-supported xfeatures, meaning xfeature F is filtered-out, which is not good, because the VM will have less features compared to boot.
This is what I was (trying) to convey earlier...
See Sean's response here: https://lore.kernel.org/all/ZRMHY83W%2FVPjYyhy@google.com/
I'll copy the pertinent part of his very detailed response inline:
KVM *must* "trim" features when servicing KVM_GET_SAVE{2}, because that's been KVM's ABI for a very long time, and userspace absolutely relies on that functionality to ensure that a VM can be migrated within a pool of heterogenous systems so long as the features that are *exposed* to the guest are supported on all platforms.
My 2 cents: as an outsider with less familiarity of the KVM code, it is hard to understand the contract here with the guest/userspace. It seems there is a fundamental question of whether or not "superfluous" features, those being host-supported features which extend that which the guest is actually capable of, can be removed between the time that the guest boots and when it terminates, through however many live-migrations that may be.
Ultimately, this problem is not really fixable if said features cannot be removed.
Is there an RFC or document which captures expectations of this form?