On Mon, Dec 29, 2025 at 11:45 PM Sean Christopherson seanjc@google.com wrote:
The fix works only because the userspace XFD[18] must be '0' since the kernel never re-disables XFD features after they are enabled.
Yes, this is why I considered XFD[18]=1 to be a bug.
Which is probably fine in practice since re-disabling a component for a guest task would need to force the guest FPU back into an init state as well, but I don't love the complexity.
This also creates a nasty, subtle asymmetry in KVM's ABI.
I find this second argument more convincing; I preferred the complexity of an extra field to track guest xfd, to having to patch xstate_bv.
(Initially I was worried also about mismatches between xstate_bv and xcomp_bv but those are find; When the compacted format is in use xcomp_bv[62:0] is simply EDX:EAX & XCR0).
Lastly, the fix is effectively papering over another bug, which I'm pretty sure is the underlying issue that was originally encountered.
Yes, I agree this is the most likely scenario. Whether it's papering over it or otherwise, it depends on what you consider the invariants to be.
So, given that KVM's effective ABI is to record XSTATE_BV[i]=0 if XFD[i]==1, I vote to fix this by emulating that behavior when stuffing XFD in fpu_update_guest_xfd(), and then manually closing the hole Paolo found in fpu_copy_uabi_to_guest_fpstate().
I disagree with changing the argument from const void* to void*. Let's instead treat it as a KVM backwards-compatibility quirk:
union fpregs_state *xstate = (union fpregs_state *)guest_xsave->region; xstate->xsave.header.xfeatures &= ~vcpu->arch.guest_fpu.fpstate->xfd;
It keeps the kernel/ API const as expected and if anything I'd consider adding a WARN to fpu_copy_uabi_to_guest_fpstate(), basically asserting that there would be no #NM on the subsequent restore.
@@ -319,10 +319,25 @@ EXPORT_SYMBOL_FOR_KVM(fpu_enable_guest_xfd_features); #ifdef CONFIG_X86_64 void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd) {
struct fpstate *fpstate = guest_fpu->fpstate;fpregs_lock();
guest_fpu->fpstate->xfd = xfd;if (guest_fpu->fpstate->in_use)xfd_update_state(guest_fpu->fpstate);
fpstate->xfd = xfd;if (fpstate->in_use)xfd_update_state(fpstate);/** If the guest's FPU state is NOT resident in hardware, clear disabled* components in XSTATE_BV as attempting to load disabled components* will generate #NM _in the host_, and KVM's ABI is that saving guest* XSAVE state should see XSTATE_BV[i]=0 if XFD[i]=1.** If the guest's FPU state is in hardware, simply do nothing as XSAVE* itself saves XSTATE_BV[i] as 0 if XFD[i]=1.
s/saves/(from fpu_swap_kvm_fpstate) will save/
*/if (xfd && test_thread_flag(TIF_NEED_FPU_LOAD))fpstate->regs.xsave.header.xfeatures &= ~xfd;
No objections to this part. I'll play with this to adjust the selftests either tomorrow or, more likely, on January 2nd, and send a v2 that also includes the change from preemption_disabled to irqs_disabled.
I take it that you don't have any qualms with the new fpu_load_guest_fpstate function, but let me know if you prefer to have it in a separate submission destined to 6.20 only.
Paolo