On Thu, Aug 18, 2022 at 2:19 PM Sean Christopherson seanjc@google.com wrote:
On Thu, Aug 18, 2022, Kyle Huey wrote:
On Thu, Aug 18, 2022 at 3:57 AM Thomas Gleixner tglx@linutronix.de wrote:
On Mon, Aug 08 2022 at 07:15, Kyle Huey wrote:
When management of the PKRU register was moved away from XSTATE, emulation of PKRU's existence in XSTATE was added for APIs that read XSTATE, but not for APIs that write XSTATE. This can be seen by running gdb and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected kernels (5.14+) the write to the PKRU register (which gdb performs through ptrace) is ignored.
There are three relevant APIs: PTRACE_SETREGSET with NT_X86_XSTATE, sigreturn, and KVM_SET_XSAVE. KVM_SET_XSAVE has its own special handling to make PKRU writes take effect (in fpu_copy_uabi_to_guest_fpstate). Push that down into copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE and sigreturn pass in pointers to the appropriate PKRU value.
This also adds code to initialize the PKRU value to the hardware init value (namely 0) if the PKRU bit is not set in the XSTATE header to match XRSTOR. This is a change to the current KVM_SET_XSAVE behavior.
You are stating a fact here, but provide 0 justification why this is correct.
Well, the justification is that this *is* the behavior we want for ptrace/sigreturn, and it's very likely the existing KVM_SET_XSAVE behavior in this edge case is an oversight rather than intentional, and in the absence of confirmation that KVM wants the existing behavior (the KVM mailing list and maintainer are CCd) one correct code path is better than one correct code path and one buggy code path.
Sorry, I missed the KVM-relevant flags.
Hrm, the current behavior has been KVM ABI for a very long time.
It's definitely odd because all other components will be initialized due to their bits being cleared in the header during kvm_load_guest_fpu(), and it probably wouldn't cause problems in practice as most VMMs likely do "all or nothing" loads. But, in theory, userspace could save/restore a subset of guest XSTATE and rely on the kernel not overwriting guest PKRU when its bit is cleared in the header.
This seems extremely conservative, but ok. As you note, PKRU is the only XSTATE component you could theoretically do this subset save/restore with in the KVM ABI since all the others really do have their hardware behavior.
All that said, I don't see any reason to force KVM to change at this time, it's trivial enough to handle KVM's oddities while providing sane behavior for others. Nullify the pointer in the guest path and then update copy_uabi_to_xstate() to play nice with a NULL pointer, e.g.
/* * Nullify @vpkru to preserve its current value if PKRU's bit isn't set * in the header. KVM's odd ABI is to leave PKRU untouched in this * case (all other components are eventually re-initialized). */ if (!(kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU)) vpkru = NULL;
You meant ustate->... here (since this is before the copy now), but yes, ok, I will do that.
return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
- Kyle