On Tue, Feb 15, 2022 at 07:36:44AM -0800, Brian Geffon wrote:
There are two issues with PKRU handling prior to 5.13. The first is that when eagerly switching PKRU we check that current is not a kernel thread as kernel threads will never use PKRU. It's possible that this_cpu_read_stable() on current_task (ie. get_current()) is returning an old cached value. By forcing the read with this_cpu_read() the correct task is used. Without this it's possible when switching from a kernel thread to a userspace thread that we'll still observe the PF_KTHREAD flag and never restore the PKRU. And as a result this issue only occurs when switching from a kernel thread to a userspace thread, switching from a non kernel thread works perfectly fine because all we consider in that situation is the flags from some other non kernel task and the next fpu is passed in to switch_fpu_finish().
Without reloading the value finish_fpu_load() after being inlined into __switch_to() uses a stale value of current:
ba1: 8b 35 00 00 00 00 mov 0x0(%rip),%esi ba7: f0 41 80 4d 01 40 lock orb $0x40,0x1(%r13) bad: e9 00 00 00 00 jmp bb2 <__switch_to+0x1eb> bb2: 41 f6 45 3e 20 testb $0x20,0x3e(%r13) bb7: 75 1c jne bd5 <__switch_to+0x20e>
By using this_cpu_read() and avoiding the cached value the compiler does insert an additional load instruction and observes the correct value now:
ba1: 8b 35 00 00 00 00 mov 0x0(%rip),%esi ba7: f0 41 80 4d 01 40 lock orb $0x40,0x1(%r13) bad: e9 00 00 00 00 jmp bb2 <__switch_to+0x1eb> bb2: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax bb9: 00 bba: f6 40 3e 20 testb $0x20,0x3e(%rax) bbe: 75 1c jne bdc <__switch_to+0x215>
The second issue is when using write_pkru() we only write to the xstate when the feature bit is set because get_xsave_addr() returns NULL when the feature bit is not set. This is problematic as the CPU is free to clear the feature bit when it observes the xstate in the init state, this behavior seems to be documented a few places throughout the kernel. If the bit was cleared then in write_pkru() we would happily write to PKRU without ever updating the xstate, and the FPU restore on return to userspace would load the old value agian.
Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state") Signed-off-by: Brian Geffon bgeffon@google.com Signed-off-by: Willis Kung williskung@google.com Tested-by: Willis Kung williskung@google.com
arch/x86/include/asm/fpu/internal.h | 2 +- arch/x86/include/asm/pgtable.h | 14 ++++++++++---- 2 files changed, 11 insertions(+), 5 deletions(-)
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>