New subject: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency

15 Feb 2022

There are two issues with PKRU handling prior to 5.13. The first is that
when eagerly switching PKRU we check that current is not a kernel
thread as kernel threads will never use PKRU. It's possible that
this_cpu_read_stable() on current_task (ie. get_current()) is returning
an old cached value. By forcing the read with this_cpu_read() the
correct task is used. Without this it's possible when switching from
a kernel thread to a userspace thread that we'll still observe the
PF_KTHREAD flag and never restore the PKRU. And as a result this
issue only occurs when switching from a kernel thread to a userspace
thread, switching from a non kernel thread works perfectly fine because
all we consider in that situation is the flags from some other non
kernel task and the next fpu is passed in to switch_fpu_finish().
Without reloading the value finish_fpu_load() after being inlined into
__switch_to() uses a stale value of current:
ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
  ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
  bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
  bb2:   41 f6 45 3e 20          testb  $0x20,0x3e(%r13)
  bb7:   75 1c                   jne    bd5 <__switch_to+0x20e>
By using this_cpu_read() and avoiding the cached value the compiler does
insert an additional load instruction and observes the correct value now:
ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
  ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
  bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
  bb2:   65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax
  bb9:   00
  bba:   f6 40 3e 20             testb  $0x20,0x3e(%rax)
  bbe:   75 1c                   jne    bdc <__switch_to+0x215>
The second issue is when using write_pkru() we only write to the
xstate when the feature bit is set because get_xsave_addr() returns
NULL when the feature bit is not set. This is problematic as the CPU
is free to clear the feature bit when it observes the xstate in the
init state, this behavior seems to be documented a few places throughout
the kernel. If the bit was cleared then in write_pkru() we would happily
write to PKRU without ever updating the xstate, and the FPU restore on
return to userspace would load the old value agian.
Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
Signed-off-by: Brian Geffon bgeffon@google.com
Signed-off-by: Willis Kung williskung@google.com
Tested-by: Willis Kung williskung@google.com
---
 arch/x86/include/asm/fpu/internal.h |  2 +-
 arch/x86/include/asm/pgtable.h      | 14 ++++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 03b3de491b5e..540bda5bdd28 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
     * PKRU state is switched eagerly because it needs to be valid before we
     * return to userland e.g. for a copy_to_user() operation.
     */
-	if (!(current->flags & PF_KTHREAD)) {
+	if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
    	/*
    	 * If the PKRU bit in xsave.header.xfeatures is not set,
    	 * then the PKRU component was in init state, which means
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9e71bf86d8d0..aa381b530de0 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
    if (!boot_cpu_has(X86_FEATURE_OSPKE))
    	return;
-	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
-
    /*
     * The PKRU value in xstate needs to be in sync with the value that is
     * written to the CPU. The FPU restore on return to userland would
     * otherwise load the previous value again.
     */
    fpregs_lock();
-	if (pk)
-		pk->pkru = pkru;
+	/*
+	 * The CPU is free to clear the feature bit when the xstate is in the
+	 * init state. For this reason, we need to make sure the feature bit is
+	 * reset when we're explicitly writing to pkru. If we did not then we
+	 * would write to pkru and it would not be saved on a context switch.
+	 */
+	current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
+	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
+	BUG_ON(!pk);
+	pk->pkru = pkru;
    __write_pkru(pkru);
    fpregs_unlock();
 }
-- 
2.35.1.265.g69c8d7142f-goog


    

[PATCH] x86/fpu: Correct pkru/xstate inconsistency

}

}

}