Hi Andy,
On Tue, Nov 9, 2021 at 9:58 AM Andy Lutomirski luto@kernel.org wrote:
Here's an excerpt from an old email that I, perhaps unwisely, sent to Dave but not to a public list:
static inline void write_pkru(u32 pkru) { struct pkru_state *pk;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return; pk = get_xsave_addr(¤t->thread.fpu.state.xsave,
XFEATURE_PKRU);
/* * The PKRU value in xstate needs to be in sync with the value
that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ fpregs_lock(); if (pk) pk->pkru = pkru;
^^^ else we just write to the PKRU register but leave XINUSE[PKRU] clear on return to usermode? That seems... unwise.
__write_pkru(pkru); fpregs_unlock();
}
I bet you're hitting exactly this bug. The fix ended up being a whole series of patches, but the gist of it is that the write_pkru() slow path needs to set the xfeature bit in the xsave buffer and then do the write. It should be possible to make a little patch to do just this in a couple lines of code.
I think you've got the right idea, the following patch does seem to fix the problem on this CPU, this is based on 5.13. It seems the changes to asm/pgtable.h were not enough, I also had to modify fpu/internal.h to get it working properly.
From e5e184d68ac6ca93c3cd2cc88d61af3260d1c014 Mon Sep 17 00:00:00 2001
From: Brian Geffon bgeffon@google.com Date: Tue, 9 Nov 2021 17:08:25 +0000 Subject: [PATCH] x86/fpu: Set XFEATURE_PKRU when writing to pkru
On kernels prior to 5.14 the write_pkru path could end up writing to the pkru register without updating the corresponding state.
Signed-off-by: Brian Geffon bgeffon@google.com --- arch/x86/include/asm/fpu/internal.h | 22 ++++++++++------------ arch/x86/include/asm/pgtable.h | 5 +++-- 2 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 16bf4d4a8159..ed2ce7d1afeb 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -564,18 +564,16 @@ static inline void switch_fpu_finish(struct fpu *new_fpu) * PKRU state is switched eagerly because it needs to be valid before we * return to userland e.g. for a copy_to_user() operation. */ - if (!(current->flags & PF_KTHREAD)) { - /* - * If the PKRU bit in xsave.header.xfeatures is not set, - * then the PKRU component was in init state, which means - * XRSTOR will set PKRU to 0. If the bit is not set then - * get_xsave_addr() will return NULL because the PKRU value - * in memory is not valid. This means pkru_val has to be - * set to 0 and not to init_pkru_value. - */ - pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); - pkru_val = pk ? pk->pkru : 0; - } + /* + * If the PKRU bit in xsave.header.xfeatures is not set, + * then the PKRU component was in init state, which means + * XRSTOR will set PKRU to 0. If the bit is not set then + * get_xsave_addr() will return NULL because the PKRU value + * in memory is not valid. This means pkru_val has to be + * set to 0 and not to init_pkru_value. + */ + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); + pkru_val = pk ? pk->pkru : 0; __write_pkru(pkru_val); }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b1099f2d9800..d00fc2df4cfe 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -137,18 +137,19 @@ static inline u32 read_pkru(void) static inline void write_pkru(u32 pkru) { struct pkru_state *pk; + struct fpu *fpu = ¤t->thread.fpu;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return;
- pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU); - /* * The PKRU value in xstate needs to be in sync with the value that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ + fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU; fpregs_lock(); + pk = get_xsave_addr(&fpu->state.xsave, XFEATURE_PKRU); if (pk) pk->pkru = pkru; __write_pkru(pkru);