On Tue, Nov 9, 2021 at 1:58 PM Brian Geffon bgeffon@google.com wrote:
Hi Andy,
On Tue, Nov 9, 2021 at 9:58 AM Andy Lutomirski luto@kernel.org wrote:
Here's an excerpt from an old email that I, perhaps unwisely, sent to Dave but not to a public list:
static inline void write_pkru(u32 pkru) { struct pkru_state *pk;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return; pk = get_xsave_addr(¤t->thread.fpu.state.xsave,
XFEATURE_PKRU);
/* * The PKRU value in xstate needs to be in sync with the value
that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ fpregs_lock(); if (pk) pk->pkru = pkru;
^^^ else we just write to the PKRU register but leave XINUSE[PKRU] clear on return to usermode? That seems... unwise.
__write_pkru(pkru); fpregs_unlock();
}
I bet you're hitting exactly this bug. The fix ended up being a whole series of patches, but the gist of it is that the write_pkru() slow path needs to set the xfeature bit in the xsave buffer and then do the write. It should be possible to make a little patch to do just this in a couple lines of code.
I think you've got the right idea, the following patch does seem to fix the problem on this CPU, this is based on 5.13. It seems the changes to asm/pgtable.h were not enough, I also had to modify fpu/internal.h to get it working properly.
Actually, it seems that only the changes to fpu/internal.h seem necessary. I guess the switch_fpu_finish explains how it's reverting to the initial value.
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 16bf4d4a8159..ed2ce7d1afeb 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -564,18 +564,16 @@ static inline void switch_fpu_finish(struct fpu *new_fpu) * PKRU state is switched eagerly because it needs to be valid before we * return to userland e.g. for a copy_to_user() operation. */ - if (!(current->flags & PF_KTHREAD)) { - /* - * If the PKRU bit in xsave.header.xfeatures is not set, - * then the PKRU component was in init state, which means - * XRSTOR will set PKRU to 0. If the bit is not set then - * get_xsave_addr() will return NULL because the PKRU value - * in memory is not valid. This means pkru_val has to be - * set to 0 and not to init_pkru_value. - */ - pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); - pkru_val = pk ? pk->pkru : 0; - } + /* + * If the PKRU bit in xsave.header.xfeatures is not set, + * then the PKRU component was in init state, which means + * XRSTOR will set PKRU to 0. If the bit is not set then + * get_xsave_addr() will return NULL because the PKRU value + * in memory is not valid. This means pkru_val has to be + * set to 0 and not to init_pkru_value. + */ + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); + pkru_val = pk ? pk->pkru : 0; __write_pkru(pkru_val); }