Hi Thomas,
We (ChromeOS) have run into an issue which we believe is related to the following errata on 11th Gen Intel Core CPUs:
"TGL034 A SYSENTER FOLLOWING AN XSAVE OR A VZEROALL MAY LEAD TO UNEXPECTED SYSTEM BEHAVIOR" [1]
Essentially we notice that the value returned by a RDPKRU instruction will flip after some amount of time when running on kernels earlier than 5.14. I have a simple repro that can be used [2].
After a little digging it appears a lot of work was done to refactor that code and I bisected to the following commit which fixes the issue:
commit 954436989cc550dd91aab98363240c9c0a4b7e23 Author: Thomas Gleixner tglx@linutronix.de Date: Wed Jun 23 14:02:21 2021 +0200
x86/fpu: Remove PKRU handling from switch_fpu_finish()
I backported this patch to 5.4 and it does appear to fix the issue because it avoids XSAVE. However, I have no idea if it's actually fixing anything or if the behavior is working as intended. So we're curious, does it make sense to pull back that patch, would that patch be enough? Any guidance here would be appreciated because this does seem broken (because of how it was previously implemented) for those CPUs prior to 5.14, which is why I'm CCing stable@.
Thanks in advance, Brian
1. https://cdrdv2.intel.com/v1/dl/getContent/631123?explicitVersion=true 2. https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
... adding LKML and x86@
On 11/8/21 9:37 AM, Brian Geffon wrote:
We (ChromeOS) have run into an issue which we believe is related to the following errata on 11th Gen Intel Core CPUs:
"TGL034 A SYSENTER FOLLOWING AN XSAVE OR A VZEROALL MAY LEAD TO UNEXPECTED SYSTEM BEHAVIOR" [1]
I'm struggling to figure out what that has to do with PKRU, though. I don't think that erratum is related at all to the issue you're seeing.
Essentially we notice that the value returned by a RDPKRU instruction will flip after some amount of time when running on kernels earlier than 5.14. I have a simple repro that can be used [2].
What does it flip to, btw? Can you dump the whole register state?
After a little digging it appears a lot of work was done to refactor that code and I bisected to the following commit which fixes the issue:
commit 954436989cc550dd91aab98363240c9c0a4b7e23 Author: Thomas Gleixner tglx@linutronix.de Date: Wed Jun 23 14:02:21 2021 +0200
x86/fpu: Remove PKRU handling from switch_fpu_finish()
I backported this patch to 5.4 and it does appear to fix the issue because it avoids XSAVE. However, I have no idea if it's actually fixing anything or if the behavior is working as intended. So we're curious, does it make sense to pull back that patch, would that patch be enough? Any guidance here would be appreciated because this does seem broken (because of how it was previously implemented) for those CPUs prior to 5.14, which is why I'm CCing stable@.
I suspect what you're seeing is that the:
- __write_pkru(pkru_val);
in that commit was somehow writing a bad value which was read out of the XSAVE buffer. That commit stops reading PKRU out of the XSAVE buffer, which probably has bad state. Just backporting this patch won't do you any good. You'll need to also backport the stuff that stops using the XSAVE buffer for PKRU in the first place.
The code doesn't bite you until the task context switches. It probably has to switch to some pkey-using task and then back to your test app. I'd randomly guess that your test app is getting a "leaked" PKRU from another app. It's _probably_ not a stale PKRU value (like from reading a PKRU!=0 value from the XSAVE buffer when XSTATE_BV[PKRU]=0) because your test app should have PKRU=0 set at all times.
Is KVM active on your test system?
One more thing... Does the protection_keys kernel selftest hit any errors on this same setup? It does a lot of PKRU sanity checking and I'm a bit surprised it hasn't caught something yet.
One more thing... Does the protection_keys kernel selftest hit any errors on this same setup? It does a lot of PKRU sanity checking and I'm a bit surprised it hasn't caught something yet.
Hi Dave,
This issue does reproduce with the self tests too, my simple test program also fails consistently [1], all it does is spin executing RDPKRU waiting for a context switch to clobber the value.
$ ./test unexpected value on iteration 3772082 value:0x55555554 expected:0x55555550
========================== self tests:
$ ./protection_keys_64 has pkeys: 1 startup pkey_reg: 0000000055555550 WARNING: not run as root, can not do hugetlb test test 0 PASSED (iteration 1) test 1 PASSED (iteration 1) test 2 PASSED (iteration 1) test 3 PASSED (iteration 1) test 4 PASSED (iteration 1) test 5 PASSED (iteration 1) protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed. Aborted (core dumped)
$ uname -a Linux localhost 5.13.0-17189-g62fb9874f5da #12 SMP PREEMPT Tue Nov 9 01:29:44 UTC 2021 x86_64 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz GenuineIntel GNU/Linux
Let me know if I can provide anything else, I'm happy to help troubleshoot this.
Thanks, Brian
1. https://gist.github.com/bgaff/e4b5457ab1cf5126fea6327666c63441
On 11/8/21 5:47 PM, Brian Geffon wrote:
One more thing... Does the protection_keys kernel selftest hit any errors on this same setup? It does a lot of PKRU sanity checking and I'm a bit surprised it hasn't caught something yet.
Hi Dave,
This issue does reproduce with the self tests too, my simple test program also fails consistently [1], all it does is spin executing RDPKRU waiting for a context switch to clobber the value.
$ ./test unexpected value on iteration 3772082 value:0x55555554 expected:0x55555550
Well, gosh, it's making it back to the software init value. If you do:
echo 0x15555554 > /sys/kernel/debug/x86/init_pkru
do you end up with 0x15555554 as the value?
The other thing you can try is to turn on all the /sys/kernel/debug/tracing/events/x86_fpu tracepoints, pin your test program to one CPU, then dump the trace buffer out for that CPU. That probably won't tell us too much for PKRU since it's generally not ever in its init state. But, another test would be to use XRSTOR to *get* it into its init state then see how long it stays there.
Another thing you could do is figure out if pkeys _ever_ worked on that hardware. If so, a (long) bisect could figure out what broke it between its introduction and the 5.13 kernel that you've been testing. A random 5.11-based distro kernel that I have running on a Cascade Lake Xeon doesn't seem to have any issues.
Does your system have any more XSAVE support than mine?
kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers' kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 kernel: [ 0.000000] x86/fpu: xstate_offset[5]: 832, xstate_sizes[5]: 64 kernel: [ 0.000000] x86/fpu: xstate_offset[6]: 896, xstate_sizes[6]: 512 kernel: [ 0.000000] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024 kernel: [ 0.000000] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]: 8 kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
Hi Dave,
On Tue, Nov 9, 2021 at 1:49 AM Dave Hansen dave.hansen@intel.com wrote:
Well, gosh, it's making it back to the software init value. If you do:
echo 0x15555554 > /sys/kernel/debug/x86/init_pkru
do you end up with 0x15555554 as the value?
What's interesting is that writing to init_pkru fails with -EINVAL for me, and I've traced it down to get_xsave_addr() returning NULL on the following check:
/* * This assumes the last 'xsave*' instruction to * have requested that 'xfeature_nr' be saved. * If it did not, we might be seeing and old value * of the field in the buffer. * * This can happen because the last 'xsave' did not * request that this feature be saved (unlikely) * or because the "init optimization" caused it * to not be saved. */ if (!(xsave->header.xfeatures & BIT_ULL(xfeature_nr))) return NULL;
And that's why I thought this was possibly related to that erratum that I shared before.
Does your system have any more XSAVE support than mine?
kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers' kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 kernel: [ 0.000000] x86/fpu: xstate_offset[5]: 832, xstate_sizes[5]: 64 kernel: [ 0.000000] x86/fpu: xstate_offset[6]: 896, xstate_sizes[6]: 512 kernel: [ 0.000000] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024 kernel: [ 0.000000] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]: 8 kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
No, it's pretty much identical
INFO kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' INFO kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' INFO kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' INFO kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' INFO kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' INFO kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' INFO kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers' INFO kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 INFO kernel: [ 0.000000] x86/fpu: xstate_offset[5]: 832, xstate_sizes[5]: 64 INFO kernel: [ 0.000000] x86/fpu: xstate_offset[6]: 896, xstate_sizes[6]: 512 INFO kernel: [ 0.000000] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024 INFO kernel: [ 0.000000] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]: 8 INFO kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
On Tue, Nov 9, 2021 at 8:43 AM Brian Geffon bgeffon@google.com wrote:
What's interesting is that writing to init_pkru fails with -EINVAL for me, and I've traced it down to get_xsave_addr() returning NULL on the following check:
/*
- This assumes the last 'xsave*' instruction to
- have requested that 'xfeature_nr' be saved.
- If it did not, we might be seeing and old value
- of the field in the buffer.
- This can happen because the last 'xsave' did not
- request that this feature be saved (unlikely)
- or because the "init optimization" caused it
- to not be saved.
*/ if (!(xsave->header.xfeatures & BIT_ULL(xfeature_nr))) return NULL;
Sorry, I should have probably also shared the value of xfeatures at this point is 0x3: which appears to be: (X86_FEATURE_FPU | X86_FEATURE_XMM)
Brian
On Tue, Nov 9, 2021, at 5:43 AM, Brian Geffon wrote:
Hi Dave,
On Tue, Nov 9, 2021 at 1:49 AM Dave Hansen dave.hansen@intel.com wrote:
Well, gosh, it's making it back to the software init value. If you do:
echo 0x15555554 > /sys/kernel/debug/x86/init_pkru
do you end up with 0x15555554 as the value?
What's interesting is that writing to init_pkru fails with -EINVAL for me, and I've traced it down to get_xsave_addr() returning NULL on the following check:
/*
- This assumes the last 'xsave*' instruction to
- have requested that 'xfeature_nr' be saved.
- If it did not, we might be seeing and old value
- of the field in the buffer.
- This can happen because the last 'xsave' did not
- request that this feature be saved (unlikely)
- or because the "init optimization" caused it
- to not be saved.
*/ if (!(xsave->header.xfeatures & BIT_ULL(xfeature_nr))) return NULL;
Here's an excerpt from an old email that I, perhaps unwisely, sent to Dave but not to a public list:
static inline void write_pkru(u32 pkru) { struct pkru_state *pk;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return;
pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
/* * The PKRU value in xstate needs to be in sync with the value that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ fpregs_lock(); if (pk) pk->pkru = pkru;
^^^ else we just write to the PKRU register but leave XINUSE[PKRU] clear on return to usermode? That seems... unwise.
__write_pkru(pkru); fpregs_unlock(); }
I bet you're hitting exactly this bug. The fix ended up being a whole series of patches, but the gist of it is that the write_pkru() slow path needs to set the xfeature bit in the xsave buffer and then do the write. It should be possible to make a little patch to do just this in a couple lines of code.
Hi Andy,
On Tue, Nov 9, 2021 at 9:58 AM Andy Lutomirski luto@kernel.org wrote:
Here's an excerpt from an old email that I, perhaps unwisely, sent to Dave but not to a public list:
static inline void write_pkru(u32 pkru) { struct pkru_state *pk;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return; pk = get_xsave_addr(¤t->thread.fpu.state.xsave,
XFEATURE_PKRU);
/* * The PKRU value in xstate needs to be in sync with the value
that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ fpregs_lock(); if (pk) pk->pkru = pkru;
^^^ else we just write to the PKRU register but leave XINUSE[PKRU] clear on return to usermode? That seems... unwise.
__write_pkru(pkru); fpregs_unlock();
}
I bet you're hitting exactly this bug. The fix ended up being a whole series of patches, but the gist of it is that the write_pkru() slow path needs to set the xfeature bit in the xsave buffer and then do the write. It should be possible to make a little patch to do just this in a couple lines of code.
I think you've got the right idea, the following patch does seem to fix the problem on this CPU, this is based on 5.13. It seems the changes to asm/pgtable.h were not enough, I also had to modify fpu/internal.h to get it working properly.
From e5e184d68ac6ca93c3cd2cc88d61af3260d1c014 Mon Sep 17 00:00:00 2001
From: Brian Geffon bgeffon@google.com Date: Tue, 9 Nov 2021 17:08:25 +0000 Subject: [PATCH] x86/fpu: Set XFEATURE_PKRU when writing to pkru
On kernels prior to 5.14 the write_pkru path could end up writing to the pkru register without updating the corresponding state.
Signed-off-by: Brian Geffon bgeffon@google.com --- arch/x86/include/asm/fpu/internal.h | 22 ++++++++++------------ arch/x86/include/asm/pgtable.h | 5 +++-- 2 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 16bf4d4a8159..ed2ce7d1afeb 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -564,18 +564,16 @@ static inline void switch_fpu_finish(struct fpu *new_fpu) * PKRU state is switched eagerly because it needs to be valid before we * return to userland e.g. for a copy_to_user() operation. */ - if (!(current->flags & PF_KTHREAD)) { - /* - * If the PKRU bit in xsave.header.xfeatures is not set, - * then the PKRU component was in init state, which means - * XRSTOR will set PKRU to 0. If the bit is not set then - * get_xsave_addr() will return NULL because the PKRU value - * in memory is not valid. This means pkru_val has to be - * set to 0 and not to init_pkru_value. - */ - pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); - pkru_val = pk ? pk->pkru : 0; - } + /* + * If the PKRU bit in xsave.header.xfeatures is not set, + * then the PKRU component was in init state, which means + * XRSTOR will set PKRU to 0. If the bit is not set then + * get_xsave_addr() will return NULL because the PKRU value + * in memory is not valid. This means pkru_val has to be + * set to 0 and not to init_pkru_value. + */ + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); + pkru_val = pk ? pk->pkru : 0; __write_pkru(pkru_val); }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b1099f2d9800..d00fc2df4cfe 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -137,18 +137,19 @@ static inline u32 read_pkru(void) static inline void write_pkru(u32 pkru) { struct pkru_state *pk; + struct fpu *fpu = ¤t->thread.fpu;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return;
- pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU); - /* * The PKRU value in xstate needs to be in sync with the value that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ + fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU; fpregs_lock(); + pk = get_xsave_addr(&fpu->state.xsave, XFEATURE_PKRU); if (pk) pk->pkru = pkru; __write_pkru(pkru);
On Tue, Nov 9, 2021 at 1:58 PM Brian Geffon bgeffon@google.com wrote:
Hi Andy,
On Tue, Nov 9, 2021 at 9:58 AM Andy Lutomirski luto@kernel.org wrote:
Here's an excerpt from an old email that I, perhaps unwisely, sent to Dave but not to a public list:
static inline void write_pkru(u32 pkru) { struct pkru_state *pk;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return; pk = get_xsave_addr(¤t->thread.fpu.state.xsave,
XFEATURE_PKRU);
/* * The PKRU value in xstate needs to be in sync with the value
that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ fpregs_lock(); if (pk) pk->pkru = pkru;
^^^ else we just write to the PKRU register but leave XINUSE[PKRU] clear on return to usermode? That seems... unwise.
__write_pkru(pkru); fpregs_unlock();
}
I bet you're hitting exactly this bug. The fix ended up being a whole series of patches, but the gist of it is that the write_pkru() slow path needs to set the xfeature bit in the xsave buffer and then do the write. It should be possible to make a little patch to do just this in a couple lines of code.
I think you've got the right idea, the following patch does seem to fix the problem on this CPU, this is based on 5.13. It seems the changes to asm/pgtable.h were not enough, I also had to modify fpu/internal.h to get it working properly.
Actually, it seems that only the changes to fpu/internal.h seem necessary. I guess the switch_fpu_finish explains how it's reverting to the initial value.
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 16bf4d4a8159..ed2ce7d1afeb 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -564,18 +564,16 @@ static inline void switch_fpu_finish(struct fpu *new_fpu) * PKRU state is switched eagerly because it needs to be valid before we * return to userland e.g. for a copy_to_user() operation. */ - if (!(current->flags & PF_KTHREAD)) { - /* - * If the PKRU bit in xsave.header.xfeatures is not set, - * then the PKRU component was in init state, which means - * XRSTOR will set PKRU to 0. If the bit is not set then - * get_xsave_addr() will return NULL because the PKRU value - * in memory is not valid. This means pkru_val has to be - * set to 0 and not to init_pkru_value. - */ - pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); - pkru_val = pk ? pk->pkru : 0; - } + /* + * If the PKRU bit in xsave.header.xfeatures is not set, + * then the PKRU component was in init state, which means + * XRSTOR will set PKRU to 0. If the bit is not set then + * get_xsave_addr() will return NULL because the PKRU value + * in memory is not valid. This means pkru_val has to be + * set to 0 and not to init_pkru_value. + */ + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); + pkru_val = pk ? pk->pkru : 0; __write_pkru(pkru_val); }
On 11/9/21 10:58 AM, Brian Geffon wrote:
Hi Andy,
On Tue, Nov 9, 2021 at 9:58 AM Andy Lutomirski luto@kernel.org wrote:
Here's an excerpt from an old email that I, perhaps unwisely, sent to Dave but not to a public list:
static inline void write_pkru(u32 pkru) { struct pkru_state *pk;
if (!boot_cpu_has(X86_FEATURE_OSPKE)) return; pk = get_xsave_addr(¤t->thread.fpu.state.xsave,
XFEATURE_PKRU);
/* * The PKRU value in xstate needs to be in sync with the value
that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */ fpregs_lock(); if (pk) pk->pkru = pkru;
^^^ else we just write to the PKRU register but leave XINUSE[PKRU] clear on return to usermode? That seems... unwise.
__write_pkru(pkru); fpregs_unlock();
}
I bet you're hitting exactly this bug. The fix ended up being a whole series of patches, but the gist of it is that the write_pkru() slow path needs to set the xfeature bit in the xsave buffer and then do the write. It should be possible to make a little patch to do just this in a couple lines of code.
I think you've got the right idea, the following patch does seem to fix the problem on this CPU, this is based on 5.13. It seems the changes to asm/pgtable.h were not enough, I also had to modify fpu/internal.h to get it working properly.
From e5e184d68ac6ca93c3cd2cc88d61af3260d1c014 Mon Sep 17 00:00:00 2001
From: Brian Geffon bgeffon@google.com Date: Tue, 9 Nov 2021 17:08:25 +0000 Subject: [PATCH] x86/fpu: Set XFEATURE_PKRU when writing to pkru
On kernels prior to 5.14 the write_pkru path could end up writing to the pkru register without updating the corresponding state.
Signed-off-by: Brian Geffon bgeffon@google.com
arch/x86/include/asm/fpu/internal.h | 22 ++++++++++------------ arch/x86/include/asm/pgtable.h | 5 +++-- 2 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 16bf4d4a8159..ed2ce7d1afeb 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -564,18 +564,16 @@ static inline void switch_fpu_finish(struct fpu *new_fpu) * PKRU state is switched eagerly because it needs to be valid before we * return to userland e.g. for a copy_to_user() operation. */
if (!(current->flags & PF_KTHREAD)) {
/*
* If the PKRU bit in xsave.header.xfeatures is not set,
* then the PKRU component was in init state, which means
* XRSTOR will set PKRU to 0. If the bit is not set then
* get_xsave_addr() will return NULL because the PKRU value
* in memory is not valid. This means pkru_val has to be
* set to 0 and not to init_pkru_value.
*/
pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
pkru_val = pk ? pk->pkru : 0;
}
/*
* If the PKRU bit in xsave.header.xfeatures is not set,
* then the PKRU component was in init state, which means
* XRSTOR will set PKRU to 0. If the bit is not set then
* get_xsave_addr() will return NULL because the PKRU value
* in memory is not valid. This means pkru_val has to be
* set to 0 and not to init_pkru_value.
*/
pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
pkru_val = pk ? pk->pkru : 0; __write_pkru(pkru_val);
}
This hunk doesn't make any sense to me. new_fpu should be for 'current', and if 'current' is a PF_KTHREAD, it shouldn't be using PKRU.
Why does this hunk matter?
index b1099f2d9800..d00fc2df4cfe 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -137,18 +137,19 @@ static inline u32 read_pkru(void) static inline void write_pkru(u32 pkru) { struct pkru_state *pk;
struct fpu *fpu = ¤t->thread.fpu; if (!boot_cpu_has(X86_FEATURE_OSPKE)) return;
pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
/* * The PKRU value in xstate needs to be in sync with the value that is * written to the CPU. The FPU restore on return to userland would * otherwise load the previous value again. */
fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
This needs to be inside fpregs_lock(). This task can get preempted at any time and the xfeatures bit is not stable.
fpregs_lock();
pk = get_xsave_addr(&fpu->state.xsave, XFEATURE_PKRU); if (pk) pk->pkru = pkru; __write_pkru(pkru);
I still don't think this quite matches up with the symptoms that you are seeing. This fix would ensure that we don't forget to update the XSAVE buffer when it is in the init state. But, if we did that, we would see PKRU *going* to the init state: all 0's. What you were seeing instead was it going from 0x55555550 to 0x55555554, not 0x0.
I don't doubt that this makes the symptoms away, I just don't think this really explains what's going on thoroughly enough.
linux-stable-mirror@lists.linaro.org