Hi Thomas,
We (ChromeOS) have run into an issue which we believe is related to the following errata on 11th Gen Intel Core CPUs:
"TGL034 A SYSENTER FOLLOWING AN XSAVE OR A VZEROALL MAY LEAD TO UNEXPECTED SYSTEM BEHAVIOR" [1]
Essentially we notice that the value returned by a RDPKRU instruction will flip after some amount of time when running on kernels earlier than 5.14. I have a simple repro that can be used [2].
After a little digging it appears a lot of work was done to refactor that code and I bisected to the following commit which fixes the issue:
commit 954436989cc550dd91aab98363240c9c0a4b7e23 Author: Thomas Gleixner tglx@linutronix.de Date: Wed Jun 23 14:02:21 2021 +0200
x86/fpu: Remove PKRU handling from switch_fpu_finish()
I backported this patch to 5.4 and it does appear to fix the issue because it avoids XSAVE. However, I have no idea if it's actually fixing anything or if the behavior is working as intended. So we're curious, does it make sense to pull back that patch, would that patch be enough? Any guidance here would be appreciated because this does seem broken (because of how it was previously implemented) for those CPUs prior to 5.14, which is why I'm CCing stable@.
Thanks in advance, Brian
1. https://cdrdv2.intel.com/v1/dl/getContent/631123?explicitVersion=true 2. https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f