Today PAT can't be used without MTRR being available, unless MTRR is at least configured via CONFIG_MTRR and the system is running as Xen PV guest. In this case PAT is automatically available via the hypervisor, but the PAT MSR can't be modified by the kernel and MTRR is disabled.
The same applies to a kernel built with no MTRR support: it won't allow to use the PAT MSR, even if there is no technical reason for that, other than setting up PAT on all cpus the same way (which is a requirement of the processor's cache management) is relying on some MTRR specific code.
Fix all of that by:
- moving the function needed by PAT from MTRR specific code one level up - reworking the init sequences of MTRR and PAT to be more similar to each other without calling PAT from MTRR code - removing the dependency of PAT on MTRR
While working on that I discovered two minor bugs in MTRR code, which are fixed, too.
Changes in V2: - complete rework of the patches based on comments by Boris Petkov - added several patches to the series
Juergen Gross (10): x86/mtrr: fix MTRR fixup on APs x86/mtrr: remove unused cyrix_set_all() function x86/mtrr: replace use_intel() with a local flag x86: move some code out of arch/x86/kernel/cpu/mtrr x86/mtrr: split generic_set_all() x86/mtrr: remove set_all callback from struct mtrr_ops x86/mtrr: simplify mtrr_bp_init() x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init x86/mtrr: add a stop_machine() handler calling only cache_cpu_init() x86: decouple pat and mtrr handling
arch/x86/include/asm/cacheinfo.h | 14 +++ arch/x86/include/asm/memtype.h | 5 +- arch/x86/include/asm/mtrr.h | 12 +-- arch/x86/kernel/cpu/cacheinfo.c | 159 +++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 3 +- arch/x86/kernel/cpu/mtrr/cyrix.c | 34 ------ arch/x86/kernel/cpu/mtrr/generic.c | 120 ++++------------------ arch/x86/kernel/cpu/mtrr/mtrr.c | 158 +++++----------------------- arch/x86/kernel/cpu/mtrr/mtrr.h | 5 - arch/x86/kernel/setup.c | 14 +-- arch/x86/kernel/smpboot.c | 9 +- arch/x86/mm/pat/memtype.c | 127 +++++++---------------- arch/x86/power/cpu.c | 3 +- 13 files changed, 274 insertions(+), 389 deletions(-)
When booting or resuming the system MTRR state is saved on the boot processor and then this state is loaded into MTRRs of all other cpus. During update of the MTRRs the MTRR mechanism needs to be disabled by writing the related MSR. The old contents of this MSR are saved in a set of static variables and later those static variables are used to restore the MSR.
In case the MSR contents need to be modified on a cpu due to the MSR not having been initialized properly by the BIOS, the related update function is modifying the static variables accordingly.
Unfortunately the MTRR state update is usually running on all cpus at the same time, so using just one set of static variables for all cpus is racy in case the MSR contents differ across cpus.
Fix that by using percpu variables for saving the MSR contents.
Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com --- I thought adding a "Fixes:" tag for the kernel's initial git commit would maybe be entertaining, but without being really helpful. The percpu variables were preferred over on-stack ones in order to avoid more code churn in followup patches decoupling PAT from MTRR support. V2: - new patch --- arch/x86/kernel/cpu/mtrr/generic.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index 558108296f3c..3d185fcf08ca 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -679,7 +679,8 @@ static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr) return changed; }
-static u32 deftype_lo, deftype_hi; +static DEFINE_PER_CPU(u32, deftype_lo); +static DEFINE_PER_CPU(u32, deftype_hi);
/** * set_mtrr_state - Set the MTRR state for this CPU. @@ -691,6 +692,7 @@ static unsigned long set_mtrr_state(void) { unsigned long change_mask = 0; unsigned int i; + u32 *lo = this_cpu_ptr(&deftype_lo);
for (i = 0; i < num_var_ranges; i++) { if (set_mtrr_var_ranges(i, &mtrr_state.var_ranges[i])) @@ -704,10 +706,10 @@ static unsigned long set_mtrr_state(void) * Set_mtrr_restore restores the old value of MTRRdefType, * so to set it we fiddle with the saved value: */ - if ((deftype_lo & 0xff) != mtrr_state.def_type - || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) { + if ((*lo & 0xff) != mtrr_state.def_type + || ((*lo & 0xc00) >> 10) != mtrr_state.enabled) {
- deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type | + *lo = (*lo & ~0xcff) | mtrr_state.def_type | (mtrr_state.enabled << 10); change_mask |= MTRR_CHANGE_MASK_DEFTYPE; } @@ -729,6 +731,8 @@ static DEFINE_RAW_SPINLOCK(set_atomicity_lock); static void prepare_set(void) __acquires(set_atomicity_lock) { unsigned long cr0; + u32 *lo = this_cpu_ptr(&deftype_lo); + u32 *hi = this_cpu_ptr(&deftype_hi);
/* * Note that this is not ideal @@ -763,10 +767,10 @@ static void prepare_set(void) __acquires(set_atomicity_lock) flush_tlb_local();
/* Save MTRR state */ - rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi); + rdmsr(MSR_MTRRdefType, *lo, *hi);
/* Disable MTRRs, and set the default type to uncached */ - mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi); + mtrr_wrmsr(MSR_MTRRdefType, *lo & ~0xcff, *hi);
/* Again, only flush caches if we have to. */ if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) @@ -775,12 +779,15 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
static void post_set(void) __releases(set_atomicity_lock) { + u32 *lo = this_cpu_ptr(&deftype_lo); + u32 *hi = this_cpu_ptr(&deftype_hi); + /* Flush TLBs (no need to flush caches - they are disabled) */ count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ALL); flush_tlb_local();
/* Intel (P6) standard MTRRs */ - mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi); + mtrr_wrmsr(MSR_MTRRdefType, *lo, *hi);
/* Enable caches */ write_cr0(read_cr0() & ~X86_CR0_CD);
On Sat, Aug 20, 2022 at 11:25:24AM +0200, Juergen Gross wrote:
When booting or resuming the system MTRR state is saved on the boot processor and then this state is loaded into MTRRs of all other cpus. During update of the MTRRs the MTRR mechanism needs to be disabled by writing the related MSR. The old contents of this MSR are saved in a set of static variables and later those static variables are used to restore the MSR.
In case the MSR contents need to be modified on a cpu due to the MSR not having been initialized properly by the BIOS, the related update function is modifying the static variables accordingly.
Unfortunately the MTRR state update is usually running on all cpus at the same time, so using just one set of static variables for all cpus is racy in case the MSR contents differ across cpus.
Fix that by using percpu variables for saving the MSR contents.
Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com
I thought adding a "Fixes:" tag for the kernel's initial git commit would maybe be entertaining, but without being really helpful.
So that means I will just do a "best guess" as to how far to backport things. Hopefully I guess well...
thanks,
greg k-h
On Sat, Aug 20, 2022 at 11:25:24AM +0200, Juergen Gross wrote:
When booting or resuming the system MTRR state is saved on the boot processor and then this state is loaded into MTRRs of all other cpus.
s/cpu/CPU/g
Pls check all commit messages.
During update of the MTRRs the MTRR mechanism needs to be disabled by writing the related MSR. The old contents of this MSR are saved in a set of static variables and later those static variables are used to restore the MSR.
In case the MSR contents need to be modified on a cpu due to the MSR not having been initialized properly by the BIOS, the related update function is modifying the static variables accordingly.
Unfortunately the MTRR state update is usually running on all cpus at the same time, so using just one set of static variables for all cpus is racy in case the MSR contents differ across cpus.
Fix that by using percpu variables for saving the MSR contents.
Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com
I thought adding a "Fixes:" tag for the kernel's initial git commit would maybe be entertaining, but without being really helpful. The percpu variables were preferred over on-stack ones in order to avoid more code churn in followup patches decoupling PAT from MTRR support.
So if that thing has been broken for so long and no one noticed, we could just as well not backport to stable at all...
V2:
- new patch
arch/x86/kernel/cpu/mtrr/generic.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index 558108296f3c..3d185fcf08ca 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -679,7 +679,8 @@ static bool set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr) return changed; } -static u32 deftype_lo, deftype_hi; +static DEFINE_PER_CPU(u32, deftype_lo); +static DEFINE_PER_CPU(u32, deftype_hi);
My APM says that the high 32 bits of the MTRRdefType MSR are reserved and MBZ. So you can drop the _hi thing and use 0 everywhere. Or rather a dummy = 0 var because of that damn rdmsr() macro.
Or simply use a
u64 deftype;
use the rdmsrl/wrmsrl() variants and split it when passing to umtrr_wrmsr() because that thing wants 2 32s.
/**
- set_mtrr_state - Set the MTRR state for this CPU.
@@ -691,6 +692,7 @@ static unsigned long set_mtrr_state(void) { unsigned long change_mask = 0; unsigned int i;
- u32 *lo = this_cpu_ptr(&deftype_lo);
The tip-tree preferred ordering of variable declarations at the beginning of a function is reverse fir tree order::
struct long_struct_name *descriptive_name; unsigned long foo, bar; unsigned int tmp; int ret;
The above is faster to parse than the reverse ordering::
int ret; unsigned int tmp; unsigned long foo, bar; struct long_struct_name *descriptive_name;
And even more so than random ordering::
unsigned long foo, bar; int ret; struct long_struct_name *descriptive_name; unsigned int tmp;
Please check all your patches.
for (i = 0; i < num_var_ranges; i++) { if (set_mtrr_var_ranges(i, &mtrr_state.var_ranges[i])) @@ -704,10 +706,10 @@ static unsigned long set_mtrr_state(void) * Set_mtrr_restore restores the old value of MTRRdefType, * so to set it we fiddle with the saved value: */
- if ((deftype_lo & 0xff) != mtrr_state.def_type
|| ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
- if ((*lo & 0xff) != mtrr_state.def_type
|| ((*lo & 0xc00) >> 10) != mtrr_state.enabled) {
deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
change_mask |= MTRR_CHANGE_MASK_DEFTYPE; }*lo = (*lo & ~0xcff) | mtrr_state.def_type | (mtrr_state.enabled << 10);
@@ -729,6 +731,8 @@ static DEFINE_RAW_SPINLOCK(set_atomicity_lock); static void prepare_set(void) __acquires(set_atomicity_lock) { unsigned long cr0;
- u32 *lo = this_cpu_ptr(&deftype_lo);
- u32 *hi = this_cpu_ptr(&deftype_hi);
You don't need the pointers here - this_cpu_read() should be enough.
/* * Note that this is not ideal @@ -763,10 +767,10 @@ static void prepare_set(void) __acquires(set_atomicity_lock) flush_tlb_local(); /* Save MTRR state */
- rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
- rdmsr(MSR_MTRRdefType, *lo, *hi);
/* Disable MTRRs, and set the default type to uncached */
- mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
- mtrr_wrmsr(MSR_MTRRdefType, *lo & ~0xcff, *hi);
/* Again, only flush caches if we have to. */ if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) @@ -775,12 +779,15 @@ static void prepare_set(void) __acquires(set_atomicity_lock) static void post_set(void) __releases(set_atomicity_lock) {
- u32 *lo = this_cpu_ptr(&deftype_lo);
- u32 *hi = this_cpu_ptr(&deftype_hi);
Ditto.
Thx.
On Sun, Aug 21, 2022 at 02:25:59PM +0200, Borislav Petkov wrote:
Fix that by using percpu variables for saving the MSR contents.
Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com
I thought adding a "Fixes:" tag for the kernel's initial git commit would maybe be entertaining, but without being really helpful. The percpu variables were preferred over on-stack ones in order to avoid more code churn in followup patches decoupling PAT from MTRR support.
So if that thing has been broken for so long and no one noticed, we could just as well not backport to stable at all...
Yeah, you can't do that.
The whole day today I kept thinking that something's wrong with this here. As in, why hasn't it been reported until now.
You say above:
"... for all cpus is racy in case the MSR contents differ across cpus."
But they don't differ:
"7.7.5 MTRRs in Multi-Processing Environments
In multi-processing environments, the MTRRs located in all processors must characterize memory in the same way. Generally, this means that identical values are written to the MTRRs used by the processors. This also means that values CR0.CD and the PAT must be consistent across processors. Failure to do so may result in coherency violations or loss of atomicity. Processor implementations do not check the MTRR settings in other processors to ensure consistency. It is the responsibility of system software to initialize and maintain MTRR consistency across all processors."
And you can't have different fixed MTRR type on each CPU - that would lead to all kinds of nasty bugs.
And here's from a big fat box:
$ rdmsr -a 0x2ff | uniq -c 256 c00
All 256 CPUs have the def type set to the same thing.
Now, if all CPUs go write that same deftype_lo variable in the rendezvous handler, the only issue that could happen is if a read sees a partial write. BUT, AFAIK, x86 doesn't tear 32-bit writes so I *think* all CPUs see the same value being corrected by using mtrr_state previously saved on the BSP.
As I said, we should've seen this exploding left and right otherwise...
Thx.
On 21.08.22 23:41, Borislav Petkov wrote:
On Sun, Aug 21, 2022 at 02:25:59PM +0200, Borislav Petkov wrote:
Fix that by using percpu variables for saving the MSR contents.
Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com
I thought adding a "Fixes:" tag for the kernel's initial git commit would maybe be entertaining, but without being really helpful. The percpu variables were preferred over on-stack ones in order to avoid more code churn in followup patches decoupling PAT from MTRR support.
So if that thing has been broken for so long and no one noticed, we could just as well not backport to stable at all...
Yeah, you can't do that.
The whole day today I kept thinking that something's wrong with this here. As in, why hasn't it been reported until now.
You say above:
"... for all cpus is racy in case the MSR contents differ across cpus."
But they don't differ:
"7.7.5 MTRRs in Multi-Processing Environments
In multi-processing environments, the MTRRs located in all processors must characterize memory in the same way. Generally, this means that identical values are written to the MTRRs used by the processors. This also means that values CR0.CD and the PAT must be consistent across processors. Failure to do so may result in coherency violations or loss of atomicity. Processor implementations do not check the MTRR settings in other processors to ensure consistency. It is the responsibility of system software to initialize and maintain MTRR consistency across all processors."
And you can't have different fixed MTRR type on each CPU - that would lead to all kinds of nasty bugs.
And here's from a big fat box:
$ rdmsr -a 0x2ff | uniq -c 256 c00
All 256 CPUs have the def type set to the same thing.
Now, if all CPUs go write that same deftype_lo variable in the rendezvous handler, the only issue that could happen is if a read sees a partial write. BUT, AFAIK, x86 doesn't tear 32-bit writes so I *think* all CPUs see the same value being corrected by using mtrr_state previously saved on the BSP.
As I said, we should've seen this exploding left and right otherwise...
And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c which has a comment saying:
/* Some BIOS's are messed up and don't set all MTRRs the same! */
Yes, the chances are slim to hit such a box, but your reasoning suggests I should remove the related code?
Juergen
On Mon, Aug 22, 2022 at 07:17:40AM +0200, Juergen Gross wrote:
And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c which has a comment saying:
/* Some BIOS's are messed up and don't set all MTRRs the same! */
That thing also says:
pr_info("mtrr: probably your BIOS does not setup all CPUs.\n"); pr_info("mtrr: corrected configuration.\n");
because it'll go and force on all CPUs the MTRR state it read from the BSP in mtrr_bp_init->get_mtrr_state.
Yes, the chances are slim to hit such a box,
Well, my workstation says:
$ dmesg | grep -i mtrr [ 0.391514] mtrr: your CPUs had inconsistent variable MTRR settings [ 0.395199] mtrr: probably your BIOS does not setup all CPUs. [ 0.399199] mtrr: corrected configuration.
but that's the variable MTRRs.
but your reasoning suggests I should remove the related code?
My reasoning says you should not do anything at all here - works as advertized. :-)
On 22.08.22 10:28, Borislav Petkov wrote:
On Mon, Aug 22, 2022 at 07:17:40AM +0200, Juergen Gross wrote:
And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c which has a comment saying:
/* Some BIOS's are messed up and don't set all MTRRs the same! */
That thing also says:
pr_info("mtrr: probably your BIOS does not setup all CPUs.\n"); pr_info("mtrr: corrected configuration.\n");
because it'll go and force on all CPUs the MTRR state it read from the BSP in mtrr_bp_init->get_mtrr_state.
Yes, the chances are slim to hit such a box,
Well, my workstation says:
$ dmesg | grep -i mtrr [ 0.391514] mtrr: your CPUs had inconsistent variable MTRR settings [ 0.395199] mtrr: probably your BIOS does not setup all CPUs. [ 0.399199] mtrr: corrected configuration.
but that's the variable MTRRs.
but your reasoning suggests I should remove the related code?
My reasoning says you should not do anything at all here - works as advertized. :-)
And what about the:
pr_warn("mtrr: your CPUs had inconsistent MTRRdefType settings\n");
This is the case the patch would fix.
Juergen
linux-stable-mirror@lists.linaro.org