Today PAT can't be used without MTRR being available, unless MTRR is at
least configured via CONFIG_MTRR and the system is running as Xen PV
guest. In this case PAT is automatically available via the hypervisor,
but the PAT MSR can't be modified by the kernel and MTRR is disabled.
As an additional complexity the availability of PAT can't be queried
via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
to be disabled. This leads to some drivers believing that not all cache
modes are available, resulting in failures or degraded functionality.
The same applies to a kernel built with no MTRR support: it won't
allow to use the PAT MSR, even if there is no technical reason for
that, other than setting up PAT on all cpus the same way (which is a
requirement of the processor's cache management) is relying on some
MTRR specific code.
Fix all of that by:
- moving the function needed by PAT from MTRR specific code one level
up
- adding a PAT indirection layer supporting the 3 cases "no or disabled
PAT", "PAT under kernel control", and "PAT under Xen control"
- removing the dependency of PAT on MTRR
Juergen Gross (3):
x86: move some code out of arch/x86/kernel/cpu/mtrr
x86: add wrapper functions for mtrr functions handling also pat
x86: decouple pat and mtrr handling
arch/x86/include/asm/memtype.h | 13 ++-
arch/x86/include/asm/mtrr.h | 27 ++++--
arch/x86/include/asm/processor.h | 10 +++
arch/x86/kernel/cpu/common.c | 123 +++++++++++++++++++++++++++-
arch/x86/kernel/cpu/mtrr/generic.c | 90 ++------------------
arch/x86/kernel/cpu/mtrr/mtrr.c | 58 ++++---------
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 -
arch/x86/kernel/setup.c | 12 +--
arch/x86/kernel/smpboot.c | 8 +-
arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++--------
arch/x86/power/cpu.c | 2 +-
arch/x86/xen/enlighten_pv.c | 4 +
12 files changed, 289 insertions(+), 186 deletions(-)
--
2.35.3
These operations are documented as always ordered in
include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer
type use cases where one side needs to ensure a flag is left pending
after some shared data was updated rely on this ordering, even in the
failure case.
This is the case with the workqueue code, which currently suffers from a
reproducible ordering violation on Apple M1 platforms (which are
notoriously out-of-order) that ends up causing the TTY layer to fail to
deliver data to userspace properly under the right conditions. This
change fixes that bug.
Change the documentation to restrict the "no order on failure" story to
the _lock() variant (for which it makes sense), and remove the
early-exit from the generic implementation, which is what causes the
missing barrier semantics in that case. Without this, the remaining
atomic op is fully ordered (including on ARM64 LSE, as of recent
versions of the architecture spec).
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs")
Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()")
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
Documentation/atomic_bitops.txt | 2 +-
include/asm-generic/bitops/atomic.h | 6 ------
2 files changed, 1 insertion(+), 7 deletions(-)
diff --git a/Documentation/atomic_bitops.txt b/Documentation/atomic_bitops.txt
index 093cdaefdb37..d8b101c97031 100644
--- a/Documentation/atomic_bitops.txt
+++ b/Documentation/atomic_bitops.txt
@@ -59,7 +59,7 @@ Like with atomic_t, the rule of thumb is:
- RMW operations that have a return value are fully ordered.
- RMW operations that are conditional are unordered on FAILURE,
- otherwise the above rules apply. In the case of test_and_{}_bit() operations,
+ otherwise the above rules apply. In the case of test_and_set_bit_lock(),
if the bit in memory is unchanged by the operation then it is deemed to have
failed.
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index 3096f086b5a3..71ab4ba9c25d 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -39,9 +39,6 @@ arch_test_and_set_bit(unsigned int nr, volatile unsigned long *p)
unsigned long mask = BIT_MASK(nr);
p += BIT_WORD(nr);
- if (READ_ONCE(*p) & mask)
- return 1;
-
old = arch_atomic_long_fetch_or(mask, (atomic_long_t *)p);
return !!(old & mask);
}
@@ -53,9 +50,6 @@ arch_test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
unsigned long mask = BIT_MASK(nr);
p += BIT_WORD(nr);
- if (!(READ_ONCE(*p) & mask))
- return 0;
-
old = arch_atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
return !!(old & mask);
}
--
2.35.1