From: Guo Ren <guoren(a)linux.alibaba.com>
After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.
Here is the symptom of the bug:
unhandled signal 11 code 0x1 (coredump)
0x0000003fd6d22524 <+4>: auipc s0,0x70
0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490
=> 0x0000003fd6d2252c <+12>: ld a5,0(s0)
(gdb) i r s0
s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21
(gdb) x /2x 0x3fd6d92490
0x3fd6d92490: 0xd80ac8a8 0x0000003f
The core dump file shows that register s0 is wrong, but the value in
memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
in TLB and got a wrong result from an incorrect physical address.
When the task ran on CPU0, which loaded/speculative-loaded the value of
address(0x3fd6d92490), then the first version of the mapping entry was
PTWed into CPU0's TLB.
When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
asid), it happened to write a value on the address (0x3fd6d92490). It
caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
ptep_get_and_clear & flush_tlb_page.
The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
switch_mm. So we only flushed the CPU1 TLB and set the second version
mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
still used a stale TLB mapping entry which contained a wrong target
physical address. It raised a bug when the task happened to read that
value.
CPU0 CPU1
- switch 'task' in
- read addr (Fill stale mapping
entry into TLB)
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- write addr cause pagefault
do_page_fault() (change to
new addr mapping)
wp_page_copy()
ptep_clear_flush()
ptep_get_and_clear()
& flush_tlb_page()
write new value into addr
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- read addr again (Use stale
mapping entry in TLB)
get wrong value from old phyical
addr, BUG!
The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
which could guarantee to invalidate all stale TLB entries during TLB
flush.
Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
Signed-off-by: Guo Ren <guoren(a)linux.alibaba.com>
Signed-off-by: Guo Ren <guoren(a)kernel.org>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
Tested-by: Zong Li <zong.li(a)sifive.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich(a)syntacore.com>
Cc: Anup Patel <apatel(a)ventanamicro.com>
Cc: Palmer Dabbelt <palmer(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
---
arch/riscv/mm/context.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 7acbfbd14557..0f784e3d307b 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -205,12 +205,24 @@ static void set_mm_noasid(struct mm_struct *mm)
local_flush_tlb_all();
}
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+ struct mm_struct *next, unsigned int cpu)
{
- if (static_branch_unlikely(&use_asid_allocator))
- set_mm_asid(mm, cpu);
- else
- set_mm_noasid(mm);
+ /*
+ * The mm_cpumask indicates which harts' TLBs contain the virtual
+ * address mapping of the mm. Compared to noasid, using asid
+ * can't guarantee that stale TLB entries are invalidated because
+ * the asid mechanism wouldn't flush TLB for every switch_mm for
+ * performance. So when using asid, keep all CPUs footmarks in
+ * cpumask() until mm reset.
+ */
+ cpumask_set_cpu(cpu, mm_cpumask(next));
+ if (static_branch_unlikely(&use_asid_allocator)) {
+ set_mm_asid(next, cpu);
+ } else {
+ cpumask_clear_cpu(cpu, mm_cpumask(prev));
+ set_mm_noasid(next);
+ }
}
static int __init asids_init(void)
@@ -264,7 +276,8 @@ static int __init asids_init(void)
}
early_initcall(asids_init);
#else
-static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
+static inline void set_mm(struct mm_struct *prev,
+ struct mm_struct *next, unsigned int cpu)
{
/* Nothing to do here when there is no MMU */
}
@@ -317,10 +330,7 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
*/
cpu = smp_processor_id();
- cpumask_clear_cpu(cpu, mm_cpumask(prev));
- cpumask_set_cpu(cpu, mm_cpumask(next));
-
- set_mm(next, cpu);
+ set_mm(prev, next, cpu);
flush_icache_deferred(next, cpu);
}
--
2.39.2
Commit 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in panic_print")
introduced a setting for the "panic_print" kernel parameter to allow
users to request a NMI backtrace on panic. Problem is that the panic_print
handling happens after the secondary CPUs are already disabled, hence
this option ended-up being kind of a no-op - kernel skips the NMI trace
in idling CPUs, which is the case of offline CPUs.
Fix it by checking the NMI backtrace bit in the panic_print prior to
the CPU disabling function.
Fixes: 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in panic_print")
Cc: stable(a)vger.kernel.org
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
V4:
- Sent as standalone patch, rebased against v6.2-rc7.
V2 / V3:
- New patch, there was no V1 of this one.
Link for V3: https://lore.kernel.org/lkml/20220819221731.480795-12-gpiccoli@igalia.com/
Hi folks, thanks in advance for reviews/comments.
Notice that while at it, I got rid of the "crash_kexec_post_notifiers"
local copy in panic(). This was introduced by commit b26e27ddfd2a
("kexec: use core_param for crash_kexec_post_notifiers boot option"),
but it is not clear from comments or commit message why this local copy
is required.
My understanding is that it's a mechanism to prevent some concurrency,
in case some other CPU modify this variable while panic() is running.
I find it very unlikely, hence I removed it - but if people consider
this copy needed, I can respin this patch and keep it, even providing a
comment about that, in order to be explict about its need.
Let me know your thoughts!
Cheers,
Guilherme
kernel/panic.c | 47 +++++++++++++++++++++++++++--------------------
1 file changed, 27 insertions(+), 20 deletions(-)
diff --git a/kernel/panic.c b/kernel/panic.c
index 463c9295bc28..f45ee88be8a2 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -211,9 +211,6 @@ static void panic_print_sys_info(bool console_flush)
return;
}
- if (panic_print & PANIC_PRINT_ALL_CPU_BT)
- trigger_all_cpu_backtrace();
-
if (panic_print & PANIC_PRINT_TASK_INFO)
show_state();
@@ -243,6 +240,30 @@ void check_panic_on_warn(const char *origin)
origin, limit);
}
+/*
+ * Helper that triggers the NMI backtrace (if set in panic_print)
+ * and then performs the secondary CPUs shutdown - we cannot have
+ * the NMI backtrace after the CPUs are off!
+ */
+static void panic_other_cpus_shutdown(void)
+{
+ if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+ trigger_all_cpu_backtrace();
+
+ /*
+ * Note that smp_send_stop() is the usual SMP shutdown function,
+ * which unfortunately may not be hardened to work in a panic
+ * situation. If we want to do crash dump after notifier calls
+ * and kmsg_dump, we will need architecture dependent extra
+ * bits in addition to stopping other CPUs, hence we rely on
+ * crash_smp_send_stop() for that.
+ */
+ if (!crash_kexec_post_notifiers)
+ smp_send_stop();
+ else
+ crash_smp_send_stop();
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -258,7 +279,6 @@ void panic(const char *fmt, ...)
long i, i_next = 0, len;
int state = 0;
int old_cpu, this_cpu;
- bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
if (panic_on_warn) {
/*
@@ -333,23 +353,10 @@ void panic(const char *fmt, ...)
*
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
- if (!_crash_kexec_post_notifiers) {
+ if (!crash_kexec_post_notifiers)
__crash_kexec(NULL);
- /*
- * Note smp_send_stop is the usual smp shutdown function, which
- * unfortunately means it may not be hardened to work in a
- * panic situation.
- */
- smp_send_stop();
- } else {
- /*
- * If we want to do crash dump after notifier calls and
- * kmsg_dump, we will need architecture dependent extra
- * works in addition to stopping other CPUs.
- */
- crash_smp_send_stop();
- }
+ panic_other_cpus_shutdown();
/*
* Run any panic handlers, including those that might need to
@@ -370,7 +377,7 @@ void panic(const char *fmt, ...)
*
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
- if (_crash_kexec_post_notifiers)
+ if (crash_kexec_post_notifiers)
__crash_kexec(NULL);
console_unblank();
--
2.39.1
Commit 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in panic_print")
introduced a setting for the "panic_print" kernel parameter to allow
users to request a NMI backtrace on panic. Problem is that the panic_print
handling happens after the secondary CPUs are already disabled, hence
this option ended-up being kind of a no-op - kernel skips the NMI trace
in idling CPUs, which is the case of offline CPUs.
Fix it by checking the NMI backtrace bit in the panic_print prior to
the CPU disabling function.
Fixes: 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in panic_print")
Cc: stable(a)vger.kernel.org
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
V5:
- Kept the local version of "crash_kexec_post_notifiers", since
this is standalone fix that should be backported to stable. Hence,
it's not a good idea to mess with it in this patch (thanks Andrew!).
V4:
- Sent as standalone patch, rebased against v6.2-rc7.
- Link: https://lore.kernel.org/lkml/20230210203510.1734835-1-gpiccoli@igalia.com/
kernel/panic.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)
diff --git a/kernel/panic.c b/kernel/panic.c
index 463c9295bc28..e026191a0a07 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -211,9 +211,6 @@ static void panic_print_sys_info(bool console_flush)
return;
}
- if (panic_print & PANIC_PRINT_ALL_CPU_BT)
- trigger_all_cpu_backtrace();
-
if (panic_print & PANIC_PRINT_TASK_INFO)
show_state();
@@ -243,6 +240,30 @@ void check_panic_on_warn(const char *origin)
origin, limit);
}
+/*
+ * Helper that triggers the NMI backtrace (if set in panic_print)
+ * and then performs the secondary CPUs shutdown - we cannot have
+ * the NMI backtrace after the CPUs are off!
+ */
+static void panic_other_cpus_shutdown(bool crash_kexec)
+{
+ if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+ trigger_all_cpu_backtrace();
+
+ /*
+ * Note that smp_send_stop() is the usual SMP shutdown function,
+ * which unfortunately may not be hardened to work in a panic
+ * situation. If we want to do crash dump after notifier calls
+ * and kmsg_dump, we will need architecture dependent extra
+ * bits in addition to stopping other CPUs, hence we rely on
+ * crash_smp_send_stop() for that.
+ */
+ if (!crash_kexec)
+ smp_send_stop();
+ else
+ crash_smp_send_stop();
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -333,23 +354,10 @@ void panic(const char *fmt, ...)
*
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
- if (!_crash_kexec_post_notifiers) {
+ if (!_crash_kexec_post_notifiers)
__crash_kexec(NULL);
- /*
- * Note smp_send_stop is the usual smp shutdown function, which
- * unfortunately means it may not be hardened to work in a
- * panic situation.
- */
- smp_send_stop();
- } else {
- /*
- * If we want to do crash dump after notifier calls and
- * kmsg_dump, we will need architecture dependent extra
- * works in addition to stopping other CPUs.
- */
- crash_smp_send_stop();
- }
+ panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
/*
* Run any panic handlers, including those that might need to
--
2.39.1
I trust you are staying safe and well, I am Franklin C. James QC. from Glasgow, Scotland. I have an investment proposition for your consideration and more details will be revealed once your interest is indicated.
Yours in service,
Franklin C. James QC.
____________________
Secretary: Phillip Hernandez
--
This email has been checked for viruses by Avast antivirus software.
www.avast.com