Dear stable maintainers, I like to indicate the oops encountered and request the below patch to be backported to v 5.15. The fix is important to avoid recurring oops in context of rcu detected stalls.
subject: rcu: Avoid tracing a few functions executed in stop machine commit 48f8070f5dd8 Target kernel version v 5.15 Reason for Application: To avoid oops due to rcu_prempt detect stalls on cpus/tasks
Environment and oops context: Issue was observed in my environment on 5.15.193 kernel (arm platform). The patch is helpful to avoid the below oops indicated in [1] and [2]
log : root@ls1021atwr:~# uname -r 5.15.93-rt58+ge0f69a158d5b
oops dump stack
** ID_531 main/smp_fsm.c:1884 <inrcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <<< [1] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P116/2:b..l (detected by 1, t=2102 jiffies, g=12741, q=1154) task:irq/31-arm-irq1 state:D stack: 0 pid: 116 ppid: 2 flags:0x00000000 [<8064b97f>] (__schedule) from [<8064bb01>] (schedule+0x8d/0xc2) [<8064bb01>] (schedule) from [<8064fa65>] (schedule_timeout+0x6d/0xa0) [<8064fa65>] (schedule_timeout) from [<804ba353>] (fsl_ifc_run_command+0x6f/0x178) [<804ba353>] (fsl_ifc_run_command) from [<804ba72f>] (fsl_ifc_cmdfunc+0x203/0x2b8) [<804ba72f>] (fsl_ifc_cmdfunc) from [<804b135f>] (nand_status_op+0xaf/0xe0) [<804b135f>] (nand_status_op) from [<804b13b3>] (nand_check_wp+0x23/0x48) .... < snipped >
Exception stack(0x822bbfb0 to 0x822bbff8) bfa0: 00000000 00000000 00000000 00000000 bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 rcu: rcu_preempt kthread timer wakeup didn't happen for 764 jiffies! g12741 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x1000 rcu: Possible timer handling issue on cpu=0 timer-softirq=1095 rcu: rcu_preempt kthread starved for 765 jiffies! g12741 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x1000 ->cpu=0 <<< [2] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:D stack: 0 pid: 13 ppid: 2 flags:0x00000000 [<8064b97f>] (__schedule) from [<8064ba03>] (schedule_rtlock+0x1b/0x2e) [<8064ba03>] (schedule_rtlock) from [<8064ea6f>] (rtlock_slowlock_locked+0x93/0x108) [<8064ea6f>] (rtlock_slowlock_locked) from [<8064eb1b>] (rt_spin_lock+0x37/0x4a) [<8064eb1b>] (rt_spin_lock) from [<8021b723>] (__local_bh_disable_ip+0x6b/0x110) [<8021b723>] (__local_bh_disable_ip) from [<8025a90f>] (del_timer_sync+0x7f/0xe0) [<8025a90f>] (del_timer_sync) from [<8064fa6b>] (schedule_timeout+0x73/0xa0) [<8064fa6b>] (schedule_timeout) from [<80254677>] (rcu_gp_fqs_loop+0x8b/0x1bc) [<80254677>] (rcu_gp_fqs_loop) from [<8025483f>] (rcu_gp_kthread+0x97/0xbc) [<8025483f>] (rcu_gp_kthread) from [<8022ca67>] (kthread+0xcf/0xe4) [<8022ca67>] (kthread) from [<80200149>] (ret_from_fork+0x11/0x28) Exception stack(0x820fffb0 to 0x820ffff8) ffa0: 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 rcu: Stack dump where RCU GP kthread last ran: << Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 < .. >
Thank you for your time and consideration. Please let me know if you require any additional information
Best Regards, Ronald Monthero
On Tue, Nov 21, 2023 at 12:09:38AM +1000, Ronald Monthero wrote:
Dear stable maintainers, I like to indicate the oops encountered and request the below patch to be backported to v 5.15. The fix is important to avoid recurring oops in context of rcu detected stalls.
subject: rcu: Avoid tracing a few functions executed in stop machine commit 48f8070f5dd8 Target kernel version v 5.15 Reason for Application: To avoid oops due to rcu_prempt detect stalls on cpus/tasks
Environment and oops context: Issue was observed in my environment on 5.15.193 kernel (arm platform). The patch is helpful to avoid the below oops indicated in [1] and [2]
As the patch does not apply cleanly, we need a working and tested backport so we know to apply the correct version.
Can you please provide that as you've obviously already done this?
thanks,
greg k-h
On Sat, Nov 25, 2023 at 2:10 AM Greg KH gregkh@linuxfoundation.org wrote:
On Tue, Nov 21, 2023 at 12:09:38AM +1000, Ronald Monthero wrote:
Dear stable maintainers, I like to indicate the oops encountered and request the below patch to be backported to v 5.15. The fix is important to avoid recurring oops in context of rcu detected stalls.
subject: rcu: Avoid tracing a few functions executed in stop machine commit 48f8070f5dd8 Target kernel version v 5.15 Reason for Application: To avoid oops due to rcu_prempt detect stalls on cpus/tasks
Environment and oops context: Issue was observed in my environment on 5.15.193 kernel (arm platform). The patch is helpful to avoid the below oops indicated in [1] and [2]
As the patch does not apply cleanly, we need a working and tested backport so we know to apply the correct version.
Can you please provide that as you've obviously already done this?
Hi Greg, Sorry I notice my typo error 193 instead of 93. I have tested on the 5.15.93-rt58 kernel.
BR, Ronald
On Thu, Nov 30, 2023 at 12:08 AM Ronald Monthero debug.penguin32@gmail.com wrote:
On Sat, Nov 25, 2023 at 2:10 AM Greg KH gregkh@linuxfoundation.org wrote:
On Tue, Nov 21, 2023 at 12:09:38AM +1000, Ronald Monthero wrote:
Dear stable maintainers, I like to indicate the oops encountered and request the below patch to be backported to v 5.15. The fix is important to avoid recurring oops in context of rcu detected stalls.
subject: rcu: Avoid tracing a few functions executed in stop machine commit 48f8070f5dd8 Target kernel version v 5.15 Reason for Application: To avoid oops due to rcu_prempt detect stalls on cpus/tasks
Environment and oops context: Issue was observed in my environment on 5.15.193 kernel (arm platform). The patch is helpful to avoid the below oops indicated in [1] and [2]
As the patch does not apply cleanly, we need a working and tested backport so we know to apply the correct version.
Can you please provide that as you've obviously already done this?
Hi Greg, Sorry I notice my typo error 193 instead of 93. I have tested on the 5.15.93-rt58 kernel.
Hi Greg, I used a 5.15.93 kernel - on arm32 bit platform I tested with 5.15.93-rt58 (rt kernel) , on real hardware - Freescale LS1021A, 32 bit Cortex A7 processor - on x86_64 platform I tested non rt kernel 5.15.93 - virtual machine - qemu platform
Below is the build log after patch to kernel/rcu/tree.h on x86_64
linux-5.15.93$ make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool DESCEND bpf/resolve_btfids CHK include/generated/compile.h CC kernel/rcu/tree.o <<< AR kernel/rcu/built-in.a <<< AR kernel/built-in.a CHK kernel/kheaders_data.tar.xz GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o AR init/built-in.a LD vmlinux.o MODPOST vmlinux.symvers MODINFO modules.builtin.modinfo GEN modules.builtin LD .tmp_vmlinux.btf BTF .btf.vmlinux.bin.o LD .tmp_vmlinux.kallsyms1
< snipped >
BTF [M] sound/usb/usx2y/snd-usb-usx2y.ko BTF [M] sound/virtio/virtio_snd.ko BTF [M] sound/x86/snd-hdmi-lpe-audio.ko BTF [M] sound/xen/snd_xen_front.ko BTF [M] virt/lib/irqbypass.ko linux-5.15.93$
BR, ronald
On Thu, Nov 30, 2023 at 10:07:03PM +1000, Ronald Monthero wrote:
On Thu, Nov 30, 2023 at 12:08 AM Ronald Monthero debug.penguin32@gmail.com wrote:
On Sat, Nov 25, 2023 at 2:10 AM Greg KH gregkh@linuxfoundation.org wrote:
On Tue, Nov 21, 2023 at 12:09:38AM +1000, Ronald Monthero wrote:
Dear stable maintainers, I like to indicate the oops encountered and request the below patch to be backported to v 5.15. The fix is important to avoid recurring oops in context of rcu detected stalls.
subject: rcu: Avoid tracing a few functions executed in stop machine commit 48f8070f5dd8 Target kernel version v 5.15 Reason for Application: To avoid oops due to rcu_prempt detect stalls on cpus/tasks
Environment and oops context: Issue was observed in my environment on 5.15.193 kernel (arm platform). The patch is helpful to avoid the below oops indicated in [1] and [2]
As the patch does not apply cleanly, we need a working and tested backport so we know to apply the correct version.
Can you please provide that as you've obviously already done this?
Hi Greg, Sorry I notice my typo error 193 instead of 93. I have tested on the 5.15.93-rt58 kernel.
Hi Greg, I used a 5.15.93 kernel
- on arm32 bit platform I tested with 5.15.93-rt58 (rt kernel) , on
real hardware - Freescale LS1021A, 32 bit Cortex A7 processor
- on x86_64 platform I tested non rt kernel 5.15.93 - virtual
machine - qemu platform
Below is the build log after patch to kernel/rcu/tree.h on x86_64
linux-5.15.93$ make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool DESCEND bpf/resolve_btfids CHK include/generated/compile.h CC kernel/rcu/tree.o <<< AR kernel/rcu/built-in.a <<< AR kernel/built-in.a CHK kernel/kheaders_data.tar.xz GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o AR init/built-in.a LD vmlinux.o MODPOST vmlinux.symvers MODINFO modules.builtin.modinfo GEN modules.builtin LD .tmp_vmlinux.btf BTF .btf.vmlinux.bin.o LD .tmp_vmlinux.kallsyms1
< snipped >
BTF [M] sound/usb/usx2y/snd-usb-usx2y.ko BTF [M] sound/virtio/virtio_snd.ko BTF [M] sound/x86/snd-hdmi-lpe-audio.ko BTF [M] sound/xen/snd_xen_front.ko BTF [M] virt/lib/irqbypass.ko linux-5.15.93$
I don't understand what you are showing here, sorry.
I do not have a working backport anywhere that I can see, that is what we need. As you seem to have one, can you please submit it?
Also note, if you are using the -rt kernel, that changes lots of stuff that we know nothing about, please work with the -rt kernel developers about that.
thanks,
greg k-h
[upstream commit 48f8070f5dd8] This backport patch for kernel 5.15v is derived from upstream 48f8070f5dd8. On 5.15 kernel it fixes recurring oops in context of rcu detected stalls, indicated below.
log : root@ls1021atwr:~# uname -r 5.15.93-rt58+ge0f69a158d5b oops dump stack
** ID_531 main/smp_fsm.c:1884 <inrcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <<< [1] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P116/2:b..l (detected by 1, t=2102 jiffies, g=12741, q=1154) task:irq/31-arm-irq1 state:D stack: 0 pid:116 ppid:2 flags:0x00000000 [<8064b97f>] (__schedule) from [<8064bb01>] (schedule+0x8d/0xc2) [<8064bb01>] (schedule) from [<8064fa65>] (schedule_timeout+0x6d/0xa0) [<8064fa65>] (schedule_timeout) from [<804ba353>] (fsl_ifc_run_command+0x6f/0x178) [<804ba353>] (fsl_ifc_run_command) from [<804ba72f>] (fsl_ifc_cmdfunc+0x203/0x2b8) [<804ba72f>] (fsl_ifc_cmdfunc) from [<804b135f>] .... < snipped >
rcu: rcu_preempt kthread timer wakeup didn't happen for 764 jiffies! g12741 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x1000 rcu: Possible timer handling issue on cpu=0 timer-softirq=1095 rcu: rcu_preempt kthread starved for 765 jiffies! g12741 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x1000 ->cpu=0 <<< [2] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:D stack: 0 pid: 13 ppid: 2 flags:0x00000000 [<8064b97f>] (__schedule) from [<8064ba03>] (schedule_rtlock+0x1b/0x2e) [<8064ba03>] (schedule_rtlock) from [<8064ea6f>] (rtlock_slowlock_locked+0x93/0x108) [<8064ea6f>] (rtlock_slowlock_locked) from [<8064eb1b>] [<8064eb1b>] (rt_spin_lock) from [<8021b723>] (__local_bh_disable_ip+0x6b/0x110) [<8021b723>] (__local_bh_disable_ip) from [<8025a90f>] (del_timer_sync+0x7f/0xe0) [<8025a90f>] (del_timer_sync) from [<8064fa6b>] (schedule_timeout+0x73/0xa0) Exception stack(0x820fffb0 to 0x820ffff8) rcu: Stack dump where RCU GP kthread last ran: ... Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 < .. >
Signed-off-by: Ronald Monthero debug.penguin32@gmail.com --- kernel/rcu/tree_plugin.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index d070059163d7..36ca6bacd430 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -458,7 +458,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) * be quite short, for example, in the case of the call from * rcu_read_unlock_special(). */ -static void +static notrace void rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) { bool empty_exp; @@ -578,7 +578,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) * is disabled. This function cannot be expected to understand these * nuances, so the caller must handle them. */ -static bool rcu_preempt_need_deferred_qs(struct task_struct *t) +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t) { return (__this_cpu_read(rcu_data.exp_deferred_qs) || READ_ONCE(t->rcu_read_unlock_special.s)) && @@ -592,7 +592,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t) * evaluate safety in terms of interrupt, softirq, and preemption * disabling. */ -static void rcu_preempt_deferred_qs(struct task_struct *t) +static notrace void rcu_preempt_deferred_qs(struct task_struct *t) { unsigned long flags;
@@ -922,7 +922,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) * Because there is no preemptible RCU, there can be no deferred quiescent * states. */ -static bool rcu_preempt_need_deferred_qs(struct task_struct *t) +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t) { return false; }
On Sat, Dec 02, 2023 at 04:56:16PM +1000, Ronald Monthero wrote:
Signed-off-by: Ronald Monthero debug.penguin32@gmail.com
You lost the signed-off-by and didn't cc: everyone on the original commit, how come?
thanks,
greg k-h
Greg ack sorry missed that block, will update and send it now. thanks BR, ronald
On Sat, Dec 2, 2023 at 5:46 PM Greg KH gregkh@linuxfoundation.org wrote:
On Sat, Dec 02, 2023 at 04:56:16PM +1000, Ronald Monthero wrote:
Signed-off-by: Ronald Monthero debug.penguin32@gmail.com
You lost the signed-off-by and didn't cc: everyone on the original commit, how come?
thanks,
greg k-h
[upstream commit 48f8070f5dd8e13148ae4647780a452d53c457a2] This backport patch for kernel 5.15v is derived from upstream 48f8070f5dd8. On 5.15 kernel it fixes recurring oops in context of rcu detected stalls, indicated below.
log : root@ls1021atwr:~# uname -r 5.15.93-rt58+ge0f69a158d5b oops dump stack
** ID_531 main/smp_fsm.c:1884 <inrcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <<< [1] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P116/2:b..l (detected by 1, t=2102 jiffies, g=12741, q=1154) task:irq/31-arm-irq1 state:D stack: 0 pid:116 ppid:2 flags:0x00000000 [<8064b97f>] (__schedule) from [<8064bb01>] (schedule+0x8d/0xc2) [<8064bb01>] (schedule) from [<8064fa65>] (schedule_timeout+0x6d/0xa0) [<8064fa65>] (schedule_timeout) from [<804ba353>] (fsl_ifc_run_command+0x6f/0x178) [<804ba353>] (fsl_ifc_run_command) from [<804ba72f>] (fsl_ifc_cmdfunc+0x203/0x2b8) [<804ba72f>] (fsl_ifc_cmdfunc) from [<804b135f>] .... < snipped >
rcu: rcu_preempt kthread timer wakeup didn't happen for 764 jiffies! g12741 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x1000 rcu: Possible timer handling issue on cpu=0 timer-softirq=1095 rcu: rcu_preempt kthread starved for 765 jiffies! g12741 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x1000 ->cpu=0 <<< [2] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:D stack: 0 pid: 13 ppid: 2 flags:0x00000000 [<8064b97f>] (__schedule) from [<8064ba03>] (schedule_rtlock+0x1b/0x2e) [<8064ba03>] (schedule_rtlock) from [<8064ea6f>] (rtlock_slowlock_locked+0x93/0x108) [<8064ea6f>] (rtlock_slowlock_locked) from [<8064eb1b>] [<8064eb1b>] (rt_spin_lock) from [<8021b723>] (__local_bh_disable_ip+0x6b/0x110) [<8021b723>] (__local_bh_disable_ip) from [<8025a90f>] (del_timer_sync+0x7f/0xe0) [<8025a90f>] (del_timer_sync) from [<8064fa6b>] (schedule_timeout+0x73/0xa0) Exception stack(0x820fffb0 to 0x820ffff8) rcu: Stack dump where RCU GP kthread last ran: ... Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 < .. >
upstream commit: Signed-off-by: Patrick Wang patrick.wang.shcn@gmail.com Acked-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Reviewed-by: Neeraj Upadhyay quic_neeraju@quicinc.com
Signed-off-by: Ronald Monthero debug.penguin32@gmail.com --- kernel/rcu/tree_plugin.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index d070059163d7..36ca6bacd430 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -458,7 +458,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) * be quite short, for example, in the case of the call from * rcu_read_unlock_special(). */ -static void +static notrace void rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) { bool empty_exp; @@ -578,7 +578,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) * is disabled. This function cannot be expected to understand these * nuances, so the caller must handle them. */ -static bool rcu_preempt_need_deferred_qs(struct task_struct *t) +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t) { return (__this_cpu_read(rcu_data.exp_deferred_qs) || READ_ONCE(t->rcu_read_unlock_special.s)) && @@ -592,7 +592,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t) * evaluate safety in terms of interrupt, softirq, and preemption * disabling. */ -static void rcu_preempt_deferred_qs(struct task_struct *t) +static notrace void rcu_preempt_deferred_qs(struct task_struct *t) { unsigned long flags;
@@ -922,7 +922,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp) * Because there is no preemptible RCU, there can be no deferred quiescent * states. */ -static bool rcu_preempt_need_deferred_qs(struct task_struct *t) +static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t) { return false; }
linux-stable-mirror@lists.linaro.org