The KGDB initial breakpoint gets an rcu stall warning after commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
[ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c [ 54.829302] ... [ 54.859163] Call Trace: [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124
Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies is only updated in the timer interrupt, so when kgdb_cpu_enter() begins to run there may already be nearly one rcu check period after jiffies. Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall maybe already gets timeout.
We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() to avoid this problem. But this isn't a complete solution because kgdb may take a very long time in irq disabled context.
Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can solve all kinds of problems.
Cc: stable@vger.kernel.org Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn --- kernel/rcu/tree_stall.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..1c7b540985bf 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) */ void rcu_cpu_stall_reset(void) { + do_update_jiffies_64(ktime_get()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check()); }
On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote:
The KGDB initial breakpoint gets an rcu stall warning after commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
[ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c [ 54.829302] ... [ 54.859163] Call Trace: [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124
Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies is only updated in the timer interrupt, so when kgdb_cpu_enter() begins to run there may already be nearly one rcu check period after jiffies. Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall maybe already gets timeout.
We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() to avoid this problem. But this isn't a complete solution because kgdb may take a very long time in irq disabled context.
Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can solve all kinds of problems.
Would it make sense for there to be a kgdb_cpu_exit()? In that case, the stalls could simply be suppressed at the beginning of the debug session and re-enabled upon exit, as is currently done for sysrq output via rcu_sysrq_start() and rcu_sysrq_end().
Thanx, Paul
Cc: stable@vger.kernel.org Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn
kernel/rcu/tree_stall.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..1c7b540985bf 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) */ void rcu_cpu_stall_reset(void) {
- do_update_jiffies_64(ktime_get()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check());
}
2.39.3
Hi, Paul,
On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote:
The KGDB initial breakpoint gets an rcu stall warning after commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
[ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c [ 54.829302] ... [ 54.859163] Call Trace: [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124
Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies is only updated in the timer interrupt, so when kgdb_cpu_enter() begins to run there may already be nearly one rcu check period after jiffies. Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall maybe already gets timeout.
We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() to avoid this problem. But this isn't a complete solution because kgdb may take a very long time in irq disabled context.
Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can solve all kinds of problems.
Would it make sense for there to be a kgdb_cpu_exit()? In that case, the stalls could simply be suppressed at the beginning of the debug session and re-enabled upon exit, as is currently done for sysrq output via rcu_sysrq_start() and rcu_sysrq_end().
Thank you for your advice, but that doesn't help. Because rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() since it is executed in irq disabled context. Instead, this patch wants to suppress the warnings *after* kgdb_cpu_enter() due to a very old jiffies value.
Huacai
Thanx, Paul
Cc: stable@vger.kernel.org Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn
kernel/rcu/tree_stall.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..1c7b540985bf 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) */ void rcu_cpu_stall_reset(void) {
do_update_jiffies_64(ktime_get()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check());
}
2.39.3
Hi, Paul,
On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote:
The KGDB initial breakpoint gets an rcu stall warning after commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
[ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c [ 54.829302] ... [ 54.859163] Call Trace: [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124
Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies is only updated in the timer interrupt, so when kgdb_cpu_enter() begins to run there may already be nearly one rcu check period after jiffies. Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall maybe already gets timeout.
We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() to avoid this problem. But this isn't a complete solution because kgdb may take a very long time in irq disabled context.
Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can solve all kinds of problems.
Would it make sense for there to be a kgdb_cpu_exit()? In that case, the stalls could simply be suppressed at the beginning of the debug session and re-enabled upon exit, as is currently done for sysrq output via rcu_sysrq_start() and rcu_sysrq_end().
Thank you for your advice, but that doesn't help. Because rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() since it is executed in irq disabled context. Instead, this patch wants to suppress the warnings *after* kgdb_cpu_enter() due to a very old jiffies value.
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period? or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
Thanks Zqiang
Huacai
Thanx, Paul
Cc: stable@vger.kernel.org Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn
kernel/rcu/tree_stall.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..1c7b540985bf 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) */ void rcu_cpu_stall_reset(void) {
do_update_jiffies_64(ktime_get()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check());
}
2.39.3
Hi, Qiang,
On Wed, Aug 16, 2023 at 11:16 AM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Paul,
On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote:
The KGDB initial breakpoint gets an rcu stall warning after commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
[ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c [ 54.829302] ... [ 54.859163] Call Trace: [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124
Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies is only updated in the timer interrupt, so when kgdb_cpu_enter() begins to run there may already be nearly one rcu check period after jiffies. Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall maybe already gets timeout.
We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() to avoid this problem. But this isn't a complete solution because kgdb may take a very long time in irq disabled context.
Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can solve all kinds of problems.
Would it make sense for there to be a kgdb_cpu_exit()? In that case, the stalls could simply be suppressed at the beginning of the debug session and re-enabled upon exit, as is currently done for sysrq output via rcu_sysrq_start() and rcu_sysrq_end().
Thank you for your advice, but that doesn't help. Because rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() since it is executed in irq disabled context. Instead, this patch wants to suppress the warnings *after* kgdb_cpu_enter() due to a very old jiffies value.
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period?
This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai
Thanks Zqiang
Huacai
Thanx, Paul
Cc: stable@vger.kernel.org Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn
kernel/rcu/tree_stall.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..1c7b540985bf 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) */ void rcu_cpu_stall_reset(void) {
do_update_jiffies_64(ktime_get()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check());
}
2.39.3
Hi, Qiang,
On Wed, Aug 16, 2023 at 11:16 AM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Paul,
On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote:
The KGDB initial breakpoint gets an rcu stall warning after commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
[ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c [ 54.829302] ... [ 54.859163] Call Trace: [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124
Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies is only updated in the timer interrupt, so when kgdb_cpu_enter() begins to run there may already be nearly one rcu check period after jiffies. Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall maybe already gets timeout.
We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() to avoid this problem. But this isn't a complete solution because kgdb may take a very long time in irq disabled context.
Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can solve all kinds of problems.
Would it make sense for there to be a kgdb_cpu_exit()? In that case, the stalls could simply be suppressed at the beginning of the debug session and re-enabled upon exit, as is currently done for sysrq output via rcu_sysrq_start() and rcu_sysrq_end().
Thank you for your advice, but that doesn't help. Because rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() since it is executed in irq disabled context. Instead, this patch wants to suppress the warnings *after* kgdb_cpu_enter() due to a very old jiffies value.
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period?
This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
Thanks Zqiang
or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai
Thanks Zqiang
Huacai
Thanx, Paul
Cc: stable@vger.kernel.org Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn
kernel/rcu/tree_stall.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..1c7b540985bf 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) */ void rcu_cpu_stall_reset(void) {
do_update_jiffies_64(ktime_get()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check());
}
2.39.3
Hi, Qiang,
On Wed, Aug 16, 2023 at 1:09 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 11:16 AM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Paul,
On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote:
The KGDB initial breakpoint gets an rcu stall warning after commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
[ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c [ 54.829302] ... [ 54.859163] Call Trace: [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124
Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies is only updated in the timer interrupt, so when kgdb_cpu_enter() begins to run there may already be nearly one rcu check period after jiffies. Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall maybe already gets timeout.
We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() to avoid this problem. But this isn't a complete solution because kgdb may take a very long time in irq disabled context.
Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can solve all kinds of problems.
Would it make sense for there to be a kgdb_cpu_exit()? In that case, the stalls could simply be suppressed at the beginning of the debug session and re-enabled upon exit, as is currently done for sysrq output via rcu_sysrq_start() and rcu_sysrq_end().
Thank you for your advice, but that doesn't help. Because rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() since it is executed in irq disabled context. Instead, this patch wants to suppress the warnings *after* kgdb_cpu_enter() due to a very old jiffies value.
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period?
This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
Huacai
Thanks Zqiang
or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai
Thanks Zqiang
Huacai
Thanx, Paul
Cc: stable@vger.kernel.org Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn Signed-off-by: Huacai Chen chenhuacai@loongson.cn
kernel/rcu/tree_stall.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..1c7b540985bf 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) */ void rcu_cpu_stall_reset(void) {
do_update_jiffies_64(ktime_get()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check());
}
2.39.3
Hi, Qiang,
On Wed, Aug 16, 2023 at 1:09 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 11:16 AM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Paul,
On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote: > The KGDB initial breakpoint gets an rcu stall warning after commit > a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in > rcu_cpu_stall_reset()"). > > [ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU > [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 > [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) > [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 > [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 > [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 > [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 > [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 > [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa > [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 > [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 > [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 > [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 > [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 > [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 > [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) > [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) > [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) > [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) > [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) > [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) > [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 > [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 > [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 > [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 > [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 > [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 > [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 > [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 > [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 > [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 > [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c > [ 54.829302] ... > [ 54.859163] Call Trace: > [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 > [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 > [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 > [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 > [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 > [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 > [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 > [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 > [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 > [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 > [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 > [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 > [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 > [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 > [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 > [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 > [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 > [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 > [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 > [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 > [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 > [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 > [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 > [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 > [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 > [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 > [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 > [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 > [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 > [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 > [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 > [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 > [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 > [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 > [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 > [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 > [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 > [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 > [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124 > > Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check > period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies > is only updated in the timer interrupt, so when kgdb_cpu_enter() begins > to run there may already be nearly one rcu check period after jiffies. > Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will > not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall > maybe already gets timeout. > > We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. > jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() > to avoid this problem. But this isn't a complete solution because kgdb > may take a very long time in irq disabled context. > > Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can > solve all kinds of problems.
Would it make sense for there to be a kgdb_cpu_exit()? In that case, the stalls could simply be suppressed at the beginning of the debug session and re-enabled upon exit, as is currently done for sysrq output via rcu_sysrq_start() and rcu_sysrq_end().
Thank you for your advice, but that doesn't help. Because rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() since it is executed in irq disabled context. Instead, this patch wants to suppress the warnings *after* kgdb_cpu_enter() due to a very old jiffies value.
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period?
This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Thanks Zqiang
Huacai
Thanks Zqiang
or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai
Thanks Zqiang
Huacai
Thanx, Paul
> Cc: stable@vger.kernel.org > Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") > Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn > Signed-off-by: Huacai Chen chenhuacai@loongson.cn > --- > kernel/rcu/tree_stall.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index b10b8349bb2a..1c7b540985bf 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) > */ > void rcu_cpu_stall_reset(void) > { > + do_update_jiffies_64(ktime_get()); > WRITE_ONCE(rcu_state.jiffies_stall, > jiffies + rcu_jiffies_till_stall_check()); > } > -- > 2.39.3 >
Hi, Qiang,
On Wed, Aug 16, 2023 at 6:06 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 1:09 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 11:16 AM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Paul,
On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote: > > On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote: > > The KGDB initial breakpoint gets an rcu stall warning after commit > > a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in > > rcu_cpu_stall_reset()"). > > > > [ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU > > [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 > > [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) > > [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 > > [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 > > [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 > > [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 > > [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 > > [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa > > [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 > > [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 > > [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 > > [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 > > [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 > > [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 > > [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) > > [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) > > [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) > > [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) > > [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) > > [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) > > [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 > > [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 > > [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 > > [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 > > [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 > > [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 > > [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 > > [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 > > [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 > > [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 > > [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c > > [ 54.829302] ... > > [ 54.859163] Call Trace: > > [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 > > [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 > > [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 > > [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 > > [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 > > [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 > > [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 > > [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 > > [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 > > [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 > > [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 > > [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 > > [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 > > [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 > > [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 > > [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 > > [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 > > [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 > > [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 > > [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 > > [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 > > [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 > > [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 > > [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 > > [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 > > [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 > > [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 > > [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 > > [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 > > [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 > > [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 > > [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 > > [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 > > [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 > > [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 > > [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 > > [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 > > [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 > > [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124 > > > > Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check > > period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies > > is only updated in the timer interrupt, so when kgdb_cpu_enter() begins > > to run there may already be nearly one rcu check period after jiffies. > > Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will > > not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall > > maybe already gets timeout. > > > > We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. > > jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() > > to avoid this problem. But this isn't a complete solution because kgdb > > may take a very long time in irq disabled context. > > > > Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can > > solve all kinds of problems. > > Would it make sense for there to be a kgdb_cpu_exit()? In that case, > the stalls could simply be suppressed at the beginning of the debug > session and re-enabled upon exit, as is currently done for sysrq output > via rcu_sysrq_start() and rcu_sysrq_end(). Thank you for your advice, but that doesn't help. Because rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() since it is executed in irq disabled context. Instead, this patch wants to suppress the warnings *after* kgdb_cpu_enter() due to a very old jiffies value.
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period?
This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Reset rcu_cpu_stall_suppress in rcu_gp_init()/rcu_gp_cleanup() is still not so good to me, because it does a useless operation in most cases. Moreover, the rcu core is refactored again and again, something may be changed in future.
If do_update_jiffies_64() cannot be used in NMI context, can we consider my old method [1]? https://lore.kernel.org/rcu/CAAhV-H7j9Y=VvRLm8thLw-EX1PGqBA9YfT4G1AN7ucYS=iP...
Of course we should set rcu_state.jiffies_stall large enough, so we can do like this:
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall, - jiffies + rcu_jiffies_till_stall_check()); + jiffies + 300 * HZ); }
300s is the largest timeout value, and I think 300s is enough here in practice.
Huacai
Thanks Zqiang
Huacai
Thanks Zqiang
or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai
Thanks Zqiang
Huacai
> > Thanx, Paul > > > Cc: stable@vger.kernel.org > > Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") > > Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn > > Signed-off-by: Huacai Chen chenhuacai@loongson.cn > > --- > > kernel/rcu/tree_stall.h | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > > index b10b8349bb2a..1c7b540985bf 100644 > > --- a/kernel/rcu/tree_stall.h > > +++ b/kernel/rcu/tree_stall.h > > @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) > > */ > > void rcu_cpu_stall_reset(void) > > { > > + do_update_jiffies_64(ktime_get()); > > WRITE_ONCE(rcu_state.jiffies_stall, > > jiffies + rcu_jiffies_till_stall_check()); > > } > > -- > > 2.39.3 > >
>>> >>> Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check >>> period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies >>> is only updated in the timer interrupt, so when kgdb_cpu_enter() begins >>> to run there may already be nearly one rcu check period after jiffies. >>> Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will >>> not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall >>> maybe already gets timeout. >>> >>> We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. >>> jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() >>> to avoid this problem. But this isn't a complete solution because kgdb >>> may take a very long time in irq disabled context. >>> >>> Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can >>> solve all kinds of problems. >> >> Would it make sense for there to be a kgdb_cpu_exit()? In that case, >> the stalls could simply be suppressed at the beginning of the debug >> session and re-enabled upon exit, as is currently done for sysrq output >> via rcu_sysrq_start() and rcu_sysrq_end(). > Thank you for your advice, but that doesn't help. Because > rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings > during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() > since it is executed in irq disabled context. Instead, this patch > wants to suppress the warnings *after* kgdb_cpu_enter() due to a very > old jiffies value. >
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period?
This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Reset rcu_cpu_stall_suppress in rcu_gp_init()/rcu_gp_cleanup() is still not so good to me, because it does a useless operation in most cases. Moreover, the rcu core is refactored again and again, something may be changed in future.
If do_update_jiffies_64() cannot be used in NMI context, can we
What about updating jiffies in dbg_touch_watchdogs or adding a wrapper which updates both jiffies and jiffies_stall?
consider my old method [1]? https://lore.kernel.org/rcu/CAAhV-H7j9Y=VvRLm8thLw-EX1PGqBA9YfT4G1AN7ucYS=iP...
Of course we should set rcu_state.jiffies_stall large enough, so we can do like this:
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall,
- jiffies + rcu_jiffies_till_stall_check());
- jiffies + 300 * HZ);
}
300s is the largest timeout value, and I think 300s is enough here in practice.
Huacai
Thanks Zqiang
Huacai
Thanks Zqiang
or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai
Thanks Zqiang
> > Huacai > >> >> Thanx, Paul >> >>> Cc: stable@vger.kernel.org >>> Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") >>> Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn >>> Signed-off-by: Huacai Chen chenhuacai@loongson.cn >>> --- >>> kernel/rcu/tree_stall.h | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >>> index b10b8349bb2a..1c7b540985bf 100644 >>> --- a/kernel/rcu/tree_stall.h >>> +++ b/kernel/rcu/tree_stall.h >>> @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) >>> */ >>> void rcu_cpu_stall_reset(void) >>> { >>> + do_update_jiffies_64(ktime_get()); >>> WRITE_ONCE(rcu_state.jiffies_stall, >>> jiffies + rcu_jiffies_till_stall_check()); >>> } >>> -- >>> 2.39.3
Hi, Alan,
On Wed, Aug 16, 2023 at 11:57 PM Alan Huang mmpgouride@gmail.com wrote:
>>>> >>>> Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check >>>> period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies >>>> is only updated in the timer interrupt, so when kgdb_cpu_enter() begins >>>> to run there may already be nearly one rcu check period after jiffies. >>>> Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will >>>> not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall >>>> maybe already gets timeout. >>>> >>>> We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. >>>> jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() >>>> to avoid this problem. But this isn't a complete solution because kgdb >>>> may take a very long time in irq disabled context. >>>> >>>> Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can >>>> solve all kinds of problems. >>> >>> Would it make sense for there to be a kgdb_cpu_exit()? In that case, >>> the stalls could simply be suppressed at the beginning of the debug >>> session and re-enabled upon exit, as is currently done for sysrq output >>> via rcu_sysrq_start() and rcu_sysrq_end(). >> Thank you for your advice, but that doesn't help. Because >> rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings >> during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() >> since it is executed in irq disabled context. Instead, this patch >> wants to suppress the warnings *after* kgdb_cpu_enter() due to a very >> old jiffies value. >> > > Hello, Huacai > > Is it possible to set the rcu_cpu_stall_suppress is true in > dbg_touch_watchdogs() > and reset the rcu_cpu_stall_suppress at the beginning and end of the > RCU grace period? This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Reset rcu_cpu_stall_suppress in rcu_gp_init()/rcu_gp_cleanup() is still not so good to me, because it does a useless operation in most cases. Moreover, the rcu core is refactored again and again, something may be changed in future.
If do_update_jiffies_64() cannot be used in NMI context, can we
What about updating jiffies in dbg_touch_watchdogs or adding a wrapper which updates both jiffies and jiffies_stall?
This can solve the kgdb problem, but I found that most callers of rcu_cpu_stall_reset() are in irq disabled context so they may meet similar problems. Modifying rcu_cpu_stall_reset() can solve all of them.
But due to the NMI issue, from my point of view, setting jiffies_stall to jiffies + 300*HZ is the best solution now. :)
Huacai
consider my old method [1]? https://lore.kernel.org/rcu/CAAhV-H7j9Y=VvRLm8thLw-EX1PGqBA9YfT4G1AN7ucYS=iP...
Of course we should set rcu_state.jiffies_stall large enough, so we can do like this:
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall,
- jiffies + rcu_jiffies_till_stall_check());
- jiffies + 300 * HZ);
}
300s is the largest timeout value, and I think 300s is enough here in practice.
Huacai
Thanks Zqiang
Huacai
Thanks Zqiang
> or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can > suppress RCU stall > in booting. This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai > > > Thanks > Zqiang > > >> >> Huacai >> >>> >>> Thanx, Paul >>> >>>> Cc: stable@vger.kernel.org >>>> Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") >>>> Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn >>>> Signed-off-by: Huacai Chen chenhuacai@loongson.cn >>>> --- >>>> kernel/rcu/tree_stall.h | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >>>> index b10b8349bb2a..1c7b540985bf 100644 >>>> --- a/kernel/rcu/tree_stall.h >>>> +++ b/kernel/rcu/tree_stall.h >>>> @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) >>>> */ >>>> void rcu_cpu_stall_reset(void) >>>> { >>>> + do_update_jiffies_64(ktime_get()); >>>> WRITE_ONCE(rcu_state.jiffies_stall, >>>> jiffies + rcu_jiffies_till_stall_check()); >>>> } >>>> -- >>>> 2.39.3
>>>>> >>>>> Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check >>>>> period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies >>>>> is only updated in the timer interrupt, so when kgdb_cpu_enter() begins >>>>> to run there may already be nearly one rcu check period after jiffies. >>>>> Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will >>>>> not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall >>>>> maybe already gets timeout. >>>>> >>>>> We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. >>>>> jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() >>>>> to avoid this problem. But this isn't a complete solution because kgdb >>>>> may take a very long time in irq disabled context. >>>>> >>>>> Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can >>>>> solve all kinds of problems. >>>> >>>> Would it make sense for there to be a kgdb_cpu_exit()? In that case, >>>> the stalls could simply be suppressed at the beginning of the debug >>>> session and re-enabled upon exit, as is currently done for sysrq output >>>> via rcu_sysrq_start() and rcu_sysrq_end(). >>> Thank you for your advice, but that doesn't help. Because >>> rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings >>> during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() >>> since it is executed in irq disabled context. Instead, this patch >>> wants to suppress the warnings *after* kgdb_cpu_enter() due to a very >>> old jiffies value. >>> >> >> Hello, Huacai >> >> Is it possible to set the rcu_cpu_stall_suppress is true in >> dbg_touch_watchdogs() >> and reset the rcu_cpu_stall_suppress at the beginning and end of the >> RCU grace period? > This is possible but not the best: 1, kgdb is not the only caller of > rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset > rcu_cpu_stall_suppress. >
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Reset rcu_cpu_stall_suppress in rcu_gp_init()/rcu_gp_cleanup() is still not so good to me, because it does a useless operation in most cases. Moreover, the rcu core is refactored again and again, something may be changed in future.
If do_update_jiffies_64() cannot be used in NMI context, can we
What about updating jiffies in dbg_touch_watchdogs or adding a wrapper which updates both jiffies and jiffies_stall?
This can solve the kgdb problem, but I found that most callers of rcu_cpu_stall_reset() are in irq disabled context so they may meet
The duration of other contexts where interrupts are disabled may not be as long as in the case of kgdb?
similar problems. Modifying rcu_cpu_stall_reset() can solve all of them.
But due to the NMI issue, from my point of view, setting jiffies_stall to jiffies + 300*HZ is the best solution now. :)
If I understand correctly, the NMI issue is the deadlock issue? If so, plus the short duration of other irq disabled contexts, it’s ok just update jiffies in dbg_touch_watchdogs.
Please correct me if anything wrong. :)
Huacai
consider my old method [1]? https://lore.kernel.org/rcu/CAAhV-H7j9Y=VvRLm8thLw-EX1PGqBA9YfT4G1AN7ucYS=iP...
Of course we should set rcu_state.jiffies_stall large enough, so we can do like this:
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall,
- jiffies + rcu_jiffies_till_stall_check());
- jiffies + 300 * HZ);
}
300s is the largest timeout value, and I think 300s is enough here in practice.
Huacai
Thanks Zqiang
Huacai
Thanks Zqiang
> >> or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can >> suppress RCU stall >> in booting. > This is also possible, but it suppresses all kinds of stall warnings, > which is not what we want. > > Huacai >> >> >> Thanks >> Zqiang >> >> >>> >>> Huacai >>> >>>> >>>> Thanx, Paul >>>> >>>>> Cc: stable@vger.kernel.org >>>>> Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") >>>>> Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn >>>>> Signed-off-by: Huacai Chen chenhuacai@loongson.cn >>>>> --- >>>>> kernel/rcu/tree_stall.h | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >>>>> index b10b8349bb2a..1c7b540985bf 100644 >>>>> --- a/kernel/rcu/tree_stall.h >>>>> +++ b/kernel/rcu/tree_stall.h >>>>> @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) >>>>> */ >>>>> void rcu_cpu_stall_reset(void) >>>>> { >>>>> + do_update_jiffies_64(ktime_get()); >>>>> WRITE_ONCE(rcu_state.jiffies_stall, >>>>> jiffies + rcu_jiffies_till_stall_check()); >>>>> } >>>>> -- >>>>> 2.39.3
Hi, Alan,
On Thu, Aug 17, 2023 at 12:52 AM Alan Huang mmpgouride@gmail.com wrote:
>>>>>> >>>>>> Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check >>>>>> period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies >>>>>> is only updated in the timer interrupt, so when kgdb_cpu_enter() begins >>>>>> to run there may already be nearly one rcu check period after jiffies. >>>>>> Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will >>>>>> not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall >>>>>> maybe already gets timeout. >>>>>> >>>>>> We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. >>>>>> jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() >>>>>> to avoid this problem. But this isn't a complete solution because kgdb >>>>>> may take a very long time in irq disabled context. >>>>>> >>>>>> Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can >>>>>> solve all kinds of problems. >>>>> >>>>> Would it make sense for there to be a kgdb_cpu_exit()? In that case, >>>>> the stalls could simply be suppressed at the beginning of the debug >>>>> session and re-enabled upon exit, as is currently done for sysrq output >>>>> via rcu_sysrq_start() and rcu_sysrq_end(). >>>> Thank you for your advice, but that doesn't help. Because >>>> rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings >>>> during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() >>>> since it is executed in irq disabled context. Instead, this patch >>>> wants to suppress the warnings *after* kgdb_cpu_enter() due to a very >>>> old jiffies value. >>>> >>> >>> Hello, Huacai >>> >>> Is it possible to set the rcu_cpu_stall_suppress is true in >>> dbg_touch_watchdogs() >>> and reset the rcu_cpu_stall_suppress at the beginning and end of the >>> RCU grace period? >> This is possible but not the best: 1, kgdb is not the only caller of >> rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset >> rcu_cpu_stall_suppress. >> > > You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress > in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and > rcu_gp_cleanup(). What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Reset rcu_cpu_stall_suppress in rcu_gp_init()/rcu_gp_cleanup() is still not so good to me, because it does a useless operation in most cases. Moreover, the rcu core is refactored again and again, something may be changed in future.
If do_update_jiffies_64() cannot be used in NMI context, can we
What about updating jiffies in dbg_touch_watchdogs or adding a wrapper which updates both jiffies and jiffies_stall?
This can solve the kgdb problem, but I found that most callers of rcu_cpu_stall_reset() are in irq disabled context so they may meet
The duration of other contexts where interrupts are disabled may not be as long as in the case of kgdb?
similar problems. Modifying rcu_cpu_stall_reset() can solve all of them.
But due to the NMI issue, from my point of view, setting jiffies_stall to jiffies + 300*HZ is the best solution now. :)
If I understand correctly, the NMI issue is the deadlock issue? If so, plus the short duration of other irq disabled contexts, it’s ok just update jiffies in dbg_touch_watchdogs.
Please correct me if anything wrong. :)
The timeout value can be configured as short as 3 seconds, in this case other callers may also have problems.
Huacai
Huacai
consider my old method [1]? https://lore.kernel.org/rcu/CAAhV-H7j9Y=VvRLm8thLw-EX1PGqBA9YfT4G1AN7ucYS=iP...
Of course we should set rcu_state.jiffies_stall large enough, so we can do like this:
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall,
- jiffies + rcu_jiffies_till_stall_check());
- jiffies + 300 * HZ);
}
300s is the largest timeout value, and I think 300s is enough here in practice.
Huacai
Thanks Zqiang
Huacai
> > Thanks > Zqiang > >> >>> or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can >>> suppress RCU stall >>> in booting. >> This is also possible, but it suppresses all kinds of stall warnings, >> which is not what we want. >> >> Huacai >>> >>> >>> Thanks >>> Zqiang >>> >>> >>>> >>>> Huacai >>>> >>>>> >>>>> Thanx, Paul >>>>> >>>>>> Cc: stable@vger.kernel.org >>>>>> Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") >>>>>> Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn >>>>>> Signed-off-by: Huacai Chen chenhuacai@loongson.cn >>>>>> --- >>>>>> kernel/rcu/tree_stall.h | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >>>>>> index b10b8349bb2a..1c7b540985bf 100644 >>>>>> --- a/kernel/rcu/tree_stall.h >>>>>> +++ b/kernel/rcu/tree_stall.h >>>>>> @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) >>>>>> */ >>>>>> void rcu_cpu_stall_reset(void) >>>>>> { >>>>>> + do_update_jiffies_64(ktime_get()); >>>>>> WRITE_ONCE(rcu_state.jiffies_stall, >>>>>> jiffies + rcu_jiffies_till_stall_check()); >>>>>> } >>>>>> -- >>>>>> 2.39.3
On Wed, Aug 16 2023 at 23:56, Alan Huang wrote:
If do_update_jiffies_64() cannot be used in NMI context, can we
What about updating jiffies in dbg_touch_watchdogs or adding a wrapper which updates both jiffies and jiffies_stall?
What makes dbg_touch_watchdogs() any different?
KGDB can pretty much have a breakpoint everywhere and therefore also within the jiffies lock held region.
Thanks,
tglx
On Aug 16, 2023, at 8:29 AM, Huacai Chen chenhuacai@kernel.org wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 6:06 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 1:09 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 11:16 AM Z qiang qiang.zhang1211@gmail.com wrote:
> > Hi, Paul, > > On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote: >> >> On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote: >>> The KGDB initial breakpoint gets an rcu stall warning after commit >>> a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in >>> rcu_cpu_stall_reset()"). >>> >>> [ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU >>> [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 >>> [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) >>> [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 >>> [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 >>> [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 >>> [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 >>> [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 >>> [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa >>> [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 >>> [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 >>> [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 >>> [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 >>> [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 >>> [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 >>> [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) >>> [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) >>> [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) >>> [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) >>> [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) >>> [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) >>> [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 >>> [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 >>> [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 >>> [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 >>> [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 >>> [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 >>> [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 >>> [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 >>> [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 >>> [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 >>> [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c >>> [ 54.829302] ... >>> [ 54.859163] Call Trace: >>> [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 >>> [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 >>> [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 >>> [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 >>> [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 >>> [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 >>> [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 >>> [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 >>> [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 >>> [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 >>> [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 >>> [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 >>> [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 >>> [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 >>> [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 >>> [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 >>> [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 >>> [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 >>> [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 >>> [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 >>> [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 >>> [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 >>> [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 >>> [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 >>> [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 >>> [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 >>> [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 >>> [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 >>> [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 >>> [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 >>> [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 >>> [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 >>> [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 >>> [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 >>> [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 >>> [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 >>> [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 >>> [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 >>> [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124 >>> >>> Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check >>> period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies >>> is only updated in the timer interrupt, so when kgdb_cpu_enter() begins >>> to run there may already be nearly one rcu check period after jiffies. >>> Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will >>> not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall >>> maybe already gets timeout. >>> >>> We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. >>> jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() >>> to avoid this problem. But this isn't a complete solution because kgdb >>> may take a very long time in irq disabled context. >>> >>> Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can >>> solve all kinds of problems. >> >> Would it make sense for there to be a kgdb_cpu_exit()? In that case, >> the stalls could simply be suppressed at the beginning of the debug >> session and re-enabled upon exit, as is currently done for sysrq output >> via rcu_sysrq_start() and rcu_sysrq_end(). > Thank you for your advice, but that doesn't help. Because > rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings > during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() > since it is executed in irq disabled context. Instead, this patch > wants to suppress the warnings *after* kgdb_cpu_enter() due to a very > old jiffies value. >
Hello, Huacai
Is it possible to set the rcu_cpu_stall_suppress is true in dbg_touch_watchdogs() and reset the rcu_cpu_stall_suppress at the beginning and end of the RCU grace period?
This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Reset rcu_cpu_stall_suppress in rcu_gp_init()/rcu_gp_cleanup() is still not so good to me, because it does a useless operation in most cases. Moreover, the rcu core is refactored again and again, something may be changed in future.
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
can we consider my old method [1]? https://lore.kernel.org/rcu/CAAhV-H7j9Y=VvRLm8thLw-EX1PGqBA9YfT4G1AN7ucYS=iP...
Of course we should set rcu_state.jiffies_stall large enough, so we can do like this:
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall,
- jiffies + rcu_jiffies_till_stall_check());
- jiffies + 300 * HZ);
}
300s is the largest timeout value, and I think 300s is enough here in practice.
I dislike that..
Thanks,
- Joel
Huacai
Thanks Zqiang
Huacai
Thanks Zqiang
or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can suppress RCU stall in booting.
This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai
Thanks Zqiang
> > Huacai > >> >> Thanx, Paul >> >>> Cc: stable@vger.kernel.org >>> Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") >>> Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn >>> Signed-off-by: Huacai Chen chenhuacai@loongson.cn >>> --- >>> kernel/rcu/tree_stall.h | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >>> index b10b8349bb2a..1c7b540985bf 100644 >>> --- a/kernel/rcu/tree_stall.h >>> +++ b/kernel/rcu/tree_stall.h >>> @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) >>> */ >>> void rcu_cpu_stall_reset(void) >>> { >>> + do_update_jiffies_64(ktime_get()); >>> WRITE_ONCE(rcu_state.jiffies_stall, >>> jiffies + rcu_jiffies_till_stall_check()); >>> } >>> -- >>> 2.39.3 >>>
Hi, Joel,
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
On Aug 16, 2023, at 8:29 AM, Huacai Chen chenhuacai@kernel.org wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 6:06 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 1:09 PM Z qiang qiang.zhang1211@gmail.com wrote:
Hi, Qiang,
On Wed, Aug 16, 2023 at 11:16 AM Z qiang qiang.zhang1211@gmail.com wrote: > >> >> Hi, Paul, >> >> On Tue, Aug 15, 2023 at 12:15 AM Paul E. McKenney paulmck@kernel.org wrote: >>> >>> On Mon, Aug 14, 2023 at 10:00:45AM +0800, Huacai Chen wrote: >>>> The KGDB initial breakpoint gets an rcu stall warning after commit >>>> a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in >>>> rcu_cpu_stall_reset()"). >>>> >>>> [ 53.452051] rcu: INFO: rcu_preempt self-detected stall on CPU >>>> [ 53.487950] rcu: 3-...0: (1 ticks this GP) idle=0e2c/1/0x4000000000000000 softirq=375/375 fqs=8 >>>> [ 53.528243] rcu: (t=12297 jiffies g=-995 q=1 ncpus=4) >>>> [ 53.564840] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 >>>> [ 53.603005] Hardware name: Loongson Loongson-3A5000-HV-7A2000-1w-V0.1-CRB/Loongson-LS3A5000-7A2000-1w-CRB-V1.21, BIOS Loongson-UDK2018-V2.0.05099-beta8 08 >>>> [ 53.682062] pc 9000000000332100 ra 90000000003320f4 tp 90000001000a0000 sp 90000001000a3710 >>>> [ 53.724934] a0 9000000001d4b488 a1 0000000000000000 a2 0000000000000001 a3 0000000000000000 >>>> [ 53.768179] a4 9000000001d526c8 a5 90000001000a38f0 a6 000000000000002c a7 0000000000000000 >>>> [ 53.810751] t0 00000000000002b0 t1 0000000000000004 t2 900000000131c9c0 t3 fffffffffffffffa >>>> [ 53.853249] t4 0000000000000080 t5 90000001002ac190 t6 0000000000000004 t7 9000000001912d58 >>>> [ 53.895684] t8 0000000000000000 u0 90000000013141a0 s9 0000000000000028 s0 9000000001d512f0 >>>> [ 53.937633] s1 9000000001d51278 s2 90000001000a3798 s3 90000000019fc410 s4 9000000001d4b488 >>>> [ 53.979486] s5 9000000001d512f0 s6 90000000013141a0 s7 0000000000000078 s8 9000000001d4b450 >>>> [ 54.021175] ra: 90000000003320f4 kgdb_cpu_enter+0x534/0x640 >>>> [ 54.060150] ERA: 9000000000332100 kgdb_cpu_enter+0x540/0x640 >>>> [ 54.098347] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) >>>> [ 54.136621] PRMD: 0000000c (PPLV0 +PIE +PWE) >>>> [ 54.172192] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) >>>> [ 54.207838] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) >>>> [ 54.242503] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0) >>>> [ 54.277996] PRID: 0014c011 (Loongson-64bit, Loongson-3A5000-HV) >>>> [ 54.313544] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc2+ #4848 >>>> [ 54.430170] Stack : 0072617764726148 0000000000000000 9000000000223504 90000001000a0000 >>>> [ 54.472308] 9000000100073a90 9000000100073a98 0000000000000000 9000000100073bd8 >>>> [ 54.514413] 9000000100073bd0 9000000100073bd0 9000000100073a00 0000000000000001 >>>> [ 54.556018] 0000000000000001 9000000100073a98 99828271f24e961a 90000001002810c0 >>>> [ 54.596924] 0000000000000001 0000000000010003 0000000000000000 0000000000000001 >>>> [ 54.637115] ffff8000337cdb80 0000000000000001 0000000006360000 900000000131c9c0 >>>> [ 54.677049] 0000000000000000 0000000000000000 90000000017b4c98 9000000001912000 >>>> [ 54.716394] 9000000001912f68 9000000001913000 9000000001912f70 00000000000002b0 >>>> [ 54.754880] 90000000014a8840 0000000000000000 900000000022351c 0000000000000000 >>>> [ 54.792372] 00000000000002b0 000000000000000c 0000000000000000 0000000000071c1c >>>> [ 54.829302] ... >>>> [ 54.859163] Call Trace: >>>> [ 54.859165] [<900000000022351c>] show_stack+0x5c/0x180 >>>> [ 54.918298] [<90000000012f6100>] dump_stack_lvl+0x60/0x88 >>>> [ 54.949251] [<90000000012dd5d8>] rcu_dump_cpu_stacks+0xf0/0x148 >>>> [ 54.981116] [<90000000002d2fb8>] rcu_sched_clock_irq+0xb78/0xe60 >>>> [ 55.012744] [<90000000002e47cc>] update_process_times+0x6c/0xc0 >>>> [ 55.044169] [<90000000002f65d4>] tick_sched_timer+0x54/0x100 >>>> [ 55.075488] [<90000000002e5174>] __hrtimer_run_queues+0x154/0x240 >>>> [ 55.107347] [<90000000002e6288>] hrtimer_interrupt+0x108/0x2a0 >>>> [ 55.139112] [<9000000000226418>] constant_timer_interrupt+0x38/0x60 >>>> [ 55.170749] [<90000000002b3010>] __handle_irq_event_percpu+0x50/0x160 >>>> [ 55.203141] [<90000000002b3138>] handle_irq_event_percpu+0x18/0x80 >>>> [ 55.235064] [<90000000002b9d54>] handle_percpu_irq+0x54/0xa0 >>>> [ 55.266241] [<90000000002b2168>] generic_handle_domain_irq+0x28/0x40 >>>> [ 55.298466] [<9000000000aba95c>] handle_cpu_irq+0x5c/0xa0 >>>> [ 55.329749] [<90000000012f7270>] handle_loongarch_irq+0x30/0x60 >>>> [ 55.361476] [<90000000012f733c>] do_vint+0x9c/0x100 >>>> [ 55.391737] [<9000000000332100>] kgdb_cpu_enter+0x540/0x640 >>>> [ 55.422440] [<9000000000332b64>] kgdb_handle_exception+0x104/0x180 >>>> [ 55.452911] [<9000000000232478>] kgdb_loongarch_notify+0x38/0xa0 >>>> [ 55.481964] [<900000000026b4d4>] notify_die+0x94/0x100 >>>> [ 55.509184] [<90000000012f685c>] do_bp+0x21c/0x340 >>>> [ 55.562475] [<90000000003315b8>] kgdb_compiled_break+0x0/0x28 >>>> [ 55.590319] [<9000000000332e80>] kgdb_register_io_module+0x160/0x1c0 >>>> [ 55.618901] [<9000000000c0f514>] configure_kgdboc+0x154/0x1c0 >>>> [ 55.647034] [<9000000000c0f5e0>] kgdboc_probe+0x60/0x80 >>>> [ 55.674647] [<9000000000c96da8>] platform_probe+0x68/0x100 >>>> [ 55.702613] [<9000000000c938e0>] really_probe+0xc0/0x340 >>>> [ 55.730528] [<9000000000c93be4>] __driver_probe_device+0x84/0x140 >>>> [ 55.759615] [<9000000000c93cdc>] driver_probe_device+0x3c/0x120 >>>> [ 55.787990] [<9000000000c93e8c>] __device_attach_driver+0xcc/0x160 >>>> [ 55.817145] [<9000000000c91290>] bus_for_each_drv+0x90/0x100 >>>> [ 55.845654] [<9000000000c94328>] __device_attach+0xa8/0x1a0 >>>> [ 55.874145] [<9000000000c925f0>] bus_probe_device+0xb0/0xe0 >>>> [ 55.902572] [<9000000000c8ec7c>] device_add+0x65c/0x860 >>>> [ 55.930635] [<9000000000c96704>] platform_device_add+0x124/0x2c0 >>>> [ 55.959669] [<9000000001452b38>] init_kgdboc+0x58/0xa0 >>>> [ 55.987677] [<900000000022015c>] do_one_initcall+0x7c/0x1e0 >>>> [ 56.016134] [<9000000001420f1c>] kernel_init_freeable+0x22c/0x2a0 >>>> [ 56.045128] [<90000000012f923c>] kernel_init+0x20/0x124 >>>> >>>> Currently rcu_cpu_stall_reset() set rcu_state.jiffies_stall to one check >>>> period later, i.e. jiffies + rcu_jiffies_till_stall_check(). But jiffies >>>> is only updated in the timer interrupt, so when kgdb_cpu_enter() begins >>>> to run there may already be nearly one rcu check period after jiffies. >>>> Since all interrupts are disabled during kgdb_cpu_enter(), jiffies will >>>> not be updated. When kgdb_cpu_enter() returns, rcu_state.jiffies_stall >>>> maybe already gets timeout. >>>> >>>> We can set rcu_state.jiffies_stall to two rcu check periods later, e.g. >>>> jiffies + (rcu_jiffies_till_stall_check() * 2) in rcu_cpu_stall_reset() >>>> to avoid this problem. But this isn't a complete solution because kgdb >>>> may take a very long time in irq disabled context. >>>> >>>> Instead, update jiffies at the beginning of rcu_cpu_stall_reset() can >>>> solve all kinds of problems. >>> >>> Would it make sense for there to be a kgdb_cpu_exit()? In that case, >>> the stalls could simply be suppressed at the beginning of the debug >>> session and re-enabled upon exit, as is currently done for sysrq output >>> via rcu_sysrq_start() and rcu_sysrq_end(). >> Thank you for your advice, but that doesn't help. Because >> rcu_sysrq_start() and rcu_sysrq_end() try to suppress the warnings >> during sysrq, but kgdb already has no warnings during kgdb_cpu_enter() >> since it is executed in irq disabled context. Instead, this patch >> wants to suppress the warnings *after* kgdb_cpu_enter() due to a very >> old jiffies value. >> > > Hello, Huacai > > Is it possible to set the rcu_cpu_stall_suppress is true in > dbg_touch_watchdogs() > and reset the rcu_cpu_stall_suppress at the beginning and end of the > RCU grace period? This is possible but not the best: 1, kgdb is not the only caller of rcu_cpu_stall_reset(); 2, it is difficult to find the "end" to reset rcu_cpu_stall_suppress.
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
Reset rcu_cpu_stall_suppress in rcu_gp_init()/rcu_gp_cleanup() is still not so good to me, because it does a useless operation in most cases. Moreover, the rcu core is refactored again and again, something may be changed in future.
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
can we consider my old method [1]? https://lore.kernel.org/rcu/CAAhV-H7j9Y=VvRLm8thLw-EX1PGqBA9YfT4G1AN7ucYS=iP...
Of course we should set rcu_state.jiffies_stall large enough, so we can do like this:
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall,
- jiffies + rcu_jiffies_till_stall_check());
- jiffies + 300 * HZ);
}
300s is the largest timeout value, and I think 300s is enough here in practice.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns());
WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check()); }
This can update jiffies_stall without updating jiffies (but has the same effect).
Huacai
Thanks,
- Joel
Huacai
Thanks Zqiang
Huacai
Thanks Zqiang
> or set rcupdate.rcu_cpu_stall_suppress_at_boot=1 in bootargs can > suppress RCU stall > in booting. This is also possible, but it suppresses all kinds of stall warnings, which is not what we want.
Huacai > > > Thanks > Zqiang > > >> >> Huacai >> >>> >>> Thanx, Paul >>> >>>> Cc: stable@vger.kernel.org >>>> Fixes: a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") >>>> Reported-off-by: Binbin Zhou zhoubinbin@loongson.cn >>>> Signed-off-by: Huacai Chen chenhuacai@loongson.cn >>>> --- >>>> kernel/rcu/tree_stall.h | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >>>> index b10b8349bb2a..1c7b540985bf 100644 >>>> --- a/kernel/rcu/tree_stall.h >>>> +++ b/kernel/rcu/tree_stall.h >>>> @@ -153,6 +153,7 @@ static void panic_on_rcu_stall(void) >>>> */ >>>> void rcu_cpu_stall_reset(void) >>>> { >>>> + do_update_jiffies_64(ktime_get()); >>>> WRITE_ONCE(rcu_state.jiffies_stall, >>>> jiffies + rcu_jiffies_till_stall_check()); >>>> } >>>> -- >>>> 2.39.3 >>>>
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Thanks,
tglx --- --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{ + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); + + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); +} + /* * Must be called with interrupts disabled ! */
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update; +unsigned long tick_estimate_stale_jiffies(void) +{
- ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
- return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
Hi, Paul,
On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
> If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Huacai
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
On Thu, Aug 24, 2023 at 08:40:00PM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > If do_update_jiffies_64() cannot be used in NMI context, > > Can you not make the jiffies update conditional on whether it is > called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
> I dislike that.. Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
Thanx, Paul
Huacai
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
Hi, Paul,
On Thu, Aug 24, 2023 at 9:24 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 08:40:00PM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: >> > If do_update_jiffies_64() cannot be used in NMI context, >> >> Can you not make the jiffies update conditional on whether it is >> called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
>> I dislike that.. > Is this acceptable? > > void rcu_cpu_stall_reset(void) > { > unsigned long delta; > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > WRITE_ONCE(rcu_state.jiffies_stall, > jiffies + delta + rcu_jiffies_till_stall_check()); > } > > This can update jiffies_stall without updating jiffies (but has the > same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Huacai
Thanx, Paul
Huacai
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
On Thu, Aug 24, 2023 at 11:43:04PM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 9:24 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 08:40:00PM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote: > On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > >> > If do_update_jiffies_64() cannot be used in NMI context, > >> > >> Can you not make the jiffies update conditional on whether it is > >> called within NMI context? > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > region then you still dead lock. > > >> I dislike that.. > > Is this acceptable? > > > > void rcu_cpu_stall_reset(void) > > { > > unsigned long delta; > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > jiffies + delta + rcu_jiffies_till_stall_check()); > > } > > > > This can update jiffies_stall without updating jiffies (but has the > > same effect). > > Now you traded the potential dead lock on jiffies lock for a potential > live lock vs. tk_core.seq. Not really an improvement, right? > > The only way you can do the above is something like the incomplete and > uncompiled below. NMI safe and therefore livelock proof time interfaces > exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Sure, for example, there is also PowerPC XMON.
But this problem also is not RCU-specific. There are also hardlockups, softlockups, workqueue lockups, networking timeouts, and who knows what all else.
Plus, and again to Thomas's point, gdb breakpoints can happen anywhere. For example, immediately after RCU computes the RCU CPU stall time for a new grace period, and right before it stores it. The gdb callout updates rcu_state.jiffies_stall, but that update is overwritten with a stale value as soon as the system starts back up.
Low probabillity, to be sure, but there are quite a few places in the kernel right after a read from some timebase or another, and many (perhaps all) of these can see similar stale-time-use problems.
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
Thanx, Paul
Huacai
Thanx, Paul
Huacai
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
Huacai
Thanx, Paul
> Thanks, > > tglx > --- > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i > */ > static ktime_t last_jiffies_update; > > +unsigned long tick_estimate_stale_jiffies(void) > +{ > + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); > + > + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); > +} > + > /* > * Must be called with interrupts disabled ! > */ > >
Hi, Paul,
On Fri, Aug 25, 2023 at 2:28 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 11:43:04PM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 9:24 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 08:40:00PM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote: > On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote: > > On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > >> > If do_update_jiffies_64() cannot be used in NMI context, > > >> > > >> Can you not make the jiffies update conditional on whether it is > > >> called within NMI context? > > > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > > region then you still dead lock. > > > > >> I dislike that.. > > > Is this acceptable? > > > > > > void rcu_cpu_stall_reset(void) > > > { > > > unsigned long delta; > > > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > > jiffies + delta + rcu_jiffies_till_stall_check()); > > > } > > > > > > This can update jiffies_stall without updating jiffies (but has the > > > same effect). > > > > Now you traded the potential dead lock on jiffies lock for a potential > > live lock vs. tk_core.seq. Not really an improvement, right? > > > > The only way you can do the above is something like the incomplete and > > uncompiled below. NMI safe and therefore livelock proof time interfaces > > exist for a reason. > > Just for completeness, another approach, with its own advantages > and disadvantage, is to add something like ULONG_MAX/4 to > rcu_state.jiffies_stall, but also set a counter indicating that this > has been done. Then RCU's force-quiescent processing could decrement > that counter (if non-zero) and reset rcu_state.jiffies_stall when it > does reach zero. > > Setting the counter to three should cover most cases, but "live by the > heuristic, die by the heuristic". ;-) > > It would be good to have some indication when gdb exited, but things > like the gdb "next" command can make that "interesting" when applied to > a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Sure, for example, there is also PowerPC XMON.
But this problem also is not RCU-specific. There are also hardlockups, softlockups, workqueue lockups, networking timeouts, and who knows what all else.
Plus, and again to Thomas's point, gdb breakpoints can happen anywhere. For example, immediately after RCU computes the RCU CPU stall time for a new grace period, and right before it stores it. The gdb callout updates rcu_state.jiffies_stall, but that update is overwritten with a stale value as soon as the system starts back up.
Low probabillity, to be sure, but there are quite a few places in the kernel right after a read from some timebase or another, and many (perhaps all) of these can see similar stale-time-use problems.
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall, jiffies + 300*HZ); }
300s is the upper limit of "stall timeout" we can configure (RCU_CPU_STALL_TIMEOUT in kernel/rcu/Kconfig.debug), so it isn't just a "magic number". In practice, 300s is also enough for any normal kgdb operation. And compared to "resetting after a few calls to rcu_gp_fqs()", this simple solution means "automatically resetting after 300s".
If this is completely unacceptable, I prefer Thomas's tick_estimate_stale_jiffies() solution.
Huacai
Thanx, Paul
Huacai
Thanx, Paul
Huacai
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
Huacai
> > Thanx, Paul > > > Thanks, > > > > tglx > > --- > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i > > */ > > static ktime_t last_jiffies_update; > > > > +unsigned long tick_estimate_stale_jiffies(void) > > +{ > > + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); > > + > > + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); > > +} > > + > > /* > > * Must be called with interrupts disabled ! > > */ > > > >
On Fri, Aug 25, 2023 at 07:15:44PM +0800, Huacai Chen wrote:
Hi, Paul,
On Fri, Aug 25, 2023 at 2:28 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 11:43:04PM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 9:24 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 08:40:00PM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote: > Hi, Paul, > On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote: > > On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote: > > > On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > > > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > > >> > If do_update_jiffies_64() cannot be used in NMI context, > > > >> > > > >> Can you not make the jiffies update conditional on whether it is > > > >> called within NMI context? > > > > > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > > > region then you still dead lock. > > > > > > >> I dislike that.. > > > > Is this acceptable? > > > > > > > > void rcu_cpu_stall_reset(void) > > > > { > > > > unsigned long delta; > > > > > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > > > jiffies + delta + rcu_jiffies_till_stall_check()); > > > > } > > > > > > > > This can update jiffies_stall without updating jiffies (but has the > > > > same effect). > > > > > > Now you traded the potential dead lock on jiffies lock for a potential > > > live lock vs. tk_core.seq. Not really an improvement, right? > > > > > > The only way you can do the above is something like the incomplete and > > > uncompiled below. NMI safe and therefore livelock proof time interfaces > > > exist for a reason. > > > > Just for completeness, another approach, with its own advantages > > and disadvantage, is to add something like ULONG_MAX/4 to > > rcu_state.jiffies_stall, but also set a counter indicating that this > > has been done. Then RCU's force-quiescent processing could decrement > > that counter (if non-zero) and reset rcu_state.jiffies_stall when it > > does reach zero. > > > > Setting the counter to three should cover most cases, but "live by the > > heuristic, die by the heuristic". ;-) > > > > It would be good to have some indication when gdb exited, but things > > like the gdb "next" command can make that "interesting" when applied to > > a long-running function. > > The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may > make no much difference? The simplest way is adding 300*HZ, but Joel > dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Sure, for example, there is also PowerPC XMON.
But this problem also is not RCU-specific. There are also hardlockups, softlockups, workqueue lockups, networking timeouts, and who knows what all else.
Plus, and again to Thomas's point, gdb breakpoints can happen anywhere. For example, immediately after RCU computes the RCU CPU stall time for a new grace period, and right before it stores it. The gdb callout updates rcu_state.jiffies_stall, but that update is overwritten with a stale value as soon as the system starts back up.
Low probabillity, to be sure, but there are quite a few places in the kernel right after a read from some timebase or another, and many (perhaps all) of these can see similar stale-time-use problems.
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
thanks,
- Joel
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall, jiffies + 300*HZ); }
300s is the upper limit of "stall timeout" we can configure (RCU_CPU_STALL_TIMEOUT in kernel/rcu/Kconfig.debug), so it isn't just a "magic number". In practice, 300s is also enough for any normal kgdb operation. And compared to "resetting after a few calls to rcu_gp_fqs()", this simple solution means "automatically resetting after 300s".
If this is completely unacceptable, I prefer Thomas's tick_estimate_stale_jiffies() solution.
Huacai
Thanx, Paul
Huacai
Thanx, Paul
Huacai
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
> Huacai > > > > > Thanx, Paul > > > > > Thanks, > > > > > > tglx > > > --- > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i > > > */ > > > static ktime_t last_jiffies_update; > > > > > > +unsigned long tick_estimate_stale_jiffies(void) > > > +{ > > > + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); > > > + > > > + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); > > > +} > > > + > > > /* > > > * Must be called with interrupts disabled ! > > > */ > > > > > >
On Fri, Aug 25, 2023 at 7:28 PM Joel Fernandes joel@joelfernandes.org wrote:
On Fri, Aug 25, 2023 at 07:15:44PM +0800, Huacai Chen wrote:
Hi, Paul,
On Fri, Aug 25, 2023 at 2:28 AM Paul E. McKenney paulmck@kernel.org wrote:
[..]
> > > > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > > > >> > If do_update_jiffies_64() cannot be used in NMI context, > > > > >> > > > > >> Can you not make the jiffies update conditional on whether it is > > > > >> called within NMI context? > > > > > > > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > > > > region then you still dead lock. > > > > > > > > >> I dislike that.. > > > > > Is this acceptable? > > > > > > > > > > void rcu_cpu_stall_reset(void) > > > > > { > > > > > unsigned long delta; > > > > > > > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > > > > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > > > > jiffies + delta + rcu_jiffies_till_stall_check()); > > > > > } > > > > > > > > > > This can update jiffies_stall without updating jiffies (but has the > > > > > same effect). > > > > > > > > Now you traded the potential dead lock on jiffies lock for a potential > > > > live lock vs. tk_core.seq. Not really an improvement, right? > > > > > > > > The only way you can do the above is something like the incomplete and > > > > uncompiled below. NMI safe and therefore livelock proof time interfaces > > > > exist for a reason. > > > > > > Just for completeness, another approach, with its own advantages > > > and disadvantage, is to add something like ULONG_MAX/4 to > > > rcu_state.jiffies_stall, but also set a counter indicating that this > > > has been done. Then RCU's force-quiescent processing could decrement > > > that counter (if non-zero) and reset rcu_state.jiffies_stall when it > > > does reach zero. > > > > > > Setting the counter to three should cover most cases, but "live by the > > > heuristic, die by the heuristic". ;-) > > > > > > It would be good to have some indication when gdb exited, but things > > > like the gdb "next" command can make that "interesting" when applied to > > > a long-running function. > > > > The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may > > make no much difference? The simplest way is adding 300*HZ, but Joel > > dislikes that. > > I am not seeing the ULONG_MAX/2, so could you please point me to that > original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Sure, for example, there is also PowerPC XMON.
But this problem also is not RCU-specific. There are also hardlockups, softlockups, workqueue lockups, networking timeouts, and who knows what all else.
Plus, and again to Thomas's point, gdb breakpoints can happen anywhere. For example, immediately after RCU computes the RCU CPU stall time for a new grace period, and right before it stores it. The gdb callout updates rcu_state.jiffies_stall, but that update is overwritten with a stale value as soon as the system starts back up.
Low probabillity, to be sure, but there are quite a few places in the kernel right after a read from some timebase or another, and many (perhaps all) of these can see similar stale-time-use problems.
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
thanks,
- Joel
Hi, Joel,
On Sun, Aug 27, 2023 at 11:27 AM Joel Fernandes joel@joelfernandes.org wrote:
On Fri, Aug 25, 2023 at 7:28 PM Joel Fernandes joel@joelfernandes.org wrote:
On Fri, Aug 25, 2023 at 07:15:44PM +0800, Huacai Chen wrote:
Hi, Paul,
On Fri, Aug 25, 2023 at 2:28 AM Paul E. McKenney paulmck@kernel.org wrote:
[..]
> > > > > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > > > > >> > If do_update_jiffies_64() cannot be used in NMI context, > > > > > >> > > > > > >> Can you not make the jiffies update conditional on whether it is > > > > > >> called within NMI context? > > > > > > > > > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > > > > > region then you still dead lock. > > > > > > > > > > >> I dislike that.. > > > > > > Is this acceptable? > > > > > > > > > > > > void rcu_cpu_stall_reset(void) > > > > > > { > > > > > > unsigned long delta; > > > > > > > > > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > > > > > > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > > > > > jiffies + delta + rcu_jiffies_till_stall_check()); > > > > > > } > > > > > > > > > > > > This can update jiffies_stall without updating jiffies (but has the > > > > > > same effect). > > > > > > > > > > Now you traded the potential dead lock on jiffies lock for a potential > > > > > live lock vs. tk_core.seq. Not really an improvement, right? > > > > > > > > > > The only way you can do the above is something like the incomplete and > > > > > uncompiled below. NMI safe and therefore livelock proof time interfaces > > > > > exist for a reason. > > > > > > > > Just for completeness, another approach, with its own advantages > > > > and disadvantage, is to add something like ULONG_MAX/4 to > > > > rcu_state.jiffies_stall, but also set a counter indicating that this > > > > has been done. Then RCU's force-quiescent processing could decrement > > > > that counter (if non-zero) and reset rcu_state.jiffies_stall when it > > > > does reach zero. > > > > > > > > Setting the counter to three should cover most cases, but "live by the > > > > heuristic, die by the heuristic". ;-) > > > > > > > > It would be good to have some indication when gdb exited, but things > > > > like the gdb "next" command can make that "interesting" when applied to > > > > a long-running function. > > > > > > The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may > > > make no much difference? The simplest way is adding 300*HZ, but Joel > > > dislikes that. > > > > I am not seeing the ULONG_MAX/2, so could you please point me to that > > original code? > > Maybe I misunderstand something, I say the original code means code > before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall > detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Sure, for example, there is also PowerPC XMON.
But this problem also is not RCU-specific. There are also hardlockups, softlockups, workqueue lockups, networking timeouts, and who knows what all else.
Plus, and again to Thomas's point, gdb breakpoints can happen anywhere. For example, immediately after RCU computes the RCU CPU stall time for a new grace period, and right before it stores it. The gdb callout updates rcu_state.jiffies_stall, but that update is overwritten with a stale value as soon as the system starts back up.
Low probabillity, to be sure, but there are quite a few places in the kernel right after a read from some timebase or another, and many (perhaps all) of these can see similar stale-time-use problems.
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
Huacai
thanks,
- Joel
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Could you test the patch for the issue you are seeing and provide your Tested-by tag? Thanks,
- Joel
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Could you test the patch for the issue you are seeing and provide your Tested-by tag? Thanks,
Either way, testing would of course be very good! ;-)
Thanx, Paul
Hi, Paul and Joel,
On Mon, Aug 28, 2023 at 6:47 PM Paul E. McKenney paulmck@kernel.org wrote:
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
> The only way I know of to avoid these sorts of false positives is for > the user to manually suppress all timeouts (perhaps using a kernel-boot > parameter for your early-boot case), do the gdb work, and then unsuppress > all stalls. Even that won't work for networking, because the other > system's clock will be running throughout. > > In other words, from what I know now, there is no perfect solution. > Therefore, there are sharp limits to the complexity of any solution that > I will be willing to accept. I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Sorry, I made a mistake. Yes, there is no concurrent issue, and this approach probably works. But I have another problem: how to ensure that there is a jiffies update in three calls to rcu_gp_fqs()? Or in other word, is three also a magic number here?
And I rechecked the commit message of a80be428fbc1f1f3bc9e ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()"). I don't know why Sergey said that the original code disables stall-detection forever, in fact it only disables the detection in the current GP.
Huacai
Could you test the patch for the issue you are seeing and provide your Tested-by tag? Thanks,
Either way, testing would of course be very good! ;-)
Thanx, Paul
On Mon, Aug 28, 2023 at 07:30:43PM +0800, Huacai Chen wrote:
Hi, Paul and Joel,
On Mon, Aug 28, 2023 at 6:47 PM Paul E. McKenney paulmck@kernel.org wrote:
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
> > The only way I know of to avoid these sorts of false positives is for > > the user to manually suppress all timeouts (perhaps using a kernel-boot > > parameter for your early-boot case), do the gdb work, and then unsuppress > > all stalls. Even that won't work for networking, because the other > > system's clock will be running throughout. > > > > In other words, from what I know now, there is no perfect solution. > > Therefore, there are sharp limits to the complexity of any solution that > > I will be willing to accept. > I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Sorry, I made a mistake. Yes, there is no concurrent issue, and this approach probably works. But I have another problem: how to ensure that there is a jiffies update in three calls to rcu_gp_fqs()? Or in other word, is three also a magic number here?
Each of the three calls to rcu_gp_fqs() involves a wakeup of and a context switch to RCU's grace-period kthread, each of which should be sufficient to update jiffies if initially in an out-of-date-jiffies state. The three is to some extent magic, the idea being to avoid a situation where an currently running FQS reenables stall warnings immediately after gdb disables them.
Obviously, if your testing shows that some other value works better, please do let us know so that we can update! But we have to start somewhere.
And I rechecked the commit message of a80be428fbc1f1f3bc9e ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()"). I don't know why Sergey said that the original code disables stall-detection forever, in fact it only disables the detection in the current GP.
Well, it does disable stall detection forever in the case where the current grace period lasts forever, which if I recall correctly was the case that Sergey was encountering.
Thanx, Paul
Huacai
Could you test the patch for the issue you are seeing and provide your Tested-by tag? Thanks,
Either way, testing would of course be very good! ;-)
Thanx, Paul
On Mon, Aug 28, 2023 at 03:47:12AM -0700, Paul E. McKenney wrote:
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
> The only way I know of to avoid these sorts of false positives is for > the user to manually suppress all timeouts (perhaps using a kernel-boot > parameter for your early-boot case), do the gdb work, and then unsuppress > all stalls. Even that won't work for networking, because the other > system's clock will be running throughout. > > In other words, from what I know now, there is no perfect solution. > Therefore, there are sharp limits to the complexity of any solution that > I will be willing to accept. I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Thanks Paul and Huacai, also I was thinking in the event of such concurrent breakpoint stalling jiffies updates but GP thread / rcu_gp_fqs() chugging along, we could also make the patch more robust for such a situation as follows (diff on top of previous patch [1]). Thoughts?
Also if someone sets a breakpoint right after the "nr_fqs == 1" check, then they are kind of asking for it anyway since the GP kthread getting stalled is an actual reason for RCU stalls (infact rcutorture has a test mode for it even :P) and as such the false-positive may not be that false. ;-)
Btw apologies for forgetting to CC Thomas on [1] since he is involved in the timekeeping discussions. I relied on "git send-email" to populate the Cc list but did not add Cc: to the patch.
[1] https://lore.kernel.org/all/20230827025349.4161262-1-joel@joelfernandes.org/
---8<-----------------------
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9273f2318ea1..ffb165a2ef41 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1559,13 +1559,15 @@ static void rcu_gp_fqs(bool first_time) WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1);
WARN_ON_ONCE(nr_fqs > 3); - if (nr_fqs) { + /* Only countdown nr_fqs for stall purposes if jiffies moves. */ + if (nr_fqs && jiffies != READ_ONCE(rcu_state.jiffies_last_fqs)) { if (nr_fqs == 1) { WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check()); } WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs); } + WRITE_ONCE(rcu_state.jiffies_last_fqs, jiffies);
if (first_time) { /* Collect dyntick-idle snapshots. */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index e9821a8422db..72128e348fa1 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -386,6 +386,8 @@ struct rcu_state { /* in jiffies. */ unsigned long jiffies_stall; /* Time at which to check */ /* for CPU stalls. */ + unsigned long jiffies_last_fqs; /* jiffies value at last FQS. + to confirm jiffies moves. */ int nr_fqs_jiffies_stall; /* Number of fqs loops after * which read jiffies and set * jiffies_stall. Stall diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index a2fa6b22e248..0ddd22afbc3a 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -160,6 +160,7 @@ void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3); WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX); + WRITE_ONCE(rcu_state.jiffies_last_fqs, 0); }
////////////////////////////////////////////////////////////////////////////// @@ -177,6 +178,7 @@ static void record_gp_stall_check_time(void) smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq. WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0); WRITE_ONCE(rcu_state.jiffies_stall, j + j1); + WRITE_ONCE(rcu_state.jiffies_last_fqs, 0); rcu_state.jiffies_resched = j + j1 / 2; rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs); }
On Mon, Aug 28, 2023 at 01:33:48PM +0000, Joel Fernandes wrote:
On Mon, Aug 28, 2023 at 03:47:12AM -0700, Paul E. McKenney wrote:
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
> > The only way I know of to avoid these sorts of false positives is for > > the user to manually suppress all timeouts (perhaps using a kernel-boot > > parameter for your early-boot case), do the gdb work, and then unsuppress > > all stalls. Even that won't work for networking, because the other > > system's clock will be running throughout. > > > > In other words, from what I know now, there is no perfect solution. > > Therefore, there are sharp limits to the complexity of any solution that > > I will be willing to accept. > I think the simplest solution is (I hope Joel will not angry):
Not angry at all, just want to help. ;-). The problem is the 300*HZ solution will also effect the VM workloads which also do a similar reset. Allow me few days to see if I can take a shot at fixing it slightly differently. I am trying Paul's idea of setting jiffies at a later time. I think it is doable. I think the advantage of doing this is it will make stall detection more robust in this face of these gaps in jiffie update. And that solution does not even need us to rely on ktime (and all the issues that come with that).
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Thanks Paul and Huacai, also I was thinking in the event of such concurrent breakpoint stalling jiffies updates but GP thread / rcu_gp_fqs() chugging along, we could also make the patch more robust for such a situation as follows (diff on top of previous patch [1]). Thoughts?
Also if someone sets a breakpoint right after the "nr_fqs == 1" check, then they are kind of asking for it anyway since the GP kthread getting stalled is an actual reason for RCU stalls (infact rcutorture has a test mode for it even :P) and as such the false-positive may not be that false. ;-)
That would indeed be asking for it. But then again, they might have set a breakpoint elsewhere that had the unintended side-effect of catching the RCU grace-period kthread right at that point.
If that isn't something we are worried about, your original is fine. If it is something we are worried about, I recommend learning from my RCU CPU stall warning experiences and just using a lock. ;-)
Thanx, Paul
Btw apologies for forgetting to CC Thomas on [1] since he is involved in the timekeeping discussions. I relied on "git send-email" to populate the Cc list but did not add Cc: to the patch.
[1] https://lore.kernel.org/all/20230827025349.4161262-1-joel@joelfernandes.org/
---8<-----------------------
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9273f2318ea1..ffb165a2ef41 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1559,13 +1559,15 @@ static void rcu_gp_fqs(bool first_time) WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1); WARN_ON_ONCE(nr_fqs > 3);
- if (nr_fqs) {
- /* Only countdown nr_fqs for stall purposes if jiffies moves. */
- if (nr_fqs && jiffies != READ_ONCE(rcu_state.jiffies_last_fqs)) { if (nr_fqs == 1) { WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check()); } WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs); }
- WRITE_ONCE(rcu_state.jiffies_last_fqs, jiffies);
if (first_time) { /* Collect dyntick-idle snapshots. */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index e9821a8422db..72128e348fa1 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -386,6 +386,8 @@ struct rcu_state { /* in jiffies. */ unsigned long jiffies_stall; /* Time at which to check */ /* for CPU stalls. */
- unsigned long jiffies_last_fqs; /* jiffies value at last FQS.
int nr_fqs_jiffies_stall; /* Number of fqs loops after * which read jiffies and set * jiffies_stall. Stallto confirm jiffies moves. */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index a2fa6b22e248..0ddd22afbc3a 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -160,6 +160,7 @@ void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3); WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX);
- WRITE_ONCE(rcu_state.jiffies_last_fqs, 0);
} ////////////////////////////////////////////////////////////////////////////// @@ -177,6 +178,7 @@ static void record_gp_stall_check_time(void) smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq. WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0); WRITE_ONCE(rcu_state.jiffies_stall, j + j1);
- WRITE_ONCE(rcu_state.jiffies_last_fqs, 0); rcu_state.jiffies_resched = j + j1 / 2; rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs);
}
On Mon, Aug 28, 2023 at 10:02 PM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 28, 2023 at 01:33:48PM +0000, Joel Fernandes wrote:
On Mon, Aug 28, 2023 at 03:47:12AM -0700, Paul E. McKenney wrote:
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
> > > The only way I know of to avoid these sorts of false positives is for > > > the user to manually suppress all timeouts (perhaps using a kernel-boot > > > parameter for your early-boot case), do the gdb work, and then unsuppress > > > all stalls. Even that won't work for networking, because the other > > > system's clock will be running throughout. > > > > > > In other words, from what I know now, there is no perfect solution. > > > Therefore, there are sharp limits to the complexity of any solution that > > > I will be willing to accept. > > I think the simplest solution is (I hope Joel will not angry): > > Not angry at all, just want to help. ;-). The problem is the 300*HZ solution > will also effect the VM workloads which also do a similar reset. Allow me few > days to see if I can take a shot at fixing it slightly differently. I am > trying Paul's idea of setting jiffies at a later time. I think it is doable. > I think the advantage of doing this is it will make stall detection more > robust in this face of these gaps in jiffie update. And that solution does > not even need us to rely on ktime (and all the issues that come with that). >
I wrote a patch similar to Paul's idea and sent it out for review, the advantage being it purely is based on jiffies. Could you try it out and let me know?
If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Thanks Paul and Huacai, also I was thinking in the event of such concurrent breakpoint stalling jiffies updates but GP thread / rcu_gp_fqs() chugging along, we could also make the patch more robust for such a situation as follows (diff on top of previous patch [1]). Thoughts?
Also if someone sets a breakpoint right after the "nr_fqs == 1" check, then they are kind of asking for it anyway since the GP kthread getting stalled is an actual reason for RCU stalls (infact rcutorture has a test mode for it even :P) and as such the false-positive may not be that false. ;-)
That would indeed be asking for it. But then again, they might have set a breakpoint elsewhere that had the unintended side-effect of catching the RCU grace-period kthread right at that point.
If that isn't something we are worried about, your original is fine. If it is something we are worried about, I recommend learning from my RCU CPU stall warning experiences and just using a lock. ;-)
I also think the original patch should be OK, but I have another question: what will happen if the current GP ends before nr_fqs_jiffies_stall reaches zero?
Huacai
Thanx, Paul
Btw apologies for forgetting to CC Thomas on [1] since he is involved in the timekeeping discussions. I relied on "git send-email" to populate the Cc list but did not add Cc: to the patch.
[1] https://lore.kernel.org/all/20230827025349.4161262-1-joel@joelfernandes.org/
---8<-----------------------
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9273f2318ea1..ffb165a2ef41 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1559,13 +1559,15 @@ static void rcu_gp_fqs(bool first_time) WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1);
WARN_ON_ONCE(nr_fqs > 3);
if (nr_fqs) {
/* Only countdown nr_fqs for stall purposes if jiffies moves. */
if (nr_fqs && jiffies != READ_ONCE(rcu_state.jiffies_last_fqs)) { if (nr_fqs == 1) { WRITE_ONCE(rcu_state.jiffies_stall, jiffies + rcu_jiffies_till_stall_check()); } WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs); }
WRITE_ONCE(rcu_state.jiffies_last_fqs, jiffies); if (first_time) { /* Collect dyntick-idle snapshots. */
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index e9821a8422db..72128e348fa1 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -386,6 +386,8 @@ struct rcu_state { /* in jiffies. */ unsigned long jiffies_stall; /* Time at which to check */ /* for CPU stalls. */
unsigned long jiffies_last_fqs; /* jiffies value at last FQS.
to confirm jiffies moves. */ int nr_fqs_jiffies_stall; /* Number of fqs loops after * which read jiffies and set * jiffies_stall. Stall
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index a2fa6b22e248..0ddd22afbc3a 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -160,6 +160,7 @@ void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3); WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX);
WRITE_ONCE(rcu_state.jiffies_last_fqs, 0);
}
////////////////////////////////////////////////////////////////////////////// @@ -177,6 +178,7 @@ static void record_gp_stall_check_time(void) smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq. WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0); WRITE_ONCE(rcu_state.jiffies_stall, j + j1);
WRITE_ONCE(rcu_state.jiffies_last_fqs, 0); rcu_state.jiffies_resched = j + j1 / 2; rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs);
}
On Mon, Aug 28, 2023 at 10:37 AM Huacai Chen chenhuacai@kernel.org wrote:
On Mon, Aug 28, 2023 at 10:02 PM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 28, 2023 at 01:33:48PM +0000, Joel Fernandes wrote:
On Mon, Aug 28, 2023 at 03:47:12AM -0700, Paul E. McKenney wrote:
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..]
> > > > The only way I know of to avoid these sorts of false positives is for > > > > the user to manually suppress all timeouts (perhaps using a kernel-boot > > > > parameter for your early-boot case), do the gdb work, and then unsuppress > > > > all stalls. Even that won't work for networking, because the other > > > > system's clock will be running throughout. > > > > > > > > In other words, from what I know now, there is no perfect solution. > > > > Therefore, there are sharp limits to the complexity of any solution that > > > > I will be willing to accept. > > > I think the simplest solution is (I hope Joel will not angry): > > > > Not angry at all, just want to help. ;-). The problem is the 300*HZ solution > > will also effect the VM workloads which also do a similar reset. Allow me few > > days to see if I can take a shot at fixing it slightly differently. I am > > trying Paul's idea of setting jiffies at a later time. I think it is doable. > > I think the advantage of doing this is it will make stall detection more > > robust in this face of these gaps in jiffie update. And that solution does > > not even need us to rely on ktime (and all the issues that come with that). > > > > I wrote a patch similar to Paul's idea and sent it out for review, the > advantage being it purely is based on jiffies. Could you try it out > and let me know? If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
I have read your patch, maybe the counter (nr_fqs_jiffies_stall) should be atomic_t and we should use atomic operation to decrement its value. Because rcu_gp_fqs() can be run concurrently, and we may miss the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Thanks Paul and Huacai, also I was thinking in the event of such concurrent breakpoint stalling jiffies updates but GP thread / rcu_gp_fqs() chugging along, we could also make the patch more robust for such a situation as follows (diff on top of previous patch [1]). Thoughts?
Also if someone sets a breakpoint right after the "nr_fqs == 1" check, then they are kind of asking for it anyway since the GP kthread getting stalled is an actual reason for RCU stalls (infact rcutorture has a test mode for it even :P) and as such the false-positive may not be that false. ;-)
[Paul] That would indeed be asking for it. But then again, they might have set a breakpoint elsewhere that had the unintended side-effect of catching the RCU grace-period kthread right at that point.
If that isn't something we are worried about, your original is fine. If it is something we are worried about, I recommend learning from my RCU CPU stall warning experiences and just using a lock. ;-)
This sounds good to me.
[Huacai] I also think the original patch should be OK, but I have another question: what will happen if the current GP ends before nr_fqs_jiffies_stall reaches zero?
Nothing should happen. Stall detection only happens when a GP is in progress. If a new GP starts, it resets nr_fqs_jiffies_stall.
Or can you elaborate your concern more?
Thanks.
Hi, Joel,
On Mon, Aug 28, 2023 at 10:51 PM Joel Fernandes joel@joelfernandes.org wrote:
On Mon, Aug 28, 2023 at 10:37 AM Huacai Chen chenhuacai@kernel.org wrote:
On Mon, Aug 28, 2023 at 10:02 PM Paul E. McKenney paulmck@kernel.org wrote:
On Mon, Aug 28, 2023 at 01:33:48PM +0000, Joel Fernandes wrote:
On Mon, Aug 28, 2023 at 03:47:12AM -0700, Paul E. McKenney wrote:
On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen chenhuacai@kernel.org wrote: [..] > > > > > The only way I know of to avoid these sorts of false positives is for > > > > > the user to manually suppress all timeouts (perhaps using a kernel-boot > > > > > parameter for your early-boot case), do the gdb work, and then unsuppress > > > > > all stalls. Even that won't work for networking, because the other > > > > > system's clock will be running throughout. > > > > > > > > > > In other words, from what I know now, there is no perfect solution. > > > > > Therefore, there are sharp limits to the complexity of any solution that > > > > > I will be willing to accept. > > > > I think the simplest solution is (I hope Joel will not angry): > > > > > > Not angry at all, just want to help. ;-). The problem is the 300*HZ solution > > > will also effect the VM workloads which also do a similar reset. Allow me few > > > days to see if I can take a shot at fixing it slightly differently. I am > > > trying Paul's idea of setting jiffies at a later time. I think it is doable. > > > I think the advantage of doing this is it will make stall detection more > > > robust in this face of these gaps in jiffie update. And that solution does > > > not even need us to rely on ktime (and all the issues that come with that). > > > > > > > I wrote a patch similar to Paul's idea and sent it out for review, the > > advantage being it purely is based on jiffies. Could you try it out > > and let me know? > If you can cc my gmail chenhuacai@gmail.com, that could be better.
Sure, will do.
> I have read your patch, maybe the counter (nr_fqs_jiffies_stall) > should be atomic_t and we should use atomic operation to decrement its > value. Because rcu_gp_fqs() can be run concurrently, and we may miss > the (nr_fqs == 1) condition.
I don't think so. There is only 1 place where RMW operation happens and rcu_gp_fqs() is called only from the GP kthread. So a concurrent RMW (and hence a lost update) is not possible.
Huacai, is your concern that the gdb user might have created a script (for example, printing a variable or two, then automatically continuing), so that breakpoints could happen in quick successsion, such that the second breakpoint might run concurrently with rcu_gp_fqs()?
If this can really happen, the point that Joel makes is a good one, namely that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only once every few jiffies. And gdb breakpoints, even with scripting, should also be rather rare. So if this is an issue, a global lock should do the trick, perhaps even one of the existing locks in the rcu_state structure. The result should then be just as performant/scalable and a lot simpler than use of atomics.
Thanks Paul and Huacai, also I was thinking in the event of such concurrent breakpoint stalling jiffies updates but GP thread / rcu_gp_fqs() chugging along, we could also make the patch more robust for such a situation as follows (diff on top of previous patch [1]). Thoughts?
Also if someone sets a breakpoint right after the "nr_fqs == 1" check, then they are kind of asking for it anyway since the GP kthread getting stalled is an actual reason for RCU stalls (infact rcutorture has a test mode for it even :P) and as such the false-positive may not be that false. ;-)
[Paul] That would indeed be asking for it. But then again, they might have set a breakpoint elsewhere that had the unintended side-effect of catching the RCU grace-period kthread right at that point.
If that isn't something we are worried about, your original is fine. If it is something we are worried about, I recommend learning from my RCU CPU stall warning experiences and just using a lock. ;-)
This sounds good to me.
[Huacai] I also think the original patch should be OK, but I have another question: what will happen if the current GP ends before nr_fqs_jiffies_stall reaches zero?
Nothing should happen. Stall detection only happens when a GP is in progress. If a new GP starts, it resets nr_fqs_jiffies_stall.
Or can you elaborate your concern more?
OK, I will test your patch these days. Maybe putting nr_fqs_jiffies_stall before jiffies_force_qs is better, because I think putting an 'int' between two 'long' is wasting space. :)
Huacai
Thanks.
Hi Huacai,
On Mon, Aug 28, 2023 at 11:13 AM Huacai Chen chenhuacai@kernel.org wrote:
[...]
[Huacai] I also think the original patch should be OK, but I have another question: what will happen if the current GP ends before nr_fqs_jiffies_stall reaches zero?
Nothing should happen. Stall detection only happens when a GP is in progress. If a new GP starts, it resets nr_fqs_jiffies_stall.
Or can you elaborate your concern more?
OK, I will test your patch these days. Maybe putting nr_fqs_jiffies_stall before jiffies_force_qs is better, because I think putting an 'int' between two 'long' is wasting space. :)
That's a good point and I'll look into that.
Meanwhile I pushed the patch out to my 6.4 stable tree for testing on my fleet.
Ideally, I'd like to change the stall detection test in the rcutorture to actually fail rcutorture if stalls don't happen in time. But at least I verified this manually using rcutorture.
I should also add a documentation patch for stallwarn.rst to document the understandable sensitivity of RCU stall detection to jiffies updates (or lack thereof). Or if you have time, I'd appreciate support on such a patch (not mandatory but I thought it would not hurt to ask).
Looking forward to how your testing goes as well!
thanks,
- Joel
Hi, Joel,
On Tue, Aug 29, 2023 at 4:47 AM Joel Fernandes joel@joelfernandes.org wrote:
Hi Huacai,
On Mon, Aug 28, 2023 at 11:13 AM Huacai Chen chenhuacai@kernel.org wrote:
[...]
[Huacai] I also think the original patch should be OK, but I have another question: what will happen if the current GP ends before nr_fqs_jiffies_stall reaches zero?
Nothing should happen. Stall detection only happens when a GP is in progress. If a new GP starts, it resets nr_fqs_jiffies_stall.
Or can you elaborate your concern more?
OK, I will test your patch these days. Maybe putting nr_fqs_jiffies_stall before jiffies_force_qs is better, because I think putting an 'int' between two 'long' is wasting space. :)
That's a good point and I'll look into that.
Another point, is it better to replace ULONG_MAX with ULONG_MAX/4 as Paul suggested?
Meanwhile I pushed the patch out to my 6.4 stable tree for testing on my fleet.
Ideally, I'd like to change the stall detection test in the rcutorture to actually fail rcutorture if stalls don't happen in time. But at least I verified this manually using rcutorture.
I should also add a documentation patch for stallwarn.rst to document the understandable sensitivity of RCU stall detection to jiffies updates (or lack thereof). Or if you have time, I'd appreciate support on such a patch (not mandatory but I thought it would not hurt to ask).
Looking forward to how your testing goes as well!
I have tested, it works for KGDB.
Huacai
thanks,
- Joel
On Tue, Aug 29, 2023 at 12:08 AM Huacai Chen chenhuacai@kernel.org wrote:
Hi, Joel,
On Tue, Aug 29, 2023 at 4:47 AM Joel Fernandes joel@joelfernandes.org wrote:
Hi Huacai,
On Mon, Aug 28, 2023 at 11:13 AM Huacai Chen chenhuacai@kernel.org wrote:
[...]
[Huacai] I also think the original patch should be OK, but I have another question: what will happen if the current GP ends before nr_fqs_jiffies_stall reaches zero?
Nothing should happen. Stall detection only happens when a GP is in progress. If a new GP starts, it resets nr_fqs_jiffies_stall.
Or can you elaborate your concern more?
OK, I will test your patch these days. Maybe putting nr_fqs_jiffies_stall before jiffies_force_qs is better, because I think putting an 'int' between two 'long' is wasting space. :)
That's a good point and I'll look into that.
Another point, is it better to replace ULONG_MAX with ULONG_MAX/4 as Paul suggested?
I could do that but I don't feel too strongly about it. I will keep it at ULONG_MAX if it's OK with everyone.
Meanwhile I pushed the patch out to my 6.4 stable tree for testing on my fleet.
Ideally, I'd like to change the stall detection test in the rcutorture to actually fail rcutorture if stalls don't happen in time. But at least I verified this manually using rcutorture.
I should also add a documentation patch for stallwarn.rst to document the understandable sensitivity of RCU stall detection to jiffies updates (or lack thereof). Or if you have time, I'd appreciate support on such a patch (not mandatory but I thought it would not hurt to ask).
Looking forward to how your testing goes as well!
I have tested, it works for KGDB.
Thanks! If you don't mind, I will add your Tested-by tag to the patch and send it out soon. My tests also look good!
- Joel
On Tue, Aug 29, 2023 at 10:46 PM Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Aug 29, 2023 at 12:08 AM Huacai Chen chenhuacai@kernel.org wrote:
Hi, Joel,
On Tue, Aug 29, 2023 at 4:47 AM Joel Fernandes joel@joelfernandes.org wrote:
Hi Huacai,
On Mon, Aug 28, 2023 at 11:13 AM Huacai Chen chenhuacai@kernel.org wrote:
[...]
[Huacai] I also think the original patch should be OK, but I have another question: what will happen if the current GP ends before nr_fqs_jiffies_stall reaches zero?
Nothing should happen. Stall detection only happens when a GP is in progress. If a new GP starts, it resets nr_fqs_jiffies_stall.
Or can you elaborate your concern more?
OK, I will test your patch these days. Maybe putting nr_fqs_jiffies_stall before jiffies_force_qs is better, because I think putting an 'int' between two 'long' is wasting space. :)
That's a good point and I'll look into that.
Another point, is it better to replace ULONG_MAX with ULONG_MAX/4 as Paul suggested?
I could do that but I don't feel too strongly about it. I will keep it at ULONG_MAX if it's OK with everyone.
Meanwhile I pushed the patch out to my 6.4 stable tree for testing on my fleet.
Ideally, I'd like to change the stall detection test in the rcutorture to actually fail rcutorture if stalls don't happen in time. But at least I verified this manually using rcutorture.
I should also add a documentation patch for stallwarn.rst to document the understandable sensitivity of RCU stall detection to jiffies updates (or lack thereof). Or if you have time, I'd appreciate support on such a patch (not mandatory but I thought it would not hurt to ask).
Looking forward to how your testing goes as well!
I have tested, it works for KGDB.
Thanks! If you don't mind, I will add your Tested-by tag to the patch and send it out soon. My tests also look good!
You can add my Tested-by, but Reported-by should be "Binbin Zhou zhoubinbin@loongson.cn"
Huacai
- Joel
On Wed, Aug 30, 2023 at 12:25:56PM +0800, Huacai Chen wrote:
On Tue, Aug 29, 2023 at 10:46 PM Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Aug 29, 2023 at 12:08 AM Huacai Chen chenhuacai@kernel.org wrote:
Hi, Joel,
On Tue, Aug 29, 2023 at 4:47 AM Joel Fernandes joel@joelfernandes.org wrote:
Hi Huacai,
On Mon, Aug 28, 2023 at 11:13 AM Huacai Chen chenhuacai@kernel.org wrote:
[...]
> [Huacai] > I also think the original patch should be OK, but I have another > question: what will happen if the current GP ends before > nr_fqs_jiffies_stall reaches zero?
Nothing should happen. Stall detection only happens when a GP is in progress. If a new GP starts, it resets nr_fqs_jiffies_stall.
Or can you elaborate your concern more?
OK, I will test your patch these days. Maybe putting nr_fqs_jiffies_stall before jiffies_force_qs is better, because I think putting an 'int' between two 'long' is wasting space. :)
That's a good point and I'll look into that.
Another point, is it better to replace ULONG_MAX with ULONG_MAX/4 as Paul suggested?
I could do that but I don't feel too strongly about it. I will keep it at ULONG_MAX if it's OK with everyone.
Meanwhile I pushed the patch out to my 6.4 stable tree for testing on my fleet.
Ideally, I'd like to change the stall detection test in the rcutorture to actually fail rcutorture if stalls don't happen in time. But at least I verified this manually using rcutorture.
I should also add a documentation patch for stallwarn.rst to document the understandable sensitivity of RCU stall detection to jiffies updates (or lack thereof). Or if you have time, I'd appreciate support on such a patch (not mandatory but I thought it would not hurt to ask).
Looking forward to how your testing goes as well!
I have tested, it works for KGDB.
Thanks! If you don't mind, I will add your Tested-by tag to the patch and send it out soon. My tests also look good!
You can add my Tested-by, but Reported-by should be "Binbin Zhou zhoubinbin@loongson.cn"
Thanks, if/when Paul takes it, he could kindly change the Reported-by, or I could.
thanks,
- Joel
On Mon, Aug 28, 2023 at 11:12:49PM +0800, Huacai Chen wrote:
Maybe putting nr_fqs_jiffies_stall before jiffies_force_qs is better, because I think putting an 'int' between two 'long' is wasting space. :)
Though it is a decent suggestion, moving it before jiffies_force_qs does not make the structure smaller. Trying to outsmart the compiler seems not a good idea when it is not really packing the structure.
Besides, I am not too worried about 4-byte holes considering the structure is full of them and that ofl_lock is ____cacheline_internodealigned_in_smp which adds a giant 55 byte hole to the structure. That should keep you at night if you are worried about 4 byte holes.
And the total size of the structure on a 64-bit build is 3776 bytes so 4 bytes is about 1/10th of a percent. Puny.
So I'd rather leave nr_fqs_jiffies_stall where it is especially considering it is more readable where it is. :-)
thanks,
- Joel
Huacai
Thanks.
On Fri, Aug 25, 2023 at 07:15:44PM +0800, Huacai Chen wrote:
Hi, Paul,
On Fri, Aug 25, 2023 at 2:28 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 11:43:04PM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 9:24 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 08:40:00PM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote: > Hi, Paul, > On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote: > > On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote: > > > On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > > > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > > >> > If do_update_jiffies_64() cannot be used in NMI context, > > > >> > > > >> Can you not make the jiffies update conditional on whether it is > > > >> called within NMI context? > > > > > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > > > region then you still dead lock. > > > > > > >> I dislike that.. > > > > Is this acceptable? > > > > > > > > void rcu_cpu_stall_reset(void) > > > > { > > > > unsigned long delta; > > > > > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > > > jiffies + delta + rcu_jiffies_till_stall_check()); > > > > } > > > > > > > > This can update jiffies_stall without updating jiffies (but has the > > > > same effect). > > > > > > Now you traded the potential dead lock on jiffies lock for a potential > > > live lock vs. tk_core.seq. Not really an improvement, right? > > > > > > The only way you can do the above is something like the incomplete and > > > uncompiled below. NMI safe and therefore livelock proof time interfaces > > > exist for a reason. > > > > Just for completeness, another approach, with its own advantages > > and disadvantage, is to add something like ULONG_MAX/4 to > > rcu_state.jiffies_stall, but also set a counter indicating that this > > has been done. Then RCU's force-quiescent processing could decrement > > that counter (if non-zero) and reset rcu_state.jiffies_stall when it > > does reach zero. > > > > Setting the counter to three should cover most cases, but "live by the > > heuristic, die by the heuristic". ;-) > > > > It would be good to have some indication when gdb exited, but things > > like the gdb "next" command can make that "interesting" when applied to > > a long-running function. > > The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may > make no much difference? The simplest way is adding 300*HZ, but Joel > dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Sure, for example, there is also PowerPC XMON.
But this problem also is not RCU-specific. There are also hardlockups, softlockups, workqueue lockups, networking timeouts, and who knows what all else.
Plus, and again to Thomas's point, gdb breakpoints can happen anywhere. For example, immediately after RCU computes the RCU CPU stall time for a new grace period, and right before it stores it. The gdb callout updates rcu_state.jiffies_stall, but that update is overwritten with a stale value as soon as the system starts back up.
Low probabillity, to be sure, but there are quite a few places in the kernel right after a read from some timebase or another, and many (perhaps all) of these can see similar stale-time-use problems.
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall, jiffies + 300*HZ); }
300s is the upper limit of "stall timeout" we can configure (RCU_CPU_STALL_TIMEOUT in kernel/rcu/Kconfig.debug), so it isn't just a "magic number". In practice, 300s is also enough for any normal kgdb operation. And compared to "resetting after a few calls to rcu_gp_fqs()", this simple solution means "automatically resetting after 300s".
Please keep in mind that the long-ago choice of 300s did not take things like kernel debuggers into account. But that 300s limit still makes sense in the absence of things like kernel debuggers. So this code is what takes up the difference.
If this is completely unacceptable, I prefer Thomas's tick_estimate_stale_jiffies() solution.
Thomas's tick_estimate_stale_jiffies() does have its merits, but the advantage of ULONG_MAX/4 is that you don't have to care about jiffies being stale.
Thanx, Paul
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
> Huacai > > > > > Thanx, Paul > > > > > Thanks, > > > > > > tglx > > > --- > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i > > > */ > > > static ktime_t last_jiffies_update; > > > > > > +unsigned long tick_estimate_stale_jiffies(void) > > > +{ > > > + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); > > > + > > > + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); > > > +} > > > + > > > /* > > > * Must be called with interrupts disabled ! > > > */ > > > > > >
On Thu, Aug 24, 2023 at 04:40:42AM -0700, Paul E. McKenney wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
> If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
I like Paul's suggestion a lot except that if someone sets a breakpoint right when the jiffies is being reset, so then we have to come back to doing Thomas's suggestion.
So maybe a combination of Paul's and Thomas's suggestions (of using last_jiffies_update with the NMI-safe timestamp read) may work.
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
True, I was curious if rcu_cpu_stall_reset() can be called on a tickless kernel as well before jiffies gets a chance to update, in which case I think your suggestion of biasing the stall time and later resetting it would help a lot for such situations.
thanks,
- Joel
Thanx, Paul
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
On Thu, Aug 24, 2023 at 01:09:42PM +0000, Joel Fernandes wrote:
On Thu, Aug 24, 2023 at 04:40:42AM -0700, Paul E. McKenney wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > If do_update_jiffies_64() cannot be used in NMI context, > > Can you not make the jiffies update conditional on whether it is > called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
> I dislike that.. Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
I like Paul's suggestion a lot except that if someone sets a breakpoint right when the jiffies is being reset, so then we have to come back to doing Thomas's suggestion.
Please note that ULONG_MAX / 4 allows for jiffies not having been reset for more than 10 days on 32-bit systems and for many millions of years on 64-bit systems. ;-)
So maybe a combination of Paul's and Thomas's suggestions (of using last_jiffies_update with the NMI-safe timestamp read) may work.
I am absolutely not a fan of reworking all of the RCU CPU stall-warning code to use some other timebase, at least not without a very important reason to do so. Nothing mentioned in this thread even comes close to that level of importance.
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
True, I was curious if rcu_cpu_stall_reset() can be called on a tickless kernel as well before jiffies gets a chance to update, in which case I think your suggestion of biasing the stall time and later resetting it would help a lot for such situations.
What code path can possibly invoke rcu_cpu_stall_reset() after an extended full-system nohz_full time period without first doing at least one context switch on the CPU that invokes rcu_cpu_stall_reset()?
Thanx, Paul
thanks,
- Joel
Thanx, Paul
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
Hi, Paul,
On Thu, Aug 24, 2023 at 9:28 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 01:09:42PM +0000, Joel Fernandes wrote:
On Thu, Aug 24, 2023 at 04:40:42AM -0700, Paul E. McKenney wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: >> > If do_update_jiffies_64() cannot be used in NMI context, >> >> Can you not make the jiffies update conditional on whether it is >> called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
>> I dislike that.. > Is this acceptable? > > void rcu_cpu_stall_reset(void) > { > unsigned long delta; > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > WRITE_ONCE(rcu_state.jiffies_stall, > jiffies + delta + rcu_jiffies_till_stall_check()); > } > > This can update jiffies_stall without updating jiffies (but has the > same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
I like Paul's suggestion a lot except that if someone sets a breakpoint right when the jiffies is being reset, so then we have to come back to doing Thomas's suggestion.
Please note that ULONG_MAX / 4 allows for jiffies not having been reset for more than 10 days on 32-bit systems and for many millions of years on 64-bit systems. ;-)
So maybe a combination of Paul's and Thomas's suggestions (of using last_jiffies_update with the NMI-safe timestamp read) may work.
I am absolutely not a fan of reworking all of the RCU CPU stall-warning code to use some other timebase, at least not without a very important reason to do so. Nothing mentioned in this thread even comes close to that level of importance.
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
True, I was curious if rcu_cpu_stall_reset() can be called on a tickless kernel as well before jiffies gets a chance to update, in which case I think your suggestion of biasing the stall time and later resetting it would help a lot for such situations.
What code path can possibly invoke rcu_cpu_stall_reset() after an extended full-system nohz_full time period without first doing at least one context switch on the CPU that invokes rcu_cpu_stall_reset()?
In my commit message, the "KGDB initial breakpoint" means the automatic call to kgdb_initial_breakpoint() at system boot. In my test: 1, the "stall timeout" is 21s; 2, when I use "continue" to exit kgdb, the "total jiffies delayed time" is ~40s (of course it will cause stall warning); 3, the "irq disabled time" (nearly the same as execution time of kgdb_cpu_enter()) is ~12s; 4, this means the "jiffies delayed time" due to the tickless mechanism is ~28s.
So, at least in this case, the tickless mechanism contributes much for the jiffies delay.
Huacai
Thanx, Paul
thanks,
- Joel
Thanx, Paul
Huacai
Thanx, Paul
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
On Fri, Aug 25, 2023 at 12:03 AM Huacai Chen chenhuacai@kernel.org wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 9:28 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 01:09:42PM +0000, Joel Fernandes wrote:
On Thu, Aug 24, 2023 at 04:40:42AM -0700, Paul E. McKenney wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote: > On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > >> > If do_update_jiffies_64() cannot be used in NMI context, > >> > >> Can you not make the jiffies update conditional on whether it is > >> called within NMI context? > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > region then you still dead lock. > > >> I dislike that.. > > Is this acceptable? > > > > void rcu_cpu_stall_reset(void) > > { > > unsigned long delta; > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > jiffies + delta + rcu_jiffies_till_stall_check()); > > } > > > > This can update jiffies_stall without updating jiffies (but has the > > same effect). > > Now you traded the potential dead lock on jiffies lock for a potential > live lock vs. tk_core.seq. Not really an improvement, right? > > The only way you can do the above is something like the incomplete and > uncompiled below. NMI safe and therefore livelock proof time interfaces > exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
I like Paul's suggestion a lot except that if someone sets a breakpoint right when the jiffies is being reset, so then we have to come back to doing Thomas's suggestion.
Please note that ULONG_MAX / 4 allows for jiffies not having been reset for more than 10 days on 32-bit systems and for many millions of years on 64-bit systems. ;-)
So maybe a combination of Paul's and Thomas's suggestions (of using last_jiffies_update with the NMI-safe timestamp read) may work.
I am absolutely not a fan of reworking all of the RCU CPU stall-warning code to use some other timebase, at least not without a very important reason to do so. Nothing mentioned in this thread even comes close to that level of importance.
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
True, I was curious if rcu_cpu_stall_reset() can be called on a tickless kernel as well before jiffies gets a chance to update, in which case I think your suggestion of biasing the stall time and later resetting it would help a lot for such situations.
What code path can possibly invoke rcu_cpu_stall_reset() after an extended full-system nohz_full time period without first doing at least one context switch on the CPU that invokes rcu_cpu_stall_reset()?
In my commit message, the "KGDB initial breakpoint" means the automatic call to kgdb_initial_breakpoint() at system boot. In my test: 1, the "stall timeout" is 21s; 2, when I use "continue" to exit kgdb, the "total jiffies delayed time" is ~40s (of course it will cause stall warning); 3, the "irq disabled time" (nearly the same as execution time of kgdb_cpu_enter()) is ~12s; 4, this means the "jiffies delayed time" due to the tickless mechanism is ~28s.
So, at least in this case, the tickless mechanism contributes much for the jiffies delay.
I'm sorry here is a typo. The "irq disabled time" is ~28s and the "tickless caused jiffies delayed time" is ~12s in the above test.
Huacai
Thanx, Paul
thanks,
- Joel
Thanx, Paul
Huacai
Thanx, Paul
> Thanks, > > tglx > --- > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i > */ > static ktime_t last_jiffies_update; > > +unsigned long tick_estimate_stale_jiffies(void) > +{ > + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); > + > + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); > +} > + > /* > * Must be called with interrupts disabled ! > */ > >
On Fri, Aug 25, 2023 at 12:03:31AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 9:28 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 01:09:42PM +0000, Joel Fernandes wrote:
On Thu, Aug 24, 2023 at 04:40:42AM -0700, Paul E. McKenney wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote: > On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > >> > If do_update_jiffies_64() cannot be used in NMI context, > >> > >> Can you not make the jiffies update conditional on whether it is > >> called within NMI context? > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > region then you still dead lock. > > >> I dislike that.. > > Is this acceptable? > > > > void rcu_cpu_stall_reset(void) > > { > > unsigned long delta; > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > jiffies + delta + rcu_jiffies_till_stall_check()); > > } > > > > This can update jiffies_stall without updating jiffies (but has the > > same effect). > > Now you traded the potential dead lock on jiffies lock for a potential > live lock vs. tk_core.seq. Not really an improvement, right? > > The only way you can do the above is something like the incomplete and > uncompiled below. NMI safe and therefore livelock proof time interfaces > exist for a reason.
Just for completeness, another approach, with its own advantages and disadvantage, is to add something like ULONG_MAX/4 to rcu_state.jiffies_stall, but also set a counter indicating that this has been done. Then RCU's force-quiescent processing could decrement that counter (if non-zero) and reset rcu_state.jiffies_stall when it does reach zero.
Setting the counter to three should cover most cases, but "live by the heuristic, die by the heuristic". ;-)
It would be good to have some indication when gdb exited, but things like the gdb "next" command can make that "interesting" when applied to a long-running function.
The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may make no much difference? The simplest way is adding 300*HZ, but Joel dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
I like Paul's suggestion a lot except that if someone sets a breakpoint right when the jiffies is being reset, so then we have to come back to doing Thomas's suggestion.
Please note that ULONG_MAX / 4 allows for jiffies not having been reset for more than 10 days on 32-bit systems and for many millions of years on 64-bit systems. ;-)
So maybe a combination of Paul's and Thomas's suggestions (of using last_jiffies_update with the NMI-safe timestamp read) may work.
I am absolutely not a fan of reworking all of the RCU CPU stall-warning code to use some other timebase, at least not without a very important reason to do so. Nothing mentioned in this thread even comes close to that level of importance.
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
True, I was curious if rcu_cpu_stall_reset() can be called on a tickless kernel as well before jiffies gets a chance to update, in which case I think your suggestion of biasing the stall time and later resetting it would help a lot for such situations.
What code path can possibly invoke rcu_cpu_stall_reset() after an extended full-system nohz_full time period without first doing at least one context switch on the CPU that invokes rcu_cpu_stall_reset()?
In my commit message, the "KGDB initial breakpoint" means the automatic call to kgdb_initial_breakpoint() at system boot. In my test: 1, the "stall timeout" is 21s; 2, when I use "continue" to exit kgdb, the "total jiffies delayed time" is ~40s (of course it will cause stall warning); 3, the "irq disabled time" (nearly the same as execution time of kgdb_cpu_enter()) is ~12s; 4, this means the "jiffies delayed time" due to the tickless mechanism is ~28s.
So, at least in this case, the tickless mechanism contributes much for the jiffies delay.
Checking kgdb_initial_breakpoint() call points:
o From start_kernel(): This is before interrupts have been enabled on the boot CPU, and so is before the first RCU grace period has started. Also before any CPU is able to enter nohz_full mode. So this is presumably not the code path you are hitting.
o From opt_kgdb_wait(): This is an early_param() function, thus also invoked before interrupts have been enabled on the boot CPU.
o From kgdb_register_io_module(): I don't have a good handle on this set of code paths, but I am guessing that this is where you are hitting your breakpoint.
And in the post-boot cases, if there is no RCU grace period in flight, there will be no RCU CPU stall warning. Otherwise, the RCU grace-period kthread should have been running every few jiffies for the duration of the grace period. If it wasn't, for example, due to being starved by the nohz_full user processes running at real-time priority, you would get RCU CPU stall warnings anyway.
If the RCU grace-period kthread is running, then the system is not in pure system-wide nohz_full mode, and the jiffies updates should be happening. This means that even if you were running in this mode for many weeks on a 32-bit system, the jiffies value should be sane.
Which in turn should mean that this should be fixable by adding ULONG_MAX/4 to rcu_state.jiffies_stall in rcu_cpu_stall_reset(), along with a reset via rcu_jiffies_till_stall_check() after a few calls to rcu_gp_fqs().
Or am I missing something subtle here?
Thanx, Paul
Huacai
Thanx, Paul
thanks,
- Joel
Thanx, Paul
Huacai
Thanx, Paul
> Thanks, > > tglx > --- > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i > */ > static ktime_t last_jiffies_update; > > +unsigned long tick_estimate_stale_jiffies(void) > +{ > + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); > + > + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); > +} > + > /* > * Must be called with interrupts disabled ! > */ > >
Hi, Thomas,
On Thu, Aug 24, 2023 at 6:03 AM Thomas Gleixner tglx@linutronix.de wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
Thank you for your advice, now the latest proposal is here [1], this is very similar to your diff, please take a look.
[1] https://lore.kernel.org/rcu/CAAhV-H5mePbbF8Y3t-JfV+PNP8BEcjKtX4UokzL_vDzyw+2...
Huacai
On Thu, Aug 24 2023 at 10:47, Huacai Chen wrote:
On Thu, Aug 24, 2023 at 6:03 AM Thomas Gleixner tglx@linutronix.de wrote:
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
Thank you for your advice, now the latest proposal is here [1], this is very similar to your diff, please take a look.
Similar by some definition of similar.
Hello Thomas,
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
Yes, we had already discussed this that jiffies update is not possible from here. There are too many threads since different patch revisions were being reviewed in different threads.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Yes, I had already mentioned exactly this issue here of not using an NMI-safe interface: https://lore.kernel.org/all/CAEXW_YT+uw5JodtrqjY0B2xx0J8ukF=FAB9-p5rxgWobSU2... I like your suggestion of using last_jiffies_update though (which as you mentioned needs to be explored more).
There are too many threads which makes the discussion hard to follow. Huacai, it would be great if we can keep the discussions in the same thread (Say for example by passing options like --in-reply-to to "git send-email" command).
thanks,
- Joel
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update; +unsigned long tick_estimate_stale_jiffies(void) +{
- ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
- return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
On Thu, Aug 24, 2023 at 01:21:55PM +0000, Joel Fernandes wrote:
Hello Thomas,
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
Yes, we had already discussed this that jiffies update is not possible from here. There are too many threads since different patch revisions were being reviewed in different threads.
One of the nice properties of the jiffies counter is its trivially provable NMI safety ;-)
Thanx, Paul
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Yes, I had already mentioned exactly this issue here of not using an NMI-safe interface: https://lore.kernel.org/all/CAEXW_YT+uw5JodtrqjY0B2xx0J8ukF=FAB9-p5rxgWobSU2... I like your suggestion of using last_jiffies_update though (which as you mentioned needs to be explored more).
There are too many threads which makes the discussion hard to follow. Huacai, it would be great if we can keep the discussions in the same thread (Say for example by passing options like --in-reply-to to "git send-email" command).
thanks,
- Joel
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update; +unsigned long tick_estimate_stale_jiffies(void) +{
- ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
- return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
Hi, Joel,
On Thu, Aug 24, 2023 at 9:22 PM Joel Fernandes joel@joelfernandes.org wrote:
Hello Thomas,
On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote:
On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote:
On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote:
If do_update_jiffies_64() cannot be used in NMI context,
Can you not make the jiffies update conditional on whether it is called within NMI context?
Which solves what? If KGDB has a breakpoint in the jiffies lock held region then you still dead lock.
Yes, we had already discussed this that jiffies update is not possible from here. There are too many threads since different patch revisions were being reviewed in different threads.
I dislike that..
Is this acceptable?
void rcu_cpu_stall_reset(void) { unsigned long delta;
delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); WRITE_ONCE(rcu_state.jiffies_stall, jiffies + delta + rcu_jiffies_till_stall_check());
}
This can update jiffies_stall without updating jiffies (but has the same effect).
Now you traded the potential dead lock on jiffies lock for a potential live lock vs. tk_core.seq. Not really an improvement, right?
The only way you can do the above is something like the incomplete and uncompiled below. NMI safe and therefore livelock proof time interfaces exist for a reason.
Yes, I had already mentioned exactly this issue here of not using an NMI-safe interface: https://lore.kernel.org/all/CAEXW_YT+uw5JodtrqjY0B2xx0J8ukF=FAB9-p5rxgWobSU2... I like your suggestion of using last_jiffies_update though (which as you mentioned needs to be explored more).
There are too many threads which makes the discussion hard to follow. Huacai, it would be great if we can keep the discussions in the same thread (Say for example by passing options like --in-reply-to to "git send-email" command).
I will try my best. In the early time, not all of us were involved. And when I think we are all ready to accept the new solution, I sent a new patch. But unfortunately some others suggest different approaches which make the patch subject change again and again...
Now I tend to use Thomas's method by introducing and using tick_estimate_stale_jiffies(). But let me wait for some time before sending patches.
Huacai
thanks,
- Joel
Thanks,
tglx
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i */ static ktime_t last_jiffies_update;
+unsigned long tick_estimate_stale_jiffies(void) +{
ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update);
return delta < 0 ? 0 : div_s64(delta, TICK_NSEC);
+}
/*
- Must be called with interrupts disabled !
*/
On Wed, Aug 16 2023 at 18:06, Z. qiang wrote:
On Wed, Aug 16, 2023 at 1:09 PM Z qiang qiang.zhang1211@gmail.com wrote:
You can replace rcu_state.jiffies_stall update by setting rcu_cpu_stall_suppress in rcu_cpu_stall_reset(), and reset rcu_cpu_stall_suppress in rcu_gp_init() and rcu_gp_cleanup().
What's the advantage compared with updating jiffies? Updating jiffies seems more straight forward.
In do_update_jiffies_64(), need to acquire jiffies_lock raw spinlock, like you said, kgdb is not the only caller of rcu_cpu_stall_reset(), the rcu_cpu_stall_reset() maybe invoke in NMI (arch/x86/platform/uv/uv_nmi.c)
What's worse is that KGDB can set breakpoints pretty much everywhere and there is no guarantee that the jiffies lock is not held when a breakpoint hits.
Thanks,
tglx
linux-stable-mirror@lists.linaro.org