Hi
On 07/31/2018 05:52 PM, Frederic Weisbecker wrote:
Before updating the full nohz tick or the idle time on IRQ exit, we check first if we are not in a nesting interrupt, whether the inner interrupt is a hard or a soft IRQ.
There is a historical reason for that: the dyntick idle mode used to reprogram the tick on IRQ exit, after softirq processing, and there was no point in doing that job in the outer nesting interrupt because the tick update will be performed through the end of the inner interrupt eventually, with even potential new timer updates.
One corner case could show up though: if an idle tick interrupts a softirq executing inline in the idle loop (through a call to local_bh_enable()) after we entered in dynticks mode, the IRQ won't reprogram the tick because it assumes the softirq executes on an inner IRQ-tail. As a result we might put the CPU in sleep mode with the tick completely stopped whereas a timer can still be enqueued. Indeed there is no tick reprogramming in local_bh_enable(). We probably asssumed there was no bh disabled section in idle, although there didn't seem to be debug code ensuring that.
Nowadays the nesting interrupt optimization still stands but only concern full dynticks. The tick is stopped on IRQ exit in full dynticks mode and we want to wait for the end of the inner IRQ to reprogramm the tick. But in_interrupt() doesn't make a difference between softirqs executing on IRQ tail and those executing inline. What was to be considered a corner case in dynticks-idle mode now becomes a serious opportunity for a bug in full dynticks mode: if a tick interrupts a task executing softirq inline, the tick reprogramming will be ignored and we may exit to userspace after local_bh_enable() with an enqueued timer that will never fire.
To fix this, simply keep reprogramming the tick if we are in a hardirq interrupting softirq. We can still figure out a way later to restore this optimization while excluding inline softirq processing.
Reported-by: Anna-Maria Gleixner anna-maria@linutronix.de Signed-off-by: Frederic Weisbecker frederic@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Tested-by: Anna-Maria Gleixner anna-maria@linutronix.de
kernel/softirq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c index 900dcfe..0980a81 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void) /* Make sure that timer wheel updates are propagated */ if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
if (!in_interrupt())
} #endifif (!in_irq()) tick_nohz_irq_exit();
This patch was back ported to the Stable linux-4.14.y and It causes regression - flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot (NFS boot):
[ 4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256 [ 4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256
the same is not reproducible with LKML - seems due to changes in tick-sched.c __tick_nohz_idle_enter()/tick_nohz_irq_exit().
I've generated backtrace from can_stop_idle_tick() (see below) and seems this patch makes tick_nohz_irq_exit() call unconditional in case of nested interrupt:
gic_handle_irq |- irq_exit |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1] |-__do_softirq <irqs enabled> |- gic_handle_irq() |- irq_exit() |- tick_irq_exit() if (!in_irq()) <-- My understanding is that this condition will be always true due to [1] tick_nohz_irq_exit(); |-__tick_nohz_idle_enter() |- can_stop_idle_tick()
Sry, not sure if my conclusion is right and how can it be fixed.
[ 3.842320] NOHZ: local_softirq_pending 40 in sirq 256 [ 3.847485] ------------[ cut here ]------------ [ 3.852133] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:915 __tick_nohz_idle_enter+0x4b8/0x568 [ 3.861393] Modules linked in: [ 3.864469] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.66-01768-gc26f664-dirty #311 [ 3.872506] Hardware name: Generic DRA74X (Flattened Device Tree) [ 3.878623] Backtrace: [ 3.881091] [<c010c050>] (dump_backtrace) from [<c010c320>] (show_stack+0x18/0x1c) [ 3.888696] r7:00000009 r6:600f0193 r5:00000000 r4:c0c5fca4 [ 3.894386] [<c010c308>] (show_stack) from [<c07ad028>] (dump_stack+0x8c/0xa0) [ 3.901645] [<c07acf9c>] (dump_stack) from [<c012e558>] (__warn+0xec/0x104) [ 3.908638] r7:00000009 r6:c0996d08 r5:00000000 r4:00000000 [ 3.914329] [<c012e46c>] (__warn) from [<c012e628>] (warn_slowpath_null+0x28/0x30) [ 3.921933] r9:00000000 r8:e4e1f7de r7:c0c8c1d8 r6:c0c65180 r5:00000000 r4:eed408e8 [ 3.929715] [<c012e600>] (warn_slowpath_null) from [<c01a9780>] (__tick_nohz_idle_enter+0x4b8/0x568) [ 3.938890] [<c01a92c8>] (__tick_nohz_idle_enter) from [<c01a9c74>] (tick_nohz_irq_exit+0x2c/0x30) [ 3.947890] r10:c0c01f50 r9:c0c00000 r8:ee008000 r7:00000000 r6:c0c01ee0 r5:00000048 [ 3.955752] r4:c0b62afc [ 3.958301] [<c01a9c48>] (tick_nohz_irq_exit) from [<c0133748>] (irq_exit+0xf4/0x144) [ 3.966173] [<c0133654>] (irq_exit) from [<c0181e6c>] (__handle_domain_irq+0x68/0xbc) [ 3.974043] [<c0181e04>] (__handle_domain_irq) from [<c0101508>] (gic_handle_irq+0x44/0x80) [ 3.982432] r9:c0c00000 r8:fa213000 r7:fa212000 r6:c0c01dd0 r5:fa21200c r4:c0c03ff0 [ 3.990211] [<c01014c4>] (gic_handle_irq) from [<c010cf4c>] (__irq_svc+0x6c/0xa8) [ 3.997726] Exception stack(0xc0c01dd0 to 0xc0c01e18) [ 4.002800] 1dc0: 00000000 c0c65180 00000000 00000000 [ 4.011017] 1de0: 00000280 00000013 c0c00000 00000000 ee008000 c0c00000 c0c01f50 c0c01e7c [ 4.019232] 1e00: c0c01e80 c0c01e20 c0133730 c01015dc 600f0113 ffffffff [ 4.025877] r9:c0c00000 r8:ee008000 r7:c0c01e04 r6:ffffffff r5:600f0113 r4:c01015dc [ 4.033659] [<c0101548>] (__do_softirq) from [<c0133730>] (irq_exit+0xdc/0x144) [ 4.041002] r10:c0c01f50 r9:c0c00000 r8:ee008000 r7:00000000 r6:00000000 r5:00000013 [ 4.048863] r4:c0b62afc [ 4.051414] [<c0133654>] (irq_exit) from [<c0181e6c>] (__handle_domain_irq+0x68/0xbc) [ 4.059280] [<c0181e04>] (__handle_domain_irq) from [<c0101508>] (gic_handle_irq+0x44/0x80) [ 4.067670] r9:c0c00000 r8:fa213000 r7:fa212000 r6:c0c01ee0 r5:fa21200c r4:c0c03ff0 [ 4.075448] [<c01014c4>] (gic_handle_irq) from [<c010cf4c>] (__irq_svc+0x6c/0xa8) [ 4.082963] Exception stack(0xc0c01ee0 to 0xc0c01f28) [ 4.088041] 1ee0: 00000001 00000000 fe600000 00000000 c0c00000 c0c03cc8 c0c03c68 c0b623b8 [ 4.096258] 1f00: 00000000 00000000 c0c01f50 c0c01f3c c0c01f1c c0c01f30 c0120f14 c0108ae4 [ 4.104469] 1f20: 600f0013 ffffffff [ 4.107975] r9:c0c00000 r8:00000000 r7:c0c01f14 r6:ffffffff r5:600f0013 r4:c0108ae4 [ 4.115760] [<c0108abc>] (arch_cpu_idle) from [<c07c6a14>] (default_idle_call+0x28/0x34) [ 4.123894] [<c07c69ec>] (default_idle_call) from [<c016e4a4>] (do_idle+0x180/0x214) [ 4.131676] [<c016e324>] (do_idle) from [<c016e7fc>] (cpu_startup_entry+0x20/0x24) [ 4.139283] r10:effff7c0 r9:c0b48a30 r8:c0c64000 r7:c0c03c40 r6:ffffffff r5:00000002 [ 4.147146] r4:000000be [ 4.149698] [<c016e7dc>] (cpu_startup_entry) from [<c07c12e4>] (rest_init+0xd8/0xdc) [ 4.157483] [<c07c120c>] (rest_init) from [<c0b00d8c>] (start_kernel+0x3cc/0x3d8) [ 4.164997] r5:c0c64000 r4:c0c6404c [ 4.168592] [<c0b009c0>] (start_kernel) from [<8000807c>] (0x8000807c) [ 4.175148] ---[ end trace 9c10a64bf81ad3fe ]---