The flapping-irq detector still has a timebomb.
A pathological workload, or test script, can arm the spurious-irq timebomb described in 4f27c00bf80f ("Improve behaviour of spurious IRQ detect")
This leads to irqs being moved the much slower polled mode, despite the actual unhandled-irq rate being well under the 99.9k/100k threshold that the code appears to check.
How? - Queued completion handler, like nvme, servicing events as they appear in the queue, even if the irq corresponding to the event has not yet been seen.
- queues frequently empty, so seeing "spurious" irqs whenever the last events of a threaded handler's while (events_queued()) process_them(); ends with those events' irqs posted while thread was scanning. In this case the while() has consumed last event(s), so next handler says IRQ_NONE.
- In each run of "unhandled" irqs, exactly one IRQ_NONE response is promoted from IRQ_NONE to IRQ_HANDLED, by note_interrupt()'s SPURIOUS_DEFERRED logic.
- Any 2+ unhandled-irq runs will increment irqs_unhandled. The time_after() check in note_interrupt() resets irqs_unhandled to 1 after an idle period, but if irqs are never spaced more than HZ/10 apart, irqs_unhandled keeps growing.
- During processing of long completion queues, the non-threaded handlers will return IRQ_WAKE_THREAD, for potentially thousands of per-event irqs. These bypass note_interrupt()'s irq_count++ logic, so do not count as handled, and do not invoke the flapping-irq logic.
- When the _counted_ irq_count reaches the 100k threshold, it's possible for irqs_unhandled > 99.9k to force a move to polling mode, even though many millions of _WAKE_THREAD irqs have been handled without being counted.
Solution: include IRQ_WAKE_THREAD events in irq_count. Only when IRQ_NONE responses outweigh (IRQ_HANDLED + IRQ_WAKE_THREAD) by the old 99:1 ratio will an irq be moved to polling mode.
Fixes: 4f27c00bf80f ("Improve behaviour of spurious IRQ detect") Cc: stable@vger.kernel.org Signed-off-by: Pete Swain swine@google.com --- kernel/irq/spurious.c | 68 +++++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 34 deletions(-)
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c index 02b2daf07441..ac596c8dc4b1 100644 --- a/kernel/irq/spurious.c +++ b/kernel/irq/spurious.c @@ -321,44 +321,44 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret) */ if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) { desc->threads_handled_last |= SPURIOUS_DEFERRED; - return; - } - /* - * Check whether one of the threaded handlers - * returned IRQ_HANDLED since the last - * interrupt happened. - * - * For simplicity we just set bit 31, as it is - * set in threads_handled_last as well. So we - * avoid extra masking. And we really do not - * care about the high bits of the handled - * count. We just care about the count being - * different than the one we saw before. - */ - handled = atomic_read(&desc->threads_handled); - handled |= SPURIOUS_DEFERRED; - if (handled != desc->threads_handled_last) { - action_ret = IRQ_HANDLED; - /* - * Note: We keep the SPURIOUS_DEFERRED - * bit set. We are handling the - * previous invocation right now. - * Keep it for the current one, so the - * next hardware interrupt will - * account for it. - */ - desc->threads_handled_last = handled; } else { /* - * None of the threaded handlers felt - * responsible for the last interrupt + * Check whether one of the threaded handlers + * returned IRQ_HANDLED since the last + * interrupt happened. * - * We keep the SPURIOUS_DEFERRED bit - * set in threads_handled_last as we - * need to account for the current - * interrupt as well. + * For simplicity we just set bit 31, as it is + * set in threads_handled_last as well. So we + * avoid extra masking. And we really do not + * care about the high bits of the handled + * count. We just care about the count being + * different than the one we saw before. */ - action_ret = IRQ_NONE; + handled = atomic_read(&desc->threads_handled); + handled |= SPURIOUS_DEFERRED; + if (handled != desc->threads_handled_last) { + action_ret = IRQ_HANDLED; + /* + * Note: We keep the SPURIOUS_DEFERRED + * bit set. We are handling the + * previous invocation right now. + * Keep it for the current one, so the + * next hardware interrupt will + * account for it. + */ + desc->threads_handled_last = handled; + } else { + /* + * None of the threaded handlers felt + * responsible for the last interrupt + * + * We keep the SPURIOUS_DEFERRED bit + * set in threads_handled_last as we + * need to account for the current + * interrupt as well. + */ + action_ret = IRQ_NONE; + } } } else { /*