This patch series backports a bunch of patches related IRQ handling with respect to freeing the irq line while IRQ is in flight at CPU or at the hardware level. Recently we saw this issue in serial 8250 driver where the IRQ was being freed while the irq was in flight or not yet delivered to the CPU. As a result the irqchip was going into a wedged state and IRQ was not getting delivered to the cpu. These patches helped fixed the issue in 4.14 kernel. Let us know if more patches need backporting.
Lukas Wunner (2): genirq: Update code comments wrt recycled thread_mask genirq: Synchronize only with single thread on free_irq()
Thomas Gleixner (4): genirq: Delay deactivation in free_irq() genirq: Fix misleading synchronize_irq() documentation genirq: Add optional hardware synchronization for shutdown x86/ioapic: Implement irq_get_irqchip_state() callback
arch/x86/kernel/apic/io_apic.c | 46 ++++++++++++++ kernel/irq/autoprobe.c | 6 +- kernel/irq/chip.c | 6 ++ kernel/irq/cpuhotplug.c | 2 +- kernel/irq/internals.h | 5 ++ kernel/irq/manage.c | 106 ++++++++++++++++++++++----------- 6 files changed, 133 insertions(+), 38 deletions(-)
From: Lukas Wunner lukas@wunner.de
commit 836557bd58e5e65c05c73af9f6ebed885dbfccfc upstream.
Previously a race existed between __free_irq() and __setup_irq() wherein the thread_mask of a just removed action could be handed out to a newly added action and the freed irq thread would then tread on the oneshot mask bit of the newly added irq thread in irq_finalize_oneshot():
time | __free_irq() | raw_spin_lock_irqsave(&desc->lock, flags); | <remove action from linked list> | raw_spin_unlock_irqrestore(&desc->lock, flags); | | __setup_irq() | raw_spin_lock_irqsave(&desc->lock, flags); | <traverse linked list to determine oneshot mask bit> | raw_spin_unlock_irqrestore(&desc->lock, flags); | | irq_thread() of freed irq (__free_irq() waits in synchronize_irq()) | irq_thread_fn() | irq_finalize_oneshot() | raw_spin_lock_irq(&desc->lock); | desc->threads_oneshot &= ~action->thread_mask; | raw_spin_unlock_irq(&desc->lock); v
The race was known at least since 2012 when it was documented in a code comment by commit e04268b0effc ("genirq: Remove paranoid warnons and bogus fixups"). The race itself is harmless as nothing touches any of the potentially freed data after synchronize_irq().
In 2017 the race was close by commit 9114014cf4e6 ("genirq: Add mutex to irq desc to serialize request/free_irq()"), apparently inadvertantly so because the race is neither mentioned in the commit message nor was the code comment updated. Make up for that.
Signed-off-by: Lukas Wunner lukas@wunner.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Bjorn Helgaas bhelgaas@google.com Cc: Mika Westerberg mika.westerberg@linux.intel.com Cc: linux-pci@vger.kernel.org Link: https://lkml.kernel.org/r/32fc25aa35ecef4b2692f57687bb7fc2a57230e2.152982829... Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com --- kernel/irq/manage.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 914b43f2255b..cb35db00fdf4 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1030,10 +1030,7 @@ static int irq_thread(void *data) * This is the regular exit path. __free_irq() is stopping the * thread via kthread_stop() after calling * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the - * oneshot mask bit can be set. We cannot verify that as we - * cannot touch the oneshot mask at this point anymore as - * __setup_irq() might have given out currents thread_mask - * again. + * oneshot mask bit can be set. */ task_work_cancel(current, irq_thread_dtor); return 0; @@ -1257,7 +1254,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) /* * Protects against a concurrent __free_irq() call which might wait * for synchronize_irq() to complete without holding the optional - * chip bus lock and desc->lock. + * chip bus lock and desc->lock. Also protects against handing out + * a recycled oneshot thread_mask bit while it's still in use by + * its previous owner. */ mutex_lock(&desc->request_mutex);
From: Lukas Wunner lukas@wunner.de
commit 519cc8652b3a1d3d0a02257afbe9573ad644da26 upstream.
When pciehp is converted to threaded IRQ handling, removal of unplugged devices below a PCIe hotplug port happens synchronously in the IRQ thread. Removal of devices typically entails a call to free_irq() by their drivers.
If those devices share their IRQ with the hotplug port, __free_irq() deadlocks because it calls synchronize_irq() to wait for all hard IRQ handlers as well as all threads sharing the IRQ to finish.
Actually it's sufficient to wait only for the IRQ thread of the removed device, so call synchronize_hardirq() to wait for all hard IRQ handlers to finish, but no longer for any threads. Compensate by rearranging the control flow in irq_wait_for_interrupt() such that the device's thread is allowed to run one last time after kthread_stop() has been called.
kthread_stop() blocks until the IRQ thread has completed. On completion the IRQ thread clears its oneshot thread_mask bit. This is safe because __free_irq() holds the request_mutex, thereby preventing __setup_irq() from handing out the same oneshot thread_mask bit to a newly requested action.
Stack trace for posterity: INFO: task irq/17-pciehp:94 blocked for more than 120 seconds. schedule+0x28/0x80 synchronize_irq+0x6e/0xa0 __free_irq+0x15a/0x2b0 free_irq+0x33/0x70 pciehp_release_ctrl+0x98/0xb0 pcie_port_remove_service+0x2f/0x40 device_release_driver_internal+0x157/0x220 bus_remove_device+0xe2/0x150 device_del+0x124/0x340 device_unregister+0x16/0x60 remove_iter+0x1a/0x20 device_for_each_child+0x4b/0x90 pcie_port_device_remove+0x1e/0x30 pci_device_remove+0x36/0xb0 device_release_driver_internal+0x157/0x220 pci_stop_bus_device+0x7d/0xa0 pci_stop_bus_device+0x3d/0xa0 pci_stop_and_remove_bus_device+0xe/0x20 pciehp_unconfigure_device+0xb8/0x160 pciehp_disable_slot+0x84/0x130 pciehp_ist+0x158/0x190 irq_thread_fn+0x1b/0x50 irq_thread+0x143/0x1a0 kthread+0x111/0x130
Signed-off-by: Lukas Wunner lukas@wunner.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Bjorn Helgaas bhelgaas@google.com Cc: Mika Westerberg mika.westerberg@linux.intel.com Cc: linux-pci@vger.kernel.org Link: https://lkml.kernel.org/r/d72b41309f077c8d3bee6cc08ad3662d50b5d22a.152982829... Signed-off by: Rishabh Bhatnagar risbhat@amazon.com --- kernel/irq/manage.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index cb35db00fdf4..722aeaae92b1 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -791,9 +791,19 @@ static irqreturn_t irq_forced_secondary_handler(int irq, void *dev_id)
static int irq_wait_for_interrupt(struct irqaction *action) { - set_current_state(TASK_INTERRUPTIBLE); + for (;;) { + set_current_state(TASK_INTERRUPTIBLE);
- while (!kthread_should_stop()) { + if (kthread_should_stop()) { + /* may need to run one last time */ + if (test_and_clear_bit(IRQTF_RUNTHREAD, + &action->thread_flags)) { + __set_current_state(TASK_RUNNING); + return 0; + } + __set_current_state(TASK_RUNNING); + return -1; + }
if (test_and_clear_bit(IRQTF_RUNTHREAD, &action->thread_flags)) { @@ -801,10 +811,7 @@ static int irq_wait_for_interrupt(struct irqaction *action) return 0; } schedule(); - set_current_state(TASK_INTERRUPTIBLE); } - __set_current_state(TASK_RUNNING); - return -1; }
/* @@ -1029,7 +1036,7 @@ static int irq_thread(void *data) /* * This is the regular exit path. __free_irq() is stopping the * thread via kthread_stop() after calling - * synchronize_irq(). So neither IRQTF_RUNTHREAD nor the + * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the * oneshot mask bit can be set. */ task_work_cancel(current, irq_thread_dtor); @@ -1253,7 +1260,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
/* * Protects against a concurrent __free_irq() call which might wait - * for synchronize_irq() to complete without holding the optional + * for synchronize_hardirq() to complete without holding the optional * chip bus lock and desc->lock. Also protects against handing out * a recycled oneshot thread_mask bit while it's still in use by * its previous owner. @@ -1610,11 +1617,11 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) /* * Drop bus_lock here so the changes which were done in the chip * callbacks above are synced out to the irq chips which hang - * behind a slow bus (I2C, SPI) before calling synchronize_irq(). + * behind a slow bus (I2C, SPI) before calling synchronize_hardirq(). * * Aside of that the bus_lock can also be taken from the threaded * handler in irq_finalize_oneshot() which results in a deadlock - * because synchronize_irq() would wait forever for the thread to + * because kthread_stop() would wait forever for the thread to * complete, which is blocked on the bus lock. * * The still held desc->request_mutex() protects against a @@ -1626,7 +1633,7 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) unregister_handler_proc(irq, action);
/* Make sure it's not being used on another CPU: */ - synchronize_irq(irq); + synchronize_hardirq(irq);
#ifdef CONFIG_DEBUG_SHIRQ /* @@ -1644,6 +1651,12 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) } #endif
+ /* + * The action has already been removed above, but the thread writes + * its oneshot mask bit when it completes. Though request_mutex is + * held across this which prevents __setup_irq() from handing out + * the same bit to a newly requested action. + */ if (action->thread) { kthread_stop(action->thread); put_task_struct(action->thread);
From: Thomas Gleixner tglx@linutronix.de
commit 4001d8e8762f57d418b66e4e668601791900a1dd upstream.
When interrupts are shutdown, they are immediately deactivated in the irqdomain hierarchy. While this looks obviously correct there is a subtle issue:
There might be an interrupt in flight when free_irq() is invoking the shutdown. This is properly handled at the irq descriptor / primary handler level, but the deactivation might completely disable resources which are required to acknowledge the interrupt.
Split the shutdown code and deactivate the interrupt after synchronization in free_irq(). Fixup all other usage sites where this is not an issue to invoke the combined shutdown_and_deactivate() function instead.
This still might be an issue if the interrupt in flight servicing is delayed on a remote CPU beyond the invocation of synchronize_irq(), but that cannot be handled at that level and needs to be handled in the synchronize_irq() context.
Fixes: f8264e34965a ("irqdomain: Introduce new interfaces to support hierarchy irqdomains") Reported-by: Robert Hodaszi Robert.Hodaszi@digi.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Marc Zyngier marc.zyngier@arm.com Link: https://lkml.kernel.org/r/20190628111440.098196390@linutronix.de Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com --- kernel/irq/autoprobe.c | 6 +++--- kernel/irq/chip.c | 6 ++++++ kernel/irq/cpuhotplug.c | 2 +- kernel/irq/internals.h | 1 + kernel/irq/manage.c | 10 ++++++++++ 5 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/irq/autoprobe.c b/kernel/irq/autoprobe.c index befa671fba64..39ff9ab34e82 100644 --- a/kernel/irq/autoprobe.c +++ b/kernel/irq/autoprobe.c @@ -92,7 +92,7 @@ unsigned long probe_irq_on(void) /* It triggered already - consider it spurious. */ if (!(desc->istate & IRQS_WAITING)) { desc->istate &= ~IRQS_AUTODETECT; - irq_shutdown(desc); + irq_shutdown_and_deactivate(desc); } else if (i < 32) mask |= 1 << i; @@ -129,7 +129,7 @@ unsigned int probe_irq_mask(unsigned long val) mask |= 1 << i;
desc->istate &= ~IRQS_AUTODETECT; - irq_shutdown(desc); + irq_shutdown_and_deactivate(desc); } raw_spin_unlock_irq(&desc->lock); } @@ -171,7 +171,7 @@ int probe_irq_off(unsigned long val) nr_of_irqs++; } desc->istate &= ~IRQS_AUTODETECT; - irq_shutdown(desc); + irq_shutdown_and_deactivate(desc); } raw_spin_unlock_irq(&desc->lock); } diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 317fc759de76..1936d86db883 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -293,6 +293,12 @@ void irq_shutdown(struct irq_desc *desc) } irq_state_clr_started(desc); } +} + + +void irq_shutdown_and_deactivate(struct irq_desc *desc) +{ + irq_shutdown(desc); /* * This must be called even if the interrupt was never started up, * because the activation can happen before the interrupt is diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c index 9eb09aef0313..a9f931217a1b 100644 --- a/kernel/irq/cpuhotplug.c +++ b/kernel/irq/cpuhotplug.c @@ -115,7 +115,7 @@ static bool migrate_one_irq(struct irq_desc *desc) */ if (irqd_affinity_is_managed(d)) { irqd_set_managed_shutdown(d); - irq_shutdown(desc); + irq_shutdown_and_deactivate(desc); return false; } affinity = cpu_online_mask; diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index 5230c47fc43e..3de3821ab5fe 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -78,6 +78,7 @@ extern void __enable_irq(struct irq_desc *desc); extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
extern void irq_shutdown(struct irq_desc *desc); +extern void irq_shutdown_and_deactivate(struct irq_desc *desc); extern void irq_enable(struct irq_desc *desc); extern void irq_disable(struct irq_desc *desc); extern void irq_percpu_enable(struct irq_desc *desc, unsigned int cpu); diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 722aeaae92b1..4e4fc7879307 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -14,6 +14,7 @@ #include <linux/module.h> #include <linux/random.h> #include <linux/interrupt.h> +#include <linux/irqdomain.h> #include <linux/slab.h> #include <linux/sched.h> #include <linux/sched/rt.h> @@ -1604,6 +1605,7 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) /* If this was the last handler, shut down the IRQ line: */ if (!desc->action) { irq_settings_clr_disable_unlazy(desc); + /* Only shutdown. Deactivate after synchronize_hardirq() */ irq_shutdown(desc); }
@@ -1673,6 +1675,14 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) * require it to deallocate resources over the slow bus. */ chip_bus_lock(desc); + /* + * There is no interrupt on the fly anymore. Deactivate it + * completely. + */ + raw_spin_lock_irqsave(&desc->lock, flags); + irq_domain_deactivate_irq(&desc->irq_data); + raw_spin_unlock_irqrestore(&desc->lock, flags); + irq_release_resources(desc); chip_bus_sync_unlock(desc); irq_remove_timings(desc);
From: Thomas Gleixner tglx@linutronix.de
commit 1d21f2af8571c6a6a44e7c1911780614847b0253 upstream.
The function might sleep, so it cannot be called from interrupt context. Not even with care.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Marc Zyngier marc.zyngier@arm.com Link: https://lkml.kernel.org/r/20190628111440.189241552@linutronix.de Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com --- kernel/irq/manage.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 4e4fc7879307..a72d7ae0418b 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -96,7 +96,8 @@ EXPORT_SYMBOL(synchronize_hardirq); * to complete before returning. If you use this function while * holding a resource the IRQ handler may need you will deadlock. * - * This function may be called - with care - from IRQ context. + * Can only be called from preemptible code as it might sleep when + * an interrupt thread is associated to @irq. */ void synchronize_irq(unsigned int irq) {
From: Thomas Gleixner tglx@linutronix.de
commit 62e0468650c30f0298822c580f382b16328119f6 upstream.
free_irq() ensures that no hardware interrupt handler is executing on a different CPU before actually releasing resources and deactivating the interrupt completely in a domain hierarchy.
But that does not catch the case where the interrupt is on flight at the hardware level but not yet serviced by the target CPU. That creates an interesing race condition:
CPU 0 CPU 1 IRQ CHIP
interrupt is raised sent to CPU1 Unable to handle immediately (interrupts off, deep idle delay) mask() ... free() shutdown() synchronize_irq() release_resources() do_IRQ() -> resources are not available
That might be harmless and just trigger a spurious interrupt warning, but some interrupt chips might get into a wedged state.
Utilize the existing irq_get_irqchip_state() callback for the synchronization in free_irq().
synchronize_hardirq() is not using this mechanism as it might actually deadlock unter certain conditions, e.g. when called with interrupts disabled and the target CPU is the one on which the synchronization is invoked. synchronize_irq() uses it because that function cannot be called from non preemtible contexts as it might sleep.
No functional change intended and according to Marc the existing GIC implementations where the driver supports the callback should be able to cope with that core change. Famous last words.
Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode") Reported-by: Robert Hodaszi Robert.Hodaszi@digi.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Marc Zyngier marc.zyngier@arm.com Tested-by: Marc Zyngier marc.zyngier@arm.com Link: https://lkml.kernel.org/r/20190628111440.279463375@linutronix.de Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com --- kernel/irq/internals.h | 4 ++++ kernel/irq/manage.c | 53 +++++++++++++++++++++++++++--------------- 2 files changed, 38 insertions(+), 19 deletions(-)
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index 3de3821ab5fe..a3d7150bb295 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -93,6 +93,10 @@ static inline void irq_mark_irq(unsigned int irq) { } extern void irq_mark_irq(unsigned int irq); #endif
+extern int __irq_get_irqchip_state(struct irq_data *data, + enum irqchip_irq_state which, + bool *state); + extern void init_kstat_irqs(struct irq_desc *desc, int node, int nr);
irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags); diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index a72d7ae0418b..fb3ea75cc0fb 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -35,8 +35,9 @@ static int __init setup_forced_irqthreads(char *arg) early_param("threadirqs", setup_forced_irqthreads); #endif
-static void __synchronize_hardirq(struct irq_desc *desc) +static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip) { + struct irq_data *irqd = irq_desc_get_irq_data(desc); bool inprogress;
do { @@ -52,6 +53,20 @@ static void __synchronize_hardirq(struct irq_desc *desc) /* Ok, that indicated we're done: double-check carefully. */ raw_spin_lock_irqsave(&desc->lock, flags); inprogress = irqd_irq_inprogress(&desc->irq_data); + + /* + * If requested and supported, check at the chip whether it + * is in flight at the hardware level, i.e. already pending + * in a CPU and waiting for service and acknowledge. + */ + if (!inprogress && sync_chip) { + /* + * Ignore the return code. inprogress is only updated + * when the chip supports it. + */ + __irq_get_irqchip_state(irqd, IRQCHIP_STATE_ACTIVE, + &inprogress); + } raw_spin_unlock_irqrestore(&desc->lock, flags);
/* Oops, that failed? */ @@ -74,13 +89,18 @@ static void __synchronize_hardirq(struct irq_desc *desc) * Returns: false if a threaded handler is active. * * This function may be called - with care - from IRQ context. + * + * It does not check whether there is an interrupt in flight at the + * hardware level, but not serviced yet, as this might deadlock when + * called with interrupts disabled and the target CPU of the interrupt + * is the current CPU. */ bool synchronize_hardirq(unsigned int irq) { struct irq_desc *desc = irq_to_desc(irq);
if (desc) { - __synchronize_hardirq(desc); + __synchronize_hardirq(desc, false); return !atomic_read(&desc->threads_active); }
@@ -98,13 +118,17 @@ EXPORT_SYMBOL(synchronize_hardirq); * * Can only be called from preemptible code as it might sleep when * an interrupt thread is associated to @irq. + * + * It optionally makes sure (when the irq chip supports that method) + * that the interrupt is not pending in any CPU and waiting for + * service. */ void synchronize_irq(unsigned int irq) { struct irq_desc *desc = irq_to_desc(irq);
if (desc) { - __synchronize_hardirq(desc); + __synchronize_hardirq(desc, true); /* * We made sure that no hardirq handler is * running. Now verify that no threaded handlers are @@ -1635,8 +1659,12 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id)
unregister_handler_proc(irq, action);
- /* Make sure it's not being used on another CPU: */ - synchronize_hardirq(irq); + /* + * Make sure it's not being used on another CPU and if the chip + * supports it also make sure that there is no (not yet serviced) + * interrupt in flight at the hardware level. + */ + __synchronize_hardirq(desc, true);
#ifdef CONFIG_DEBUG_SHIRQ /* @@ -2187,7 +2215,6 @@ int irq_get_irqchip_state(unsigned int irq, enum irqchip_irq_state which, { struct irq_desc *desc; struct irq_data *data; - struct irq_chip *chip; unsigned long flags; int err = -EINVAL;
@@ -2197,19 +2224,7 @@ int irq_get_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
data = irq_desc_get_irq_data(desc);
- do { - chip = irq_data_get_irq_chip(data); - if (chip->irq_get_irqchip_state) - break; -#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY - data = data->parent_data; -#else - data = NULL; -#endif - } while (data); - - if (data) - err = chip->irq_get_irqchip_state(data, which, state); + err = __irq_get_irqchip_state(data, which, state);
irq_put_desc_busunlock(desc, flags); return err;
From: Thomas Gleixner tglx@linutronix.de
commit dfe0cf8b51b07e56ded571e3de0a4a9382517231 upstream.
When an interrupt is shut down in free_irq() there might be an inflight interrupt pending in the IO-APIC remote IRR which is not yet serviced. That means the interrupt has been sent to the target CPUs local APIC, but the target CPU is in a state which delays the servicing.
So free_irq() would proceed to free resources and to clear the vector because synchronize_hardirq() does not see an interrupt handler in progress.
That can trigger a spurious interrupt warning, which is harmless and just confuses users, but it also can leave the remote IRR in a stale state because once the handler is invoked the interrupt resources might be freed already and therefore acknowledgement is not possible anymore.
Implement the irq_get_irqchip_state() callback for the IO-APIC irq chip. The callback is invoked from free_irq() via __synchronize_hardirq(). Check the remote IRR bit of the interrupt and return 'in flight' if it is set and the interrupt is configured in level mode. For edge mode the remote IRR has no meaning.
As this is only meaningful for level triggered interrupts this won't cure the potential spurious interrupt warning for edge triggered interrupts, but the edge trigger case does not result in stale hardware state. This has to be addressed at the vector/interrupt entry level seperately.
Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode") Reported-by: Robert Hodaszi Robert.Hodaszi@digi.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Marc Zyngier marc.zyngier@arm.com Link: https://lkml.kernel.org/r/20190628111440.370295517@linutronix.de Signed-off-by: Rishabh Bhatnagar risbhat@amazon.com --- arch/x86/kernel/apic/io_apic.c | 46 ++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index de74bca6a8ff..7d1d393298d0 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1860,6 +1860,50 @@ static int ioapic_set_affinity(struct irq_data *irq_data, return ret; }
+/* + * Interrupt shutdown masks the ioapic pin, but the interrupt might already + * be in flight, but not yet serviced by the target CPU. That means + * __synchronize_hardirq() would return and claim that everything is calmed + * down. So free_irq() would proceed and deactivate the interrupt and free + * resources. + * + * Once the target CPU comes around to service it it will find a cleared + * vector and complain. While the spurious interrupt is harmless, the full + * release of resources might prevent the interrupt from being acknowledged + * which keeps the hardware in a weird state. + * + * Verify that the corresponding Remote-IRR bits are clear. + */ +static int ioapic_irq_get_chip_state(struct irq_data *irqd, + enum irqchip_irq_state which, + bool *state) +{ + struct mp_chip_data *mcd = irqd->chip_data; + struct IO_APIC_route_entry rentry; + struct irq_pin_list *p; + + if (which != IRQCHIP_STATE_ACTIVE) + return -EINVAL; + + *state = false; + raw_spin_lock(&ioapic_lock); + for_each_irq_pin(p, mcd->irq_2_pin) { + rentry = __ioapic_read_entry(p->apic, p->pin); + /* + * The remote IRR is only valid in level trigger mode. It's + * meaning is undefined for edge triggered interrupts and + * irrelevant because the IO-APIC treats them as fire and + * forget. + */ + if (rentry.irr && rentry.trigger) { + *state = true; + break; + } + } + raw_spin_unlock(&ioapic_lock); + return 0; +} + static struct irq_chip ioapic_chip __read_mostly = { .name = "IO-APIC", .irq_startup = startup_ioapic_irq, @@ -1869,6 +1913,7 @@ static struct irq_chip ioapic_chip __read_mostly = { .irq_eoi = ioapic_ack_level, .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, + .irq_get_irqchip_state = ioapic_irq_get_chip_state, .flags = IRQCHIP_SKIP_SET_WAKE, };
@@ -1881,6 +1926,7 @@ static struct irq_chip ioapic_ir_chip __read_mostly = { .irq_eoi = ioapic_ir_ack_level, .irq_set_affinity = ioapic_set_affinity, .irq_retrigger = irq_chip_retrigger_hierarchy, + .irq_get_irqchip_state = ioapic_irq_get_chip_state, .flags = IRQCHIP_SKIP_SET_WAKE, };
On Thu, Sep 29, 2022 at 09:06:45PM +0000, Rishabh Bhatnagar wrote:
This patch series backports a bunch of patches related IRQ handling with respect to freeing the irq line while IRQ is in flight at CPU or at the hardware level. Recently we saw this issue in serial 8250 driver where the IRQ was being freed while the irq was in flight or not yet delivered to the CPU. As a result the irqchip was going into a wedged state and IRQ was not getting delivered to the cpu. These patches helped fixed the issue in 4.14 kernel.
Why is the serial driver freeing an irq while the system is running? Ah, this could happen on a tty hangup, right?
Let us know if more patches need backporting.
What hardware platform were these patches tested on to verify they work properly? And why can't they move to 4.19 or newer if they really need this fix? What's preventing that?
As Amazon doesn't seem to be testing 4.14.y -rc releases, I find it odd that you all did this backport. Is this a kernel that you all care about?
thanks,
greg k-h
On 10/2/22, 8:30 AM, "Greg KH" gregkh@linuxfoundation.org wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Thu, Sep 29, 2022 at 09:06:45PM +0000, Rishabh Bhatnagar wrote: > This patch series backports a bunch of patches related IRQ handling > with respect to freeing the irq line while IRQ is in flight at CPU > or at the hardware level. > Recently we saw this issue in serial 8250 driver where the IRQ was being > freed while the irq was in flight or not yet delivered to the CPU. As a > result the irqchip was going into a wedged state and IRQ was not getting > delivered to the cpu. These patches helped fixed the issue in 4.14 > kernel.
Why is the serial driver freeing an irq while the system is running? Ah, this could happen on a tty hangup, right? Yes, exactly during tty hangup we see this sequence happening. It doesn't happen on every hangup but can be reproduced within 10 tries. We didn't see the same behavior in 5.10 and hence found these commits.
> Let us know if more patches need backporting.
What hardware platform were these patches tested on to verify they work properly? And why can't they move to 4.19 or newer if they really need this fix? What's preventing that?
As Amazon doesn't seem to be testing 4.14.y -rc releases, I find it odd that you all did this backport. Is this a kernel that you all care about?
These were tested on Intel x86_64 (Xeon Platinum 8259). Amazon linux 2 still supports 4.14 kernel for our customers, so we would need to fix that.
thanks,
greg k-h
(putting my @amazon.com hat on)
On Sun, 2022-10-02 at 17:30 +0200, Greg KH wrote:
On Thu, Sep 29, 2022 at 09:06:45PM +0000, Rishabh Bhatnagar wrote:
This patch series backports a bunch of patches related IRQ handling with respect to freeing the irq line while IRQ is in flight at CPU or at the hardware level. Recently we saw this issue in serial 8250 driver where the IRQ was being freed while the irq was in flight or not yet delivered to the CPU. As a result the irqchip was going into a wedged state and IRQ was not getting delivered to the cpu. These patches helped fixed the issue in 4.14 kernel.
Why is the serial driver freeing an irq while the system is running? Ah, this could happen on a tty hangup, right?
Right. Rishabh answered that separately.
Let us know if more patches need backporting.
What hardware platform were these patches tested on to verify they work properly? And why can't they move to 4.19 or newer if they really need this fix? What's preventing that?
As Amazon doesn't seem to be testing 4.14.y -rc releases, I find it odd that you all did this backport. Is this a kernel that you all care about?
These were tested on a collection of EC2 instances, virtual and metal I believe (Rishabh, please confirm).
Amazon Linux 2 runs 4.14 or 5.10. Unfortunately we still have to support customers running the former.
We'll be including these patches in our releases, we thought it would be nice to have them in -stable as well for the sake of whoever else might be still using this kernel. No huge deal if they don't.
As for testing -rc's, yes, we need to get better at that (and publish what we test). Point taken :-)
Cheers, Ben.
On 10/6/22 8:07 PM, Herrenschmidt, Benjamin wrote:
(putting my @amazon.com hat on)
On Sun, 2022-10-02 at 17:30 +0200, Greg KH wrote:
On Thu, Sep 29, 2022 at 09:06:45PM +0000, Rishabh Bhatnagar wrote:
This patch series backports a bunch of patches related IRQ handling with respect to freeing the irq line while IRQ is in flight at CPU or at the hardware level. Recently we saw this issue in serial 8250 driver where the IRQ was being freed while the irq was in flight or not yet delivered to the CPU. As a result the irqchip was going into a wedged state and IRQ was not getting delivered to the cpu. These patches helped fixed the issue in 4.14 kernel.
Why is the serial driver freeing an irq while the system is running? Ah, this could happen on a tty hangup, right?
Right. Rishabh answered that separately.
Let us know if more patches need backporting.
What hardware platform were these patches tested on to verify they work properly? And why can't they move to 4.19 or newer if they really need this fix? What's preventing that?
As Amazon doesn't seem to be testing 4.14.y -rc releases, I find it odd that you all did this backport. Is this a kernel that you all care about?
These were tested on a collection of EC2 instances, virtual and metal I believe (Rishabh, please confirm).
Yes these patches were tested on multiple virt/metal EC2 instances.
Amazon Linux 2 runs 4.14 or 5.10. Unfortunately we still have to support customers running the former.
We'll be including these patches in our releases, we thought it would be nice to have them in -stable as well for the sake of whoever else might be still using this kernel. No huge deal if they don't.
As for testing -rc's, yes, we need to get better at that (and publish what we test). Point taken :-)
Cheers, Ben.
On 10/9/22 10:50 AM, Bhatnagar, Rishabh wrote:
On 10/6/22 8:07 PM, Herrenschmidt, Benjamin wrote:
(putting my @amazon.com hat on)
On Sun, 2022-10-02 at 17:30 +0200, Greg KH wrote:
On Thu, Sep 29, 2022 at 09:06:45PM +0000, Rishabh Bhatnagar wrote:
This patch series backports a bunch of patches related IRQ handling with respect to freeing the irq line while IRQ is in flight at CPU or at the hardware level. Recently we saw this issue in serial 8250 driver where the IRQ was being freed while the irq was in flight or not yet delivered to the CPU. As a result the irqchip was going into a wedged state and IRQ was not getting delivered to the cpu. These patches helped fixed the issue in 4.14 kernel.
Why is the serial driver freeing an irq while the system is running? Ah, this could happen on a tty hangup, right?
Right. Rishabh answered that separately.
Let us know if more patches need backporting.
What hardware platform were these patches tested on to verify they work properly? And why can't they move to 4.19 or newer if they really need this fix? What's preventing that?
As Amazon doesn't seem to be testing 4.14.y -rc releases, I find it odd that you all did this backport. Is this a kernel that you all care about?
These were tested on a collection of EC2 instances, virtual and metal I believe (Rishabh, please confirm).
Yes these patches were tested on multiple virt/metal EC2 instances.
Amazon Linux 2 runs 4.14 or 5.10. Unfortunately we still have to support customers running the former.
We'll be including these patches in our releases, we thought it would be nice to have them in -stable as well for the sake of whoever else might be still using this kernel. No huge deal if they don't.
As for testing -rc's, yes, we need to get better at that (and publish what we test). Point taken :-)
Cheers, Ben.
Hi Greg
Let us know if you think it would be beneficial to take these backports for 4.14 stable. We can drop this patch set otherwise.
Thanks alot, Rishabh
On Fri, Oct 14, 2022 at 12:00:31PM -0700, Bhatnagar, Rishabh wrote:
On 10/9/22 10:50 AM, Bhatnagar, Rishabh wrote:
On 10/6/22 8:07 PM, Herrenschmidt, Benjamin wrote:
(putting my @amazon.com hat on)
On Sun, 2022-10-02 at 17:30 +0200, Greg KH wrote:
On Thu, Sep 29, 2022 at 09:06:45PM +0000, Rishabh Bhatnagar wrote:
This patch series backports a bunch of patches related IRQ handling with respect to freeing the irq line while IRQ is in flight at CPU or at the hardware level. Recently we saw this issue in serial 8250 driver where the IRQ was being freed while the irq was in flight or not yet delivered to the CPU. As a result the irqchip was going into a wedged state and IRQ was not getting delivered to the cpu. These patches helped fixed the issue in 4.14 kernel.
Why is the serial driver freeing an irq while the system is running? Ah, this could happen on a tty hangup, right?
Right. Rishabh answered that separately.
Let us know if more patches need backporting.
What hardware platform were these patches tested on to verify they work properly? And why can't they move to 4.19 or newer if they really need this fix? What's preventing that?
As Amazon doesn't seem to be testing 4.14.y -rc releases, I find it odd that you all did this backport. Is this a kernel that you all care about?
These were tested on a collection of EC2 instances, virtual and metal I believe (Rishabh, please confirm).
Yes these patches were tested on multiple virt/metal EC2 instances.
Amazon Linux 2 runs 4.14 or 5.10. Unfortunately we still have to support customers running the former.
We'll be including these patches in our releases, we thought it would be nice to have them in -stable as well for the sake of whoever else might be still using this kernel. No huge deal if they don't.
As for testing -rc's, yes, we need to get better at that (and publish what we test). Point taken :-)
Cheers, Ben.
Hi Greg
Let us know if you think it would be beneficial to take these backports for 4.14 stable.
Give me some time after -rc1 is out to review this then as we are swamped right now.
We can drop this patch set otherwise.
You can do whatever you want with your tree :)
thanks,
greg k-h
On Thu, Sep 29, 2022 at 09:06:45PM +0000, Rishabh Bhatnagar wrote:
This patch series backports a bunch of patches related IRQ handling with respect to freeing the irq line while IRQ is in flight at CPU or at the hardware level. Recently we saw this issue in serial 8250 driver where the IRQ was being freed while the irq was in flight or not yet delivered to the CPU. As a result the irqchip was going into a wedged state and IRQ was not getting delivered to the cpu. These patches helped fixed the issue in 4.14 kernel. Let us know if more patches need backporting.
Lukas Wunner (2): genirq: Update code comments wrt recycled thread_mask genirq: Synchronize only with single thread on free_irq()
Thomas Gleixner (4): genirq: Delay deactivation in free_irq() genirq: Fix misleading synchronize_irq() documentation genirq: Add optional hardware synchronization for shutdown x86/ioapic: Implement irq_get_irqchip_state() callback
arch/x86/kernel/apic/io_apic.c | 46 ++++++++++++++ kernel/irq/autoprobe.c | 6 +- kernel/irq/chip.c | 6 ++ kernel/irq/cpuhotplug.c | 2 +- kernel/irq/internals.h | 5 ++ kernel/irq/manage.c | 106 ++++++++++++++++++++++----------- 6 files changed, 133 insertions(+), 38 deletions(-)
A simple build test breaks with this series applied:
ld: kernel/irq/manage.o: in function `__synchronize_hardirq': manage.c:(.text+0x149): undefined reference to `__irq_get_irqchip_state' ld: kernel/irq/manage.o: in function `irq_get_irqchip_state': manage.c:(.text+0x5a2): undefined reference to `__irq_get_irqchip_state' make: *** [Makefile:1038: vmlinux] Error 1
How did you test this?
{sigh}
I'm dropping all of these from my queue. I think you need to just keep this in your device-specific tree as obviously this is not ready to be backported anywhere.
greg k-h
linux-stable-mirror@lists.linaro.org