From: Thomas Gleixner tglx@linutronix.de
commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream.
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests.
X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS.
Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then:
- Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver
This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on.
Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations.
For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design.
Remove the X86 workaround as it is not longer required.
Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi alisaidi@amazon.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Ali Saidi alisaidi@amazon.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de [fllinden@amazon.com - 4.14 never had the x86 workaround, so skip x86 changes] Signed-off-by: Frank van der Linden fllinden@amazon.com --- kernel/irq/manage.c | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 5277949e82e0..ce6fba446814 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -168,9 +168,9 @@ void irq_set_thread_affinity(struct irq_desc *desc) set_bit(IRQTF_AFFINITY, &action->thread_flags); }
+#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK static void irq_validate_effective_affinity(struct irq_data *data) { -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK const struct cpumask *m = irq_data_get_effective_affinity_mask(data); struct irq_chip *chip = irq_data_get_irq_chip(data);
@@ -178,9 +178,19 @@ static void irq_validate_effective_affinity(struct irq_data *data) return; pr_warn_once("irq_chip %s did not update eff. affinity mask of irq %u\n", chip->name, data->irq); -#endif }
+static inline void irq_init_effective_affinity(struct irq_data *data, + const struct cpumask *mask) +{ + cpumask_copy(irq_data_get_effective_affinity_mask(data), mask); +} +#else +static inline void irq_validate_effective_affinity(struct irq_data *data) { } +static inline void irq_init_effective_affinity(struct irq_data *data, + const struct cpumask *mask) { } +#endif + int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force) { @@ -205,6 +215,26 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, return ret; }
+static bool irq_set_affinity_deactivated(struct irq_data *data, + const struct cpumask *mask, bool force) +{ + struct irq_desc *desc = irq_data_to_desc(data); + + /* + * If the interrupt is not yet activated, just store the affinity + * mask and do not call the chip driver at all. On activation the + * driver has to make sure anyway that the interrupt is in a + * useable state so startup works. + */ + if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || irqd_is_activated(data)) + return false; + + cpumask_copy(desc->irq_common_data.affinity, mask); + irq_init_effective_affinity(data, mask); + irqd_set(data, IRQD_AFFINITY_SET); + return true; +} + int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, bool force) { @@ -215,6 +245,9 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, if (!chip || !chip->irq_set_affinity) return -EINVAL;
+ if (irq_set_affinity_deactivated(data, mask, force)) + return 0; + if (irq_can_move_pcntxt(data)) { ret = irq_do_set_affinity(data, mask, force); } else {
From: Thomas Gleixner tglx@linutronix.de
commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream.
John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis:
"It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING.
Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in.
Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ...
Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping john@metanate.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Marc Zyngier maz@kernel.org Acked-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de [fllinden@amazon.com - backported to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com --- arch/x86/kernel/apic/vector.c | 4 ++++ drivers/irqchip/irq-gic-v3-its.c | 5 ++++- include/linux/irq.h | 12 ++++++++++++ kernel/irq/manage.c | 6 +++++- 4 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index b958082c74a7..0e50f8240d41 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -368,6 +368,10 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq, irq_data->chip = &lapic_controller; irq_data->chip_data = data; irq_data->hwirq = virq + i; + + /* Don't invoke affinity setter on deactivated interrupts */ + irqd_set_affinity_on_activate(irq_data); + err = assign_irq_vector_policy(virq + i, node, data, info, irq_data); if (err) { diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index d7963dbdb091..a32714f0af64 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2435,6 +2435,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + struct irq_data *irqd; irq_hw_number_t hwirq; int err; int i; @@ -2450,7 +2451,9 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i, &its_irq_chip, its_dev); - irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(virq + i))); + irqd = irq_get_irq_data(virq + i); + irqd_set_single_target(irqd); + irqd_set_affinity_on_activate(irqd); pr_debug("ID:%d pID:%d vID:%d\n", (int)(hwirq + i - its_dev->event_map.lpi_base), (int)(hwirq + i), virq + i); diff --git a/include/linux/irq.h b/include/linux/irq.h index e508bad309b6..881772e948ca 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -212,6 +212,8 @@ struct irq_data { * mask. Applies only to affinity managed irqs. * IRQD_SINGLE_TARGET - IRQ allows only a single affinity target * IRQD_DEFAULT_TRIGGER_SET - Expected trigger already been set + * IRQD_AFFINITY_ON_ACTIVATE - Affinity is set on activation. Don't call + * irq_chip::irq_set_affinity() when deactivated. */ enum { IRQD_TRIGGER_MASK = 0xf, @@ -233,6 +235,7 @@ enum { IRQD_MANAGED_SHUTDOWN = (1 << 23), IRQD_SINGLE_TARGET = (1 << 24), IRQD_DEFAULT_TRIGGER_SET = (1 << 25), + IRQD_AFFINITY_ON_ACTIVATE = (1 << 29), };
#define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors) @@ -377,6 +380,15 @@ static inline bool irqd_is_managed_and_shutdown(struct irq_data *d) return __irqd_to_state(d) & IRQD_MANAGED_SHUTDOWN; }
+static inline void irqd_set_affinity_on_activate(struct irq_data *d) +{ + __irqd_to_state(d) |= IRQD_AFFINITY_ON_ACTIVATE; +} + +static inline bool irqd_affinity_on_activate(struct irq_data *d) +{ + return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE; +} #undef __irqd_to_state
static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index ce6fba446814..3193be58805c 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -221,12 +221,16 @@ static bool irq_set_affinity_deactivated(struct irq_data *data, struct irq_desc *desc = irq_data_to_desc(data);
/* + * Handle irq chips which can handle affinity only in activated + * state correctly + * * If the interrupt is not yet activated, just store the affinity * mask and do not call the chip driver at all. On activation the * driver has to make sure anyway that the interrupt is in a * useable state so startup works. */ - if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || irqd_is_activated(data)) + if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || + irqd_is_activated(data) || !irqd_affinity_on_activate(data)) return false;
cpumask_copy(desc->irq_common_data.affinity, mask);
Forgot to Cc these to Thomas - let me forward them to ask his input.
Thomas - do these look good to you? 4.14 is a little different in the sense that it didn't have the x86 set-affinity-before-activate workaround. It did have the same failure scenario that was seen on arm64, so the patches are necessary.
The x86 irq allocation loop has some differences for 4.14 x86 too, but I don't see any problems there.
I tested this backport by running it through our regression tests and the reproducer case for the original problem.
Thanks,
- Frank
Frank van der Linden fllinden@amazon.com writes:
Forgot to Cc these to Thomas - let me forward them to ask his input.
Thomas - do these look good to you? 4.14 is a little different in the sense that it didn't have the x86 set-affinity-before-activate workaround. It did have the same failure scenario that was seen on arm64, so the patches are necessary.
The x86 irq allocation loop has some differences for 4.14 x86 too, but I don't see any problems there.
Me neither.
On Mon, Aug 10, 2020 at 08:11:43PM +0000, Frank van der Linden wrote:
From: Thomas Gleixner tglx@linutronix.de
commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream.
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests.
X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS.
Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then:
- Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver
This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on.
Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations.
For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design.
Remove the X86 workaround as it is not longer required.
Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Why is this needed for 4.14.y, when this "Fixes:" tag says a commit that showed up in 4.15?
thanks,
greg k-h
On Wed, Aug 19, 2020 at 11:55:37AM +0200, Greg KH wrote:
On Mon, Aug 10, 2020 at 08:11:43PM +0000, Frank van der Linden wrote:
From: Thomas Gleixner tglx@linutronix.de
commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream.
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests.
X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS.
Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then:
- Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver
This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on.
Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations.
For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design.
Remove the X86 workaround as it is not longer required.
Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Why is this needed for 4.14.y, when this "Fixes:" tag says a commit that showed up in 4.15?
The issue of set_affinity being called on inactive interrupts, and it not being handled correctly, was already there in 4.14. The commit in the Fixes: tag is the x86 workaround, which came in in 4.15, and is no longer needed.
So we still need the backport of the genirq changes, to fix it in 4.14. This is mostly for arm64 (gicv3), where this issue results in interrupts getting messed up.
- Frank
On Wed, Aug 19, 2020 at 04:05:51PM +0000, Frank van der Linden wrote:
On Wed, Aug 19, 2020 at 11:55:37AM +0200, Greg KH wrote:
On Mon, Aug 10, 2020 at 08:11:43PM +0000, Frank van der Linden wrote:
From: Thomas Gleixner tglx@linutronix.de
commit baedb87d1b53532f81b4bd0387f83b05d4f7eb9a upstream.
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests.
X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS.
Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then:
- Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver
This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on.
Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations.
For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design.
Remove the X86 workaround as it is not longer required.
Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts")
Why is this needed for 4.14.y, when this "Fixes:" tag says a commit that showed up in 4.15?
The issue of set_affinity being called on inactive interrupts, and it not being handled correctly, was already there in 4.14. The commit in the Fixes: tag is the x86 workaround, which came in in 4.15, and is no longer needed.
So we still need the backport of the genirq changes, to fix it in 4.14. This is mostly for arm64 (gicv3), where this issue results in interrupts getting messed up.
Ok, thanks, both now queued up.
greg k-h
linux-stable-mirror@lists.linaro.org