On Mon, Jun 04, 2018 at 04:11:03PM -0700, Kevin Hilman wrote:
> kernelci.org bot <bot(a)kernelci.org> writes:
>
> > Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.1…
> > Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.47-53…
> >
> > Tree: stable-rc
> > Branch: linux-4.14.y
> > Git Describe: v4.14.47-53-g721adf61fde2
> > Git Commit: 721adf61fde28b9a87a95e45ecf3f5a325e7c76f
> > Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > Tested: 56 unique boards, 23 SoC families, 14 builds out of 185
> >
> > Boot Regressions Detected:
> >
> > arm64:
> >
> > defconfig:
> > meson-gxl-s905x-khadas-vim:
> > lab-baylibre: failing since 39 days (last pass: v4.14.26-140-g2a1700a4929f - first fail: v4.14.36-184-g3cd53e436ee2)
> >
> > Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
>
> TL;DR; All is well.
>
> The failing board is having a power supply issue and has been taken
> offline for repair. Since the same board is passing fine in another
> lab, it can be ignored.
Thanks for the updates on this, and the 4.9 board breakage.
greg k-h
On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> The case that interrupt affinity setting fails with -EBUSY can be handled
> in the kernel completely by using the already available generic pending
> infrastructure.
>
> If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
> interrupts for which irq_chip::set_affinity() can only be invoked from
> interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
> set the affinity pending bit. The next raised interrupt for the affected
> irq will check the pending bit and try to set the new affinity from the
> handler. This avoids that -EBUSY is returned when an affinity change is
> requested from user space and the previous change has not been cleaned
> up. The new affinity will take effect when the next interrupt is raised
> from the device.
>
> Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: stable(a)vger.kernel.org
Tested-by: Song Liu <songliubraving(a)fb.com>
> ---
> kernel/irq/manage.c | 37 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 35 insertions(+), 2 deletions(-)
>
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -204,6 +204,39 @@ int irq_do_set_affinity(struct irq_data
> return ret;
> }
>
> +#ifdef CONFIG_GENERIC_PENDING_IRQ
> +static inline int irq_set_affinity_pending(struct irq_data *data,
> + const struct cpumask *dest)
> +{
> + struct irq_desc *desc = irq_data_to_desc(data);
> +
> + irqd_set_move_pending(data);
> + irq_copy_pending(desc, dest);
> + return 0;
> +}
> +#else
> +static inline int irq_set_affinity_pending(struct irq_data *data,
> + const struct cpumask *dest)
> +{
> + return -EBUSY;
> +}
> +#endif
> +
> +static int irq_try_set_affinity(struct irq_data *data,
> + const struct cpumask *dest, bool force)
> +{
> + int ret = irq_do_set_affinity(data, dest, force);
> +
> + /*
> + * In case that the underlying vector management is busy and the
> + * architecture supports the generic pending mechanism then utilize
> + * this to avoid returning an error to user space.
> + */
> + if (ret == -EBUSY && !force)
> + ret = irq_set_affinity_pending(data, dest);
> + return ret;
> +}
> +
> int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
> bool force)
> {
> @@ -214,8 +247,8 @@ int irq_set_affinity_locked(struct irq_d
> if (!chip || !chip->irq_set_affinity)
> return -EINVAL;
>
> - if (irq_can_move_pcntxt(data)) {
> - ret = irq_do_set_affinity(data, mask, force);
> + if (irq_can_move_pcntxt(data) && !irqd_is_setaffinity_pending(data)) {
> + ret = irq_try_set_affinity(data, mask, force);
> } else {
> irqd_set_move_pending(data);
> irq_copy_pending(desc, mask);
>
>
On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> apic_ack_edge() is explicitely for handling interrupt affinity cleanup when
> interrupt remapping is not available or disable.
>
> Remapped interrupts and also some of the platform specific special
> interrupts, e.g. UV, invoke ack_APIC_irq() directly.
>
> To address the issue of failing an affinity update with -EBUSY the delayed
> affinity mechanism can be reused, but ack_APIC_irq() does not handle
> that. Adding this to ack_APIC_irq() is not possible, because that function
> is also used for exceptions and directly handled interrupts like IPIs.
>
> Create a new function, which just contains the conditional invocation of
> irq_move_irq() and the final ack_APIC_irq(). Making the invocation of
> irq_move_irq() conditional avoids the out of line call if the pending bit
> is not set.
>
> Reuse the new function in apic_ack_edge().
>
> Preparatory change for the real fix
>
> Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: stable(a)vger.kernel.org
Tested-by: Song Liu <songliubraving(a)fb.com>
> ---
> arch/x86/include/asm/apic.h | 2 ++
> arch/x86/kernel/apic/vector.c | 10 ++++++++--
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -436,6 +436,8 @@ static inline void apic_set_eoi_write(vo
>
> #endif /* CONFIG_X86_LOCAL_APIC */
>
> +extern void apic_ack_irq(struct irq_data *data);
> +
> static inline void ack_APIC_irq(void)
> {
> /*
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -809,11 +809,17 @@ static int apic_retrigger_irq(struct irq
> return 1;
> }
>
> +void apic_ack_irq(struct irq_data *irqd)
> +{
> + if (unlikely(irqd_is_setaffinity_pending(irqd)))
> + irq_move_irq(irqd);
> + ack_APIC_irq();
> +}
> +
> void apic_ack_edge(struct irq_data *irqd)
> {
> irq_complete_move(irqd_cfg(irqd));
> - irq_move_irq(irqd);
> - ack_APIC_irq();
> + apic_ack_irq(irqd);
> }
>
> static struct irq_chip lapic_controller = {
>
>
On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> The generic pending interrupt mechanism moves interrupts from the interrupt
> handler on the original target CPU to the new destination CPU. This is
> required for x86 and ia64 due to the way the interrupt delivery and
> acknowledge works if the interrupts are not remapped.
>
> However that update can fail for various reasons. Some of them are valid
> reasons to discard the pending update, but the case, when the previous move
> has not been fully cleaned up is not a legit reason to fail.
>
> Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
> a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
> tried again when the next interrupt arrives.
>
> Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: stable(a)vger.kernel.org
Tested-by: Song Liu <songliubraving(a)fb.com>
> ---
> kernel/irq/migration.c | 24 ++++++++++++++++++------
> 1 file changed, 18 insertions(+), 6 deletions(-)
>
> --- a/kernel/irq/migration.c
> +++ b/kernel/irq/migration.c
> @@ -38,17 +38,18 @@ bool irq_fixup_move_pending(struct irq_d
> void irq_move_masked_irq(struct irq_data *idata)
> {
> struct irq_desc *desc = irq_data_to_desc(idata);
> - struct irq_chip *chip = desc->irq_data.chip;
> + struct irq_data *data = &desc->irq_data;
> + struct irq_chip *chip = data->chip;
>
> - if (likely(!irqd_is_setaffinity_pending(&desc->irq_data)))
> + if (likely(!irqd_is_setaffinity_pending(data)))
> return;
>
> - irqd_clr_move_pending(&desc->irq_data);
> + irqd_clr_move_pending(data);
>
> /*
> * Paranoia: cpu-local interrupts shouldn't be calling in here anyway.
> */
> - if (irqd_is_per_cpu(&desc->irq_data)) {
> + if (irqd_is_per_cpu(data)) {
> WARN_ON(1);
> return;
> }
> @@ -73,9 +74,20 @@ void irq_move_masked_irq(struct irq_data
> * For correct operation this depends on the caller
> * masking the irqs.
> */
> - if (cpumask_any_and(desc->pending_mask, cpu_online_mask) < nr_cpu_ids)
> - irq_do_set_affinity(&desc->irq_data, desc->pending_mask, false);
> + if (cpumask_any_and(desc->pending_mask, cpu_online_mask) < nr_cpu_ids) {
> + int ret;
>
> + ret = irq_do_set_affinity(data, desc->pending_mask, false);
> + /*
> + * If the there is a cleanup pending in the underlying
> + * vector management, reschedule the move for the next
> + * interrupt. Leave desc->pending_mask intact.
> + */
> + if (ret == -EBUSY) {
> + irqd_set_move_pending(data);
> + return;
> + }
> + }
> cpumask_clear(desc->pending_mask);
> }
>
>
>
On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> Several people observed the WARN_ON() in irq_matrix_free() which triggers
> when the caller tries to free an vector which is not in the allocation
> range. Song provided the trace information which allowed to decode the root
> cause.
>
> The rework of the vector allocation mechanism failed to preserve a sanity
> check, which prevents setting a new target vector/CPU when the previous
> affinity change has not fully completed.
>
> As a result a half finished affinity change can be overwritten, which can
> cause the leak of a irq descriptor pointer on the previous target CPU and
> double enqueue of the hlist head into the cleanup lists of two or more
> CPUs. After one CPU cleaned up its vector the next CPU will invoke the
> cleanup handler with vector 0, which triggers the out of range warning in
> the matrix allocator.
>
> Prevent this by checking the apic_data of the interrupt whether the
> move_in_progress flag is false and the hlist node is not hashed. Return
> -EBUSY if not.
>
> This prevents the damage and restores the behaviour before the vector
> allocation rework, but due to other changes in that area it also widens the
> chance that user space can observe -EBUSY. In theory this should be fine,
> but actually not all user space tools handle -EBUSY correctly. Addressing
> that is not part of this fix, but will be addressed in follow up patches.
>
> Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
> Reported-by: Dmitry Safonov <0x7f454c46(a)gmail.com>
> Reported-by: Tariq Toukan <tariqt(a)mellanox.com>
> Reported-by: Song Liu <liu.song.a23(a)gmail.com>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: stable(a)vger.kernel.org
Thanks Thomas!
This patch alone fixes my test: ethtool -L in a loop.
I also run the same test for the full set, and it works well.
Tested-by: Song Liu <songliubraving(a)fb.com>
> ---
> arch/x86/kernel/apic/vector.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -235,6 +235,15 @@ static int allocate_vector(struct irq_da
> if (vector && cpu_online(cpu) && cpumask_test_cpu(cpu, dest))
> return 0;
>
> + /*
> + * Careful here. @apicd might either have move_in_progress set or
> + * be enqueued for cleanup. Assigning a new vector would either
> + * leave a stale vector on some CPU around or in case of a pending
> + * cleanup corrupt the hlist.
> + */
> + if (apicd->move_in_progress || !hlist_unhashed(&apicd->clist))
> + return -EBUSY;
> +
> vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
> if (vector > 0)
> apic_update_vector(irqd, vector, cpu);
>
>