From: Thomas Gleixner <tglx(a)linutronix.de>
commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream.
John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:
"It looks like what happens is that because the interrupts are not per-CPU
in the hardware, armpmu_request_irq() calls irq_force_affinity() while
the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
IRQF_NOBALANCING.
Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
irq_setup_affinity() which returns early because IRQF_PERCPU and
IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.
Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...
Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: John Keeping <john(a)metanate.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Marc Zyngier <maz(a)kernel.org>
Acked-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
---
arch/x86/kernel/apic/vector.c | 4 ++++
drivers/irqchip/irq-gic-v3-its.c | 5 ++++-
include/linux/irq.h | 13 +++++++++++++
kernel/irq/manage.c | 6 +++++-
4 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 18c0dca08163..557b153c9ee6 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -554,6 +554,10 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
irqd->chip_data = apicd;
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
+
+ /* Don't invoke affinity setter on deactivated interrupts */
+ irqd_set_affinity_on_activate(irqd);
+
/*
* Legacy vectors are already assigned when the IOAPIC
* takes them over. They stay on the same vector. This is
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 263cf9240b16..7966b19ceba7 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2581,6 +2581,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
msi_alloc_info_t *info = args;
struct its_device *its_dev = info->scratchpad[0].ptr;
struct its_node *its = its_dev->its;
+ struct irq_data *irqd;
irq_hw_number_t hwirq;
int err;
int i;
@@ -2600,7 +2601,9 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
irq_domain_set_hwirq_and_chip(domain, virq + i,
hwirq + i, &its_irq_chip, its_dev);
- irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(virq + i)));
+ irqd = irq_get_irq_data(virq + i);
+ irqd_set_single_target(irqd);
+ irqd_set_affinity_on_activate(irqd);
pr_debug("ID:%d pID:%d vID:%d\n",
(int)(hwirq + i - its_dev->event_map.lpi_base),
(int)(hwirq + i), virq + i);
diff --git a/include/linux/irq.h b/include/linux/irq.h
index ab9d956e21b3..be00e0ae5e31 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -211,6 +211,8 @@ struct irq_data {
* IRQD_CAN_RESERVE - Can use reservation mode
* IRQD_MSI_NOMASK_QUIRK - Non-maskable MSI quirk for affinity change
* required
+ * IRQD_AFFINITY_ON_ACTIVATE - Affinity is set on activation. Don't call
+ * irq_chip::irq_set_affinity() when deactivated.
*/
enum {
IRQD_TRIGGER_MASK = 0xf,
@@ -234,6 +236,7 @@ enum {
IRQD_DEFAULT_TRIGGER_SET = (1 << 25),
IRQD_CAN_RESERVE = (1 << 26),
IRQD_MSI_NOMASK_QUIRK = (1 << 27),
+ IRQD_AFFINITY_ON_ACTIVATE = (1 << 29),
};
#define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -408,6 +411,16 @@ static inline bool irqd_msi_nomask_quirk(struct irq_data *d)
return __irqd_to_state(d) & IRQD_MSI_NOMASK_QUIRK;
}
+static inline void irqd_set_affinity_on_activate(struct irq_data *d)
+{
+ __irqd_to_state(d) |= IRQD_AFFINITY_ON_ACTIVATE;
+}
+
+static inline bool irqd_affinity_on_activate(struct irq_data *d)
+{
+ return __irqd_to_state(d) & IRQD_AFFINITY_ON_ACTIVATE;
+}
+
#undef __irqd_to_state
static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index df73685de114..3b1d0a4725a4 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -281,12 +281,16 @@ static bool irq_set_affinity_deactivated(struct irq_data *data,
struct irq_desc *desc = irq_data_to_desc(data);
/*
+ * Handle irq chips which can handle affinity only in activated
+ * state correctly
+ *
* If the interrupt is not yet activated, just store the affinity
* mask and do not call the chip driver at all. On activation the
* driver has to make sure anyway that the interrupt is in a
* useable state so startup works.
*/
- if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) || irqd_is_activated(data))
+ if (!IS_ENABLED(CONFIG_IRQ_DOMAIN_HIERARCHY) ||
+ irqd_is_activated(data) || !irqd_affinity_on_activate(data))
return false;
cpumask_copy(desc->irq_common_data.affinity, mask);
--
2.17.2
Recently we found regression when running will_it_scale/page_fault3 test
on ARM64. Over 70% down for the multi processes cases and over 20% down
for the multi threads cases. It turns out the regression is caused by
commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
calling balance_dirty_pages() in write fault").
The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault. The retried page
fault would see correct PTEs installed then just fall through to spurious TLB
flush. The regression is caused by the excessive spurious TLB flush. It is
fine on x86 since x86's spurious TLB flush is no-op.
We could just skip the spurious TLB flush to mitigate the regression.
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Reported-by: Xu Yu <xuyu(a)linux.alibaba.com>
Debugged-by: Xu Yu <xuyu(a)linux.alibaba.com>
Tested-by: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
---
v3: Incorporated Linus's suggestion
v2: Incorporated Will Deacon's suggestion
mm/memory.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index 3a7779d9891d..602f4283122f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4247,6 +4247,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
vmf->flags & FAULT_FLAG_WRITE)) {
update_mmu_cache(vmf->vma, vmf->address, vmf->pte);
} else {
+ /* Skip spurious TLB flush for retried page fault */
+ if (vmf->flags & FAULT_FLAG_TRIED)
+ goto unlock;
/*
* This is needed only for protection faults but the arch code
* is not yet telling us if this is a protection fault or not.
--
2.26.2
Fix linkage error when CONFIG_BINFMT_ELF is selected but CONFIG_COREDUMP
is not:
ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_phdrs':
elfcore.c:(.text+0x172): undefined reference to `dump_emit'
ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_data':
elfcore.c:(.text+0x2b2): undefined reference to `dump_emit'
Cc: <stable(a)vger.kernel.org>
Fixes: 1fcccbac89f5 ("elf coredump: replace ELF_CORE_EXTRA_* macros by functions")
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
This is similar fix to commit 42d91f612c87 ("um: Fix build error and
kconfig for i386") although I put different fixes tag - the commit which
introduced this part of code.
---
arch/ia64/kernel/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile
index 1a8df6669eee..18d6008b151f 100644
--- a/arch/ia64/kernel/Makefile
+++ b/arch/ia64/kernel/Makefile
@@ -41,7 +41,7 @@ obj-y += esi_stub.o # must be in kernel proper
endif
obj-$(CONFIG_INTEL_IOMMU) += pci-dma.o
-obj-$(CONFIG_BINFMT_ELF) += elfcore.o
+obj-$(CONFIG_ELF_CORE) += elfcore.o
# fp_emulate() expects f2-f5,f16-f31 to contain the user-level state.
CFLAGS_traps.o += -mfixed-range=f2-f5,f16-f31
--
2.17.1
This is a final attempt to reach you as regards the estate of our deceased client who made you one of the beneficiaries of his estate. Do get back to me at your earliest convenience. The Trustees
The patch titled
Subject: squashfs: avoid bio_alloc() failure with 1Mbyte blocks
has been added to the -mm tree. Its filename is
squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/squashfs-avoid-bio_alloc-failure-w…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/squashfs-avoid-bio_alloc-failure-w…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: squashfs: avoid bio_alloc() failure with 1Mbyte blocks
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".
Bio_alloc() is limited to 256 pages (1 Mbyte). This can cause a failure
when reading 1 Mbyte block filesystems. The problem is a datablock can be
fully (or almost uncompressed), requiring 256 pages, but, because blocks
are not aligned to page boundaries, it may require 257 pages to read.
Bio_kmalloc() can handle 1024 pages, and so use this for the edge
condition.
Link: http://lkml.kernel.org/r/20200815035637.15319-1-phillip@squashfs.org.uk
Fixes: 93e72b3c612a ("squashfs: migrate from ll_rw_block usage to BIO")
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: Nicolas Prochazka <nicolas.prochazka(a)gmail.com>
Reported-by: Tomoatsu Shimada <shimada(a)walbrix.com>
Reviewed-by: Guenter Roeck <groeck(a)chromium.org>
Cc: Philippe Liard <pliard(a)google.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Adrien Schildknecht <adrien+dev(a)schischi.me>
Cc: Daniel Rosenberg <drosen(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/block.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/squashfs/block.c~squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks
+++ a/fs/squashfs/block.c
@@ -87,7 +87,11 @@ static int squashfs_bio_read(struct supe
int error, i;
struct bio *bio;
- bio = bio_alloc(GFP_NOIO, page_count);
+ if (page_count <= BIO_MAX_PAGES)
+ bio = bio_alloc(GFP_NOIO, page_count);
+ else
+ bio = bio_kmalloc(GFP_NOIO, page_count);
+
if (!bio)
return -ENOMEM;
_
Patches currently in -mm which might be from phillip(a)squashfs.org.uk are
squashfs-avoid-bio_alloc-failure-with-1mbyte-blocks.patch