commit 68d54ceeec0e5fee4fb8048e6a04c193f32525ca upstream.
The ptrace(PTRACE_PEEKMTETAGS) implementation checks whether the user
page has valid tags (mapped with PROT_MTE) by testing the PG_mte_tagged
page flag. If this bit is cleared, ptrace(PTRACE_PEEKMTETAGS) returns
-EIO.
A newly created (PROT_MTE) mapping points to the zero page which had its
tags zeroed during cpu_enable_mte(). If there were no prior writes to
this mapping, ptrace(PTRACE_PEEKMTETAGS) fails with -EIO since the zero
page does not have the PG_mte_tagged flag set.
Set PG_mte_tagged on the zero page when its tags are cleared during
boot. In addition, to avoid ptrace(PTRACE_PEEKMTETAGS) succeeding on
!PROT_MTE mappings pointing to the zero page, change the
__access_remote_tags() check to (vm_flags & VM_MTE) instead of
PG_mte_tagged.
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Cc: <stable(a)vger.kernel.org> # 5.10.x
Cc: Will Deacon <will(a)kernel.org>
Reported-by: Luis Machado <luis.machado(a)linaro.org>
Tested-by: Luis Machado <luis.machado(a)linaro.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Link: https://lore.kernel.org/r/20210210180316.23654-1-catalin.marinas@arm.com
---
arch/arm64/kernel/cpufeature.c | 6 +-----
arch/arm64/kernel/mte.c | 3 ++-
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0a52e076153b..65a522fbd874 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1696,16 +1696,12 @@ static void bti_enable(const struct arm64_cpu_capabilities *__unused)
#ifdef CONFIG_ARM64_MTE
static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
{
- static bool cleared_zero_page = false;
-
/*
* Clear the tags in the zero page. This needs to be done via the
* linear map which has the Tagged attribute.
*/
- if (!cleared_zero_page) {
- cleared_zero_page = true;
+ if (!test_and_set_bit(PG_mte_tagged, &ZERO_PAGE(0)->flags))
mte_clear_page_tags(lm_alias(empty_zero_page));
- }
}
#endif /* CONFIG_ARM64_MTE */
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index ef15c8a2a49d..7a66a7d9c1ff 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -239,11 +239,12 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
* would cause the existing tags to be cleared if the page
* was never mapped with PROT_MTE.
*/
- if (!test_bit(PG_mte_tagged, &page->flags)) {
+ if (!(vma->vm_flags & VM_MTE)) {
ret = -EOPNOTSUPP;
put_page(page);
break;
}
+ WARN_ON_ONCE(!test_bit(PG_mte_tagged, &page->flags));
/* limit access to the end of the page */
offset = offset_in_page(addr);
Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.
This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.
This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.
The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
block driver, that was used to create the MSI vectors, and the single
queue mode enforced later on by blk-mq because of kdump (i.e. keeping
all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)
Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.
Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
Cc: lvivier(a)redhat.com
Cc: stable(a)vger.kernel.org
Reviewed-by: Laurent Vivier <lvivier(a)redhat.com>
Reviewed-by: Cédric Le Goater <clg(a)kaod.org>
Signed-off-by: Greg Kurz <groug(a)kaod.org>
---
v2: - added missing #include <linux/crash_dump.h>
arch/powerpc/platforms/pseries/msi.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index b3ac2455faad..637300330507 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -4,6 +4,7 @@
* Copyright 2006-2007 Michael Ellerman, IBM Corp.
*/
+#include <linux/crash_dump.h>
#include <linux/device.h>
#include <linux/irq.h>
#include <linux/msi.h>
@@ -458,8 +459,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
return hwirq;
}
- virq = irq_create_mapping_affinity(NULL, hwirq,
- entry->affinity);
+ /*
+ * Depending on the number of online CPUs in the original
+ * kernel, it is likely for CPU #0 to be offline in a kdump
+ * kernel. The associated IRQs in the affinity mappings
+ * provided by irq_create_affinity_masks() are thus not
+ * started by irq_startup(), as per-design for managed IRQs.
+ * This can be a problem with multi-queue block devices driven
+ * by blk-mq : such a non-started IRQ is very likely paired
+ * with the single queue enforced by blk-mq during kdump (see
+ * blk_mq_alloc_tag_set()). This causes the device to remain
+ * silent and likely hangs the guest at some point.
+ *
+ * We don't really care for fine-grained affinity when doing
+ * kdump actually : simply ignore the pre-computed affinity
+ * masks in this case and let the default mask with all CPUs
+ * be used when creating the IRQ mappings.
+ */
+ if (is_kdump_kernel())
+ virq = irq_create_mapping(NULL, hwirq);
+ else
+ virq = irq_create_mapping_affinity(NULL, hwirq,
+ entry->affinity);
if (!virq) {
pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
--
2.26.2