The quilt patch titled
Subject: kexec: fix the unexpected kexec_dprintk() macro
has been removed from the -mm tree. Its filename was
kexec-fix-the-unexpected-kexec_dprintk-macro.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baoquan He <bhe(a)redhat.com>
Subject: kexec: fix the unexpected kexec_dprintk() macro
Date: Tue, 9 Apr 2024 12:22:38 +0800
Jiri reported that the current kexec_dprintk() always prints out debugging
message whenever kexec/kdmmp loading is triggered. That is not wanted.
The debugging message is supposed to be printed out when 'kexec -s -d' is
specified for kexec/kdump loading.
After investigating, the reason is the current kexec_dprintk() takes
printk(KERN_INFO) or printk(KERN_DEBUG) depending on whether '-d' is
specified. However, distros usually have defaulg log level like below:
[~]# cat /proc/sys/kernel/printk
7 4 1 7
So, even though '-d' is not specified, printk(KERN_DEBUG) also always
prints out. I thought printk(KERN_DEBUG) is equal to pr_debug(), it's
not.
Fix it by changing to use pr_info() instead which are expected to work.
Link: https://lkml.kernel.org/r/20240409042238.1240462-1-bhe@redhat.com
Fixes: cbc2fe9d9cb2 ("kexec_file: add kexec_file flag to control debug printing")
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://lore.kernel.org/all/4c775fca-5def-4a2d-8437-7130b02722a2@kernel.org
Reviewed-by: Dave Young <dyoung(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/kexec.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/include/linux/kexec.h~kexec-fix-the-unexpected-kexec_dprintk-macro
+++ a/include/linux/kexec.h
@@ -461,10 +461,8 @@ static inline void arch_kexec_pre_free_p
extern bool kexec_file_dbg_print;
-#define kexec_dprintk(fmt, ...) \
- printk("%s" fmt, \
- kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG, \
- ##__VA_ARGS__)
+#define kexec_dprintk(fmt, arg...) \
+ do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0)
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
_
Patches currently in -mm which might be from bhe(a)redhat.com are
The quilt patch titled
Subject: mm: fix race between __split_huge_pmd_locked() and GUP-fast
has been removed from the -mm tree. Its filename was
mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: fix race between __split_huge_pmd_locked() and GUP-fast
Date: Wed, 1 May 2024 15:33:10 +0100
__split_huge_pmd_locked() can be called for a present THP, devmap or
(non-present) migration entry. It calls pmdp_invalidate() unconditionally
on the pmdp and only determines if it is present or not based on the
returned old pmd. This is a problem for the migration entry case because
pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a
present pmd.
On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future
call to pmd_present() will return true. And therefore any lockless
pgtable walker could see the migration entry pmd in this state and start
interpretting the fields as if it were present, leading to BadThings (TM).
GUP-fast appears to be one such lockless pgtable walker.
x86 does not suffer the above problem, but instead pmd_mkinvalid() will
corrupt the offset field of the swap entry within the swap pte. See link
below for discussion of that problem.
Fix all of this by only calling pmdp_invalidate() for a present pmd. And
for good measure let's add a warning to all implementations of
pmdp_invalidate[_ad](). I've manually reviewed all other
pmdp_invalidate[_ad]() call sites and believe all others to be conformant.
This is a theoretical bug found during code review. I don't have any test
case to trigger it in practice.
Link: https://lkml.kernel.org/r/20240501143310.1381675-1-ryan.roberts@arm.com
Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)kernel.org>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Naveen N. Rao <naveen.n.rao(a)linux.ibm.com>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/mm/arch_pgtable_helpers.rst | 6 +-
arch/powerpc/mm/book3s64/pgtable.c | 1
arch/s390/include/asm/pgtable.h | 4 +
arch/sparc/mm/tlb.c | 1
arch/x86/mm/pgtable.c | 2
mm/huge_memory.c | 49 ++++++++++----------
mm/pgtable-generic.c | 2
7 files changed, 39 insertions(+), 26 deletions(-)
--- a/arch/powerpc/mm/book3s64/pgtable.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast
+++ a/arch/powerpc/mm/book3s64/pgtable.c
@@ -170,6 +170,7 @@ pmd_t pmdp_invalidate(struct vm_area_str
{
unsigned long old_pmd;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return __pmd(old_pmd);
--- a/arch/s390/include/asm/pgtable.h~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast
+++ a/arch/s390/include/asm/pgtable.h
@@ -1769,8 +1769,10 @@ static inline pmd_t pmdp_huge_clear_flus
static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
- pmd_t pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
+ pmd_t pmd;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
+ pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
return pmdp_xchg_direct(vma->vm_mm, addr, pmdp, pmd);
}
--- a/arch/sparc/mm/tlb.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast
+++ a/arch/sparc/mm/tlb.c
@@ -249,6 +249,7 @@ pmd_t pmdp_invalidate(struct vm_area_str
{
pmd_t old, entry;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
old = pmdp_establish(vma, address, pmdp, entry);
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
--- a/arch/x86/mm/pgtable.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast
+++ a/arch/x86/mm/pgtable.c
@@ -631,6 +631,8 @@ int pmdp_clear_flush_young(struct vm_are
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
+
/*
* No flush is necessary. Once an invalid PTE is established, the PTE's
* access and dirty bits cannot be updated.
--- a/Documentation/mm/arch_pgtable_helpers.rst~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast
+++ a/Documentation/mm/arch_pgtable_helpers.rst
@@ -140,7 +140,8 @@ PMD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD |
+---------------------------+--------------------------------------------------+
-| pmd_mkinvalid | Invalidates a mapped PMD [1] |
+| pmd_mkinvalid | Invalidates a present PMD; do not call for |
+| | non-present PMD [1] |
+---------------------------+--------------------------------------------------+
| pmd_set_huge | Creates a PMD huge mapping |
+---------------------------+--------------------------------------------------+
@@ -196,7 +197,8 @@ PUD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD |
+---------------------------+--------------------------------------------------+
-| pud_mkinvalid | Invalidates a mapped PUD [1] |
+| pud_mkinvalid | Invalidates a present PUD; do not call for |
+| | non-present PUD [1] |
+---------------------------+--------------------------------------------------+
| pud_set_huge | Creates a PUD huge mapping |
+---------------------------+--------------------------------------------------+
--- a/mm/huge_memory.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast
+++ a/mm/huge_memory.c
@@ -2430,32 +2430,11 @@ static void __split_huge_pmd_locked(stru
return __split_huge_zero_page_pmd(vma, haddr, pmd);
}
- /*
- * Up to this point the pmd is present and huge and userland has the
- * whole access to the hugepage during the split (which happens in
- * place). If we overwrite the pmd with the not-huge version pointing
- * to the pte here (which of course we could if all CPUs were bug
- * free), userland could trigger a small page size TLB miss on the
- * small sized TLB while the hugepage TLB entry is still established in
- * the huge TLB. Some CPU doesn't like that.
- * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
- * 383 on page 105. Intel should be safe but is also warns that it's
- * only safe if the permission and cache attributes of the two entries
- * loaded in the two TLB is identical (which should be the case here).
- * But it is generally safer to never allow small and huge TLB entries
- * for the same virtual address to be loaded simultaneously. So instead
- * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
- * current pmd notpresent (atomically because here the pmd_trans_huge
- * must remain set at all times on the pmd until the split is complete
- * for this pmd), then we flush the SMP TLB and finally we write the
- * non-huge version of the pmd entry with pmd_populate.
- */
- old_pmd = pmdp_invalidate(vma, haddr, pmd);
-
- pmd_migration = is_pmd_migration_entry(old_pmd);
+ pmd_migration = is_pmd_migration_entry(*pmd);
if (unlikely(pmd_migration)) {
swp_entry_t entry;
+ old_pmd = *pmd;
entry = pmd_to_swp_entry(old_pmd);
page = pfn_swap_entry_to_page(entry);
write = is_writable_migration_entry(entry);
@@ -2466,6 +2445,30 @@ static void __split_huge_pmd_locked(stru
soft_dirty = pmd_swp_soft_dirty(old_pmd);
uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
+ /*
+ * Up to this point the pmd is present and huge and userland has
+ * the whole access to the hugepage during the split (which
+ * happens in place). If we overwrite the pmd with the not-huge
+ * version pointing to the pte here (which of course we could if
+ * all CPUs were bug free), userland could trigger a small page
+ * size TLB miss on the small sized TLB while the hugepage TLB
+ * entry is still established in the huge TLB. Some CPU doesn't
+ * like that. See
+ * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
+ * 383 on page 105. Intel should be safe but is also warns that
+ * it's only safe if the permission and cache attributes of the
+ * two entries loaded in the two TLB is identical (which should
+ * be the case here). But it is generally safer to never allow
+ * small and huge TLB entries for the same virtual address to be
+ * loaded simultaneously. So instead of doing "pmd_populate();
+ * flush_pmd_tlb_range();" we first mark the current pmd
+ * notpresent (atomically because here the pmd_trans_huge must
+ * remain set at all times on the pmd until the split is
+ * complete for this pmd), then we flush the SMP TLB and finally
+ * we write the non-huge version of the pmd entry with
+ * pmd_populate.
+ */
+ old_pmd = pmdp_invalidate(vma, haddr, pmd);
page = pmd_page(old_pmd);
folio = page_folio(page);
if (pmd_dirty(old_pmd)) {
--- a/mm/pgtable-generic.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast
+++ a/mm/pgtable-generic.c
@@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(st
pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return old;
@@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_str
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
return pmdp_invalidate(vma, address, pmdp);
}
#endif
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
The quilt patch titled
Subject: fs/proc/task_mmu: fix uffd-wp confusion in pagemap_scan_pmd_entry()
has been removed from the -mm tree. Its filename was
fs-proc-task_mmu-fix-uffd-wp-confusion-in-pagemap_scan_pmd_entry.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: fs/proc/task_mmu: fix uffd-wp confusion in pagemap_scan_pmd_entry()
Date: Mon, 29 Apr 2024 12:41:04 +0100
pagemap_scan_pmd_entry() checks if uffd-wp is set on each pte to avoid
unnecessary if set. However it was previously checking with
`pte_uffd_wp(ptep_get(pte))` without first confirming that the pte was
present. It is only valid to call pte_uffd_wp() for present ptes. For
swap ptes, pte_swp_uffd_wp() must be called because the uffd-wp bit may be
kept in a different position, depending on the arch.
This was leading to test failures in the pagemap_ioctl mm selftest, when
bringing up uffd-wp support on arm64 due to incorrectly interpretting the
uffd-wp status of migration entries.
Let's fix this by using the correct check based on pte_present(). While
we are at it, let's pass the pte to make_uffd_wp_pte() to avoid the
pointless extra ptep_get() which can't be optimized out due to READ_ONCE()
on many arches.
Link: https://lkml.kernel.org/r/20240429114104.182890-1-ryan.roberts@arm.com
Fixes: 12f6b01a0bcb ("fs/proc/task_mmu: add fast paths to get/clear PAGE_IS_WRITTEN flag")
Closes: https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
--- a/fs/proc/task_mmu.c~fs-proc-task_mmu-fix-uffd-wp-confusion-in-pagemap_scan_pmd_entry
+++ a/fs/proc/task_mmu.c
@@ -1817,10 +1817,8 @@ static unsigned long pagemap_page_catego
}
static void make_uffd_wp_pte(struct vm_area_struct *vma,
- unsigned long addr, pte_t *pte)
+ unsigned long addr, pte_t *pte, pte_t ptent)
{
- pte_t ptent = ptep_get(pte);
-
if (pte_present(ptent)) {
pte_t old_pte;
@@ -2175,9 +2173,12 @@ static int pagemap_scan_pmd_entry(pmd_t
if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) {
/* Fast path for performing exclusive WP */
for (addr = start; addr != end; pte++, addr += PAGE_SIZE) {
- if (pte_uffd_wp(ptep_get(pte)))
+ pte_t ptent = ptep_get(pte);
+
+ if ((pte_present(ptent) && pte_uffd_wp(ptent)) ||
+ pte_swp_uffd_wp_any(ptent))
continue;
- make_uffd_wp_pte(vma, addr, pte);
+ make_uffd_wp_pte(vma, addr, pte, ptent);
if (!flush_end)
start = addr;
flush_end = addr + PAGE_SIZE;
@@ -2190,8 +2191,10 @@ static int pagemap_scan_pmd_entry(pmd_t
p->arg.return_mask == PAGE_IS_WRITTEN) {
for (addr = start; addr < end; pte++, addr += PAGE_SIZE) {
unsigned long next = addr + PAGE_SIZE;
+ pte_t ptent = ptep_get(pte);
- if (pte_uffd_wp(ptep_get(pte)))
+ if ((pte_present(ptent) && pte_uffd_wp(ptent)) ||
+ pte_swp_uffd_wp_any(ptent))
continue;
ret = pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN,
p, addr, &next);
@@ -2199,7 +2202,7 @@ static int pagemap_scan_pmd_entry(pmd_t
break;
if (~p->arg.flags & PM_SCAN_WP_MATCHING)
continue;
- make_uffd_wp_pte(vma, addr, pte);
+ make_uffd_wp_pte(vma, addr, pte, ptent);
if (!flush_end)
start = addr;
flush_end = next;
@@ -2208,8 +2211,9 @@ static int pagemap_scan_pmd_entry(pmd_t
}
for (addr = start; addr != end; pte++, addr += PAGE_SIZE) {
+ pte_t ptent = ptep_get(pte);
unsigned long categories = p->cur_vma_category |
- pagemap_page_category(p, vma, addr, ptep_get(pte));
+ pagemap_page_category(p, vma, addr, ptent);
unsigned long next = addr + PAGE_SIZE;
if (!pagemap_scan_is_interesting_page(categories, p))
@@ -2224,7 +2228,7 @@ static int pagemap_scan_pmd_entry(pmd_t
if (~categories & PAGE_IS_WRITTEN)
continue;
- make_uffd_wp_pte(vma, addr, pte);
+ make_uffd_wp_pte(vma, addr, pte, ptent);
if (!flush_end)
start = addr;
flush_end = next;
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
selftests-mm-soft-dirty-should-fail-if-a-testcase-fails.patch
mm-debug_vm_pgtable-test-pmd_leaf-behavior-with-pmd_mkinvalid.patch
mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast.patch
The quilt patch titled
Subject: fs/proc/task_mmu: fix loss of young/dirty bits during pagemap scan
has been removed from the -mm tree. Its filename was
fs-proc-task_mmu-fix-loss-of-young-dirty-bits-during-pagemap-scan.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: fs/proc/task_mmu: fix loss of young/dirty bits during pagemap scan
Date: Mon, 29 Apr 2024 12:40:17 +0100
make_uffd_wp_pte() was previously doing:
pte = ptep_get(ptep);
ptep_modify_prot_start(ptep);
pte = pte_mkuffd_wp(pte);
ptep_modify_prot_commit(ptep, pte);
But if another thread accessed or dirtied the pte between the first 2
calls, this could lead to loss of that information. Since
ptep_modify_prot_start() gets and clears atomically, the following is the
correct pattern and prevents any possible race. Any access after the
first call would see an invalid pte and cause a fault:
pte = ptep_modify_prot_start(ptep);
pte = pte_mkuffd_wp(pte);
ptep_modify_prot_commit(ptep, pte);
Link: https://lkml.kernel.org/r/20240429114017.182570-1-ryan.roberts@arm.com
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/proc/task_mmu.c~fs-proc-task_mmu-fix-loss-of-young-dirty-bits-during-pagemap-scan
+++ a/fs/proc/task_mmu.c
@@ -1825,7 +1825,7 @@ static void make_uffd_wp_pte(struct vm_a
pte_t old_pte;
old_pte = ptep_modify_prot_start(vma, addr, pte);
- ptent = pte_mkuffd_wp(ptent);
+ ptent = pte_mkuffd_wp(old_pte);
ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
} else if (is_swap_pte(ptent)) {
ptent = pte_swp_mkuffd_wp(ptent);
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
selftests-mm-soft-dirty-should-fail-if-a-testcase-fails.patch
mm-debug_vm_pgtable-test-pmd_leaf-behavior-with-pmd_mkinvalid.patch
mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast.patch
The quilt patch titled
Subject: mm: use memalloc_nofs_save() in page_cache_ra_order()
has been removed from the -mm tree. Its filename was
mm-use-memalloc_nofs_save-in-page_cache_ra_order.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Subject: mm: use memalloc_nofs_save() in page_cache_ra_order()
Date: Fri, 26 Apr 2024 19:29:38 +0800
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"),
ensure that page_cache_ra_order() do not attempt to reclaim file-backed
pages too, or it leads to a deadlock, found issue when test ext4 large
folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200
Call trace:
__switch_to+0x14c/0x240
__schedule+0x82c/0xdd0
schedule+0x58/0xf0
io_schedule+0x24/0xa0
__folio_lock+0x130/0x300
migrate_pages_batch+0x378/0x918
migrate_pages+0x350/0x700
compact_zone+0x63c/0xb38
compact_zone_order+0xc0/0x118
try_to_compact_pages+0xb0/0x280
__alloc_pages_direct_compact+0x98/0x248
__alloc_pages+0x510/0x1110
alloc_pages+0x9c/0x130
folio_alloc+0x20/0x78
filemap_alloc_folio+0x8c/0x1b0
page_cache_ra_order+0x174/0x308
ondemand_readahead+0x1c8/0x2b8
page_cache_async_ra+0x68/0xb8
filemap_readahead.isra.0+0x64/0xa8
filemap_get_pages+0x3fc/0x5b0
filemap_splice_read+0xf4/0x280
ext4_file_splice_read+0x2c/0x48 [ext4]
vfs_splice_read.part.0+0xa8/0x118
splice_direct_to_actor+0xbc/0x288
do_splice_direct+0x9c/0x108
do_sendfile+0x328/0x468
__arm64_sys_sendfile64+0x8c/0x148
invoke_syscall+0x4c/0x118
el0_svc_common.constprop.0+0xc8/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x4c/0x1f8
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/readahead.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/readahead.c~mm-use-memalloc_nofs_save-in-page_cache_ra_order
+++ a/mm/readahead.c
@@ -490,6 +490,7 @@ void page_cache_ra_order(struct readahea
pgoff_t index = readahead_index(ractl);
pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
pgoff_t mark = index + ra->size - ra->async_size;
+ unsigned int nofs;
int err = 0;
gfp_t gfp = readahead_gfp_mask(mapping);
@@ -504,6 +505,8 @@ void page_cache_ra_order(struct readahea
new_order = min_t(unsigned int, new_order, ilog2(ra->size));
}
+ /* See comment in page_cache_ra_unbounded() */
+ nofs = memalloc_nofs_save();
filemap_invalidate_lock_shared(mapping);
while (index <= limit) {
unsigned int order = new_order;
@@ -527,6 +530,7 @@ void page_cache_ra_order(struct readahea
read_pages(ractl);
filemap_invalidate_unlock_shared(mapping);
+ memalloc_nofs_restore(nofs);
/*
* If there were already pages in the page cache, then we may have
_
Patches currently in -mm which might be from wangkefeng.wang(a)huawei.com are
arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch
arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch
mm-move-mm-counter-updating-out-of-set_pte_range.patch
mm-filemap-batch-mm-counter-updating-in-filemap_map_pages.patch
mm-swapfile-check-usable-swap-device-in-__folio_throttle_swaprate.patch
mm-memory-check-userfaultfd_wp-in-vmf_orig_pte_uffd_wp.patch
The quilt patch titled
Subject: kmsan: compiler_types: declare __no_sanitize_or_inline
has been removed from the -mm tree. Its filename was
kmsan-compiler_types-declare-__no_sanitize_or_inline.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Alexander Potapenko <glider(a)google.com>
Subject: kmsan: compiler_types: declare __no_sanitize_or_inline
Date: Fri, 26 Apr 2024 11:16:22 +0200
It turned out that KMSAN instruments READ_ONCE_NOCHECK(), resulting in
false positive reports, because __no_sanitize_or_inline enforced inlining.
Properly declare __no_sanitize_or_inline under __SANITIZE_MEMORY__, so
that it does not __always_inline the annotated function.
Link: https://lkml.kernel.org/r/20240426091622.3846771-1-glider@google.com
Fixes: 5de0ce85f5a4 ("kmsan: mark noinstr as __no_sanitize_memory")
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reported-by: syzbot+355c5bb8c1445c871ee8(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/000000000000826ac1061675b0e3@google.com
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Marco Elver <elver(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Miguel Ojeda <ojeda(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler_types.h | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/include/linux/compiler_types.h~kmsan-compiler_types-declare-__no_sanitize_or_inline
+++ a/include/linux/compiler_types.h
@@ -278,6 +278,17 @@ struct ftrace_likely_data {
# define __no_kcsan
#endif
+#ifdef __SANITIZE_MEMORY__
+/*
+ * Similarly to KASAN and KCSAN, KMSAN loses function attributes of inlined
+ * functions, therefore disabling KMSAN checks also requires disabling inlining.
+ *
+ * __no_sanitize_or_inline effectively prevents KMSAN from reporting errors
+ * within the function and marks all its outputs as initialized.
+ */
+# define __no_sanitize_or_inline __no_kmsan_checks notrace __maybe_unused
+#endif
+
#ifndef __no_sanitize_or_inline
#define __no_sanitize_or_inline __always_inline
#endif
_
Patches currently in -mm which might be from glider(a)google.com are