May 2024 - Linux-stable-mirror

[merged mm-nonmm-stable] kexec-fix-the-unexpected-kexec_dprintk-macro.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: kexec: fix the unexpected kexec_dprintk() macro has been removed from the -mm tree. Its filename was kexec-fix-the-unexpected-kexec_dprintk-macro.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Baoquan He <bhe(a)redhat.com> Subject: kexec: fix the unexpected kexec_dprintk() macro Date: Tue, 9 Apr 2024 12:22:38 +0800 Jiri reported that the current kexec_dprintk() always prints out debugging message whenever kexec/kdmmp loading is triggered. That is not wanted. The debugging message is supposed to be printed out when 'kexec -s -d' is specified for kexec/kdump loading. After investigating, the reason is the current kexec_dprintk() takes printk(KERN_INFO) or printk(KERN_DEBUG) depending on whether '-d' is specified. However, distros usually have defaulg log level like below: [~]# cat /proc/sys/kernel/printk 7 4 1 7 So, even though '-d' is not specified, printk(KERN_DEBUG) also always prints out. I thought printk(KERN_DEBUG) is equal to pr_debug(), it's not. Fix it by changing to use pr_info() instead which are expected to work. Link: https://lkml.kernel.org/r/20240409042238.1240462-1-bhe@redhat.com Fixes: cbc2fe9d9cb2 ("kexec_file: add kexec_file flag to control debug printing") Signed-off-by: Baoquan He <bhe(a)redhat.com> Reported-by: Jiri Slaby <jirislaby(a)kernel.org> Closes: https://lore.kernel.org/all/4c775fca-5def-4a2d-8437-7130b02722a2@kernel.org Reviewed-by: Dave Young <dyoung(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/kexec.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) --- a/include/linux/kexec.h~kexec-fix-the-unexpected-kexec_dprintk-macro +++ a/include/linux/kexec.h @@ -461,10 +461,8 @@ static inline void arch_kexec_pre_free_p extern bool kexec_file_dbg_print; -#define kexec_dprintk(fmt, ...) \ - printk("%s" fmt, \ - kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG, \ - ##__VA_ARGS__) +#define kexec_dprintk(fmt, arg...) \ + do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0) #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; _ Patches currently in -mm which might be from bhe(a)redhat.com are

1 year, 2 months

1
0
0 0

[merged mm-stable] mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: fix race between __split_huge_pmd_locked() and GUP-fast has been removed from the -mm tree. Its filename was mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryan Roberts <ryan.roberts(a)arm.com> Subject: mm: fix race between __split_huge_pmd_locked() and GUP-fast Date: Wed, 1 May 2024 15:33:10 +0100 __split_huge_pmd_locked() can be called for a present THP, devmap or (non-present) migration entry. It calls pmdp_invalidate() unconditionally on the pmdp and only determines if it is present or not based on the returned old pmd. This is a problem for the migration entry case because pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a present pmd. On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future call to pmd_present() will return true. And therefore any lockless pgtable walker could see the migration entry pmd in this state and start interpretting the fields as if it were present, leading to BadThings (TM). GUP-fast appears to be one such lockless pgtable walker. x86 does not suffer the above problem, but instead pmd_mkinvalid() will corrupt the offset field of the swap entry within the swap pte. See link below for discussion of that problem. Fix all of this by only calling pmdp_invalidate() for a present pmd. And for good measure let's add a warning to all implementations of pmdp_invalidate[_ad](). I've manually reviewed all other pmdp_invalidate[_ad]() call sites and believe all others to be conformant. This is a theoretical bug found during code review. I don't have any test case to trigger it in practice. Link: https://lkml.kernel.org/r/20240501143310.1381675-1-ryan.roberts@arm.com Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> Reviewed-by: Zi Yan <ziy(a)nvidia.com> Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar(a)kernel.org> Cc: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com> Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Jonathan Corbet <corbet(a)lwn.net> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Naveen N. Rao <naveen.n.rao(a)linux.ibm.com> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Sven Schnelle <svens(a)linux.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/mm/arch_pgtable_helpers.rst | 6 +- arch/powerpc/mm/book3s64/pgtable.c | 1 arch/s390/include/asm/pgtable.h | 4 + arch/sparc/mm/tlb.c | 1 arch/x86/mm/pgtable.c | 2 mm/huge_memory.c | 49 ++++++++++---------- mm/pgtable-generic.c | 2 7 files changed, 39 insertions(+), 26 deletions(-) --- a/arch/powerpc/mm/book3s64/pgtable.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast +++ a/arch/powerpc/mm/book3s64/pgtable.c @@ -170,6 +170,7 @@ pmd_t pmdp_invalidate(struct vm_area_str { unsigned long old_pmd; + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return __pmd(old_pmd); --- a/arch/s390/include/asm/pgtable.h~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast +++ a/arch/s390/include/asm/pgtable.h @@ -1769,8 +1769,10 @@ static inline pmd_t pmdp_huge_clear_flus static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp) { - pmd_t pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID); + pmd_t pmd; + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); + pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID); return pmdp_xchg_direct(vma->vm_mm, addr, pmdp, pmd); } --- a/arch/sparc/mm/tlb.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast +++ a/arch/sparc/mm/tlb.c @@ -249,6 +249,7 @@ pmd_t pmdp_invalidate(struct vm_area_str { pmd_t old, entry; + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID); old = pmdp_establish(vma, address, pmdp, entry); flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE); --- a/arch/x86/mm/pgtable.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast +++ a/arch/x86/mm/pgtable.c @@ -631,6 +631,8 @@ int pmdp_clear_flush_young(struct vm_are pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); + /* * No flush is necessary. Once an invalid PTE is established, the PTE's * access and dirty bits cannot be updated. --- a/Documentation/mm/arch_pgtable_helpers.rst~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast +++ a/Documentation/mm/arch_pgtable_helpers.rst @@ -140,7 +140,8 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD | +---------------------------+--------------------------------------------------+ -| pmd_mkinvalid | Invalidates a mapped PMD [1] | +| pmd_mkinvalid | Invalidates a present PMD; do not call for | +| | non-present PMD [1] | +---------------------------+--------------------------------------------------+ | pmd_set_huge | Creates a PMD huge mapping | +---------------------------+--------------------------------------------------+ @@ -196,7 +197,8 @@ PUD Page Table Helpers +---------------------------+--------------------------------------------------+ | pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD | +---------------------------+--------------------------------------------------+ -| pud_mkinvalid | Invalidates a mapped PUD [1] | +| pud_mkinvalid | Invalidates a present PUD; do not call for | +| | non-present PUD [1] | +---------------------------+--------------------------------------------------+ | pud_set_huge | Creates a PUD huge mapping | +---------------------------+--------------------------------------------------+ --- a/mm/huge_memory.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast +++ a/mm/huge_memory.c @@ -2430,32 +2430,11 @@ static void __split_huge_pmd_locked(stru return __split_huge_zero_page_pmd(vma, haddr, pmd); } - /* - * Up to this point the pmd is present and huge and userland has the - * whole access to the hugepage during the split (which happens in - * place). If we overwrite the pmd with the not-huge version pointing - * to the pte here (which of course we could if all CPUs were bug - * free), userland could trigger a small page size TLB miss on the - * small sized TLB while the hugepage TLB entry is still established in - * the huge TLB. Some CPU doesn't like that. - * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum - * 383 on page 105. Intel should be safe but is also warns that it's - * only safe if the permission and cache attributes of the two entries - * loaded in the two TLB is identical (which should be the case here). - * But it is generally safer to never allow small and huge TLB entries - * for the same virtual address to be loaded simultaneously. So instead - * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the - * current pmd notpresent (atomically because here the pmd_trans_huge - * must remain set at all times on the pmd until the split is complete - * for this pmd), then we flush the SMP TLB and finally we write the - * non-huge version of the pmd entry with pmd_populate. - */ - old_pmd = pmdp_invalidate(vma, haddr, pmd); - - pmd_migration = is_pmd_migration_entry(old_pmd); + pmd_migration = is_pmd_migration_entry(*pmd); if (unlikely(pmd_migration)) { swp_entry_t entry; + old_pmd = *pmd; entry = pmd_to_swp_entry(old_pmd); page = pfn_swap_entry_to_page(entry); write = is_writable_migration_entry(entry); @@ -2466,6 +2445,30 @@ static void __split_huge_pmd_locked(stru soft_dirty = pmd_swp_soft_dirty(old_pmd); uffd_wp = pmd_swp_uffd_wp(old_pmd); } else { + /* + * Up to this point the pmd is present and huge and userland has + * the whole access to the hugepage during the split (which + * happens in place). If we overwrite the pmd with the not-huge + * version pointing to the pte here (which of course we could if + * all CPUs were bug free), userland could trigger a small page + * size TLB miss on the small sized TLB while the hugepage TLB + * entry is still established in the huge TLB. Some CPU doesn't + * like that. See + * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum + * 383 on page 105. Intel should be safe but is also warns that + * it's only safe if the permission and cache attributes of the + * two entries loaded in the two TLB is identical (which should + * be the case here). But it is generally safer to never allow + * small and huge TLB entries for the same virtual address to be + * loaded simultaneously. So instead of doing "pmd_populate(); + * flush_pmd_tlb_range();" we first mark the current pmd + * notpresent (atomically because here the pmd_trans_huge must + * remain set at all times on the pmd until the split is + * complete for this pmd), then we flush the SMP TLB and finally + * we write the non-huge version of the pmd entry with + * pmd_populate. + */ + old_pmd = pmdp_invalidate(vma, haddr, pmd); page = pmd_page(old_pmd); folio = page_folio(page); if (pmd_dirty(old_pmd)) { --- a/mm/pgtable-generic.c~mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast +++ a/mm/pgtable-generic.c @@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(st pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return old; @@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_str pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { + VM_WARN_ON_ONCE(!pmd_present(*pmdp)); return pmdp_invalidate(vma, address, pmdp); } #endif _ Patches currently in -mm which might be from ryan.roberts(a)arm.com are

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] fs-proc-task_mmu-fix-uffd-wp-confusion-in-pagemap_scan_pmd_entry.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: fs/proc/task_mmu: fix uffd-wp confusion in pagemap_scan_pmd_entry() has been removed from the -mm tree. Its filename was fs-proc-task_mmu-fix-uffd-wp-confusion-in-pagemap_scan_pmd_entry.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryan Roberts <ryan.roberts(a)arm.com> Subject: fs/proc/task_mmu: fix uffd-wp confusion in pagemap_scan_pmd_entry() Date: Mon, 29 Apr 2024 12:41:04 +0100 pagemap_scan_pmd_entry() checks if uffd-wp is set on each pte to avoid unnecessary if set. However it was previously checking with `pte_uffd_wp(ptep_get(pte))` without first confirming that the pte was present. It is only valid to call pte_uffd_wp() for present ptes. For swap ptes, pte_swp_uffd_wp() must be called because the uffd-wp bit may be kept in a different position, depending on the arch. This was leading to test failures in the pagemap_ioctl mm selftest, when bringing up uffd-wp support on arm64 due to incorrectly interpretting the uffd-wp status of migration entries. Let's fix this by using the correct check based on pte_present(). While we are at it, let's pass the pte to make_uffd_wp_pte() to avoid the pointless extra ptep_get() which can't be optimized out due to READ_ONCE() on many arches. Link: https://lkml.kernel.org/r/20240429114104.182890-1-ryan.roberts@arm.com Fixes: 12f6b01a0bcb ("fs/proc/task_mmu: add fast paths to get/clear PAGE_IS_WRITTEN flag") Closes: https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/proc/task_mmu.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) --- a/fs/proc/task_mmu.c~fs-proc-task_mmu-fix-uffd-wp-confusion-in-pagemap_scan_pmd_entry +++ a/fs/proc/task_mmu.c @@ -1817,10 +1817,8 @@ static unsigned long pagemap_page_catego } static void make_uffd_wp_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte) + unsigned long addr, pte_t *pte, pte_t ptent) { - pte_t ptent = ptep_get(pte); - if (pte_present(ptent)) { pte_t old_pte; @@ -2175,9 +2173,12 @@ static int pagemap_scan_pmd_entry(pmd_t if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) { /* Fast path for performing exclusive WP */ for (addr = start; addr != end; pte++, addr += PAGE_SIZE) { - if (pte_uffd_wp(ptep_get(pte))) + pte_t ptent = ptep_get(pte); + + if ((pte_present(ptent) && pte_uffd_wp(ptent)) || + pte_swp_uffd_wp_any(ptent)) continue; - make_uffd_wp_pte(vma, addr, pte); + make_uffd_wp_pte(vma, addr, pte, ptent); if (!flush_end) start = addr; flush_end = addr + PAGE_SIZE; @@ -2190,8 +2191,10 @@ static int pagemap_scan_pmd_entry(pmd_t p->arg.return_mask == PAGE_IS_WRITTEN) { for (addr = start; addr < end; pte++, addr += PAGE_SIZE) { unsigned long next = addr + PAGE_SIZE; + pte_t ptent = ptep_get(pte); - if (pte_uffd_wp(ptep_get(pte))) + if ((pte_present(ptent) && pte_uffd_wp(ptent)) || + pte_swp_uffd_wp_any(ptent)) continue; ret = pagemap_scan_output(p->cur_vma_category | PAGE_IS_WRITTEN, p, addr, &next); @@ -2199,7 +2202,7 @@ static int pagemap_scan_pmd_entry(pmd_t break; if (~p->arg.flags & PM_SCAN_WP_MATCHING) continue; - make_uffd_wp_pte(vma, addr, pte); + make_uffd_wp_pte(vma, addr, pte, ptent); if (!flush_end) start = addr; flush_end = next; @@ -2208,8 +2211,9 @@ static int pagemap_scan_pmd_entry(pmd_t } for (addr = start; addr != end; pte++, addr += PAGE_SIZE) { + pte_t ptent = ptep_get(pte); unsigned long categories = p->cur_vma_category | - pagemap_page_category(p, vma, addr, ptep_get(pte)); + pagemap_page_category(p, vma, addr, ptent); unsigned long next = addr + PAGE_SIZE; if (!pagemap_scan_is_interesting_page(categories, p)) @@ -2224,7 +2228,7 @@ static int pagemap_scan_pmd_entry(pmd_t if (~categories & PAGE_IS_WRITTEN) continue; - make_uffd_wp_pte(vma, addr, pte); + make_uffd_wp_pte(vma, addr, pte, ptent); if (!flush_end) start = addr; flush_end = next; _ Patches currently in -mm which might be from ryan.roberts(a)arm.com are selftests-mm-soft-dirty-should-fail-if-a-testcase-fails.patch mm-debug_vm_pgtable-test-pmd_leaf-behavior-with-pmd_mkinvalid.patch mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast.patch

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] fs-proc-task_mmu-fix-loss-of-young-dirty-bits-during-pagemap-scan.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: fs/proc/task_mmu: fix loss of young/dirty bits during pagemap scan has been removed from the -mm tree. Its filename was fs-proc-task_mmu-fix-loss-of-young-dirty-bits-during-pagemap-scan.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryan Roberts <ryan.roberts(a)arm.com> Subject: fs/proc/task_mmu: fix loss of young/dirty bits during pagemap scan Date: Mon, 29 Apr 2024 12:40:17 +0100 make_uffd_wp_pte() was previously doing: pte = ptep_get(ptep); ptep_modify_prot_start(ptep); pte = pte_mkuffd_wp(pte); ptep_modify_prot_commit(ptep, pte); But if another thread accessed or dirtied the pte between the first 2 calls, this could lead to loss of that information. Since ptep_modify_prot_start() gets and clears atomically, the following is the correct pattern and prevents any possible race. Any access after the first call would see an invalid pte and cause a fault: pte = ptep_modify_prot_start(ptep); pte = pte_mkuffd_wp(pte); ptep_modify_prot_commit(ptep, pte); Link: https://lkml.kernel.org/r/20240429114017.182570-1-ryan.roberts@arm.com Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/proc/task_mmu.c~fs-proc-task_mmu-fix-loss-of-young-dirty-bits-during-pagemap-scan +++ a/fs/proc/task_mmu.c @@ -1825,7 +1825,7 @@ static void make_uffd_wp_pte(struct vm_a pte_t old_pte; old_pte = ptep_modify_prot_start(vma, addr, pte); - ptent = pte_mkuffd_wp(ptent); + ptent = pte_mkuffd_wp(old_pte); ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); } else if (is_swap_pte(ptent)) { ptent = pte_swp_mkuffd_wp(ptent); _ Patches currently in -mm which might be from ryan.roberts(a)arm.com are selftests-mm-soft-dirty-should-fail-if-a-testcase-fails.patch mm-debug_vm_pgtable-test-pmd_leaf-behavior-with-pmd_mkinvalid.patch mm-fix-race-between-__split_huge_pmd_locked-and-gup-fast.patch

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] mm-use-memalloc_nofs_save-in-page_cache_ra_order.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: use memalloc_nofs_save() in page_cache_ra_order() has been removed from the -mm tree. Its filename was mm-use-memalloc_nofs_save-in-page_cache_ra_order.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Kefeng Wang <wangkefeng.wang(a)huawei.com> Subject: mm: use memalloc_nofs_save() in page_cache_ra_order() Date: Fri, 26 Apr 2024 19:29:38 +0800 See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio. INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190 Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Zhang Yi <yi.zhang(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/readahead.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/mm/readahead.c~mm-use-memalloc_nofs_save-in-page_cache_ra_order +++ a/mm/readahead.c @@ -490,6 +490,7 @@ void page_cache_ra_order(struct readahea pgoff_t index = readahead_index(ractl); pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; pgoff_t mark = index + ra->size - ra->async_size; + unsigned int nofs; int err = 0; gfp_t gfp = readahead_gfp_mask(mapping); @@ -504,6 +505,8 @@ void page_cache_ra_order(struct readahea new_order = min_t(unsigned int, new_order, ilog2(ra->size)); } + /* See comment in page_cache_ra_unbounded() */ + nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); while (index <= limit) { unsigned int order = new_order; @@ -527,6 +530,7 @@ void page_cache_ra_order(struct readahea read_pages(ractl); filemap_invalidate_unlock_shared(mapping); + memalloc_nofs_restore(nofs); /* * If there were already pages in the page cache, then we may have _ Patches currently in -mm which might be from wangkefeng.wang(a)huawei.com are arm64-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch arm-mm-drop-vm_fault_badmap-vm_fault_badaccess.patch mm-move-mm-counter-updating-out-of-set_pte_range.patch mm-filemap-batch-mm-counter-updating-in-filemap_map_pages.patch mm-swapfile-check-usable-swap-device-in-__folio_throttle_swaprate.patch mm-memory-check-userfaultfd_wp-in-vmf_orig_pte_uffd_wp.patch

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] kmsan-compiler_types-declare-__no_sanitize_or_inline.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: kmsan: compiler_types: declare __no_sanitize_or_inline has been removed from the -mm tree. Its filename was kmsan-compiler_types-declare-__no_sanitize_or_inline.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Alexander Potapenko <glider(a)google.com> Subject: kmsan: compiler_types: declare __no_sanitize_or_inline Date: Fri, 26 Apr 2024 11:16:22 +0200 It turned out that KMSAN instruments READ_ONCE_NOCHECK(), resulting in false positive reports, because __no_sanitize_or_inline enforced inlining. Properly declare __no_sanitize_or_inline under __SANITIZE_MEMORY__, so that it does not __always_inline the annotated function. Link: https://lkml.kernel.org/r/20240426091622.3846771-1-glider@google.com Fixes: 5de0ce85f5a4 ("kmsan: mark noinstr as __no_sanitize_memory") Signed-off-by: Alexander Potapenko <glider(a)google.com> Reported-by: syzbot+355c5bb8c1445c871ee8(a)syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/000000000000826ac1061675b0e3@google.com Cc: <stable(a)vger.kernel.org> Reviewed-by: Marco Elver <elver(a)google.com> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: Miguel Ojeda <ojeda(a)kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/compiler_types.h | 11 +++++++++++ 1 file changed, 11 insertions(+) --- a/include/linux/compiler_types.h~kmsan-compiler_types-declare-__no_sanitize_or_inline +++ a/include/linux/compiler_types.h @@ -278,6 +278,17 @@ struct ftrace_likely_data { # define __no_kcsan #endif +#ifdef __SANITIZE_MEMORY__ +/* + * Similarly to KASAN and KCSAN, KMSAN loses function attributes of inlined + * functions, therefore disabling KMSAN checks also requires disabling inlining. + * + * __no_sanitize_or_inline effectively prevents KMSAN from reporting errors + * within the function and marks all its outputs as initialized. + */ +# define __no_sanitize_or_inline __no_kmsan_checks notrace __maybe_unused +#endif + #ifndef __no_sanitize_or_inline #define __no_sanitize_or_inline __always_inline #endif _ Patches currently in -mm which might be from glider(a)google.com are

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] maple_tree-fix-mas_empty_area_rev-null-pointer-dereference.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: maple_tree: fix mas_empty_area_rev() null pointer dereference has been removed from the -mm tree. Its filename was maple_tree-fix-mas_empty_area_rev-null-pointer-dereference.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com> Subject: maple_tree: fix mas_empty_area_rev() null pointer dereference Date: Mon, 22 Apr 2024 16:33:49 -0400 Currently the code calls mas_start() followed by mas_data_end() if the maple state is MA_START, but mas_start() may return with the maple state node == NULL. This will lead to a null pointer dereference when checking information in the NULL node, which is done in mas_data_end(). Avoid setting the offset if there is no node by waiting until after the maple state is checked for an empty or single entry state. A user could trigger the events to cause a kernel oops by unmapping all vmas to produce an empty maple tree, then mapping a vma that would cause the scenario described above. Link: https://lkml.kernel.org/r/20240422203349.2418465-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reported-by: Marius Fleischer <fleischermarius(a)gmail.com> Closes: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW… Link: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW… Tested-by: Marius Fleischer <fleischermarius(a)gmail.com> Tested-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/maple_tree.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) --- a/lib/maple_tree.c~maple_tree-fix-mas_empty_area_rev-null-pointer-dereference +++ a/lib/maple_tree.c @@ -5109,18 +5109,18 @@ int mas_empty_area_rev(struct ma_state * if (size == 0 || max - min < size - 1) return -EINVAL; - if (mas_is_start(mas)) { + if (mas_is_start(mas)) mas_start(mas); - mas->offset = mas_data_end(mas); - } else if (mas->offset >= 2) { - mas->offset -= 2; - } else if (!mas_rewind_node(mas)) { + else if ((mas->offset < 2) && (!mas_rewind_node(mas))) return -EBUSY; - } - /* Empty set. */ - if (mas_is_none(mas) || mas_is_ptr(mas)) + if (unlikely(mas_is_none(mas) || mas_is_ptr(mas))) return mas_sparse_area(mas, min, max, size, false); + else if (mas->offset >= 2) + mas->offset -= 2; + else + mas->offset = mas_data_end(mas); + /* The start of the window can only be within these values. */ mas->index = min; _ Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] mm-userfaultfd-reset-ptes-when-close-for-wr-protected-ones.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/userfaultfd: reset ptes when close() for wr-protected ones has been removed from the -mm tree. Its filename was mm-userfaultfd-reset-ptes-when-close-for-wr-protected-ones.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Peter Xu <peterx(a)redhat.com> Subject: mm/userfaultfd: reset ptes when close() for wr-protected ones Date: Mon, 22 Apr 2024 09:33:11 -0400 Userfaultfd unregister includes a step to remove wr-protect bits from all the relevant pgtable entries, but that only covered an explicit UFFDIO_UNREGISTER ioctl, not a close() on the userfaultfd itself. Cover that too. This fixes a WARN trace. The only user visible side effect is the user can observe leftover wr-protect bits even if the user close()ed on an userfaultfd when releasing the last reference of it. However hopefully that should be harmless, and nothing bad should happen even if so. This change is now more important after the recent page-table-check patch we merged in mm-unstable (446dd9ad37d0 ("mm/page_table_check: support userfault wr-protect entries")), as we'll do sanity check on uffd-wp bits without vma context. So it's better if we can 100% guarantee no uffd-wp bit leftovers, to make sure each report will be valid. Link: https://lore.kernel.org/all/000000000000ca4df20616a0fe16@google.com/ Fixes: f369b07c8614 ("mm/uffd: reset write protection when unregister with wp-mode") Analyzed-by: David Hildenbrand <david(a)redhat.com> Link: https://lkml.kernel.org/r/20240422133311.2987675-1-peterx@redhat.com Reported-by: syzbot+d8426b591c36b21c750e(a)syzkaller.appspotmail.com Signed-off-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/userfaultfd.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/fs/userfaultfd.c~mm-userfaultfd-reset-ptes-when-close-for-wr-protected-ones +++ a/fs/userfaultfd.c @@ -895,6 +895,10 @@ static int userfaultfd_release(struct in prev = vma; continue; } + /* Reset ptes for the whole vma range if wr-protected */ + if (userfaultfd_wp(vma)) + uffd_wp_range(vma, vma->vm_start, + vma->vm_end - vma->vm_start, false); new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; vma = vma_modify_flags_uffd(&vmi, prev, vma, vma->vm_start, vma->vm_end, new_flags, _ Patches currently in -mm which might be from peterx(a)redhat.com are mm-hugetlb-assert-hugetlb_lock-in-__hugetlb_cgroup_commit_charge.patch mm-page_table_check-support-userfault-wr-protect-entries.patch mm-gup-fix-hugepd-handling-in-hugetlb-rework.patch

1 year, 2 months

1
0
0 0

+ docs-admin-guide-mm-damon-usage-fix-wrong-schemes-effective-quota-update-command.patch added to mm-unstable branch

by Andrew Morton

The patch titled Subject: Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command has been added to the -mm mm-unstable branch. Its filename is docs-admin-guide-mm-damon-usage-fix-wrong-schemes-effective-quota-update-command.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: SeongJae Park <sj(a)kernel.org> Subject: Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Date: Fri, 3 May 2024 11:03:15 -0700 To update effective size quota of DAMOS schemes on DAMON sysfs file interface, user should write 'update_schemes_effective_quotas' to the kdamond 'state' file. But the document is mistakenly saying the input string as 'update_schemes_effective_bytes'. Fix it (s/bytes/quotas/). Link: https://lkml.kernel.org/r/20240503180318.72798-8-sj@kernel.org Fixes: a6068d6dfa2f ("Docs/admin-guide/mm/damon/usage: document effective_bytes file") Signed-off-by: SeongJae Park <sj(a)kernel.org> Cc: <stable(a)vger.kernel.org> [6.9.x] Cc: Jonathan Corbet <corbet(a)lwn.net> Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/admin-guide/mm/damon/usage.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-fix-wrong-schemes-effective-quota-update-command +++ a/Documentation/admin-guide/mm/damon/usage.rst @@ -153,7 +153,7 @@ Users can write below commands for the k - ``clear_schemes_tried_regions``: Clear the DAMON-based operating scheme action tried regions directory for each DAMON-based operation scheme of the kdamond. -- ``update_schemes_effective_bytes``: Update the contents of +- ``update_schemes_effective_quotas``: Update the contents of ``effective_bytes`` files for each DAMON-based operation scheme of the kdamond. For more details, refer to :ref:`quotas directory <sysfs_quotas>`. @@ -342,7 +342,7 @@ Based on the user-specified :ref:`goal < effective size quota is further adjusted. Reading ``effective_bytes`` returns the current effective size quota. The file is not updated in real time, so users should ask DAMON sysfs interface to update the content of the file for -the stats by writing a special keyword, ``update_schemes_effective_bytes`` to +the stats by writing a special keyword, ``update_schemes_effective_quotas`` to the relevant ``kdamonds/<N>/state`` file. Under ``weights`` directory, three files (``sz_permil``, _ Patches currently in -mm which might be from sj(a)kernel.org are mm-damon-paddr-implement-damon_folio_young.patch mm-damon-paddr-implement-damon_folio_mkold.patch mm-damon-add-damos-filter-type-young.patch mm-damon-paddr-implement-damos-filter-type-young.patch docs-mm-damon-design-document-young-page-type-damos-filter.patch docs-admin-guide-mm-damon-usage-update-for-young-page-type-damos-filter.patch docs-abi-damon-update-for-youg-page-type-damos-filter.patch mm-damon-paddr-avoid-unnecessary-page-level-access-check-for-pageout-damos-action.patch mm-damon-paddr-do-page-level-access-check-for-pageout-damos-action-on-its-own.patch mm-vmscan-remove-ignore_references-argument-of-reclaim_pages.patch mm-vmscan-remove-ignore_references-argument-of-reclaim_folio_list.patch selftests-damon-_damon_sysfs-support-quota-goals.patch selftests-damon-add-a-test-for-damos-quota-goal.patch mm-damon-core-initialize-esz_bp-from-damos_quota_init_priv.patch selftests-damon-_damon_sysfs-check-errors-from-nr_schemes-file-reads.patch selftests-damon-_damon_sysfs-find-sysfs-mount-point-from-proc-mounts.patch selftests-damon-_damon_sysfs-use-is-instead-of-==-for-none.patch selftests-damon-classify-tests-for-functionalities-and-regressions.patch docs-admin-guide-mm-damon-usage-fix-wrong-example-of-damos-filter-matching-sysfs-file.patch docs-admin-guide-mm-damon-usage-fix-wrong-schemes-effective-quota-update-command.patch docs-mm-damon-design-use-a-list-for-supported-filters.patch docs-mm-damon-maintainer-profile-change-the-maintainers-timezone-from-pst-to-pt.patch docs-mm-damon-maintainer-profile-allow-posting-patches-based-on-damon-next-tree.patch

1 year, 2 months

1
0
0 0

+ docs-admin-guide-mm-damon-usage-fix-wrong-example-of-damos-filter-matching-sysfs-file.patch added to mm-unstable branch

by Andrew Morton

The patch titled Subject: Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file has been added to the -mm mm-unstable branch. Its filename is docs-admin-guide-mm-damon-usage-fix-wrong-example-of-damos-filter-matching-sysfs-file.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: SeongJae Park <sj(a)kernel.org> Subject: Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file Date: Fri, 3 May 2024 11:03:14 -0700 The example usage of DAMOS filter sysfs files, specifically the part of 'matching' file writing for memcg type filter, is wrong. The intention is to exclude pages of a memcg that already getting enough care from a given scheme, but the example is setting the filter to apply the scheme to only the pages of the memcg. Fix it. Link: https://lkml.kernel.org/r/20240503180318.72798-7-sj@kernel.org Fixes: 9b7f9322a530 ("Docs/admin-guide/mm/damon/usage: document DAMOS filters of sysfs") Closes: https://lore.kernel.org/r/20240317191358.97578-1-sj@kernel.org Signed-off-by: SeongJae Park <sj(a)kernel.org> Cc: <stable(a)vger.kernel.org> [6.3.x] Cc: Jonathan Corbet <corbet(a)lwn.net> Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/admin-guide/mm/damon/usage.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-fix-wrong-example-of-damos-filter-matching-sysfs-file +++ a/Documentation/admin-guide/mm/damon/usage.rst @@ -434,7 +434,7 @@ pages of all memory cgroups except ``/ha # # further filter out all cgroups except one at '/having_care_already' echo memcg > 1/type echo /having_care_already > 1/memcg_path - echo N > 1/matching + echo Y > 1/matching Note that ``anon`` and ``memcg`` filters are currently supported only when ``paddr`` :ref:`implementation <sysfs_context>` is being used. _ Patches currently in -mm which might be from sj(a)kernel.org are mm-damon-paddr-implement-damon_folio_young.patch mm-damon-paddr-implement-damon_folio_mkold.patch mm-damon-add-damos-filter-type-young.patch mm-damon-paddr-implement-damos-filter-type-young.patch docs-mm-damon-design-document-young-page-type-damos-filter.patch docs-admin-guide-mm-damon-usage-update-for-young-page-type-damos-filter.patch docs-abi-damon-update-for-youg-page-type-damos-filter.patch mm-damon-paddr-avoid-unnecessary-page-level-access-check-for-pageout-damos-action.patch mm-damon-paddr-do-page-level-access-check-for-pageout-damos-action-on-its-own.patch mm-vmscan-remove-ignore_references-argument-of-reclaim_pages.patch mm-vmscan-remove-ignore_references-argument-of-reclaim_folio_list.patch selftests-damon-_damon_sysfs-support-quota-goals.patch selftests-damon-add-a-test-for-damos-quota-goal.patch mm-damon-core-initialize-esz_bp-from-damos_quota_init_priv.patch selftests-damon-_damon_sysfs-check-errors-from-nr_schemes-file-reads.patch selftests-damon-_damon_sysfs-find-sysfs-mount-point-from-proc-mounts.patch selftests-damon-_damon_sysfs-use-is-instead-of-==-for-none.patch selftests-damon-classify-tests-for-functionalities-and-regressions.patch docs-admin-guide-mm-damon-usage-fix-wrong-example-of-damos-filter-matching-sysfs-file.patch docs-admin-guide-mm-damon-usage-fix-wrong-schemes-effective-quota-update-command.patch docs-mm-damon-design-use-a-list-for-supported-filters.patch docs-mm-damon-maintainer-profile-change-the-maintainers-timezone-from-pst-to-pt.patch docs-mm-damon-maintainer-profile-allow-posting-patches-based-on-damon-next-tree.patch

1 year, 2 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror May 2024