April 2024 - Linux-stable-mirror

撤回: [Request] backport a mainline patch to Linux kernel-5.10 stable tree

by Bo Ye (叶波)

Bo Ye (叶波) 将撤回邮件“[Request] backport a mainline patch to Linux kernel-5.10 stable tree”。

1 year, 2 months

2
1
0 0

撤回: [Request] backport a mainline patch to Linux kernel-5.10 stable tree

by Bo Ye (叶波)

Bo Ye (叶波) 将撤回邮件“[Request] backport a mainline patch to Linux kernel-5.10 stable tree”。

1 year, 2 months

1
0
0 0

撤回: [Request] backport a mainline patch to Linux kernel-5.10 stable tree

by Bo Ye (叶波)

Bo Ye (叶波) 将撤回邮件“[Request] backport a mainline patch to Linux kernel-5.10 stable tree”。

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly has been removed from the -mm tree. Its filename was mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly Date: Thu, 14 Mar 2024 17:12:59 +0100 Darrick reports that in some cases where pread() would fail with -EIO and mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ / MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT. While the madvise() call can be interrupted by a signal, this is not the desired behavior. MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave like page faults in that case: fail and not retry forever. A reproducer can be found at [1]. The reason is that __get_user_pages(), as called by faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way: it will simply return 0 when VM_FAULT_RETRY happened, making madvise_populate()->faultin_vma_page_range() retry again and again, never setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages(). __get_user_pages_locked() does what we want, but duplicating that logic in faultin_vma_page_range() feels wrong. So let's use __get_user_pages_locked() instead, that will detect VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT from faultin_page() to __get_user_pages(), all the way to madvise_populate(). But, there is an issue: __get_user_pages_locked() will end up re-taking the MM lock and then __get_user_pages() will do another VMA lookup. In the meantime, the VMA layout could have changed and we'd fail with different error codes than we'd want to. As __get_user_pages() will currently do a new VMA lookup either way, let it do the VMA handling in a different way, controlled by a new FOLL_MADV_POPULATE flag, effectively moving these checks from madvise_populate() + faultin_page_range() in there. With this change, Darricks reproducer properly fails with -EFAULT, as documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE. [1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/ Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: Darrick J. Wong <djwong(a)kernel.org> Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/ Cc: Darrick J. Wong <djwong(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Jason Gunthorpe <jgg(a)nvidia.com> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/gup.c | 54 ++++++++++++++++++++++++++++-------------------- mm/internal.h | 10 +++++--- mm/madvise.c | 17 +-------------- 3 files changed, 40 insertions(+), 41 deletions(-) --- a/mm/gup.c~mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly +++ a/mm/gup.c @@ -1206,6 +1206,22 @@ static long __get_user_pages(struct mm_s /* first iteration or cross vma bound */ if (!vma || start >= vma->vm_end) { + /* + * MADV_POPULATE_(READ|WRITE) wants to handle VMA + * lookups+error reporting differently. + */ + if (gup_flags & FOLL_MADV_POPULATE) { + vma = vma_lookup(mm, start); + if (!vma) { + ret = -ENOMEM; + goto out; + } + if (check_vma_flags(vma, gup_flags)) { + ret = -EINVAL; + goto out; + } + goto retry; + } vma = gup_vma_lookup(mm, start); if (!vma && in_gate_area(mm, start)) { ret = get_gate_page(mm, start & PAGE_MASK, @@ -1685,35 +1701,35 @@ long populate_vma_page_range(struct vm_a } /* - * faultin_vma_page_range() - populate (prefault) page tables inside the - * given VMA range readable/writable + * faultin_page_range() - populate (prefault) page tables inside the + * given range readable/writable * * This takes care of mlocking the pages, too, if VM_LOCKED is set. * - * @vma: target vma + * @mm: the mm to populate page tables in * @start: start address * @end: end address * @write: whether to prefault readable or writable * @locked: whether the mmap_lock is still held * - * Returns either number of processed pages in the vma, or a negative error - * code on error (see __get_user_pages()). + * Returns either number of processed pages in the MM, or a negative error + * code on error (see __get_user_pages()). Note that this function reports + * errors related to VMAs, such as incompatible mappings, as expected by + * MADV_POPULATE_(READ|WRITE). * - * vma->vm_mm->mmap_lock must be held. The range must be page-aligned and - * covered by the VMA. If it's released, *@locked will be set to 0. + * The range must be page-aligned. + * + * mm->mmap_lock must be held. If it's released, *@locked will be set to 0. */ -long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, bool write, int *locked) +long faultin_page_range(struct mm_struct *mm, unsigned long start, + unsigned long end, bool write, int *locked) { - struct mm_struct *mm = vma->vm_mm; unsigned long nr_pages = (end - start) / PAGE_SIZE; int gup_flags; long ret; VM_BUG_ON(!PAGE_ALIGNED(start)); VM_BUG_ON(!PAGE_ALIGNED(end)); - VM_BUG_ON_VMA(start < vma->vm_start, vma); - VM_BUG_ON_VMA(end > vma->vm_end, vma); mmap_assert_locked(mm); /* @@ -1725,19 +1741,13 @@ long faultin_vma_page_range(struct vm_ar * a poisoned page. * !FOLL_FORCE: Require proper access permissions. */ - gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE; + gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE | + FOLL_MADV_POPULATE; if (write) gup_flags |= FOLL_WRITE; - /* - * We want to report -EINVAL instead of -EFAULT for any permission - * problems or incompatible mappings. - */ - if (check_vma_flags(vma, gup_flags)) - return -EINVAL; - - ret = __get_user_pages(mm, start, nr_pages, gup_flags, - NULL, locked); + ret = __get_user_pages_locked(mm, start, nr_pages, NULL, locked, + gup_flags); lru_add_drain(); return ret; } --- a/mm/internal.h~mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly +++ a/mm/internal.h @@ -686,9 +686,8 @@ struct anon_vma *folio_anon_vma(struct f void unmap_mapping_folio(struct folio *folio); extern long populate_vma_page_range(struct vm_area_struct *vma, unsigned long start, unsigned long end, int *locked); -extern long faultin_vma_page_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end, - bool write, int *locked); +extern long faultin_page_range(struct mm_struct *mm, unsigned long start, + unsigned long end, bool write, int *locked); extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags, unsigned long bytes); @@ -1127,10 +1126,13 @@ enum { FOLL_FAST_ONLY = 1 << 20, /* allow unlocking the mmap lock */ FOLL_UNLOCKABLE = 1 << 21, + /* VMA lookup+checks compatible with MADV_POPULATE_(READ|WRITE) */ + FOLL_MADV_POPULATE = 1 << 22, }; #define INTERNAL_GUP_FLAGS (FOLL_TOUCH | FOLL_TRIED | FOLL_REMOTE | FOLL_PIN | \ - FOLL_FAST_ONLY | FOLL_UNLOCKABLE) + FOLL_FAST_ONLY | FOLL_UNLOCKABLE | \ + FOLL_MADV_POPULATE) /* * Indicates for which pages that are write-protected in the page table, --- a/mm/madvise.c~mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly +++ a/mm/madvise.c @@ -908,27 +908,14 @@ static long madvise_populate(struct vm_a { const bool write = behavior == MADV_POPULATE_WRITE; struct mm_struct *mm = vma->vm_mm; - unsigned long tmp_end; int locked = 1; long pages; *prev = vma; while (start < end) { - /* - * We might have temporarily dropped the lock. For example, - * our VMA might have been split. - */ - if (!vma || start >= vma->vm_end) { - vma = vma_lookup(mm, start); - if (!vma) - return -ENOMEM; - } - - tmp_end = min_t(unsigned long, end, vma->vm_end); /* Populate (prefault) page tables readable/writable. */ - pages = faultin_vma_page_range(vma, start, tmp_end, write, - &locked); + pages = faultin_page_range(mm, start, end, write, &locked); if (!locked) { mmap_read_lock(mm); locked = 1; @@ -949,7 +936,7 @@ static long madvise_populate(struct vm_a pr_warn_once("%s: unhandled return value: %ld\n", __func__, pages); fallthrough; - case -ENOMEM: + case -ENOMEM: /* No VMA or out of memory. */ return -ENOMEM; } } _ Patches currently in -mm which might be from david(a)redhat.com are mm-madvise-dont-perform-madvise-vma-walk-for-madv_populate_readwrite.patch mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared.patch mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared-fix.patch selftests-memfd_secret-add-vmsplice-test.patch mm-merge-folio_is_secretmem-and-folio_fast_pin_allowed-into-gup_fast_folio_allowed.patch mm-optimize-config_per_vma_lock-member-placement-in-vm_area_struct.patch mm-remove-prot-parameter-from-move_pte.patch mm-gup-consistently-name-gup-fast-functions.patch mm-treewide-rename-config_have_fast_gup-to-config_have_gup_fast.patch mm-use-gup-fast-instead-fast-gup-in-remaining-comments.patch drivers-virt-acrn-fix-pfnmap-pte-checks-in-acrn_vm_ram_map.patch mm-pass-vma-instead-of-mm-to-follow_pte.patch mm-follow_pte-improvements.patch mm-allow-for-detecting-underflows-with-page_mapcount-again.patch mm-rmap-always-inline-anon-file-rmap-duplication-of-a-single-pte.patch mm-rmap-add-fast-path-for-small-folios-when-adding-removing-duplicating.patch mm-track-mapcount-of-large-folios-in-single-value.patch mm-improve-folio_likely_mapped_shared-using-the-mapcount-of-large-folios.patch mm-make-folio_mapcount-return-0-for-small-typed-folios.patch mm-memory-use-folio_mapcount-in-zap_present_folio_ptes.patch mm-huge_memory-use-folio_mapcount-in-zap_huge_pmd-sanity-check.patch mm-memory-failure-use-folio_mapcount-in-hwpoison_user_mappings.patch mm-page_alloc-use-folio_mapped-in-__alloc_contig_migrate_range.patch mm-migrate-use-folio_likely_mapped_shared-in-add_page_for_migration.patch sh-mm-cache-use-folio_mapped-in-copy_from_user_page.patch mm-filemap-use-folio_mapcount-in-filemap_unaccount_folio.patch mm-migrate_device-use-folio_mapcount-in-migrate_vma_check_page.patch trace-events-page_ref-trace-the-raw-page-mapcount-value.patch xtensa-mm-convert-check_tlb_entry-to-sanity-check-folios.patch mm-debug-print-only-page-mapcount-excluding-folio-entire-mapcount-in-__dump_folio.patch documentation-admin-guide-cgroup-v1-memoryrst-dont-reference-page_mapcount.patch mm-ksm-rename-get_ksm_page_flags-to-ksm_get_folio_flags.patch mm-ksm-remove-page_mapcount-usage-in-stable_tree_search.patch loongarch-tlb-fix-error-parameter-ptep-set-but-not-used-due-to-__tlb_remove_tlb_entry.patch

1 year, 2 months

2
1
0 0

[merged mm-hotfixes-stable] nilfs2-fix-oob-in-nilfs_set_de_type.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: nilfs2: fix OOB in nilfs_set_de_type has been removed from the -mm tree. Its filename was nilfs2-fix-oob-in-nilfs_set_de_type.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Jeongjun Park <aha310510(a)gmail.com> Subject: nilfs2: fix OOB in nilfs_set_de_type Date: Tue, 16 Apr 2024 03:20:48 +0900 The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function, which uses this array, specifies the index to read from the array in the same way as "(mode & S_IFMT) >> S_SHIFT". static void nilfs_set_de_type(struct nilfs_dir_entry *de, struct inode *inode) { umode_t mode = inode->i_mode; de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob } However, when the index is determined this way, an out-of-bounds (OOB) error occurs by referring to an index that is 1 larger than the array size when the condition "mode & S_IFMT == S_IFMT" is satisfied. Therefore, a patch to resize the nilfs_type_by_mode array should be applied to prevent OOB errors. Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com Reported-by: syzbot+2e22057de05b9f3b30d8(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/nilfs2/dir.c~nilfs2-fix-oob-in-nilfs_set_de_type +++ a/fs/nilfs2/dir.c @@ -240,7 +240,7 @@ nilfs_filetype_table[NILFS_FT_MAX] = { #define S_SHIFT 12 static unsigned char -nilfs_type_by_mode[S_IFMT >> S_SHIFT] = { +nilfs_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = { [S_IFREG >> S_SHIFT] = NILFS_FT_REG_FILE, [S_IFDIR >> S_SHIFT] = NILFS_FT_DIR, [S_IFCHR >> S_SHIFT] = NILFS_FT_CHRDEV, _ Patches currently in -mm which might be from aha310510(a)gmail.com are

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: fork: defer linking file vma until vma is fully initialized has been removed from the -mm tree. Its filename was fork-defer-linking-file-vma-until-vma-is-fully-initialized.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Miaohe Lin <linmiaohe(a)huawei.com> Subject: fork: defer linking file vma until vma is fully initialized Date: Wed, 10 Apr 2024 17:14:41 +0800 Thorvald reported a WARNING [1]. And the root cause is below race: CPU 1 CPU 2 fork hugetlbfs_fallocate dup_mmap hugetlbfs_punch_hole i_mmap_lock_write(mapping); vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree. i_mmap_unlock_write(mapping); hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem! i_mmap_lock_write(mapping); hugetlb_vmdelete_list vma_interval_tree_foreach hugetlb_vma_trylock_write -- Vma_lock is cleared. tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem! hugetlb_vma_unlock_write -- Vma_lock is assigned!!! i_mmap_unlock_write(mapping); hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside i_mmap_rwsem lock while vma lock can be used in the same time. Fix this by deferring linking file vma until vma is fully initialized. Those vmas should be initialized first before they can be used. Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing") Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com> Reported-by: Thorvald Natvig <thorvald(a)google.com> Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1] Reviewed-by: Jane Chu <jane.chu(a)oracle.com> Cc: Christian Brauner <brauner(a)kernel.org> Cc: Heiko Carstens <hca(a)linux.ibm.com> Cc: Kent Overstreet <kent.overstreet(a)linux.dev> Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: Mateusz Guzik <mjguzik(a)gmail.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Peng Zhang <zhangpeng.00(a)bytedance.com> Cc: Tycho Andersen <tandersen(a)netflix.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/fork.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) --- a/kernel/fork.c~fork-defer-linking-file-vma-until-vma-is-fully-initialized +++ a/kernel/fork.c @@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(str } else if (anon_vma_fork(tmp, mpnt)) goto fail_nomem_anon_vma_fork; vm_flags_clear(tmp, VM_LOCKED_MASK); + /* + * Copy/update hugetlb private vma information. + */ + if (is_vm_hugetlb_page(tmp)) + hugetlb_dup_vma_private(tmp); + + /* + * Link the vma into the MT. After using __mt_dup(), memory + * allocation is not necessary here, so it cannot fail. + */ + vma_iter_bulk_store(&vmi, tmp); + + mm->map_count++; + + if (tmp->vm_ops && tmp->vm_ops->open) + tmp->vm_ops->open(tmp); + file = tmp->vm_file; if (file) { struct address_space *mapping = file->f_mapping; @@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(str i_mmap_unlock_write(mapping); } - /* - * Copy/update hugetlb private vma information. - */ - if (is_vm_hugetlb_page(tmp)) - hugetlb_dup_vma_private(tmp); - - /* - * Link the vma into the MT. After using __mt_dup(), memory - * allocation is not necessary here, so it cannot fail. - */ - vma_iter_bulk_store(&vmi, tmp); - - mm->map_count++; if (!(tmp->vm_flags & VM_WIPEONFORK)) retval = copy_page_range(tmp, mpnt); - if (tmp->vm_ops && tmp->vm_ops->open) - tmp->vm_ops->open(tmp); - if (retval) { mpnt = vma_next(&vmi); goto loop_out; _ Patches currently in -mm which might be from linmiaohe(a)huawei.com are

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/shmem: inline shmem_is_huge() for disabled transparent hugepages has been removed from the -mm tree. Its filename was mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Sumanth Korikkar <sumanthk(a)linux.ibm.com> Subject: mm/shmem: inline shmem_is_huge() for disabled transparent hugepages Date: Tue, 9 Apr 2024 17:54:07 +0200 In order to minimize code size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y), compiler might choose to make a regular function call (out-of-line) for shmem_is_huge() instead of inlining it. When transparent hugepages are disabled (CONFIG_TRANSPARENT_HUGEPAGE=n), it can cause compilation error. mm/shmem.c: In function `shmem_getattr': ./include/linux/huge_mm.h:383:27: note: in expansion of macro `BUILD_BUG' 383 | #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; }) | ^~~~~~~~~ mm/shmem.c:1148:33: note: in expansion of macro `HPAGE_PMD_SIZE' 1148 | stat->blksize = HPAGE_PMD_SIZE; To prevent the possible error, always inline shmem_is_huge() when transparent hugepages are disabled. Link: https://lkml.kernel.org/r/20240409155407.2322714-1-sumanthk@linux.ibm.com Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Alexander Gordeev <agordeev(a)linux.ibm.com> Cc: Heiko Carstens <hca(a)linux.ibm.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Ilya Leoshkevich <iii(a)linux.ibm.com> Cc: Vasily Gorbik <gor(a)linux.ibm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/shmem_fs.h | 9 +++++++++ mm/shmem.c | 6 ------ 2 files changed, 9 insertions(+), 6 deletions(-) --- a/include/linux/shmem_fs.h~mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages +++ a/include/linux/shmem_fs.h @@ -110,8 +110,17 @@ extern struct page *shmem_read_mapping_p extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); int shmem_unuse(unsigned int type); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE extern bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force, struct mm_struct *mm, unsigned long vm_flags); +#else +static __always_inline bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force, + struct mm_struct *mm, unsigned long vm_flags) +{ + return false; +} +#endif + #ifdef CONFIG_SHMEM extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); #else --- a/mm/shmem.c~mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages +++ a/mm/shmem.c @@ -748,12 +748,6 @@ static long shmem_unused_huge_count(stru #define shmem_huge SHMEM_HUGE_DENY -bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force, - struct mm_struct *mm, unsigned long vm_flags) -{ - return false; -} - static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo, struct shrink_control *sc, unsigned long nr_to_split) { _ Patches currently in -mm which might be from sumanthk(a)linux.ibm.com are

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: Squashfs: check the inode number is not the invalid value of zero has been removed from the -mm tree. Its filename was squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Phillip Lougher <phillip(a)squashfs.org.uk> Subject: Squashfs: check the inode number is not the invalid value of zero Date: Mon, 8 Apr 2024 23:02:06 +0100 Syskiller has produced an out of bounds access in fill_meta_index(). That out of bounds access is ultimately caused because the inode has an inode number with the invalid value of zero, which was not checked. The reason this causes the out of bounds access is due to following sequence of events: 1. Fill_meta_index() is called to allocate (via empty_meta_index()) and fill a metadata index. It however suffers a data read error and aborts, invalidating the newly returned empty metadata index. It does this by setting the inode number of the index to zero, which means unused (zero is not a valid inode number). 2. When fill_meta_index() is subsequently called again on another read operation, locate_meta_index() returns the previous index because it matches the inode number of 0. Because this index has been returned it is expected to have been filled, and because it hasn't been, an out of bounds access is performed. This patch adds a sanity check which checks that the inode number is not zero when the inode is created and returns -EINVAL if it is. [phillip(a)squashfs.org.uk: whitespace fix] Link: https://lkml.kernel.org/r/20240409204723.446925-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20240408220206.435788-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk> Reported-by: "Ubisectech Sirius" <bugreport(a)ubisectech.com> Closes: https://lore.kernel.org/lkml/87f5c007-b8a5-41ae-8b57-431e924c5915.bugreport… Cc: Christian Brauner <brauner(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/squashfs/inode.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/fs/squashfs/inode.c~squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero +++ a/fs/squashfs/inode.c @@ -48,6 +48,10 @@ static int squashfs_new_inode(struct sup gid_t i_gid; int err; + inode->i_ino = le32_to_cpu(sqsh_ino->inode_number); + if (inode->i_ino == 0) + return -EINVAL; + err = squashfs_get_id(sb, le16_to_cpu(sqsh_ino->uid), &i_uid); if (err) return err; @@ -58,7 +62,6 @@ static int squashfs_new_inode(struct sup i_uid_write(inode, i_uid); i_gid_write(inode, i_gid); - inode->i_ino = le32_to_cpu(sqsh_ino->inode_number); inode_set_mtime(inode, le32_to_cpu(sqsh_ino->mtime), 0); inode_set_atime(inode, inode_get_mtime_sec(inode), 0); inode_set_ctime(inode, inode_get_mtime_sec(inode), 0); _ Patches currently in -mm which might be from phillip(a)squashfs.org.uk are squashfs-remove-deprecated-strncpy-by-not-copying-the-string.patch

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] mmswapops-update-check-in-is_pfn_swap_entry-for-hwpoison-entries.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm,swapops: update check in is_pfn_swap_entry for hwpoison entries has been removed from the -mm tree. Its filename was mmswapops-update-check-in-is_pfn_swap_entry-for-hwpoison-entries.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Oscar Salvador <osalvador(a)suse.de> Subject: mm,swapops: update check in is_pfn_swap_entry for hwpoison entries Date: Sun, 7 Apr 2024 15:05:37 +0200 Tony reported that the Machine check recovery was broken in v6.9-rc1, as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors to DRAM. After some more digging and debugging on his side, he realized that this went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")'. That commit, among other things, introduced swp_offset_pfn(), replacing hwpoison_entry_to_pfn() in its favour. The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(), but is_pfn_swap_entry() never got updated to cover hwpoison entries, which means that we would hit the VM_BUG_ON whenever we would call swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set. Fix this by updating the check to cover hwpoison entries as well, and update the comment while we are it. Link: https://lkml.kernel.org/r/20240407130537.16977-1-osalvador@suse.de Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry") Signed-off-by: Oscar Salvador <osalvador(a)suse.de> Reported-by: Tony Luck <tony.luck(a)intel.com> Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/ Tested-by: Tony Luck <tony.luck(a)intel.com> Reviewed-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Acked-by: Miaohe Lin <linmiaohe(a)huawei.com> Cc: <stable(a)vger.kernel.org> [6.1.x] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/swapops.h | 65 +++++++++++++++++++------------------- 1 file changed, 33 insertions(+), 32 deletions(-) --- a/include/linux/swapops.h~mmswapops-update-check-in-is_pfn_swap_entry-for-hwpoison-entries +++ a/include/linux/swapops.h @@ -390,6 +390,35 @@ static inline bool is_migration_entry_di } #endif /* CONFIG_MIGRATION */ +#ifdef CONFIG_MEMORY_FAILURE + +/* + * Support for hardware poisoned pages + */ +static inline swp_entry_t make_hwpoison_entry(struct page *page) +{ + BUG_ON(!PageLocked(page)); + return swp_entry(SWP_HWPOISON, page_to_pfn(page)); +} + +static inline int is_hwpoison_entry(swp_entry_t entry) +{ + return swp_type(entry) == SWP_HWPOISON; +} + +#else + +static inline swp_entry_t make_hwpoison_entry(struct page *page) +{ + return swp_entry(0, 0); +} + +static inline int is_hwpoison_entry(swp_entry_t swp) +{ + return 0; +} +#endif + typedef unsigned long pte_marker; #define PTE_MARKER_UFFD_WP BIT(0) @@ -483,8 +512,9 @@ static inline struct folio *pfn_swap_ent /* * A pfn swap entry is a special type of swap entry that always has a pfn stored - * in the swap offset. They are used to represent unaddressable device memory - * and to restrict access to a page undergoing migration. + * in the swap offset. They can either be used to represent unaddressable device + * memory, to restrict access to a page undergoing migration or to represent a + * pfn which has been hwpoisoned and unmapped. */ static inline bool is_pfn_swap_entry(swp_entry_t entry) { @@ -492,7 +522,7 @@ static inline bool is_pfn_swap_entry(swp BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS); return is_migration_entry(entry) || is_device_private_entry(entry) || - is_device_exclusive_entry(entry); + is_device_exclusive_entry(entry) || is_hwpoison_entry(entry); } struct page_vma_mapped_walk; @@ -561,35 +591,6 @@ static inline int is_pmd_migration_entry } #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ -#ifdef CONFIG_MEMORY_FAILURE - -/* - * Support for hardware poisoned pages - */ -static inline swp_entry_t make_hwpoison_entry(struct page *page) -{ - BUG_ON(!PageLocked(page)); - return swp_entry(SWP_HWPOISON, page_to_pfn(page)); -} - -static inline int is_hwpoison_entry(swp_entry_t entry) -{ - return swp_type(entry) == SWP_HWPOISON; -} - -#else - -static inline swp_entry_t make_hwpoison_entry(struct page *page) -{ - return swp_entry(0, 0); -} - -static inline int is_hwpoison_entry(swp_entry_t swp) -{ - return 0; -} -#endif - static inline int non_swap_entry(swp_entry_t entry) { return swp_type(entry) >= MAX_SWAPFILES; _ Patches currently in -mm which might be from osalvador(a)suse.de are

1 year, 2 months

1
0
0 0

[merged mm-hotfixes-stable] mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled has been removed from the -mm tree. Its filename was mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Miaohe Lin <linmiaohe(a)huawei.com> Subject: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled Date: Sun, 7 Apr 2024 16:54:56 +0800 When I did hard offline test with hugetlb pages, below deadlock occurs: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-11409-gf6cef5f8c37f #1 Not tainted ------------------------------------------------------ bash/46904 is trying to acquire lock: ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60 but task is already holding lock: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (pcp_batch_high_lock){+.+.}-{3:3}: __mutex_lock+0x6c/0x770 page_alloc_cpu_online+0x3c/0x70 cpuhp_invoke_callback+0x397/0x5f0 __cpuhp_invoke_callback_range+0x71/0xe0 _cpu_up+0xeb/0x210 cpu_up+0x91/0xe0 cpuhp_bringup_mask+0x49/0xb0 bringup_nonboot_cpus+0xb7/0xe0 smp_init+0x25/0xa0 kernel_init_freeable+0x15f/0x3e0 kernel_init+0x15/0x1b0 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1298/0x1cd0 lock_acquire+0xc0/0x2b0 cpus_read_lock+0x2a/0xc0 static_key_slow_dec+0x16/0x60 __hugetlb_vmemmap_restore_folio+0x1b9/0x200 dissolve_free_huge_page+0x211/0x260 __page_handle_poison+0x45/0xc0 memory_failure+0x65e/0xc70 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xca/0x1e0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pcp_batch_high_lock); lock(cpu_hotplug_lock); lock(pcp_batch_high_lock); rlock(cpu_hotplug_lock); *** DEADLOCK *** 5 locks held by bash/46904: #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0 #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0 #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0 #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70 #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40 stack backtrace: CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xa0 check_noncircular+0x129/0x140 __lock_acquire+0x1298/0x1cd0 lock_acquire+0xc0/0x2b0 cpus_read_lock+0x2a/0xc0 static_key_slow_dec+0x16/0x60 __hugetlb_vmemmap_restore_folio+0x1b9/0x200 dissolve_free_huge_page+0x211/0x260 __page_handle_poison+0x45/0xc0 memory_failure+0x65e/0xc70 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xca/0x1e0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fc862314887 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887 RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001 RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00 In short, below scene breaks the lock dependency chain: memory_failure __page_handle_poison zone_pcp_disable -- lock(pcp_batch_high_lock) dissolve_free_huge_page __hugetlb_vmemmap_restore_folio static_key_slow_dec cpus_read_lock -- rlock(cpu_hotplug_lock) Fix this by calling drain_all_pages() instead. This issue won't occur until commit a6b40850c442 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key"). As it introduced rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while lock(pcp_batch_high_lock) is already in the __page_handle_poison(). [linmiaohe(a)huawei.com: extend comment per Oscar] [akpm(a)linux-foundation.org: reflow block comment] Link: https://lkml.kernel.org/r/20240407085456.2798193-1-linmiaohe@huawei.com Fixes: a6b40850c442 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key") Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com> Acked-by: Oscar Salvador <osalvador(a)suse.de> Reviewed-by: Jane Chu <jane.chu(a)oracle.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory-failure.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) --- a/mm/memory-failure.c~mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled +++ a/mm/memory-failure.c @@ -154,11 +154,23 @@ static int __page_handle_poison(struct p { int ret; - zone_pcp_disable(page_zone(page)); + /* + * zone_pcp_disable() can't be used here. It will + * hold pcp_batch_high_lock and dissolve_free_huge_page() might hold + * cpu_hotplug_lock via static_key_slow_dec() when hugetlb vmemmap + * optimization is enabled. This will break current lock dependency + * chain and leads to deadlock. + * Disabling pcp before dissolving the page was a deterministic + * approach because we made sure that those pages cannot end up in any + * PCP list. Draining PCP lists expels those pages to the buddy system, + * but nothing guarantees that those pages do not get back to a PCP + * queue if we need to refill those. + */ ret = dissolve_free_huge_page(page); - if (!ret) + if (!ret) { + drain_all_pages(page_zone(page)); ret = take_page_off_buddy(page); - zone_pcp_enable(page_zone(page)); + } return ret; } _ Patches currently in -mm which might be from linmiaohe(a)huawei.com are

1 year, 2 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2024