August 2023 - Linux-stable-mirror

[merged mm-hotfixes-stable] mm-memory-failure-fix-unexpected-return-value-in-soft_offline_page.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: memory-failure: fix unexpected return value in soft_offline_page() has been removed from the -mm tree. Its filename was mm-memory-failure-fix-unexpected-return-value-in-soft_offline_page.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Miaohe Lin <linmiaohe(a)huawei.com> Subject: mm: memory-failure: fix unexpected return value in soft_offline_page() Date: Tue, 27 Jun 2023 19:28:08 +0800 When page_handle_poison() fails to handle the hugepage or free page in retry path, soft_offline_page() will return 0 while -EBUSY is expected in this case. Consequently the user will think soft_offline_page succeeds while it in fact failed. So the user will not try again later in this case. Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages") Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory-failure.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) --- a/mm/memory-failure.c~mm-memory-failure-fix-unexpected-return-value-in-soft_offline_page +++ a/mm/memory-failure.c @@ -2741,10 +2741,13 @@ retry: if (ret > 0) { ret = soft_offline_in_use_page(page); } else if (ret == 0) { - if (!page_handle_poison(page, true, false) && try_again) { - try_again = false; - flags &= ~MF_COUNT_INCREASED; - goto retry; + if (!page_handle_poison(page, true, false)) { + if (try_again) { + try_again = false; + flags &= ~MF_COUNT_INCREASED; + goto retry; + } + ret = -EBUSY; } } _ Patches currently in -mm which might be from linmiaohe(a)huawei.com are mm-memory-failure-fix-potential-page-refcnt-leak-in-memory_failure.patch mm-memcg-fix-obsolete-function-name-in-mem_cgroup_protection.patch mm-memory-failure-add-pageoffline-check.patch mm-page_alloc-avoid-unneeded-alike_pages-calculation.patch mm-memcg-update-obsolete-comment-above-parent_mem_cgroup.patch mm-page_alloc-remove-unneeded-variable-base.patch mm-memcg-fix-wrong-function-name-above-obj_cgroup_charge_zswap.patch mm-memory-failure-use-helper-macro-llist_for_each_entry_safe.patch mm-mm_init-use-helper-macro-bits_per_long-and-bits_per_byte.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] radix-tree-remove-unused-variable.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: radix tree: remove unused variable has been removed from the -mm tree. Its filename was radix-tree-remove-unused-variable.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Arnd Bergmann <arnd(a)arndb.de> Subject: radix tree: remove unused variable Date: Fri, 11 Aug 2023 15:10:13 +0200 Recent versions of clang warn about an unused variable, though older versions saw the 'slot++' as a use and did not warn: radix-tree.c:1136:50: error: parameter 'slot' set but not used [-Werror,-Wunused-but-set-parameter] It's clearly not needed any more, so just remove it. Link: https://lkml.kernel.org/r/20230811131023.2226509-1-arnd@kernel.org Fixes: 3a08cd52c37c7 ("radix tree: Remove multiorder support") Signed-off-by: Arnd Bergmann <arnd(a)arndb.de> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Nathan Chancellor <nathan(a)kernel.org> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Peng Zhang <zhangpeng.00(a)bytedance.com> Cc: Rong Tao <rongtao(a)cestc.cn> Cc: Tom Rix <trix(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/radix-tree.c | 1 - 1 file changed, 1 deletion(-) --- a/lib/radix-tree.c~radix-tree-remove-unused-variable +++ a/lib/radix-tree.c @@ -1136,7 +1136,6 @@ static void set_iter_tags(struct radix_t void __rcu **radix_tree_iter_resume(void __rcu **slot, struct radix_tree_iter *iter) { - slot++; iter->index = __radix_tree_iter_add(iter, 1); iter->next_index = iter->index; iter->tags = 0; _ Patches currently in -mm which might be from arnd(a)arndb.de are iomem-remove-__weak-ioremap_cache-helper.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] mm-add-a-call-to-flush_cache_vmap-in-vmap_pfn.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: add a call to flush_cache_vmap() in vmap_pfn() has been removed from the -mm tree. Its filename was mm-add-a-call-to-flush_cache_vmap-in-vmap_pfn.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Subject: mm: add a call to flush_cache_vmap() in vmap_pfn() Date: Wed, 9 Aug 2023 18:46:33 +0200 flush_cache_vmap() must be called after new vmalloc mappings are installed in the page table in order to allow architectures to make sure the new mapping is visible. It could lead to a panic since on some architectures (like powerpc), the page table walker could see the wrong pte value and trigger a spurious page fault that can not be resolved (see commit f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")). But actually the patch is aiming at riscv: the riscv specification allows the caching of invalid entries in the TLB, and since we recently removed the vmalloc page fault handling, we now need to emit a tlb shootdown whenever a new vmalloc mapping is emitted (https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivos…). That's a temporary solution, there are ways to avoid that :) Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function") Reported-by: Dylan Jhong <dylan(a)andestech.com> Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/ Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Palmer Dabbelt <palmer(a)rivosinc.com> Acked-by: Palmer Dabbelt <palmer(a)rivosinc.com> Reviewed-by: Dylan Jhong <dylan(a)andestech.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmalloc.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/mm/vmalloc.c~mm-add-a-call-to-flush_cache_vmap-in-vmap_pfn +++ a/mm/vmalloc.c @@ -2979,6 +2979,10 @@ void *vmap_pfn(unsigned long *pfns, unsi free_vm_area(area); return NULL; } + + flush_cache_vmap((unsigned long)area->addr, + (unsigned long)area->addr + count * PAGE_SIZE); + return area->addr; } EXPORT_SYMBOL_GPL(vmap_pfn); _ Patches currently in -mm which might be from alexghiti(a)rivosinc.com are

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] selftests-mm-foll_longterm-need-to-be-updated-to-0x100.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: selftests/mm: FOLL_LONGTERM need to be updated to 0x100 has been removed from the -mm tree. Its filename was selftests-mm-foll_longterm-need-to-be-updated-to-0x100.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ayush Jain <ayush.jain3(a)amd.com> Subject: selftests/mm: FOLL_LONGTERM need to be updated to 0x100 Date: Tue, 8 Aug 2023 07:43:47 -0500 After commit 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at include/linux/mm_types.h. As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same here as well. Before this change test goes in an infinite assert loop in hmm.hmm_device_private.hmm_gup_test ========================================================== RUN hmm.hmm_device_private.hmm_gup_test ... hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE.. ..(2) == m[2] (34) hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0) hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0) ... ========================================================== Call Trace: <TASK> ? sched_clock+0xd/0x20 ? __lock_acquire.constprop.0+0x120/0x6c0 ? ktime_get+0x2c/0xd0 ? sched_clock+0xd/0x20 ? local_clock+0x12/0xd0 ? lock_release+0x26e/0x3b0 pin_user_pages_fast+0x4c/0x70 gup_test_ioctl+0x4ff/0xbb0 ? gup_test_ioctl+0x68c/0xbb0 __x64_sys_ioctl+0x99/0xd0 do_syscall_64+0x60/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? irqentry_exit_to_user_mode+0xd/0x20 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x96/0x200 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f6aaa31aaff After this change test is able to pass successfully. Link: https://lkml.kernel.org/r/20230808124347.79163-1-ayush.jain3@amd.com Fixes: 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h") Signed-off-by: Ayush Jain <ayush.jain3(a)amd.com> Reviewed-by: Raghavendra K T <raghavendra.kt(a)amd.com> Reviewed-by: John Hubbard <jhubbard(a)nvidia.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Jason Gunthorpe <jgg(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- tools/testing/selftests/mm/hmm-tests.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/tools/testing/selftests/mm/hmm-tests.c~selftests-mm-foll_longterm-need-to-be-updated-to-0x100 +++ a/tools/testing/selftests/mm/hmm-tests.c @@ -57,9 +57,14 @@ enum { #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) /* Just the flags we need, copied from mm.h: */ + +#ifndef FOLL_WRITE #define FOLL_WRITE 0x01 /* check pte is writable */ -#define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite */ +#endif +#ifndef FOLL_LONGTERM +#define FOLL_LONGTERM 0x100 /* mapping lifetime is indefinite */ +#endif FIXTURE(hmm) { int fd; _ Patches currently in -mm which might be from ayush.jain3(a)amd.com are selftests-mm-add-ksm_merge_time-tests.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] nilfs2-fix-general-protection-fault-in-nilfs_lookup_dirty_data_buffers.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers() has been removed from the -mm tree. Its filename was nilfs2-fix-general-protection-fault-in-nilfs_lookup_dirty_data_buffers.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Subject: nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers() Date: Sat, 5 Aug 2023 22:20:38 +0900 A syzbot stress test reported that create_empty_buffers() called from nilfs_lookup_dirty_data_buffers() can cause a general protection fault. Analysis using its reproducer revealed that the back reference "mapping" from a page/folio has been changed to NULL after dirty page/folio gang lookup in nilfs_lookup_dirty_data_buffers(). Fix this issue by excluding pages/folios from being collected if, after acquiring a lock on each page/folio, its back reference "mapping" differs from the pointer to the address space struct that held the page/folio. Link: https://lkml.kernel.org/r/20230805132038.6435-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+0ad741797f4565e7e2d2(a)syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000002930a705fc32b231@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/segment.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/fs/nilfs2/segment.c~nilfs2-fix-general-protection-fault-in-nilfs_lookup_dirty_data_buffers +++ a/fs/nilfs2/segment.c @@ -725,6 +725,11 @@ static size_t nilfs_lookup_dirty_data_bu struct folio *folio = fbatch.folios[i]; folio_lock(folio); + if (unlikely(folio->mapping != mapping)) { + /* Exclude folios removed from the address space */ + folio_unlock(folio); + continue; + } head = folio_buffers(folio); if (!head) { create_empty_buffers(&folio->page, i_blocksize(inode), 0); _ Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are nilfs2-fix-warning-in-mark_buffer_dirty-due-to-discarded-buffer-reuse.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] mm-gup-handle-cont-pte-hugetlb-pages-correctly-in-gup_must_unshare-via-gup-fast.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast has been removed from the -mm tree. Its filename was mm-gup-handle-cont-pte-hugetlb-pages-correctly-in-gup_must_unshare-via-gup-fast.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast Date: Sat, 5 Aug 2023 12:12:56 +0200 In contrast to most other GUP code, GUP-fast common page table walking code like gup_pte_range() also handles hugetlb pages. But in contrast to other hugetlb page table walking code, it does not look at the hugetlb PTE abstraction whereby we have only a single logical hugetlb PTE per hugetlb page, even when using multiple cont-PTEs underneath -- which is for example what huge_ptep_get() abstracts. So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast might stumble over a PTE that does not map the head page of a hugetlb page -- not the first "head" PTE of such a cont mapping. Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we might end up calling gup_must_unshare() with a tail page of a hugetlb page. We only maintain a single PageAnonExclusive flag per hugetlb page (as hugetlb pages cannot get partially COW-shared), stored for the head page. That flag is clear for all tail pages. So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail page of a hugetlb page: 1) With CONFIG_DEBUG_VM_PGFLAGS Stumbles over the: VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); For example, when executing the COW selftests with 64k hugetlb pages on arm64: [ 61.082187] page:00000000829819ff refcount:3 mapcount:1 mapping:0000000000000000 index:0x1 pfn:0x11ee11 [ 61.082842] head:0000000080f79bf7 order:4 entire_mapcount:1 nr_pages_mapped:0 pincount:2 [ 61.083384] anon flags: 0x17ffff80003000e(referenced|uptodate|dirty|head|mappedtodisk|node=0|zone=2|lastcpupid=0xfffff) [ 61.084101] page_type: 0xffffffff() [ 61.084332] raw: 017ffff800000000 fffffc00037b8401 0000000000000402 0000000200000000 [ 61.084840] raw: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000 [ 61.085359] head: 017ffff80003000e ffffd9e95b09b788 ffffd9e95b09b788 ffff0007ff63cf71 [ 61.085885] head: 0000000000000000 0000000000000002 00000003ffffffff 0000000000000000 [ 61.086415] page dumped because: VM_BUG_ON_PAGE(PageHuge(page) && !PageHead(page)) [ 61.086914] ------------[ cut here ]------------ [ 61.087220] kernel BUG at include/linux/page-flags.h:990! [ 61.087591] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 61.087999] Modules linked in: ... [ 61.089404] CPU: 0 PID: 4612 Comm: cow Kdump: loaded Not tainted 6.5.0-rc4+ #3 [ 61.089917] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 61.090409] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 61.090897] pc : gup_must_unshare.part.0+0x64/0x98 [ 61.091242] lr : gup_must_unshare.part.0+0x64/0x98 [ 61.091592] sp : ffff8000825eb940 [ 61.091826] x29: ffff8000825eb940 x28: 0000000000000000 x27: fffffc00037b8440 [ 61.092329] x26: 0400000000000001 x25: 0000000000080101 x24: 0000000000080000 [ 61.092835] x23: 0000000000080100 x22: ffff0000cffb9588 x21: ffff0000c8ec6b58 [ 61.093341] x20: 0000ffffad6b1000 x19: fffffc00037b8440 x18: ffffffffffffffff [ 61.093850] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548 [ 61.094358] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021 [ 61.094858] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffd9e958b7a1c0 [ 61.095359] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 00000000002bffa8 [ 61.095873] x5 : ffff0008bb19e708 x4 : 0000000000000000 x3 : 0000000000000000 [ 61.096380] x2 : 0000000000000000 x1 : ffff0000cf6636c0 x0 : 0000000000000046 [ 61.096894] Call trace: [ 61.097080] gup_must_unshare.part.0+0x64/0x98 [ 61.097392] gup_pte_range+0x3a8/0x3f0 [ 61.097662] gup_pgd_range+0x1ec/0x280 [ 61.097942] lockless_pages_from_mm+0x64/0x1a0 [ 61.098258] internal_get_user_pages_fast+0xe4/0x1d0 [ 61.098612] pin_user_pages_fast+0x58/0x78 [ 61.098917] pin_longterm_test_start+0xf4/0x2b8 [ 61.099243] gup_test_ioctl+0x170/0x3b0 [ 61.099528] __arm64_sys_ioctl+0xa8/0xf0 [ 61.099822] invoke_syscall.constprop.0+0x7c/0xd0 [ 61.100160] el0_svc_common.constprop.0+0xe8/0x100 [ 61.100500] do_el0_svc+0x38/0xa0 [ 61.100736] el0_svc+0x3c/0x198 [ 61.100971] el0t_64_sync_handler+0x134/0x150 [ 61.101280] el0t_64_sync+0x17c/0x180 [ 61.101543] Code: aa1303e0 f00074c1 912b0021 97fffeb2 (d4210000) 2) Without CONFIG_DEBUG_VM_PGFLAGS Always detects "not exclusive" for passed tail pages and refuses to PIN the tail pages R/O, as gup_must_unshare() == true. GUP-fast will fallback to ordinary GUP. As ordinary GUP properly considers the logical hugetlb PTE abstraction in hugetlb_follow_page_mask(), pinning the page will succeed when looking at the PageAnonExclusive on the head page only. So the only real effect of this is that with cont-PTE hugetlb pages, we'll always fallback from GUP-fast to ordinary GUP when not working on the head page, which ends up checking the head page and do the right thing. Consequently, the cow selftests pass with cont-PTE hugetlb pages as well without CONFIG_DEBUG_VM_PGFLAGS. Note that this only applies to anon hugetlb pages that are mapped using cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel. ... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into the page table R/O using GUP-fast. On production kernels (and even most debug kernels, that don't set CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required to be backported. But of course, it does not hurt. Link: https://lkml.kernel.org/r/20230805101256.87306-1-david@redhat.com Fixes: a7f226604170 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: Ryan Roberts <ryan.roberts(a)arm.com> Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com> Tested-by: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Jason Gunthorpe <jgg(a)nvidia.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/internal.h | 10 ++++++++++ 1 file changed, 10 insertions(+) --- a/mm/internal.h~mm-gup-handle-cont-pte-hugetlb-pages-correctly-in-gup_must_unshare-via-gup-fast +++ a/mm/internal.h @@ -1005,6 +1005,16 @@ static inline bool gup_must_unshare(stru smp_rmb(); /* + * During GUP-fast we might not get called on the head page for a + * hugetlb page that is mapped using cont-PTE, because GUP-fast does + * not work with the abstracted hugetlb PTEs that always point at the + * head page. For hugetlb, PageAnonExclusive only applies on the head + * page (as it cannot be partially COW-shared), so lookup the head page. + */ + if (unlikely(!PageHead(page) && PageHuge(page))) + page = compound_head(page); + + /* * Note that PageKsm() pages cannot be exclusive, and consequently, * cannot get pinned. */ _ Patches currently in -mm which might be from david(a)redhat.com are kvm-explicitly-set-foll_honor_numa_fault-in-hva_to_pfn_slow.patch mm-gup-dont-implicitly-set-foll_honor_numa_fault.patch pgtable-improve-pte_protnone-comment.patch selftest-mm-ksm_functional_tests-test-in-mmap_and_merge_range-if-anything-got-merged.patch selftest-mm-ksm_functional_tests-add-prot_none-test.patch selftest-mm-ksm_functional_tests-add-prot_none-test-fix.patch mm-swap-stop-using-page-private-on-tail-pages-for-thp_swap.patch mm-swap-inline-folio_set_swap_entry-and-folio_swap_entry.patch mm-huge_memory-work-on-folio-swap-instead-of-page-private-when-splitting-folio.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] mm-enable-page-walking-api-to-lock-vmas-during-the-walk.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: enable page walking API to lock vmas during the walk has been removed from the -mm tree. Its filename was mm-enable-page-walking-api-to-lock-vmas-during-the-walk.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Suren Baghdasaryan <surenb(a)google.com> Subject: mm: enable page walking API to lock vmas during the walk Date: Fri, 4 Aug 2023 08:27:19 -0700 walk_page_range() and friends often operate under write-locked mmap_lock. With introduction of vma locks, the vmas have to be locked as well during such walks to prevent concurrent page faults in these areas. Add an additional member to mm_walk_ops to indicate locking requirements for the walk. The change ensures that page walks which prevent concurrent page faults by write-locking mmap_lock, operate correctly after introduction of per-vma locks. With per-vma locks page faults can be handled under vma lock without taking mmap_lock at all, so write locking mmap_lock would not stop them. The change ensures vmas are properly locked during such walks. A sample issue this solves is do_mbind() performing queue_pages_range() to queue pages for migration. Without this change a concurrent page can be faulted into the area and be left out of migration. Link: https://lkml.kernel.org/r/20230804152724.3090321-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb(a)google.com> Suggested-by: Linus Torvalds <torvalds(a)linuxfoundation.org> Suggested-by: Jann Horn <jannh(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Laurent Dufour <ldufour(a)linux.ibm.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Michel Lespinasse <michel(a)lespinasse.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/powerpc/mm/book3s64/subpage_prot.c | 1 arch/riscv/mm/pageattr.c | 1 arch/s390/mm/gmap.c | 5 +++ fs/proc/task_mmu.c | 5 +++ include/linux/pagewalk.h | 11 ++++++ mm/damon/vaddr.c | 2 + mm/hmm.c | 1 mm/ksm.c | 25 +++++++++------ mm/madvise.c | 3 + mm/memcontrol.c | 2 + mm/memory-failure.c | 1 mm/mempolicy.c | 22 ++++++++----- mm/migrate_device.c | 1 mm/mincore.c | 1 mm/mlock.c | 1 mm/mprotect.c | 1 mm/pagewalk.c | 36 ++++++++++++++++++++-- mm/vmscan.c | 1 18 files changed, 100 insertions(+), 20 deletions(-) --- a/arch/powerpc/mm/book3s64/subpage_prot.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/arch/powerpc/mm/book3s64/subpage_prot.c @@ -145,6 +145,7 @@ static int subpage_walk_pmd_entry(pmd_t static const struct mm_walk_ops subpage_walk_ops = { .pmd_entry = subpage_walk_pmd_entry, + .walk_lock = PGWALK_WRLOCK_VERIFY, }; static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, --- a/arch/riscv/mm/pageattr.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/arch/riscv/mm/pageattr.c @@ -102,6 +102,7 @@ static const struct mm_walk_ops pageattr .pmd_entry = pageattr_pmd_entry, .pte_entry = pageattr_pte_entry, .pte_hole = pageattr_pte_hole, + .walk_lock = PGWALK_RDLOCK, }; static int __set_memory(unsigned long addr, int numpages, pgprot_t set_mask, --- a/arch/s390/mm/gmap.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/arch/s390/mm/gmap.c @@ -2514,6 +2514,7 @@ static int thp_split_walk_pmd_entry(pmd_ static const struct mm_walk_ops thp_split_walk_ops = { .pmd_entry = thp_split_walk_pmd_entry, + .walk_lock = PGWALK_WRLOCK_VERIFY, }; static inline void thp_split_mm(struct mm_struct *mm) @@ -2565,6 +2566,7 @@ static int __zap_zero_pages(pmd_t *pmd, static const struct mm_walk_ops zap_zero_walk_ops = { .pmd_entry = __zap_zero_pages, + .walk_lock = PGWALK_WRLOCK, }; /* @@ -2655,6 +2657,7 @@ static const struct mm_walk_ops enable_s .hugetlb_entry = __s390_enable_skey_hugetlb, .pte_entry = __s390_enable_skey_pte, .pmd_entry = __s390_enable_skey_pmd, + .walk_lock = PGWALK_WRLOCK, }; int s390_enable_skey(void) @@ -2692,6 +2695,7 @@ static int __s390_reset_cmma(pte_t *pte, static const struct mm_walk_ops reset_cmma_walk_ops = { .pte_entry = __s390_reset_cmma, + .walk_lock = PGWALK_WRLOCK, }; void s390_reset_cmma(struct mm_struct *mm) @@ -2728,6 +2732,7 @@ static int s390_gather_pages(pte_t *ptep static const struct mm_walk_ops gather_pages_ops = { .pte_entry = s390_gather_pages, + .walk_lock = PGWALK_RDLOCK, }; /* --- a/fs/proc/task_mmu.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/fs/proc/task_mmu.c @@ -757,12 +757,14 @@ static int smaps_hugetlb_range(pte_t *pt static const struct mm_walk_ops smaps_walk_ops = { .pmd_entry = smaps_pte_range, .hugetlb_entry = smaps_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, }; static const struct mm_walk_ops smaps_shmem_walk_ops = { .pmd_entry = smaps_pte_range, .hugetlb_entry = smaps_hugetlb_range, .pte_hole = smaps_pte_hole, + .walk_lock = PGWALK_RDLOCK, }; /* @@ -1244,6 +1246,7 @@ static int clear_refs_test_walk(unsigned static const struct mm_walk_ops clear_refs_walk_ops = { .pmd_entry = clear_refs_pte_range, .test_walk = clear_refs_test_walk, + .walk_lock = PGWALK_WRLOCK, }; static ssize_t clear_refs_write(struct file *file, const char __user *buf, @@ -1621,6 +1624,7 @@ static const struct mm_walk_ops pagemap_ .pmd_entry = pagemap_pmd_range, .pte_hole = pagemap_pte_hole, .hugetlb_entry = pagemap_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, }; /* @@ -1934,6 +1938,7 @@ static int gather_hugetlb_stats(pte_t *p static const struct mm_walk_ops show_numa_ops = { .hugetlb_entry = gather_hugetlb_stats, .pmd_entry = gather_pte_stats, + .walk_lock = PGWALK_RDLOCK, }; /* --- a/include/linux/pagewalk.h~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/include/linux/pagewalk.h @@ -6,6 +6,16 @@ struct mm_walk; +/* Locking requirement during a page walk. */ +enum page_walk_lock { + /* mmap_lock should be locked for read to stabilize the vma tree */ + PGWALK_RDLOCK = 0, + /* vma will be write-locked during the walk */ + PGWALK_WRLOCK = 1, + /* vma is expected to be already write-locked during the walk */ + PGWALK_WRLOCK_VERIFY = 2, +}; + /** * struct mm_walk_ops - callbacks for walk_page_range * @pgd_entry: if set, called for each non-empty PGD (top-level) entry @@ -66,6 +76,7 @@ struct mm_walk_ops { int (*pre_vma)(unsigned long start, unsigned long end, struct mm_walk *walk); void (*post_vma)(struct mm_walk *walk); + enum page_walk_lock walk_lock; }; /* --- a/mm/damon/vaddr.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/damon/vaddr.c @@ -386,6 +386,7 @@ out: static const struct mm_walk_ops damon_mkold_ops = { .pmd_entry = damon_mkold_pmd_entry, .hugetlb_entry = damon_mkold_hugetlb_entry, + .walk_lock = PGWALK_RDLOCK, }; static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) @@ -525,6 +526,7 @@ out: static const struct mm_walk_ops damon_young_ops = { .pmd_entry = damon_young_pmd_entry, .hugetlb_entry = damon_young_hugetlb_entry, + .walk_lock = PGWALK_RDLOCK, }; static bool damon_va_young(struct mm_struct *mm, unsigned long addr, --- a/mm/hmm.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/hmm.c @@ -562,6 +562,7 @@ static const struct mm_walk_ops hmm_walk .pte_hole = hmm_vma_walk_hole, .hugetlb_entry = hmm_vma_walk_hugetlb_entry, .test_walk = hmm_vma_walk_test, + .walk_lock = PGWALK_RDLOCK, }; /** --- a/mm/ksm.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/ksm.c @@ -455,6 +455,12 @@ static int break_ksm_pmd_entry(pmd_t *pm static const struct mm_walk_ops break_ksm_ops = { .pmd_entry = break_ksm_pmd_entry, + .walk_lock = PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops break_ksm_lock_vma_ops = { + .pmd_entry = break_ksm_pmd_entry, + .walk_lock = PGWALK_WRLOCK, }; /* @@ -470,16 +476,17 @@ static const struct mm_walk_ops break_ks * of the process that owns 'vma'. We also do not want to enforce * protection keys here anyway. */ -static int break_ksm(struct vm_area_struct *vma, unsigned long addr) +static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma) { vm_fault_t ret = 0; + const struct mm_walk_ops *ops = lock_vma ? + &break_ksm_lock_vma_ops : &break_ksm_ops; do { int ksm_page; cond_resched(); - ksm_page = walk_page_range_vma(vma, addr, addr + 1, - &break_ksm_ops, NULL); + ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL); if (WARN_ON_ONCE(ksm_page < 0)) return ksm_page; if (!ksm_page) @@ -565,7 +572,7 @@ static void break_cow(struct ksm_rmap_it mmap_read_lock(mm); vma = find_mergeable_vma(mm, addr); if (vma) - break_ksm(vma, addr); + break_ksm(vma, addr, false); mmap_read_unlock(mm); } @@ -871,7 +878,7 @@ static void remove_trailing_rmap_items(s * in cmp_and_merge_page on one of the rmap_items we would be removing. */ static int unmerge_ksm_pages(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, bool lock_vma) { unsigned long addr; int err = 0; @@ -882,7 +889,7 @@ static int unmerge_ksm_pages(struct vm_a if (signal_pending(current)) err = -ERESTARTSYS; else - err = break_ksm(vma, addr); + err = break_ksm(vma, addr, lock_vma); } return err; } @@ -1029,7 +1036,7 @@ static int unmerge_and_remove_all_rmap_i if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) continue; err = unmerge_ksm_pages(vma, - vma->vm_start, vma->vm_end); + vma->vm_start, vma->vm_end, false); if (err) goto error; } @@ -2530,7 +2537,7 @@ static int __ksm_del_vma(struct vm_area_ return 0; if (vma->anon_vma) { - err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end); + err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end, true); if (err) return err; } @@ -2668,7 +2675,7 @@ int ksm_madvise(struct vm_area_struct *v return 0; /* just ignore the advice */ if (vma->anon_vma) { - err = unmerge_ksm_pages(vma, start, end); + err = unmerge_ksm_pages(vma, start, end, true); if (err) return err; } --- a/mm/madvise.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/madvise.c @@ -233,6 +233,7 @@ static int swapin_walk_pmd_entry(pmd_t * static const struct mm_walk_ops swapin_walk_ops = { .pmd_entry = swapin_walk_pmd_entry, + .walk_lock = PGWALK_RDLOCK, }; static void shmem_swapin_range(struct vm_area_struct *vma, @@ -534,6 +535,7 @@ regular_folio: static const struct mm_walk_ops cold_walk_ops = { .pmd_entry = madvise_cold_or_pageout_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static void madvise_cold_page_range(struct mmu_gather *tlb, @@ -757,6 +759,7 @@ static int madvise_free_pte_range(pmd_t static const struct mm_walk_ops madvise_free_walk_ops = { .pmd_entry = madvise_free_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static int madvise_free_single_vma(struct vm_area_struct *vma, --- a/mm/memcontrol.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/memcontrol.c @@ -6024,6 +6024,7 @@ static int mem_cgroup_count_precharge_pt static const struct mm_walk_ops precharge_walk_ops = { .pmd_entry = mem_cgroup_count_precharge_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) @@ -6303,6 +6304,7 @@ put: /* get_mctgt_type() gets & locks static const struct mm_walk_ops charge_walk_ops = { .pmd_entry = mem_cgroup_move_charge_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static void mem_cgroup_move_charge(void) --- a/mm/memory-failure.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/memory-failure.c @@ -831,6 +831,7 @@ static int hwpoison_hugetlb_range(pte_t static const struct mm_walk_ops hwp_walk_ops = { .pmd_entry = hwpoison_pte_range, .hugetlb_entry = hwpoison_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, }; /* --- a/mm/mempolicy.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/mempolicy.c @@ -718,6 +718,14 @@ static const struct mm_walk_ops queue_pa .hugetlb_entry = queue_folios_hugetlb, .pmd_entry = queue_folios_pte_range, .test_walk = queue_pages_test_walk, + .walk_lock = PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops queue_pages_lock_vma_walk_ops = { + .hugetlb_entry = queue_folios_hugetlb, + .pmd_entry = queue_folios_pte_range, + .test_walk = queue_pages_test_walk, + .walk_lock = PGWALK_WRLOCK, }; /* @@ -738,7 +746,7 @@ static const struct mm_walk_ops queue_pa static int queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, nodemask_t *nodes, unsigned long flags, - struct list_head *pagelist) + struct list_head *pagelist, bool lock_vma) { int err; struct queue_pages qp = { @@ -749,8 +757,10 @@ queue_pages_range(struct mm_struct *mm, .end = end, .first = NULL, }; + const struct mm_walk_ops *ops = lock_vma ? + &queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops; - err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); + err = walk_page_range(mm, start, end, ops, &qp); if (!qp.first) /* whole range in hole */ @@ -1078,7 +1088,7 @@ static int migrate_to_node(struct mm_str vma = find_vma(mm, 0); VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))); queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask, - flags | MPOL_MF_DISCONTIG_OK, &pagelist); + flags | MPOL_MF_DISCONTIG_OK, &pagelist, false); if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, alloc_migration_target, NULL, @@ -1321,12 +1331,8 @@ static long do_mbind(unsigned long start * Lock the VMAs before scanning for pages to migrate, to ensure we don't * miss a concurrently inserted page. */ - vma_iter_init(&vmi, mm, start); - for_each_vma_range(vmi, vma, end) - vma_start_write(vma); - ret = queue_pages_range(mm, start, end, nmask, - flags | MPOL_MF_INVERT, &pagelist); + flags | MPOL_MF_INVERT, &pagelist, true); if (ret < 0) { err = ret; --- a/mm/migrate_device.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/migrate_device.c @@ -279,6 +279,7 @@ next: static const struct mm_walk_ops migrate_vma_walk_ops = { .pmd_entry = migrate_vma_collect_pmd, .pte_hole = migrate_vma_collect_hole, + .walk_lock = PGWALK_RDLOCK, }; /* --- a/mm/mincore.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/mincore.c @@ -176,6 +176,7 @@ static const struct mm_walk_ops mincore_ .pmd_entry = mincore_pte_range, .pte_hole = mincore_unmapped_range, .hugetlb_entry = mincore_hugetlb, + .walk_lock = PGWALK_RDLOCK, }; /* --- a/mm/mlock.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/mlock.c @@ -371,6 +371,7 @@ static void mlock_vma_pages_range(struct { static const struct mm_walk_ops mlock_walk_ops = { .pmd_entry = mlock_pte_range, + .walk_lock = PGWALK_WRLOCK_VERIFY, }; /* --- a/mm/mprotect.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/mprotect.c @@ -568,6 +568,7 @@ static const struct mm_walk_ops prot_non .pte_entry = prot_none_pte_entry, .hugetlb_entry = prot_none_hugetlb_entry, .test_walk = prot_none_test, + .walk_lock = PGWALK_WRLOCK, }; int --- a/mm/pagewalk.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/pagewalk.c @@ -400,6 +400,33 @@ static int __walk_page_range(unsigned lo return err; } +static inline void process_mm_walk_lock(struct mm_struct *mm, + enum page_walk_lock walk_lock) +{ + if (walk_lock == PGWALK_RDLOCK) + mmap_assert_locked(mm); + else + mmap_assert_write_locked(mm); +} + +static inline void process_vma_walk_lock(struct vm_area_struct *vma, + enum page_walk_lock walk_lock) +{ +#ifdef CONFIG_PER_VMA_LOCK + switch (walk_lock) { + case PGWALK_WRLOCK: + vma_start_write(vma); + break; + case PGWALK_WRLOCK_VERIFY: + vma_assert_write_locked(vma); + break; + case PGWALK_RDLOCK: + /* PGWALK_RDLOCK is handled by process_mm_walk_lock */ + break; + } +#endif +} + /** * walk_page_range - walk page table with caller specific callbacks * @mm: mm_struct representing the target process of page table walk @@ -459,7 +486,7 @@ int walk_page_range(struct mm_struct *mm if (!walk.mm) return -EINVAL; - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); vma = find_vma(walk.mm, start); do { @@ -474,6 +501,7 @@ int walk_page_range(struct mm_struct *mm if (ops->pte_hole) err = ops->pte_hole(start, next, -1, &walk); } else { /* inside vma */ + process_vma_walk_lock(vma, ops->walk_lock); walk.vma = vma; next = min(end, vma->vm_end); vma = find_vma(mm, vma->vm_end); @@ -549,7 +577,8 @@ int walk_page_range_vma(struct vm_area_s if (start < vma->vm_start || end > vma->vm_end) return -EINVAL; - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(start, end, &walk); } @@ -566,7 +595,8 @@ int walk_page_vma(struct vm_area_struct if (!walk.mm) return -EINVAL; - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(vma->vm_start, vma->vm_end, &walk); } --- a/mm/vmscan.c~mm-enable-page-walking-api-to-lock-vmas-during-the-walk +++ a/mm/vmscan.c @@ -4284,6 +4284,7 @@ static void walk_mm(struct lruvec *lruve static const struct mm_walk_ops mm_walk_ops = { .test_walk = should_skip_vma, .p4d_entry = walk_pud_range, + .walk_lock = PGWALK_RDLOCK, }; int err; _ Patches currently in -mm which might be from surenb(a)google.com are swap-remove-remnants-of-polling-from-read_swap_cache_async.patch mm-add-missing-vm_fault_result_trace-name-for-vm_fault_completed.patch mm-drop-per-vma-lock-when-returning-vm_fault_retry-or-vm_fault_completed.patch mm-change-folio_lock_or_retry-to-use-vm_fault-directly.patch mm-handle-swap-page-faults-under-per-vma-lock.patch mm-handle-userfaults-under-vma-lock.patch mm-handle-userfaults-under-vma-lock-fix.patch mm-for-config_per_vma_lock-equate-write-lock-assertion-for-vma-and-mmap.patch mm-replace-mmap-with-vma-write-lock-assertions-when-operating-on-a-vma.patch mm-lock-vma-explicitly-before-doing-vm_flags_reset-and-vm_flags_reset_once.patch mm-always-lock-new-vma-before-inserting-into-vma-tree.patch mm-move-vma-locking-out-of-vma_prepare-and-dup_anon_vma.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT has been removed from the -mm tree. Its filename was mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT Date: Thu, 3 Aug 2023 16:32:02 +0200 Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") missed that follow_page() and follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting or due to inaccessible (PROT_NONE) VMAs. As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast"): "Other follow_page callers like KSM should not use FOLL_NUMA, or they would fail to get the pages if they use follow_page instead of get_user_pages." liubo reported [1] that smaps_rollup results are imprecise, because they miss accounting of pages that are mapped PROT_NONE. Further, it's easy to reproduce that KSM no longer works on inaccessible VMAs on x86-64, because pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs, and follow_page() refuses to return such pages right now. As KVM really depends on these NUMA hinting faults, removing the pte_protnone()/pmd_protnone() handling in GUP code completely is not really an option. To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT to restore the original behavior for now and add better comments. Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in is_valid_gup_args(), to add that flag for all external GUP users. Note that there are three GUP-internal __get_user_pages() users that don't end up calling is_valid_gup_args() and consequently won't get FOLL_HONOR_NUMA_FAULT set. 1) get_dump_page(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE and wouldn't have honored NUMA hinting faults already. 2) populate_vma_page_range(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have honored NUMA hinting faults already. 3) faultin_vma_page_range(): we similarly don't want to handle NUMA hinting faults. To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in inaccessible VMAs properly, we have to perform VMA accessibility checks in gup_can_follow_protnone(). As GUP-fast should reject such pages either way in pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and arm64 that both implement pte_protnone() -- let's just always fallback to ordinary GUP when stumbling over pte_protnone()/pmd_protnone(). As Linus notes [2], honoring NUMA faults might only make sense for selected GUP users. So we should really see if we can instead let relevant GUP callers specify it manually, and not trigger NUMA hinting faults from GUP as default. Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and adding appropriate documenation. While at it, remove a stale comment from follow_trans_huge_pmd(): That comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa: serialise parallel get_user_page against THP migration"), which noted: THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages Nowadays, we do have PMD migration entries, so the comment no longer applies. Let's drop it. [1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs… Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: liubo <liubo254(a)huawei.com> Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com Reported-by: Peter Xu <peterx(a)redhat.com> Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ Acked-by: Mel Gorman <mgorman(a)techsingularity.net> Acked-by: Peter Xu <peterx(a)redhat.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mm.h | 21 +++++++++++++++------ include/linux/mm_types.h | 9 +++++++++ mm/gup.c | 30 ++++++++++++++++++++++++------ mm/huge_memory.c | 3 +-- 4 files changed, 49 insertions(+), 14 deletions(-) --- a/include/linux/mm.h~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault +++ a/include/linux/mm.h @@ -3421,15 +3421,24 @@ static inline int vm_fault_to_errno(vm_f * Indicates whether GUP can follow a PROT_NONE mapped page, or whether * a (NUMA hinting) fault is required. */ -static inline bool gup_can_follow_protnone(unsigned int flags) +static inline bool gup_can_follow_protnone(struct vm_area_struct *vma, + unsigned int flags) { /* - * FOLL_FORCE has to be able to make progress even if the VMA is - * inaccessible. Further, FOLL_FORCE access usually does not represent - * application behaviour and we should avoid triggering NUMA hinting - * faults. + * If callers don't want to honor NUMA hinting faults, no need to + * determine if we would actually have to trigger a NUMA hinting fault. */ - return flags & FOLL_FORCE; + if (!(flags & FOLL_HONOR_NUMA_FAULT)) + return true; + + /* + * NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs. + * + * Requiring a fault here even for inaccessible VMAs would mean that + * FOLL_FORCE cannot make any progress, because handle_mm_fault() + * refuses to process NUMA hinting faults in inaccessible VMAs. + */ + return !vma_is_accessible(vma); } typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data); --- a/include/linux/mm_types.h~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault +++ a/include/linux/mm_types.h @@ -1286,6 +1286,15 @@ enum { FOLL_PCI_P2PDMA = 1 << 10, /* allow interrupts from generic signals */ FOLL_INTERRUPTIBLE = 1 << 11, + /* + * Always honor (trigger) NUMA hinting faults. + * + * FOLL_WRITE implicitly honors NUMA hinting faults because a + * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE + * apply). get_user_pages_fast_only() always implicitly honors NUMA + * hinting faults. + */ + FOLL_HONOR_NUMA_FAULT = 1 << 12, /* See also internal only FOLL flags in mm/internal.h */ }; --- a/mm/gup.c~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault +++ a/mm/gup.c @@ -597,7 +597,7 @@ static struct page *follow_page_pte(stru pte = ptep_get(ptep); if (!pte_present(pte)) goto no_page; - if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) + if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags)) goto no_page; page = vm_normal_page(vma, address, pte); @@ -714,7 +714,7 @@ static struct page *follow_pmd_mask(stru if (likely(!pmd_trans_huge(pmdval))) return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); - if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags)) + if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags); ptl = pmd_lock(mm, pmd); @@ -851,6 +851,10 @@ struct page *follow_page(struct vm_area_ if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) return NULL; + /* + * We never set FOLL_HONOR_NUMA_FAULT because callers don't expect + * to fail on PROT_NONE-mapped pages. + */ page = follow_page_mask(vma, address, foll_flags, &ctx); if (ctx.pgmap) put_dev_pagemap(ctx.pgmap); @@ -2227,6 +2231,13 @@ static bool is_valid_gup_args(struct pag gup_flags |= FOLL_UNLOCKABLE; } + /* + * For now, always trigger NUMA hinting faults. Some GUP users like + * KVM require the hint to be as the calling context of GUP is + * functionally similar to a memory reference from task context. + */ + gup_flags |= FOLL_HONOR_NUMA_FAULT; + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) == (FOLL_PIN | FOLL_GET))) @@ -2551,7 +2562,14 @@ static int gup_pte_range(pmd_t pmd, pmd_ struct page *page; struct folio *folio; - if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) + /* + * Always fallback to ordinary GUP on PROT_NONE-mapped pages: + * pte_access_permitted() better should reject these pages + * either way: otherwise, GUP-fast might succeed in + * cases where ordinary GUP would fail due to VMA access + * permissions. + */ + if (pte_protnone(pte)) goto pte_unmap; if (!pte_access_permitted(pte, flags & FOLL_WRITE)) @@ -2970,8 +2988,8 @@ static int gup_pmd_range(pud_t *pudp, pu if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) || pmd_devmap(pmd))) { - if (pmd_protnone(pmd) && - !gup_can_follow_protnone(flags)) + /* See gup_pte_range() */ + if (pmd_protnone(pmd)) return 0; if (!gup_huge_pmd(pmd, pmdp, addr, next, flags, @@ -3151,7 +3169,7 @@ static int internal_get_user_pages_fast( if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_FORCE | FOLL_PIN | FOLL_GET | FOLL_FAST_ONLY | FOLL_NOFAULT | - FOLL_PCI_P2PDMA))) + FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT))) return -EINVAL; if (gup_flags & FOLL_PIN) --- a/mm/huge_memory.c~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault +++ a/mm/huge_memory.c @@ -1467,8 +1467,7 @@ struct page *follow_trans_huge_pmd(struc if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) return ERR_PTR(-EFAULT); - /* Full NUMA hinting faults to serialise migration in fault paths */ - if (pmd_protnone(*pmd) && !gup_can_follow_protnone(flags)) + if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags)) return NULL; if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page)) _ Patches currently in -mm which might be from david(a)redhat.com are kvm-explicitly-set-foll_honor_numa_fault-in-hva_to_pfn_slow.patch mm-gup-dont-implicitly-set-foll_honor_numa_fault.patch pgtable-improve-pte_protnone-comment.patch selftest-mm-ksm_functional_tests-test-in-mmap_and_merge_range-if-anything-got-merged.patch selftest-mm-ksm_functional_tests-add-prot_none-test.patch selftest-mm-ksm_functional_tests-add-prot_none-test-fix.patch mm-swap-stop-using-page-private-on-tail-pages-for-thp_swap.patch mm-swap-inline-folio_set_swap_entry-and-folio_swap_entry.patch mm-huge_memory-work-on-folio-swap-instead-of-page-private-when-splitting-folio.patch

1 year, 10 months

1
0
0 0

[PATCH v0 8/9] media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3

by Bryan O'Donoghue

VC_MODE = 0 implies a two bit VC address. VC_MODE = 1 is required for VCs with a larger address than two bits. Fixes: eebe6d00e9bf ("media: camss: Add support for CSID hardware version Titan 170") Cc: stable(a)vger.kernel.org Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org> --- drivers/media/platform/qcom/camss/camss-csid-gen2.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/media/platform/qcom/camss/camss-csid-gen2.c b/drivers/media/platform/qcom/camss/camss-csid-gen2.c index 45c7986d4a8d0..9ac253111e572 100644 --- a/drivers/media/platform/qcom/camss/camss-csid-gen2.c +++ b/drivers/media/platform/qcom/camss/camss-csid-gen2.c @@ -352,7 +352,7 @@ static void __csid_configure_stream(struct csid_device *csid, u8 enable, u8 vc) phy_sel = csid->phy.csiphy_id; if (enable) { - u8 dt_id = vc; + u8 dt_id = vc * 4; if (tg->enabled) { /* Config Test Generator */ @@ -449,6 +449,8 @@ static void __csid_configure_stream(struct csid_device *csid, u8 enable, u8 vc) writel_relaxed(val, csid->base + CSID_CSI2_RX_CFG0); val = 1 << CSI2_RX_CFG1_PACKET_ECC_CORRECTION_EN; + if (vc > 3) + val |= 1 << CSI2_RX_CFG1_VC_MODE; val |= 1 << CSI2_RX_CFG1_MISR_EN; writel_relaxed(val, csid->base + CSID_CSI2_RX_CFG1); -- 2.41.0

1 year, 10 months

2
4
0 0

[PATCH v5 06/11] drm/radeon: Use RMW accessors for changing LNKCTL

by Ilpo Järvinen

Don't assume that only the driver would be accessing LNKCTL. ASPM policy changes can trigger write to LNKCTL outside of driver's control. And in the case of upstream bridge, the driver does not even own the device it's changing the registers for. Use RMW capability accessors which do proper locking to avoid losing concurrent updates to the register value. Fixes: 8a7cd27679d0 ("drm/radeon/cik: add support for pcie gen1/2/3 switching") Fixes: b9d305dfb66c ("drm/radeon: implement pcie gen2/3 support for SI") Suggested-by: Lukas Wunner <lukas(a)wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com> Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/radeon/cik.c | 36 ++++++++++------------------------- drivers/gpu/drm/radeon/si.c | 37 ++++++++++-------------------------- 2 files changed, 20 insertions(+), 53 deletions(-) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index 5819737c21c6..a6f3c811ceb8 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -9534,17 +9534,8 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev) u16 bridge_cfg2, gpu_cfg2; u32 max_lw, current_lw, tmp; - pcie_capability_read_word(root, PCI_EXP_LNKCTL, - &bridge_cfg); - pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL, - &gpu_cfg); - - tmp16 = bridge_cfg | PCI_EXP_LNKCTL_HAWD; - pcie_capability_write_word(root, PCI_EXP_LNKCTL, tmp16); - - tmp16 = gpu_cfg | PCI_EXP_LNKCTL_HAWD; - pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL, - tmp16); + pcie_capability_set_word(root, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_HAWD); + pcie_capability_set_word(rdev->pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_HAWD); tmp = RREG32_PCIE_PORT(PCIE_LC_STATUS1); max_lw = (tmp & LC_DETECTED_LINK_WIDTH_MASK) >> LC_DETECTED_LINK_WIDTH_SHIFT; @@ -9591,21 +9582,14 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev) msleep(100); /* linkctl */ - pcie_capability_read_word(root, PCI_EXP_LNKCTL, - &tmp16); - tmp16 &= ~PCI_EXP_LNKCTL_HAWD; - tmp16 |= (bridge_cfg & PCI_EXP_LNKCTL_HAWD); - pcie_capability_write_word(root, PCI_EXP_LNKCTL, - tmp16); - - pcie_capability_read_word(rdev->pdev, - PCI_EXP_LNKCTL, - &tmp16); - tmp16 &= ~PCI_EXP_LNKCTL_HAWD; - tmp16 |= (gpu_cfg & PCI_EXP_LNKCTL_HAWD); - pcie_capability_write_word(rdev->pdev, - PCI_EXP_LNKCTL, - tmp16); + pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL, + PCI_EXP_LNKCTL_HAWD, + bridge_cfg & + PCI_EXP_LNKCTL_HAWD); + pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL, + PCI_EXP_LNKCTL_HAWD, + gpu_cfg & + PCI_EXP_LNKCTL_HAWD); /* linkctl2 */ pcie_capability_read_word(root, PCI_EXP_LNKCTL2, diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c index 8d5e4b25609d..a91012447b56 100644 --- a/drivers/gpu/drm/radeon/si.c +++ b/drivers/gpu/drm/radeon/si.c @@ -7131,17 +7131,8 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev) u16 bridge_cfg2, gpu_cfg2; u32 max_lw, current_lw, tmp; - pcie_capability_read_word(root, PCI_EXP_LNKCTL, - &bridge_cfg); - pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL, - &gpu_cfg); - - tmp16 = bridge_cfg | PCI_EXP_LNKCTL_HAWD; - pcie_capability_write_word(root, PCI_EXP_LNKCTL, tmp16); - - tmp16 = gpu_cfg | PCI_EXP_LNKCTL_HAWD; - pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL, - tmp16); + pcie_capability_set_word(root, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_HAWD); + pcie_capability_set_word(rdev->pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_HAWD); tmp = RREG32_PCIE(PCIE_LC_STATUS1); max_lw = (tmp & LC_DETECTED_LINK_WIDTH_MASK) >> LC_DETECTED_LINK_WIDTH_SHIFT; @@ -7188,22 +7179,14 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev) msleep(100); /* linkctl */ - pcie_capability_read_word(root, PCI_EXP_LNKCTL, - &tmp16); - tmp16 &= ~PCI_EXP_LNKCTL_HAWD; - tmp16 |= (bridge_cfg & PCI_EXP_LNKCTL_HAWD); - pcie_capability_write_word(root, - PCI_EXP_LNKCTL, - tmp16); - - pcie_capability_read_word(rdev->pdev, - PCI_EXP_LNKCTL, - &tmp16); - tmp16 &= ~PCI_EXP_LNKCTL_HAWD; - tmp16 |= (gpu_cfg & PCI_EXP_LNKCTL_HAWD); - pcie_capability_write_word(rdev->pdev, - PCI_EXP_LNKCTL, - tmp16); + pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL, + PCI_EXP_LNKCTL_HAWD, + bridge_cfg & + PCI_EXP_LNKCTL_HAWD); + pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL, + PCI_EXP_LNKCTL_HAWD, + gpu_cfg & + PCI_EXP_LNKCTL_HAWD); /* linkctl2 */ pcie_capability_read_word(root, PCI_EXP_LNKCTL2, -- 2.30.2

1 year, 10 months

3
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2023