- Linux-stable-mirror - lists.linaro.org

[merged mm-hotfixes-stable] maple_tree-fix-get-wrong-data_end-in-mtree_lookup_walk.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: maple_tree: fix get wrong data_end in mtree_lookup_walk() has been removed from the -mm tree. Its filename was maple_tree-fix-get-wrong-data_end-in-mtree_lookup_walk.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Peng Zhang <zhangpeng.00(a)bytedance.com> Subject: maple_tree: fix get wrong data_end in mtree_lookup_walk() Date: Tue, 14 Mar 2023 20:42:01 +0800 if (likely(offset > end)) max = pivots[offset]; The above code should be changed to if (likely(offset < end)), which is correct. This affects the correctness of ma_data_end(). Now it seems that the final result will not be wrong, but it is best to change it. This patch does not change the code as above, because it simplifies the code by the way. Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/maple_tree.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) --- a/lib/maple_tree.c~maple_tree-fix-get-wrong-data_end-in-mtree_lookup_walk +++ a/lib/maple_tree.c @@ -3941,18 +3941,13 @@ static inline void *mtree_lookup_walk(st end = ma_data_end(node, type, pivots, max); if (unlikely(ma_dead_node(node))) goto dead_node; - - if (pivots[offset] >= mas->index) - goto next; - do { - offset++; - } while ((offset < end) && (pivots[offset] < mas->index)); - - if (likely(offset > end)) - max = pivots[offset]; + if (pivots[offset] >= mas->index) { + max = pivots[offset]; + break; + } + } while (++offset < end); -next: slots = ma_slots(node, type); next = mt_slot(mas->tree, slots, offset); if (unlikely(ma_dead_node(node))) _ Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch maple_tree-simplify-mas_wr_node_walk.patch

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() has been removed from the -mm tree. Its filename was mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com> Subject: mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() Date: Tue, 4 Apr 2023 23:47:16 +0800 The si->lock must be held when deleting the si from the available list. Otherwise, another thread can re-add the si to the available list, which can lead to memory corruption. The only place we have found where this happens is in the swapoff path. This case can be described as below: core 0 core 1 swapoff del_from_avail_list(si) waiting try lock si->lock acquire swap_avail_lock and re-add si into swap_avail_head acquire si->lock but missing si already being added again, and continuing to clear SWP_WRITEOK, etc. It can be easily found that a massive warning messages can be triggered inside get_swap_pages() by some special cases, for example, we call madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile, run much swapon-swapoff operations (e.g. stress-ng-swap). However, in the worst case, panic can be caused by the above scene. In swapoff(), the memory used by si could be kept in swap_info[] after turning off a swap. This means memory corruption will not be caused immediately until allocated and reset for a new swap in the swapon path. A panic message caused: (with CONFIG_PLIST_DEBUG enabled) ------------[ cut here ]------------ top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70 Modules linked in: rfkill(E) crct10dif_ce(E)... CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+ Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) pc : plist_check_prev_next_node+0x50/0x70 lr : plist_check_prev_next_node+0x50/0x70 sp : ffff0018009d3c30 x29: ffff0018009d3c40 x28: ffff800011b32a98 x27: 0000000000000000 x26: ffff001803908000 x25: ffff8000128ea088 x24: ffff800011b32a48 x23: 0000000000000028 x22: ffff001800875c00 x21: ffff800010f9e520 x20: ffff001800875c00 x19: ffff001800fdc6e0 x18: 0000000000000030 x17: 0000000000000000 x16: 0000000000000000 x15: 0736076307640766 x14: 0730073007380731 x13: 0736076307640766 x12: 0730073007380731 x11: 000000000004058d x10: 0000000085a85b76 x9 : ffff8000101436e4 x8 : ffff800011c8ce08 x7 : 0000000000000000 x6 : 0000000000000001 x5 : ffff0017df9ed338 x4 : 0000000000000001 x3 : ffff8017ce62a000 x2 : ffff0017df9ed340 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: plist_check_prev_next_node+0x50/0x70 plist_check_head+0x80/0xf0 plist_add+0x28/0x140 add_to_avail_list+0x9c/0xf0 _enable_swap_info+0x78/0xb4 __do_sys_swapon+0x918/0xa10 __arm64_sys_swapon+0x20/0x30 el0_svc_common+0x8c/0x220 do_el0_svc+0x2c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 irq event stamp: 2082270 Now, si->lock locked before calling 'del_from_avail_list()' to make sure other thread see the si had been deleted and SWP_WRITEOK cleared together, will not reinsert again. This problem exists in versions after stable 5.10.y. Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba… Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node") Tested-by: Yongchen Yin <wb-yyc939293(a)alibaba-inc.com> Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Aaron Lu <aaron.lu(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/swapfile.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/swapfile.c~mm-swap-fix-swap_info_struct-race-between-swapoff-and-get_swap_pages +++ a/mm/swapfile.c @@ -679,6 +679,7 @@ static void __del_from_avail_list(struct { int nid; + assert_spin_locked(&p->lock); for_each_node(nid) plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]); } @@ -2434,8 +2435,8 @@ SYSCALL_DEFINE1(swapoff, const char __us spin_unlock(&swap_lock); goto out_dput; } - del_from_avail_list(p); spin_lock(&p->lock); + del_from_avail_list(p); if (p->prio < 0) { struct swap_info_struct *si = p; int nid; _ Patches currently in -mm which might be from rongwei.wang(a)linux.alibaba.com are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] nilfs2-fix-sysfs-interface-lifetime.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: nilfs2: fix sysfs interface lifetime has been removed from the -mm tree. Its filename was nilfs2-fix-sysfs-interface-lifetime.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Subject: nilfs2: fix sysfs interface lifetime Date: Fri, 31 Mar 2023 05:55:15 +0900 The current nilfs2 sysfs support has issues with the timing of creation and deletion of sysfs entries, potentially leading to null pointer dereferences, use-after-free, and lockdep warnings. Some of the sysfs attributes for nilfs2 per-filesystem instance refer to metadata file "cpfile", "sufile", or "dat", but nilfs_sysfs_create_device_group that creates those attributes is executed before the inodes for these metadata files are loaded, and nilfs_sysfs_delete_device_group which deletes these sysfs entries is called after releasing their metadata file inodes. Therefore, access to some of these sysfs attributes may occur outside of the lifetime of these metadata files, resulting in inode NULL pointer dereferences or use-after-free. In addition, the call to nilfs_sysfs_create_device_group() is made during the locking period of the semaphore "ns_sem" of nilfs object, so the shrinker call caused by the memory allocation for the sysfs entries, may derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in nilfs_evict_inode()". Since nilfs2 may acquire "ns_sem" deep in the call stack holding other locks via its error handler __nilfs_error(), this causes lockdep to report circular locking. This is a false positive and no circular locking actually occurs as no inodes exist yet when nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep warnings can be resolved by simply moving the call to nilfs_sysfs_create_device_group() out of "ns_sem". This fixes these sysfs issues by revising where the device's sysfs interface is created/deleted and keeping its lifetime within the lifetime of the metadata files above. Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+979fa7f9c0d086fdc282(a)syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com Reported-by: syzbot+5b7d542076d9bddc3c6a(a)syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com Cc: Viacheslav Dubeyko <slava(a)dubeyko.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/super.c | 2 ++ fs/nilfs2/the_nilfs.c | 12 +++++++----- 2 files changed, 9 insertions(+), 5 deletions(-) --- a/fs/nilfs2/super.c~nilfs2-fix-sysfs-interface-lifetime +++ a/fs/nilfs2/super.c @@ -482,6 +482,7 @@ static void nilfs_put_super(struct super up_write(&nilfs->ns_sem); } + nilfs_sysfs_delete_device_group(nilfs); iput(nilfs->ns_sufile); iput(nilfs->ns_cpfile); iput(nilfs->ns_dat); @@ -1105,6 +1106,7 @@ nilfs_fill_super(struct super_block *sb, nilfs_put_root(fsroot); failed_unload: + nilfs_sysfs_delete_device_group(nilfs); iput(nilfs->ns_sufile); iput(nilfs->ns_cpfile); iput(nilfs->ns_dat); --- a/fs/nilfs2/the_nilfs.c~nilfs2-fix-sysfs-interface-lifetime +++ a/fs/nilfs2/the_nilfs.c @@ -87,7 +87,6 @@ void destroy_nilfs(struct the_nilfs *nil { might_sleep(); if (nilfs_init(nilfs)) { - nilfs_sysfs_delete_device_group(nilfs); brelse(nilfs->ns_sbh[0]); brelse(nilfs->ns_sbh[1]); } @@ -305,6 +304,10 @@ int load_nilfs(struct the_nilfs *nilfs, goto failed; } + err = nilfs_sysfs_create_device_group(sb); + if (unlikely(err)) + goto sysfs_error; + if (valid_fs) goto skip_recovery; @@ -366,6 +369,9 @@ int load_nilfs(struct the_nilfs *nilfs, goto failed; failed_unload: + nilfs_sysfs_delete_device_group(nilfs); + + sysfs_error: iput(nilfs->ns_cpfile); iput(nilfs->ns_sufile); iput(nilfs->ns_dat); @@ -697,10 +703,6 @@ int init_nilfs(struct the_nilfs *nilfs, if (err) goto failed_sbh; - err = nilfs_sysfs_create_device_group(sb); - if (err) - goto failed_sbh; - set_nilfs_init(nilfs); err = 0; out: _ Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] mm-take-a-page-reference-when-removing-device-exclusive-entries.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: take a page reference when removing device exclusive entries has been removed from the -mm tree. Its filename was mm-take-a-page-reference-when-removing-device-exclusive-entries.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Alistair Popple <apopple(a)nvidia.com> Subject: mm: take a page reference when removing device exclusive entries Date: Thu, 30 Mar 2023 12:25:19 +1100 Device exclusive page table entries are used to prevent CPU access to a page whilst it is being accessed from a device. Typically this is used to implement atomic operations when the underlying bus does not support atomic access. When a CPU thread encounters a device exclusive entry it locks the page and restores the original entry after calling mmu notifiers to signal drivers that exclusive access is no longer available. The device exclusive entry holds a reference to the page making it safe to access the struct page whilst the entry is present. However the fault handling code does not hold the PTL when taking the page lock. This means if there are multiple threads faulting concurrently on the device exclusive entry one will remove the entry whilst others will wait on the page lock without holding a reference. This can lead to threads locking or waiting on a folio with a zero refcount. Whilst mmap_lock prevents the pages getting freed via munmap() they may still be freed by a migration. This leads to warnings such as PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount drops to zero. Fix this by trying to take a reference on the folio before locking it. The code already checks the PTE under the PTL and aborts if the entry is no longer there. It is also possible the folio has been unmapped, freed and re-allocated allowing a reference to be taken on an unrelated folio. This case is also detected by the PTE check and the folio is unlocked without further changes. Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: Alistair Popple <apopple(a)nvidia.com> Reviewed-by: Ralph Campbell <rcampbell(a)nvidia.com> Reviewed-by: John Hubbard <jhubbard(a)nvidia.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) --- a/mm/memory.c~mm-take-a-page-reference-when-removing-device-exclusive-entries +++ a/mm/memory.c @@ -3563,8 +3563,21 @@ static vm_fault_t remove_device_exclusiv struct vm_area_struct *vma = vmf->vma; struct mmu_notifier_range range; - if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) + /* + * We need a reference to lock the folio because we don't hold + * the PTL so a racing thread can remove the device-exclusive + * entry and unmap it. If the folio is free the entry must + * have been removed already. If it happens to have already + * been re-allocated after being freed all we do is lock and + * unlock it. + */ + if (!folio_try_get(folio)) + return 0; + + if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) { + folio_put(folio); return VM_FAULT_RETRY; + } mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma->vm_mm, vmf->address & PAGE_MASK, (vmf->address & PAGE_MASK) + PAGE_SIZE, NULL); @@ -3577,6 +3590,7 @@ static vm_fault_t remove_device_exclusiv pte_unmap_unlock(vmf->pte, vmf->ptl); folio_unlock(folio); + folio_put(folio); mmu_notifier_invalidate_range_end(&range); return 0; _ Patches currently in -mm which might be from apopple(a)nvidia.com are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] mm-vmalloc-avoid-warn_alloc-noise-caused-by-fatal-signal.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: vmalloc: avoid warn_alloc noise caused by fatal signal has been removed from the -mm tree. Its filename was mm-vmalloc-avoid-warn_alloc-noise-caused-by-fatal-signal.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Yafang Shao <laoar.shao(a)gmail.com> Subject: mm: vmalloc: avoid warn_alloc noise caused by fatal signal Date: Thu, 30 Mar 2023 16:26:25 +0000 There're some suspicious warn_alloc on my test serer, for example, [13366.518837] warn_alloc: 81 callbacks suppressed [13366.518841] test_verifier: vmalloc error: size 4096, page order 0, failed to allocate pages, mode:0x500dc2(GFP_HIGHUSER|__GFP_ZERO|__GFP_ACCOUNT), nodemask=(null),cpuset=/,mems_allowed=0-1 [13366.522240] CPU: 30 PID: 722463 Comm: test_verifier Kdump: loaded Tainted: G W O 6.2.0+ #638 [13366.524216] Call Trace: [13366.524702] <TASK> [13366.525148] dump_stack_lvl+0x6c/0x80 [13366.525712] dump_stack+0x10/0x20 [13366.526239] warn_alloc+0x119/0x190 [13366.526783] ? alloc_pages_bulk_array_mempolicy+0x9e/0x2a0 [13366.527470] __vmalloc_area_node+0x546/0x5b0 [13366.528066] __vmalloc_node_range+0xc2/0x210 [13366.528660] __vmalloc_node+0x42/0x50 [13366.529186] ? bpf_prog_realloc+0x53/0xc0 [13366.529743] __vmalloc+0x1e/0x30 [13366.530235] bpf_prog_realloc+0x53/0xc0 [13366.530771] bpf_patch_insn_single+0x80/0x1b0 [13366.531351] bpf_jit_blind_constants+0xe9/0x1c0 [13366.531932] ? __free_pages+0xee/0x100 [13366.532457] ? free_large_kmalloc+0x58/0xb0 [13366.533002] bpf_int_jit_compile+0x8c/0x5e0 [13366.533546] bpf_prog_select_runtime+0xb4/0x100 [13366.534108] bpf_prog_load+0x6b1/0xa50 [13366.534610] ? perf_event_task_tick+0x96/0xb0 [13366.535151] ? security_capable+0x3a/0x60 [13366.535663] __sys_bpf+0xb38/0x2190 [13366.536120] ? kvm_clock_get_cycles+0x9/0x10 [13366.536643] __x64_sys_bpf+0x1c/0x30 [13366.537094] do_syscall_64+0x38/0x90 [13366.537554] entry_SYSCALL_64_after_hwframe+0x72/0xdc [13366.538107] RIP: 0033:0x7f78310f8e29 [13366.538561] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48 [13366.540286] RSP: 002b:00007ffe2a61fff8 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [13366.541031] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f78310f8e29 [13366.541749] RDX: 0000000000000080 RSI: 00007ffe2a6200b0 RDI: 0000000000000005 [13366.542470] RBP: 00007ffe2a620010 R08: 00007ffe2a6202a0 R09: 00007ffe2a6200b0 [13366.543183] R10: 00000000000f423e R11: 0000000000000206 R12: 0000000000407800 [13366.543900] R13: 00007ffe2a620540 R14: 0000000000000000 R15: 0000000000000000 [13366.544623] </TASK> [13366.545260] Mem-Info: [13366.546121] active_anon:81319 inactive_anon:20733 isolated_anon:0 active_file:69450 inactive_file:5624 isolated_file:0 unevictable:0 dirty:10 writeback:0 slab_reclaimable:69649 slab_unreclaimable:48930 mapped:27400 shmem:12868 pagetables:4929 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:15870308 free_pcp:142935 free_cma:0 [13366.551886] Node 0 active_anon:224836kB inactive_anon:33528kB active_file:175692kB inactive_file:13752kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:59248kB dirty:32kB writeback:0kB shmem:18252kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:4616kB pagetables:10664kB sec_pagetables:0kB all_unreclaimable? no [13366.555184] Node 1 active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:50352kB dirty:8kB writeback:0kB shmem:33220kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3896kB pagetables:9052kB sec_pagetables:0kB all_unreclaimable? no [13366.558262] Node 0 DMA free:15360kB boost:0kB min:304kB low:380kB high:456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [13366.560821] lowmem_reserve[]: 0 2735 31873 31873 31873 [13366.561981] Node 0 DMA32 free:2790904kB boost:0kB min:56028kB low:70032kB high:84036kB reserved_highatomic:0KB active_anon:1936kB inactive_anon:20kB active_file:396kB inactive_file:344kB unevictable:0kB writepending:0kB present:3129200kB managed:2801520kB mlocked:0kB bounce:0kB free_pcp:5188kB local_pcp:0kB free_cma:0kB [13366.565148] lowmem_reserve[]: 0 0 29137 29137 29137 [13366.566168] Node 0 Normal free:28533824kB boost:0kB min:596740kB low:745924kB high:895108kB reserved_highatomic:28672KB active_anon:222900kB inactive_anon:33508kB active_file:175296kB inactive_file:13408kB unevictable:0kB writepending:32kB present:30408704kB managed:29837172kB mlocked:0kB bounce:0kB free_pcp:295724kB local_pcp:0kB free_cma:0kB [13366.569485] lowmem_reserve[]: 0 0 0 0 0 [13366.570416] Node 1 Normal free:32141144kB boost:0kB min:660504kB low:825628kB high:990752kB reserved_highatomic:69632KB active_anon:100440kB inactive_anon:49404kB active_file:102108kB inactive_file:8744kB unevictable:0kB writepending:8kB present:33554432kB managed:33025372kB mlocked:0kB bounce:0kB free_pcp:270880kB local_pcp:46860kB free_cma:0kB [13366.573403] lowmem_reserve[]: 0 0 0 0 0 [13366.574015] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB [13366.575474] Node 0 DMA32: 782*4kB (UME) 756*8kB (UME) 736*16kB (UME) 745*32kB (UME) 694*64kB (UME) 653*128kB (UME) 595*256kB (UME) 552*512kB (UME) 454*1024kB (UME) 347*2048kB (UME) 246*4096kB (UME) = 2790904kB [13366.577442] Node 0 Normal: 33856*4kB (UMEH) 51815*8kB (UMEH) 42418*16kB (UMEH) 36272*32kB (UMEH) 22195*64kB (UMEH) 10296*128kB (UMEH) 7238*256kB (UMEH) 5638*512kB (UEH) 5337*1024kB (UMEH) 3506*2048kB (UMEH) 1470*4096kB (UME) = 28533784kB [13366.580460] Node 1 Normal: 15776*4kB (UMEH) 37485*8kB (UMEH) 29509*16kB (UMEH) 21420*32kB (UMEH) 14818*64kB (UMEH) 13051*128kB (UMEH) 9918*256kB (UMEH) 7374*512kB (UMEH) 5397*1024kB (UMEH) 3887*2048kB (UMEH) 2002*4096kB (UME) = 32141240kB [13366.583027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [13366.584380] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [13366.585702] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [13366.587042] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [13366.588372] 87386 total pagecache pages [13366.589266] 0 pages in swap cache [13366.590327] Free swap = 0kB [13366.591227] Total swap = 0kB [13366.592142] 16777082 pages RAM [13366.593057] 0 pages HighMem/MovableOnly [13366.594037] 357226 pages reserved [13366.594979] 0 pages hwpoisoned This failure really confuse me as there're still lots of available pages. Finally I figured out it was caused by a fatal signal. When a process is allocating memory via vm_area_alloc_pages(), it will break directly even if it hasn't allocated the requested pages when it receives a fatal signal. In that case, we shouldn't show this warn_alloc, as it is useless. We only need to show this warning when there're really no enough pages. Link: https://lkml.kernel.org/r/20230330162625.13604-1-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com> Reviewed-by: Lorenzo Stoakes <lstoakes(a)gmail.com> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmalloc.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-avoid-warn_alloc-noise-caused-by-fatal-signal +++ a/mm/vmalloc.c @@ -3042,9 +3042,11 @@ static void *__vmalloc_area_node(struct * allocation request, free them via vfree() if any. */ if (area->nr_pages != nr_small_pages) { - warn_alloc(gfp_mask, NULL, - "vmalloc error: size %lu, page order %u, failed to allocate pages", - area->nr_pages * PAGE_SIZE, page_order); + /* vm_area_alloc_pages() can also fail due to a fatal signal */ + if (!fatal_signal_pending(current)) + warn_alloc(gfp_mask, NULL, + "vmalloc error: size %lu, page order %u, failed to allocate pages", + area->nr_pages * PAGE_SIZE, page_order); goto fail; } _ Patches currently in -mm which might be from laoar.shao(a)gmail.com are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] nilfs2-fix-potential-uaf-of-struct-nilfs_sc_info-in-nilfs_segctor_thread.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() has been removed from the -mm tree. Its filename was nilfs2-fix-potential-uaf-of-struct-nilfs_sc_info-in-nilfs_segctor_thread.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Subject: nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() Date: Tue, 28 Mar 2023 02:53:18 +0900 The finalization of nilfs_segctor_thread() can race with nilfs_segctor_kill_thread() which terminates that thread, potentially causing a use-after-free BUG as KASAN detected. At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member of "struct nilfs_sc_info" to indicate the thread has finished, and then notifies nilfs_segctor_kill_thread() of this using waitqueue "sc_wait_task" on the struct nilfs_sc_info. However, here, immediately after the NULL assignment to "sc_task", it is possible that nilfs_segctor_kill_thread() will detect it and return to continue the deallocation, freeing the nilfs_sc_info structure before the thread does the notification. This fixes the issue by protecting the NULL assignment to "sc_task" and its notification, with spinlock "sc_state_lock" of the struct nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate the race. Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com Reported-by: syzbot+b08ebcc22f8f3e6be43a(a)syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/segment.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/fs/nilfs2/segment.c~nilfs2-fix-potential-uaf-of-struct-nilfs_sc_info-in-nilfs_segctor_thread +++ a/fs/nilfs2/segment.c @@ -2609,11 +2609,10 @@ static int nilfs_segctor_thread(void *ar goto loop; end_thread: - spin_unlock(&sci->sc_state_lock); - /* end sync. */ sci->sc_task = NULL; wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */ + spin_unlock(&sci->sc_state_lock); return 0; } _ Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] zsmalloc-document-freeable-stats.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: zsmalloc: document freeable stats has been removed from the -mm tree. Its filename was zsmalloc-document-freeable-stats.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Sergey Senozhatsky <senozhatsky(a)chromium.org> Subject: zsmalloc: document freeable stats Date: Sat, 25 Mar 2023 11:46:31 +0900 When freeable class stat was added to classes file (back in 2016) we forgot to update zsmalloc documentation. Fix that. Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org Fixes: 1120ed548394 ("mm/zsmalloc: add `freeable' column to pool stat") Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Cc: Minchan Kim <minchan(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/mm/zsmalloc.rst | 2 ++ 1 file changed, 2 insertions(+) --- a/Documentation/mm/zsmalloc.rst~zsmalloc-document-freeable-stats +++ a/Documentation/mm/zsmalloc.rst @@ -83,6 +83,8 @@ pages_used the number of pages allocated for the class pages_per_zspage the number of 0-order pages to make a zspage +freeable + the approximate number of pages class compaction can free Each zspage maintains inuse counter which keeps track of the number of objects stored in the zspage. The inuse counter determines the zspage's _ Patches currently in -mm which might be from senozhatsky(a)chromium.org are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] zsmalloc-document-new-fullness-grouping.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: zsmalloc: document new fullness grouping has been removed from the -mm tree. Its filename was zsmalloc-document-new-fullness-grouping.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Sergey Senozhatsky <senozhatsky(a)chromium.org> Subject: zsmalloc: document new fullness grouping Date: Sat, 25 Mar 2023 11:46:30 +0900 Patch series "zsmalloc: minor documentation updates". Two minor patches that bring zsmalloc documentation up to date. This patch (of 2): Update documentation and reflect new zspages fullness grouping (we don't use almost_empty and almost_full anymore). Link: https://lkml.kernel.org/r/20230325024631.2817153-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20230325024631.2817153-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Fixes: 67e157eb3639 ("zsmalloc: show per fullness group class stats") Cc: Minchan Kim <minchan(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/mm/zsmalloc.rst | 133 +++++++++++++++++--------------- 1 file changed, 74 insertions(+), 59 deletions(-) --- a/Documentation/mm/zsmalloc.rst~zsmalloc-document-new-fullness-grouping +++ a/Documentation/mm/zsmalloc.rst @@ -39,13 +39,12 @@ With CONFIG_ZSMALLOC_STAT, we could see # cat /sys/kernel/debug/zsmalloc/zram0/classes - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage + class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable ... ... - 9 176 0 1 186 129 8 4 - 10 192 1 0 2880 2872 135 3 - 11 208 0 1 819 795 42 2 - 12 224 0 1 219 159 12 4 + 30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14 + 31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44 + 32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26 ... ... @@ -54,10 +53,28 @@ class index size object size zspage stores -almost_empty - the number of ZS_ALMOST_EMPTY zspages(see below) -almost_full - the number of ZS_ALMOST_FULL zspages(see below) +10% + the number of zspages with usage ratio less than 10% (see below) +20% + the number of zspages with usage ratio between 10% and 20% +30% + the number of zspages with usage ratio between 20% and 30% +40% + the number of zspages with usage ratio between 30% and 40% +50% + the number of zspages with usage ratio between 40% and 50% +60% + the number of zspages with usage ratio between 50% and 60% +70% + the number of zspages with usage ratio between 60% and 70% +80% + the number of zspages with usage ratio between 70% and 80% +90% + the number of zspages with usage ratio between 80% and 90% +99% + the number of zspages with usage ratio between 90% and 99% +100% + the number of zspages with usage ratio 100% obj_allocated the number of objects allocated obj_used @@ -67,18 +84,11 @@ pages_used pages_per_zspage the number of 0-order pages to make a zspage -We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where - -* n = number of allocated objects -* N = total number of objects zspage can store -* f = fullness_threshold_frac(ie, 4 at the moment) - -Similarly, we assign zspage to: - -* ZS_ALMOST_FULL when n > N / f -* ZS_EMPTY when n == 0 -* ZS_FULL when n == N - +Each zspage maintains inuse counter which keeps track of the number of +objects stored in the zspage. The inuse counter determines the zspage's +"fullness group" which is calculated as the ratio of the "inuse" objects to +the total number of objects the zspage can hold (objs_per_zspage). The +closer the inuse counter is to objs_per_zspage, the better. Internals ========= @@ -94,10 +104,10 @@ of objects that each zspage can store. For instance, consider the following size classes::: - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable ... - 94 1536 0 0 0 0 0 3 0 - 100 1632 0 0 0 0 0 2 0 + 94 1536 0 .... 0 0 0 0 3 0 + 100 1632 0 .... 0 0 0 0 2 0 ... @@ -134,10 +144,11 @@ reduces memory wastage. Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable + ... - 202 3264 0 0 0 0 0 4 0 - 254 4096 0 0 0 0 0 1 0 + 202 3264 0 .. 0 0 0 0 4 0 + 254 4096 0 .. 0 0 0 0 1 0 ... Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages @@ -151,40 +162,42 @@ efficient storage of large objects. For zspage chain size of 8, huge class watermark becomes 3632 bytes::: - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable + ... - 202 3264 0 0 0 0 0 4 0 - 211 3408 0 0 0 0 0 5 0 - 217 3504 0 0 0 0 0 6 0 - 222 3584 0 0 0 0 0 7 0 - 225 3632 0 0 0 0 0 8 0 - 254 4096 0 0 0 0 0 1 0 + 202 3264 0 .. 0 0 0 0 4 0 + 211 3408 0 .. 0 0 0 0 5 0 + 217 3504 0 .. 0 0 0 0 6 0 + 222 3584 0 .. 0 0 0 0 7 0 + 225 3632 0 .. 0 0 0 0 8 0 + 254 4096 0 .. 0 0 0 0 1 0 ... For zspage chain size of 16, huge class watermark becomes 3840 bytes::: - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable + ... - 202 3264 0 0 0 0 0 4 0 - 206 3328 0 0 0 0 0 13 0 - 207 3344 0 0 0 0 0 9 0 - 208 3360 0 0 0 0 0 14 0 - 211 3408 0 0 0 0 0 5 0 - 212 3424 0 0 0 0 0 16 0 - 214 3456 0 0 0 0 0 11 0 - 217 3504 0 0 0 0 0 6 0 - 219 3536 0 0 0 0 0 13 0 - 222 3584 0 0 0 0 0 7 0 - 223 3600 0 0 0 0 0 15 0 - 225 3632 0 0 0 0 0 8 0 - 228 3680 0 0 0 0 0 9 0 - 230 3712 0 0 0 0 0 10 0 - 232 3744 0 0 0 0 0 11 0 - 234 3776 0 0 0 0 0 12 0 - 235 3792 0 0 0 0 0 13 0 - 236 3808 0 0 0 0 0 14 0 - 238 3840 0 0 0 0 0 15 0 - 254 4096 0 0 0 0 0 1 0 + 202 3264 0 .. 0 0 0 0 4 0 + 206 3328 0 .. 0 0 0 0 13 0 + 207 3344 0 .. 0 0 0 0 9 0 + 208 3360 0 .. 0 0 0 0 14 0 + 211 3408 0 .. 0 0 0 0 5 0 + 212 3424 0 .. 0 0 0 0 16 0 + 214 3456 0 .. 0 0 0 0 11 0 + 217 3504 0 .. 0 0 0 0 6 0 + 219 3536 0 .. 0 0 0 0 13 0 + 222 3584 0 .. 0 0 0 0 7 0 + 223 3600 0 .. 0 0 0 0 15 0 + 225 3632 0 .. 0 0 0 0 8 0 + 228 3680 0 .. 0 0 0 0 9 0 + 230 3712 0 .. 0 0 0 0 10 0 + 232 3744 0 .. 0 0 0 0 11 0 + 234 3776 0 .. 0 0 0 0 12 0 + 235 3792 0 .. 0 0 0 0 13 0 + 236 3808 0 .. 0 0 0 0 14 0 + 238 3840 0 .. 0 0 0 0 15 0 + 254 4096 0 .. 0 0 0 0 1 0 ... Overall the combined zspage chain size effect on zsmalloc pool configuration::: @@ -214,9 +227,10 @@ zram as a build artifacts storage (Linux zsmalloc classes stats::: - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable + ... - Total 13 51 413836 412973 159955 3 + Total 13 .. 51 413836 412973 159955 3 zram mm_stat::: @@ -227,9 +241,10 @@ zram as a build artifacts storage (Linux zsmalloc classes stats::: - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable + ... - Total 18 87 414852 412978 156666 0 + Total 18 .. 87 414852 412978 156666 0 zram mm_stat::: _ Patches currently in -mm which might be from senozhatsky(a)chromium.org are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] fsdax-force-clear-dirty-mark-if-cow.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: fsdax: force clear dirty mark if CoW has been removed from the -mm tree. Its filename was fsdax-force-clear-dirty-mark-if-cow.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Shiyang Ruan <ruansy.fnst(a)fujitsu.com> Subject: fsdax: force clear dirty mark if CoW Date: Fri, 24 Mar 2023 10:28:00 +0000 XFS allows CoW on non-shared extents to combat fragmentation[1]. The old non-shared extent could be mwrited before, its dax entry is marked dirty. This results in a WARNing: [ 28.512349] ------------[ cut here ]------------ [ 28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390 [ 28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables [ 28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117 [ 28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014 [ 28.516307] RIP: 0010:dax_insert_entry+0x342/0x390 [ 28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48 [ 28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086 [ 28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b [ 28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff [ 28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8 [ 28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155 [ 28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8 [ 28.519703] FS: 00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000 [ 28.520148] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0 [ 28.520863] PKRU: 55555554 [ 28.521043] Call Trace: [ 28.521219] <TASK> [ 28.521368] dax_fault_iter+0x196/0x390 [ 28.521595] dax_iomap_pte_fault+0x19b/0x3d0 [ 28.521852] __xfs_filemap_fault+0x234/0x2b0 [ 28.522116] __do_fault+0x30/0x130 [ 28.522334] do_fault+0x193/0x340 [ 28.522586] __handle_mm_fault+0x2d3/0x690 [ 28.522975] handle_mm_fault+0xe6/0x2c0 [ 28.523259] do_user_addr_fault+0x1bc/0x6f0 [ 28.523521] exc_page_fault+0x60/0x140 [ 28.523763] asm_exc_page_fault+0x22/0x30 [ 28.524001] RIP: 0033:0x7f14a0b589ca [ 28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 [ 28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202 [ 28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046 [ 28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000 [ 28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000 [ 28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c [ 28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0 [ 28.527449] </TASK> [ 28.527600] ---[ end trace 0000000000000000 ]--- To be able to delete this entry, clear its dirty mark before invalidate_inode_pages2_range(). [1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/ Link: https://lkml.kernel.org/r/1679653680-2-1-git-send-email-ruansy.fnst@fujitsu… Fixes: f80e1668888f3 ("fsdax: invalidate pages when CoW") Signed-off-by: Shiyang Ruan <ruansy.fnst(a)fujitsu.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Darrick J. Wong <djwong(a)kernel.org> Cc: Jan Kara <jack(a)suse.cz> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/dax.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) --- a/fs/dax.c~fsdax-force-clear-dirty-mark-if-cow +++ a/fs/dax.c @@ -781,6 +781,33 @@ out: return ret; } +static int __dax_clear_dirty_range(struct address_space *mapping, + pgoff_t start, pgoff_t end) +{ + XA_STATE(xas, &mapping->i_pages, start); + unsigned int scanned = 0; + void *entry; + + xas_lock_irq(&xas); + xas_for_each(&xas, entry, end) { + entry = get_unlocked_entry(&xas, 0); + xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY); + xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE); + put_unlocked_entry(&xas, entry, WAKE_NEXT); + + if (++scanned % XA_CHECK_SCHED) + continue; + + xas_pause(&xas); + xas_unlock_irq(&xas); + cond_resched(); + xas_lock_irq(&xas); + } + xas_unlock_irq(&xas); + + return 0; +} + /* * Delete DAX entry at @index from @mapping. Wait for it * to be unlocked before deleting it. @@ -1440,6 +1467,16 @@ static loff_t dax_iomap_iter(const struc * written by write(2) is visible in mmap. */ if (iomap->flags & IOMAP_F_NEW || cow) { + /* + * Filesystem allows CoW on non-shared extents. The src extents + * may have been mmapped with dirty mark before. To be able to + * invalidate its dax entries, we need to clear the dirty mark + * in advance. + */ + if (cow) + __dax_clear_dirty_range(iomi->inode->i_mapping, + pos >> PAGE_SHIFT, + (end - 1) >> PAGE_SHIFT); invalidate_inode_pages2_range(iomi->inode->i_mapping, pos >> PAGE_SHIFT, (end - 1) >> PAGE_SHIFT); _ Patches currently in -mm which might be from ruansy.fnst(a)fujitsu.com are

2 years, 7 months

1
0
0 0

[merged mm-hotfixes-stable] mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/hugetlb: fix uffd wr-protection for CoW optimization path has been removed from the -mm tree. Its filename was mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Peter Xu <peterx(a)redhat.com> Subject: mm/hugetlb: fix uffd wr-protection for CoW optimization path Date: Tue, 21 Mar 2023 15:18:40 -0400 This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page. Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap. This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR. This patch will be needed for v5.19+ hence copy stable. [peterx(a)redhat.com: v2] Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n [peterx(a)redhat.com: v3] Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu <peterx(a)redhat.com> Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wr-protection-for-cow-optimization-path +++ a/mm/hugetlb.c @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_s struct folio *pagecache_folio, spinlock_t *ptl) { const bool unshare = flags & FAULT_FLAG_UNSHARE; - pte_t pte; + pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct page *old_page; struct folio *new_folio; @@ -5488,6 +5488,17 @@ static vm_fault_t hugetlb_wp(struct mm_s struct mmu_notifier_range range; /* + * Never handle CoW for uffd-wp protected pages. It should be only + * handled when the uffd-wp protection is removed. + * + * Note that only the CoW optimization path (in hugetlb_no_page()) + * can trigger this, because hugetlb_fault() will always resolve + * uffd-wp bit first. + */ + if (!unshare && huge_pte_uffd_wp(pte)) + return 0; + + /* * hugetlb does not support FOLL_FORCE-style write faults that keep the * PTE mapped R/O such as maybe_mkwrite() would do. */ @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_s return 0; } - pte = huge_ptep_get(ptep); old_page = pte_page(pte); delayacct_wpcopy_start(); _ Patches currently in -mm which might be from peterx(a)redhat.com are mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation.patch mm-uffd-uffd_feature_wp_unpopulated.patch mm-uffd-uffd_feature_wp_unpopulated-fix.patch selftests-mm-smoke-test-uffd_feature_wp_unpopulated.patch

2 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror