August 2024 - Linux-stable-mirror

[PATCH 1/3] io_uring/net: ensure expanded bundle recv gets marked for cleanup

by Jens Axboe

If the iovec inside the kmsg isn't already allocated AND one gets expanded beyond the fixed size, then the request may not already have been marked for cleanup. Ensure that it is. Cc: stable(a)vger.kernel.org Fixes: 2f9c9515bdfd ("io_uring/net: support bundles for recv") Signed-off-by: Jens Axboe <axboe(a)kernel.dk> --- io_uring/net.c | 1 + 1 file changed, 1 insertion(+) diff --git a/io_uring/net.c b/io_uring/net.c index 594490a1389b..97a48408cec3 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1094,6 +1094,7 @@ static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) { kmsg->free_iov_nr = ret; kmsg->free_iov = arg.iovs; + req->flags |= REQ_F_NEED_CLEANUP; } } else { void __user *buf; -- 2.43.0

11 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] padata-fix-possible-divide-by-0-panic-in-padata_mt_helper.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: padata: fix possible divide-by-0 panic in padata_mt_helper() has been removed from the -mm tree. Its filename was padata-fix-possible-divide-by-0-panic-in-padata_mt_helper.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Waiman Long <longman(a)redhat.com> Subject: padata: Fix possible divide-by-0 panic in padata_mt_helper() Date: Tue, 6 Aug 2024 13:46:47 -0400 We are hit with a not easily reproducible divide-by-0 panic in padata.c at bootup time. [ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI [ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1 [ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021 [ 10.017908] Workqueue: events_unbound padata_mt_helper [ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0 : [ 10.017963] Call Trace: [ 10.017968] <TASK> [ 10.018004] ? padata_mt_helper+0x39/0xb0 [ 10.018084] process_one_work+0x174/0x330 [ 10.018093] worker_thread+0x266/0x3a0 [ 10.018111] kthread+0xcf/0x100 [ 10.018124] ret_from_fork+0x31/0x50 [ 10.018138] ret_from_fork_asm+0x1a/0x30 [ 10.018147] </TASK> Looking at the padata_mt_helper() function, the only way a divide-by-0 panic can happen is when ps->chunk_size is 0. The way that chunk_size is initialized in padata_do_multithreaded(), chunk_size can be 0 when the min_chunk in the passed-in padata_mt_job structure is 0. Fix this divide-by-0 panic by making sure that chunk_size will be at least 1 no matter what the input parameters are. Link: https://lkml.kernel.org/r/20240806174647.1050398-1-longman@redhat.com Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs") Signed-off-by: Waiman Long <longman(a)redhat.com> Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com> Cc: Steffen Klassert <steffen.klassert(a)secunet.com> Cc: Waiman Long <longman(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/padata.c | 7 +++++++ 1 file changed, 7 insertions(+) --- a/kernel/padata.c~padata-fix-possible-divide-by-0-panic-in-padata_mt_helper +++ a/kernel/padata.c @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(stru ps.chunk_size = max(ps.chunk_size, job->min_chunk); ps.chunk_size = roundup(ps.chunk_size, job->align); + /* + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` + */ + if (!ps.chunk_size) + ps.chunk_size = 1U; + list_for_each_entry(pw, &works, pw_list) if (job->numa_aware) { int old_node = atomic_read(&last_used_nid); _ Patches currently in -mm which might be from longman(a)redhat.com are mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu.patch mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu-v3.patch lib-stackdepot-double-depot_pools_cap-if-kasan-is-enabled.patch watchdog-handle-the-enodev-failure-case-of-lockup_detector_delay_init-separately.patch

11 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] memcg-protect-concurrent-access-to-mem_cgroup_idr.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: memcg: protect concurrent access to mem_cgroup_idr has been removed from the -mm tree. Its filename was memcg-protect-concurrent-access-to-mem_cgroup_idr.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Shakeel Butt <shakeel.butt(a)linux.dev> Subject: memcg: protect concurrent access to mem_cgroup_idr Date: Fri, 2 Aug 2024 16:58:22 -0700 Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") decoupled the memcg IDs from the CSS ID space to fix the cgroup creation failures. It introduced IDR to maintain the memcg ID space. The IDR depends on external synchronization mechanisms for modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace() happen within css callback and thus are protected through cgroup_mutex from concurrent modifications. However idr_remove() for mem_cgroup_idr was not protected against concurrency and can be run concurrently for different memcgs when they hit their refcnt to zero. Fix that. We have been seeing list_lru based kernel crashes at a low frequency in our fleet for a long time. These crashes were in different part of list_lru code including list_lru_add(), list_lru_del() and reparenting code. Upon further inspection, it looked like for a given object (dentry and inode), the super_block's list_lru didn't have list_lru_one for the memcg of that object. The initial suspicions were either the object is not allocated through kmem_cache_alloc_lru() or somehow memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but returned success. No evidence were found for these cases. Looking more deeply, we started seeing situations where valid memcg's id is not present in mem_cgroup_idr and in some cases multiple valid memcgs have same id and mem_cgroup_idr is pointing to one of them. So, the most reasonable explanation is that these situations can happen due to race between multiple idr_remove() calls or race between idr_alloc()/idr_replace() and idr_remove(). These races are causing multiple memcgs to acquire the same ID and then offlining of one of them would cleanup list_lrus on the system for all of them. Later access from other memcgs to the list_lru cause crashes due to missing list_lru_one. Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev> Acked-by: Muchun Song <muchun.song(a)linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev> Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memcontrol.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) --- a/mm/memcontrol.c~memcg-protect-concurrent-access-to-mem_cgroup_idr +++ a/mm/memcontrol.c @@ -3386,11 +3386,28 @@ static void memcg_wb_domain_size_changed #define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1) static DEFINE_IDR(mem_cgroup_idr); +static DEFINE_SPINLOCK(memcg_idr_lock); + +static int mem_cgroup_alloc_id(void) +{ + int ret; + + idr_preload(GFP_KERNEL); + spin_lock(&memcg_idr_lock); + ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1, + GFP_NOWAIT); + spin_unlock(&memcg_idr_lock); + idr_preload_end(); + return ret; +} static void mem_cgroup_id_remove(struct mem_cgroup *memcg) { if (memcg->id.id > 0) { + spin_lock(&memcg_idr_lock); idr_remove(&mem_cgroup_idr, memcg->id.id); + spin_unlock(&memcg_idr_lock); + memcg->id.id = 0; } } @@ -3524,8 +3541,7 @@ static struct mem_cgroup *mem_cgroup_all if (!memcg) return ERR_PTR(error); - memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL, - 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL); + memcg->id.id = mem_cgroup_alloc_id(); if (memcg->id.id < 0) { error = memcg->id.id; goto fail; @@ -3667,7 +3683,9 @@ static int mem_cgroup_css_online(struct * publish it here at the end of onlining. This matches the * regular ID destruction during offlining. */ + spin_lock(&memcg_idr_lock); idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); + spin_unlock(&memcg_idr_lock); return 0; offline_kmem: _ Patches currently in -mm which might be from shakeel.butt(a)linux.dev are memcg-increase-the-valid-index-range-for-memcg-stats.patch

11 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] mm-list_lru-fix-uaf-for-memory-cgroup.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: list_lru: fix UAF for memory cgroup has been removed from the -mm tree. Its filename was mm-list_lru-fix-uaf-for-memory-cgroup.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm: list_lru: fix UAF for memory cgroup Date: Thu, 18 Jul 2024 16:36:07 +0800 The mem_cgroup_from_slab_obj() is supposed to be called under rcu lock or cgroup_mutex or others which could prevent returned memcg from being freed. Fix it by adding missing rcu read lock. Found by code inspection. [songmuchun(a)bytedance.com: only grab rcu lock when necessary, per Vlastimil] Link: https://lkml.kernel.org/r/20240801024603.1865-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20240718083607.42068-1-songmuchun@bytedance.com Fixes: 0a97c01cd20b ("list_lru: allow explicit memcg and NUMA node selection") Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/list_lru.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) --- a/mm/list_lru.c~mm-list_lru-fix-uaf-for-memory-cgroup +++ a/mm/list_lru.c @@ -85,6 +85,7 @@ list_lru_from_memcg_idx(struct list_lru } #endif /* CONFIG_MEMCG */ +/* The caller must ensure the memcg lifetime. */ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, struct mem_cgroup *memcg) { @@ -109,14 +110,22 @@ EXPORT_SYMBOL_GPL(list_lru_add); bool list_lru_add_obj(struct list_lru *lru, struct list_head *item) { + bool ret; int nid = page_to_nid(virt_to_page(item)); - struct mem_cgroup *memcg = list_lru_memcg_aware(lru) ? - mem_cgroup_from_slab_obj(item) : NULL; - return list_lru_add(lru, item, nid, memcg); + if (list_lru_memcg_aware(lru)) { + rcu_read_lock(); + ret = list_lru_add(lru, item, nid, mem_cgroup_from_slab_obj(item)); + rcu_read_unlock(); + } else { + ret = list_lru_add(lru, item, nid, NULL); + } + + return ret; } EXPORT_SYMBOL_GPL(list_lru_add_obj); +/* The caller must ensure the memcg lifetime. */ bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid, struct mem_cgroup *memcg) { @@ -139,11 +148,18 @@ EXPORT_SYMBOL_GPL(list_lru_del); bool list_lru_del_obj(struct list_lru *lru, struct list_head *item) { + bool ret; int nid = page_to_nid(virt_to_page(item)); - struct mem_cgroup *memcg = list_lru_memcg_aware(lru) ? - mem_cgroup_from_slab_obj(item) : NULL; - return list_lru_del(lru, item, nid, memcg); + if (list_lru_memcg_aware(lru)) { + rcu_read_lock(); + ret = list_lru_del(lru, item, nid, mem_cgroup_from_slab_obj(item)); + rcu_read_unlock(); + } else { + ret = list_lru_del(lru, item, nid, NULL); + } + + return ret; } EXPORT_SYMBOL_GPL(list_lru_del_obj); _ Patches currently in -mm which might be from songmuchun(a)bytedance.com are mm-kmem-remove-mem_cgroup_from_obj.patch

11 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] kcov-properly-check-for-softirq-context.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: kcov: properly check for softirq context has been removed from the -mm tree. Its filename was kcov-properly-check-for-softirq-context.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Andrey Konovalov <andreyknvl(a)gmail.com> Subject: kcov: properly check for softirq context Date: Mon, 29 Jul 2024 04:21:58 +0200 When collecting coverage from softirqs, KCOV uses in_serving_softirq() to check whether the code is running in the softirq context. Unfortunately, in_serving_softirq() is > 0 even when the code is running in the hardirq or NMI context for hardirqs and NMIs that happened during a softirq. As a result, if a softirq handler contains a remote coverage collection section and a hardirq with another remote coverage collection section happens during handling the softirq, KCOV incorrectly detects a nested softirq coverate collection section and prints a WARNING, as reported by syzbot. This issue was exposed by commit a7f3813e589f ("usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler"), which switched dummy_hcd to using hrtimer and made the timer's callback be executed in the hardirq context. Change the related checks in KCOV to account for this behavior of in_serving_softirq() and make KCOV ignore remote coverage collection sections in the hardirq and NMI contexts. This prevents the WARNING printed by syzbot but does not fix the inability of KCOV to collect coverage from the __usb_hcd_giveback_urb when dummy_hcd is in use (caused by a7f3813e589f); a separate patch is required for that. Link: https://lkml.kernel.org/r/20240729022158.92059-1-andrey.konovalov@linux.dev Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts") Signed-off-by: Andrey Konovalov <andreyknvl(a)gmail.com> Reported-by: syzbot+2388cdaeb6b10f0c13ac(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac Acked-by: Marco Elver <elver(a)google.com> Cc: Alan Stern <stern(a)rowland.harvard.edu> Cc: Aleksandr Nogikh <nogikh(a)google.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Marcello Sylvester Bauer <sylv(a)sylv.io> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/kcov.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) --- a/kernel/kcov.c~kcov-properly-check-for-softirq-context +++ a/kernel/kcov.c @@ -161,6 +161,15 @@ static void kcov_remote_area_put(struct kmsan_unpoison_memory(&area->list, sizeof(area->list)); } +/* + * Unlike in_serving_softirq(), this function returns false when called during + * a hardirq or an NMI that happened in the softirq context. + */ +static inline bool in_softirq_really(void) +{ + return in_serving_softirq() && !in_hardirq() && !in_nmi(); +} + static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t) { unsigned int mode; @@ -170,7 +179,7 @@ static notrace bool check_kcov_mode(enum * so we ignore code executed in interrupts, unless we are in a remote * coverage collection section in a softirq. */ - if (!in_task() && !(in_serving_softirq() && t->kcov_softirq)) + if (!in_task() && !(in_softirq_really() && t->kcov_softirq)) return false; mode = READ_ONCE(t->kcov_mode); /* @@ -849,7 +858,7 @@ void kcov_remote_start(u64 handle) if (WARN_ON(!kcov_check_handle(handle, true, true, true))) return; - if (!in_task() && !in_serving_softirq()) + if (!in_task() && !in_softirq_really()) return; local_lock_irqsave(&kcov_percpu_data.lock, flags); @@ -991,7 +1000,7 @@ void kcov_remote_stop(void) int sequence; unsigned long flags; - if (!in_task() && !in_serving_softirq()) + if (!in_task() && !in_softirq_really()) return; local_lock_irqsave(&kcov_percpu_data.lock, flags); _ Patches currently in -mm which might be from andreyknvl(a)gmail.com are kcov-dont-instrument-lib-find_bitc.patch

11 months, 2 weeks

1
0
0 0

[merged mm-hotfixes-stable] selftests-mm-add-s390-to-arch-check.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: selftests: mm: add s390 to ARCH check has been removed from the -mm tree. Its filename was selftests-mm-add-s390-to-arch-check.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Nico Pache <npache(a)redhat.com> Subject: selftests: mm: add s390 to ARCH check Date: Wed, 24 Jul 2024 15:35:17 -0600 commit 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM") changed the env variable for the architecture from MACHINE to ARCH. This is preventing 3 required TEST_GEN_FILES from being included when cross compiling s390x and errors when trying to run the test suite. This is due to the ARCH variable already being set and the arch folder name being s390. Add "s390" to the filtered list to cover this case and have the 3 files included in the build. Link: https://lkml.kernel.org/r/20240724213517.23918-1-npache@redhat.com Fixes: 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM") Signed-off-by: Nico Pache <npache(a)redhat.com> Cc: Mark Brown <broonie(a)kernel.org> Cc: Albert Ou <aou(a)eecs.berkeley.edu> Cc: Palmer Dabbelt <palmer(a)dabbelt.com> Cc: Paul Walmsley <paul.walmsley(a)sifive.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- tools/testing/selftests/mm/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/tools/testing/selftests/mm/Makefile~selftests-mm-add-s390-to-arch-check +++ a/tools/testing/selftests/mm/Makefile @@ -110,7 +110,7 @@ endif endif -ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64)) +ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64 s390)) TEST_GEN_FILES += va_high_addr_switch TEST_GEN_FILES += virtual_address_range TEST_GEN_FILES += write_to_hugetlbfs _ Patches currently in -mm which might be from npache(a)redhat.com are

11 months, 2 weeks

1
0
0 0

[folded-merged] mm-list_lru-fix-uaf-for-memory-cgroup-v2.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm-list_lru-fix-uaf-for-memory-cgroup-v2 has been removed from the -mm tree. Its filename was mm-list_lru-fix-uaf-for-memory-cgroup-v2.patch This patch was dropped because it was folded into mm-list_lru-fix-uaf-for-memory-cgroup.patch ------------------------------------------------------ From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm-list_lru-fix-uaf-for-memory-cgroup-v2 Date: Thu, 1 Aug 2024 10:46:03 +0800 only grab rcu lock when necessary, per Vlastimil Link: https://lkml.kernel.org/r/20240801024603.1865-1-songmuchun@bytedance.com Fixes: 0a97c01cd20b ("list_lru: allow explicit memcg and NUMA node selection") Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: <stable(a)vger.kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/list_lru.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) --- a/mm/list_lru.c~mm-list_lru-fix-uaf-for-memory-cgroup-v2 +++ a/mm/list_lru.c @@ -112,12 +112,14 @@ bool list_lru_add_obj(struct list_lru *l { bool ret; int nid = page_to_nid(virt_to_page(item)); - struct mem_cgroup *memcg; - rcu_read_lock(); - memcg = list_lru_memcg_aware(lru) ? mem_cgroup_from_slab_obj(item) : NULL; - ret = list_lru_add(lru, item, nid, memcg); - rcu_read_unlock(); + if (list_lru_memcg_aware(lru)) { + rcu_read_lock(); + ret = list_lru_add(lru, item, nid, mem_cgroup_from_slab_obj(item)); + rcu_read_unlock(); + } else { + ret = list_lru_add(lru, item, nid, NULL); + } return ret; } @@ -148,12 +150,14 @@ bool list_lru_del_obj(struct list_lru *l { bool ret; int nid = page_to_nid(virt_to_page(item)); - struct mem_cgroup *memcg; - rcu_read_lock(); - memcg = list_lru_memcg_aware(lru) ? mem_cgroup_from_slab_obj(item) : NULL; - ret = list_lru_del(lru, item, nid, memcg); - rcu_read_unlock(); + if (list_lru_memcg_aware(lru)) { + rcu_read_lock(); + ret = list_lru_del(lru, item, nid, mem_cgroup_from_slab_obj(item)); + rcu_read_unlock(); + } else { + ret = list_lru_del(lru, item, nid, NULL); + } return ret; } _ Patches currently in -mm which might be from songmuchun(a)bytedance.com are mm-list_lru-fix-uaf-for-memory-cgroup.patch mm-kmem-remove-mem_cgroup_from_obj.patch

11 months, 2 weeks

1
0
0 0

[PATCH] drm/i915: Fix NULL ptr deref in intel_async_flip_check_uapi()

by Ma Ke

intel_atomic_get_new_crtc_state can return NULL, unless crtc state wasn't obtained previously with intel_atomic_get_crtc_state. We should check it for NULLness here, just as in many other places, where we can't guarantee that intel_atomic_get_crtc_state was called. Cc: stable(a)vger.kernel.org Fixes: b0b2bed2a130 ("drm/i915: Check async flip capability early on") Signed-off-by: Ma Ke <make24(a)iscas.ac.cn> --- drivers/gpu/drm/i915/display/intel_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index c2c388212e2e..9dd7b5985d57 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -6115,7 +6115,7 @@ static int intel_async_flip_check_uapi(struct intel_atomic_state *state, return -EINVAL; } - if (intel_crtc_needs_modeset(new_crtc_state)) { + if (new_crtc_state && intel_crtc_needs_modeset(new_crtc_state)) { drm_dbg_kms(&i915->drm, "[CRTC:%d:%s] modeset required\n", crtc->base.base.id, crtc->base.name); -- 2.25.1

11 months, 2 weeks

2
1
0 0

+ mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/hugetlb: fix hugetlb vs. core-mm PT locking has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: mm/hugetlb: fix hugetlb vs. core-mm PT locking Date: Thu, 1 Aug 2024 22:47:48 +0200 We recently made GUP's common page table walking code to also walk hugetlb VMAs without most hugetlb special-casing, preparing for the future of having less hugetlb-specific page table walking code in the codebase. Turns out that we missed one page table locking detail: page table locking for hugetlb folios that are not mapped using a single PMD/PUD. Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the page tables, will perform a pte_offset_map_lock() to grab the PTE table lock. However, hugetlb that concurrently modifies these page tables would actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the locks would differ. Something similar can happen right now with hugetlb folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS. This issue can be reproduced [1], for example triggering: [ 3105.936100] ------------[ cut here ]------------ [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188 [ 3105.944634] Modules linked in: [...] [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1 [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024 [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3105.991108] pc : try_grab_folio+0x11c/0x188 [ 3105.994013] lr : follow_page_pte+0xd8/0x430 [ 3105.996986] sp : ffff80008eafb8f0 [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43 [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48 [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978 [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001 [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000 [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000 [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0 [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080 [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000 [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000 [ 3106.047957] Call trace: [ 3106.049522] try_grab_folio+0x11c/0x188 [ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0 [ 3106.055527] follow_page_mask+0x1a0/0x2b8 [ 3106.058118] __get_user_pages+0xf0/0x348 [ 3106.060647] faultin_page_range+0xb0/0x360 [ 3106.063651] do_madvise+0x340/0x598 Let's make huge_pte_lockptr() effectively use the same PT locks as any core-mm page table walker would. Add ptep_lockptr() to obtain the PTE page table lock using a pte pointer -- unfortunately we cannot convert pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables we can have with CONFIG_HIGHPTE. Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such that when e.g., CONFIG_PGTABLE_LEVELS==2 with PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document why that works. There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb folio being mapped using two PTE page tables. While hugetlb wants to take the PMD table lock, core-mm would grab the PTE table lock of one of both PTE page tables. In such corner cases, we have to make sure that both locks match, which is (fortunately!) currently guaranteed for 8xx as it does not support SMP and consequently doesn't use split PT locks. [1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/ Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code") Signed-off-by: David Hildenbrand <david(a)redhat.com> Acked-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/hugetlb.h | 33 ++++++++++++++++++++++++++++++--- include/linux/mm.h | 11 +++++++++++ 2 files changed, 41 insertions(+), 3 deletions(-) --- a/include/linux/hugetlb.h~mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking +++ a/include/linux/hugetlb.h @@ -944,10 +944,37 @@ static inline bool htlb_allow_alloc_fall static inline spinlock_t *huge_pte_lockptr(struct hstate *h, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + const unsigned long size = huge_page_size(h); + + VM_WARN_ON(size == PAGE_SIZE); + + /* + * hugetlb must use the exact same PT locks as core-mm page table + * walkers would. When modifying a PTE table, hugetlb must take the + * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD + * PT lock etc. + * + * The expectation is that any hugetlb folio smaller than a PMD is + * always mapped into a single PTE table and that any hugetlb folio + * smaller than a PUD (but at least as big as a PMD) is always mapped + * into a single PMD table. + * + * If that does not hold for an architecture, then that architecture + * must disable split PT locks such that all *_lockptr() functions + * will give us the same result: the per-MM PT lock. + * + * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where + * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use pud_lockptr() + * and core-mm would use pmd_lockptr(). However, in such configurations + * split PMD locks are disabled -- they don't make sense on a single + * PGDIR page table -- and the end result is the same. + */ + if (size >= PUD_SIZE) + return pud_lockptr(mm, (pud_t *) pte); + else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE)) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); - return &mm->page_table_lock; + /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */ + return ptep_lockptr(mm, pte); } #ifndef hugepages_supported --- a/include/linux/mm.h~mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking +++ a/include/linux/mm.h @@ -2920,6 +2920,13 @@ static inline spinlock_t *pte_lockptr(st return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); } +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) +{ + BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE)); + BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE); + return ptlock_ptr(virt_to_ptdesc(pte)); +} + static inline bool ptlock_init(struct ptdesc *ptdesc) { /* @@ -2944,6 +2951,10 @@ static inline spinlock_t *pte_lockptr(st { return &mm->page_table_lock; } +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) +{ + return &mm->page_table_lock; +} static inline void ptlock_cache_init(void) {} static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } static inline void ptlock_free(struct ptdesc *ptdesc) {} _ Patches currently in -mm which might be from david(a)redhat.com are mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking.patch mm-turn-use_split_pte_ptlocks-use_split_pte_ptlocks-into-kconfig-options.patch mm-hugetlb-enforce-that-pmd-pt-sharing-has-split-pmd-pt-locks.patch powerpc-8xx-document-and-enforce-that-split-pt-locks-are-not-used.patch mm-simplify-arch_make_folio_accessible.patch mm-gup-convert-to-arch_make_folio_accessible.patch s390-uv-drop-arch_make_page_accessible.patch mm-hugetlb-remove-hugetlb_follow_page_mask-leftover.patch mm-rmap-cleanup-partially-mapped-handling-in-__folio_remove_rmap.patch mm-clarify-folio_likely_mapped_shared-documentation-for-ksm-folios.patch mm-provide-vm_normal_pagefolio_pmd-with-config_pgtable_has_huge_leaves.patch mm-pagewalk-introduce-folio_walk_start-folio_walk_end.patch mm-migrate-convert-do_pages_stat_array-from-follow_page-to-folio_walk.patch mm-migrate-convert-add_page_for_migration-from-follow_page-to-folio_walk.patch mm-ksm-convert-get_mergeable_page-from-follow_page-to-folio_walk.patch mm-ksm-convert-scan_get_next_rmap_item-from-follow_page-to-folio_walk.patch mm-huge_memory-convert-split_huge_pages_pid-from-follow_page-to-folio_walk.patch mm-huge_memory-convert-split_huge_pages_pid-from-follow_page-to-folio_walk-fix.patch s390-uv-convert-gmap_destroy_page-from-follow_page-to-folio_walk.patch s390-mm-fault-convert-do_secure_storage_access-from-follow_page-to-folio_walk.patch mm-remove-follow_page.patch mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch mm-rmap-minimize-folio-_nr_pages_mapped-updates-when-batching-pte-unmapping.patch

11 months, 2 weeks

1
0
0 0

+ mm-numa-no-task_numa_fault-call-if-page-table-is-changed.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/numa: no task_numa_fault() call if page table is changed has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-numa-no-task_numa_fault-call-if-page-table-is-changed.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Zi Yan <ziy(a)nvidia.com> Subject: mm/numa: no task_numa_fault() call if page table is changed Date: Wed, 7 Aug 2024 14:47:29 -0400 When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") restructured do_numa_page() and do_huge_pmd_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pte_same()/!pmd_same() return immediately. This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting). Link: https://lkml.kernel.org/r/20240807184730.1266736-1-ziy@nvidia.com Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") Signed-off-by: Zi Yan <ziy(a)nvidia.com> Reported-by: "Huang, Ying" <ying.huang(a)intel.com> Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte… Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 5 +++-- mm/memory.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) --- a/mm/huge_memory.c~mm-numa-no-task_numa_fault-call-if-page-table-is-changed +++ a/mm/huge_memory.c @@ -1738,10 +1738,11 @@ vm_fault_t do_huge_pmd_numa_page(struct goto out_map; } -out: +count_fault: if (nid != NUMA_NO_NODE) task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags); +out: return 0; out_map: @@ -1753,7 +1754,7 @@ out_map: set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); - goto out; + goto count_fault; } /* --- a/mm/memory.c~mm-numa-no-task_numa_fault-call-if-page-table-is-changed +++ a/mm/memory.c @@ -5371,9 +5371,10 @@ static vm_fault_t do_numa_page(struct vm goto out_map; } -out: +count_fault: if (nid != NUMA_NO_NODE) task_numa_fault(last_cpupid, nid, nr_pages, flags); +out: return 0; out_map: /* @@ -5387,7 +5388,7 @@ out_map: numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte, writable); pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + goto count_fault; } static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) _ Patches currently in -mm which might be from ziy(a)nvidia.com are mm-numa-no-task_numa_fault-call-if-page-table-is-changed.patch memory-tiering-read-last_cpupid-correctly-in-do_huge_pmd_numa_page.patch memory-tiering-introduce-folio_use_access_time-check.patch memory-tiering-count-pgpromote_success-when-mem-tiering-is-enabled.patch mm-migrate-move-common-code-to-numa_migrate_check-was-numa_migrate_prep.patch

11 months, 2 weeks

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2024