The patch titled
Subject: mm: fix apply_to_existing_page_range()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-apply_to_existing_page_range.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm: fix apply_to_existing_page_range()
Date: Wed, 9 Apr 2025 12:40:43 +0300
In the case of apply_to_existing_page_range(), apply_to_pte_range() is
reached with 'create' set to false. When !create, the loop over the PTE
page table is broken.
apply_to_pte_range() will only move to the next PTE entry if 'create' is
true or if the current entry is not pte_none().
This means that the user of apply_to_existing_page_range() will not have
'fn' called for any entries after the first pte_none() in the PTE page
table.
Fix the loop logic in apply_to_pte_range().
There are no known runtime issues from this, but the fix is trivial enough
for stable@ even without a known buggy user.
Link: https://lkml.kernel.org/r/20250409094043.1629234-1-kirill.shutemov@linux.in…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: be1db4753ee6 ("mm/memory.c: add apply_to_existing_page_range() helper")
Cc: Daniel Axtens <dja(a)axtens.net>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory.c~mm-fix-apply_to_existing_page_range
+++ a/mm/memory.c
@@ -2943,11 +2943,11 @@ static int apply_to_pte_range(struct mm_
if (fn) {
do {
if (create || !pte_none(ptep_get(pte))) {
- err = fn(pte++, addr, data);
+ err = fn(pte, addr, data);
if (err)
break;
}
- } while (addr += PAGE_SIZE, addr != end);
+ } while (pte++, addr += PAGE_SIZE, addr != end);
}
arch_leave_lazy_mmu_mode();
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-page_alloc-fix-deadlock-on-cpu_hotplug_lock-in-__accept_page.patch
mm-fix-apply_to_existing_page_range.patch
The patch titled
Subject: alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
alloc_tag-handle-incomplete-bulk-allocations-in-vm_module_tags_populate.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "T.J. Mercier" <tjmercier(a)google.com>
Subject: alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate
Date: Wed, 9 Apr 2025 22:51:11 +0000
alloc_pages_bulk_node() may partially succeed and allocate fewer than the
requested nr_pages. There are several conditions under which this can
occur, but we have encountered the case where CONFIG_PAGE_OWNER is enabled
causing all bulk allocations to always fallback to single page allocations
due to commit 187ad460b841 ("mm/page_alloc: avoid page allocator recursion
with pagesets.lock held").
Currently vm_module_tags_populate() immediately fails when
alloc_pages_bulk_node() returns fewer than the requested number of pages.
When this happens memory allocation profiling gets disabled, for example
[ 14.297583] [9: modprobe: 465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled!
[ 14.299339] [9: modprobe: 465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
This patch causes vm_module_tags_populate() to retry bulk allocations for
the remaining memory instead of failing immediately which will avoid the
disablement of memory allocation profiling.
Link: https://lkml.kernel.org/r/20250409225111.3770347-1-tjmercier@google.com
Fixes: 0f9b685626da ("alloc_tag: populate memory for module tags as needed")
Signed-off-by: T.J. Mercier <tjmercier(a)google.com>
Reported-by: Janghyuck Kim <janghyuck.kim(a)samsung.com>
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/alloc_tag.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--- a/lib/alloc_tag.c~alloc_tag-handle-incomplete-bulk-allocations-in-vm_module_tags_populate
+++ a/lib/alloc_tag.c
@@ -422,11 +422,20 @@ static int vm_module_tags_populate(void)
unsigned long old_shadow_end = ALIGN(phys_end, MODULE_ALIGN);
unsigned long new_shadow_end = ALIGN(new_end, MODULE_ALIGN);
unsigned long more_pages;
- unsigned long nr;
+ unsigned long nr = 0;
more_pages = ALIGN(new_end - phys_end, PAGE_SIZE) >> PAGE_SHIFT;
- nr = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
- NUMA_NO_NODE, more_pages, next_page);
+ while (nr < more_pages) {
+ unsigned long allocated;
+
+ allocated = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
+ NUMA_NO_NODE, more_pages - nr, next_page + nr);
+
+ if (!allocated)
+ break;
+ nr += allocated;
+ }
+
if (nr < more_pages ||
vmap_pages_range(phys_end, phys_end + (nr << PAGE_SHIFT), PAGE_KERNEL,
next_page, PAGE_SHIFT) < 0) {
_
Patches currently in -mm which might be from tjmercier(a)google.com are
alloc_tag-handle-incomplete-bulk-allocations-in-vm_module_tags_populate.patch
The patch titled
Subject: alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
alloc_tag-handle-incomplete-bulk-allocations-in-vm_module_tags_populate.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "T.J. Mercier" <tjmercier(a)google.com>
Subject: alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate
Date: Wed, 9 Apr 2025 19:54:47 +0000
alloc_pages_bulk_node may partially succeed and allocate fewer than the
requested nr_pages. There are several conditions under which this can
occur, but we have encountered the case where CONFIG_PAGE_OWNER is enabled
causing all bulk allocations to always fallback to single page allocations
due to commit 187ad460b841 ("mm/page_alloc: avoid page allocator recursion
with pagesets.lock held").
Currently vm_module_tags_populate immediately fails when
alloc_pages_bulk_node returns fewer than the requested number of pages.
This patch causes vm_module_tags_populate to retry bulk allocations for
the remaining memory instead.
Link: https://lkml.kernel.org/r/20250409195448.3697351-1-tjmercier@google.com
Fixes: 187ad460b841 ("mm/page_alloc: avoid page allocator recursion with pagesets.lock held")
Signed-off-by: T.J. Mercier <tjmercier(a)google.com>
Reported-by: Janghyuck Kim <janghyuck.kim(a)samsung.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/alloc_tag.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--- a/lib/alloc_tag.c~alloc_tag-handle-incomplete-bulk-allocations-in-vm_module_tags_populate
+++ a/lib/alloc_tag.c
@@ -422,11 +422,20 @@ static int vm_module_tags_populate(void)
unsigned long old_shadow_end = ALIGN(phys_end, MODULE_ALIGN);
unsigned long new_shadow_end = ALIGN(new_end, MODULE_ALIGN);
unsigned long more_pages;
- unsigned long nr;
+ unsigned long nr = 0;
more_pages = ALIGN(new_end - phys_end, PAGE_SIZE) >> PAGE_SHIFT;
- nr = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
- NUMA_NO_NODE, more_pages, next_page);
+ while (nr < more_pages) {
+ unsigned long allocated;
+
+ allocated = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
+ NUMA_NO_NODE, more_pages - nr, next_page + nr);
+
+ if (!allocated)
+ break;
+ nr += allocated;
+ }
+
if (nr < more_pages ||
vmap_pages_range(phys_end, phys_end + (nr << PAGE_SHIFT), PAGE_KERNEL,
next_page, PAGE_SHIFT) < 0) {
_
Patches currently in -mm which might be from tjmercier(a)google.com are
alloc_tag-handle-incomplete-bulk-allocations-in-vm_module_tags_populate.patch
struct rdma_cm_id has member "struct work_struct net_work"
that is reused for enqueuing cma_netevent_work_handler()s
onto cma_wq.
Below crash[1] can occur if more than one call to
cma_netevent_callback() occurs in quick succession,
which further enqueues cma_netevent_work_handler()s for the
same rdma_cm_id, overwriting any previously queued work-item(s)
that was just scheduled to run i.e. there is no guarantee
the queued work item may run between two successive calls
to cma_netevent_callback() and the 2nd INIT_WORK would overwrite
the 1st work item (for the same rdma_cm_id), despite grabbing
id_table_lock during enqueue.
Also drgn analysis [2] indicates the work item was likely overwritten.
Fix this by moving the INIT_WORK() to __rdma_create_id(),
so that it doesn't race with any existing queue_work() or
its worker thread.
[1] Trimmed crash stack:
=============================================
BUG: kernel NULL pointer dereference, address: 0000000000000008
kworker/u256:6 ... 6.12.0-0...
Workqueue: cma_netevent_work_handler [rdma_cm] (rdma_cm)
RIP: 0010:process_one_work+0xba/0x31a
Call Trace:
worker_thread+0x266/0x3a0
kthread+0xcf/0x100
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1a/0x30
=============================================
[2] drgn crash analysis:
>>> trace = prog.crashed_thread().stack_trace()
>>> trace
(0) crash_setup_regs (./arch/x86/include/asm/kexec.h:111:15)
(1) __crash_kexec (kernel/crash_core.c:122:4)
(2) panic (kernel/panic.c:399:3)
(3) oops_end (arch/x86/kernel/dumpstack.c:382:3)
...
(8) process_one_work (kernel/workqueue.c:3168:2)
(9) process_scheduled_works (kernel/workqueue.c:3310:3)
(10) worker_thread (kernel/workqueue.c:3391:4)
(11) kthread (kernel/kthread.c:389:9)
Line workqueue.c:3168 for this kernel version is in process_one_work():
3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
>>> trace[8]["work"]
*(struct work_struct *)0xffff92577d0a21d8 = {
.data = (atomic_long_t){
.counter = (s64)536870912, <=== Note
},
.entry = (struct list_head){
.next = (struct list_head *)0xffff924d075924c0,
.prev = (struct list_head *)0xffff924d075924c0,
},
.func = (work_func_t)cma_netevent_work_handler+0x0 = 0xffffffffc2cec280,
}
Suspicion is that pwq is NULL:
>>> trace[8]["pwq"]
(struct pool_workqueue *)<absent>
In process_one_work(), pwq is assigned from:
struct pool_workqueue *pwq = get_work_pwq(work);
and get_work_pwq() is:
static struct pool_workqueue *get_work_pwq(struct work_struct *work)
{
unsigned long data = atomic_long_read(&work->data);
if (data & WORK_STRUCT_PWQ)
return work_struct_pwq(data);
else
return NULL;
}
WORK_STRUCT_PWQ is 0x4:
>>> print(repr(prog['WORK_STRUCT_PWQ']))
Object(prog, 'enum work_flags', value=4)
But work->data is 536870912 which is 0x20000000.
So, get_work_pwq() returns NULL and we crash in process_one_work():
3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
=============================================
Fixes: 925d046e7e52 ("RDMA/core: Add a netevent notifier to cma")
Cc: stable(a)vger.kernel.org
Co-developed-by: Håkon Bugge <haakon.bugge(a)oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge(a)oracle.com>
Signed-off-by: Sharath Srinivasan <sharath.srinivasan(a)oracle.com>
---
v1->v2 cc:stable@vger.kernel.org
---
drivers/infiniband/core/cma.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 91db10515d74..176d0b3e4488 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -72,6 +72,8 @@ static const char * const cma_events[] = {
static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid,
enum ib_gid_type gid_type);
+static void cma_netevent_work_handler(struct work_struct *_work);
+
const char *__attribute_const__ rdma_event_msg(enum rdma_cm_event_type event)
{
size_t index = event;
@@ -1033,6 +1035,7 @@ __rdma_create_id(struct net *net, rdma_cm_event_handler event_handler,
get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
id_priv->id.route.addr.dev_addr.net = get_net(net);
id_priv->seq_num &= 0x00ffffff;
+ INIT_WORK(&id_priv->id.net_work, cma_netevent_work_handler);
rdma_restrack_new(&id_priv->res, RDMA_RESTRACK_CM_ID);
if (parent)
@@ -5227,7 +5230,6 @@ static int cma_netevent_callback(struct notifier_block *self,
if (!memcmp(current_id->id.route.addr.dev_addr.dst_dev_addr,
neigh->ha, ETH_ALEN))
continue;
- INIT_WORK(¤t_id->id.net_work, cma_netevent_work_handler);
cma_id_get(current_id);
queue_work(cma_wq, ¤t_id->id.net_work);
}
--
2.39.5 (Apple Git-154)