The patch titled
Subject: mm/damon/sysfs: change next_update_jiffies to a global variable
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-sysfs-change-next_update_jiffies-to-a-global-variable.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Quanmin Yan <yanquanmin1(a)huawei.com>
Subject: mm/damon/sysfs: change next_update_jiffies to a global variable
Date: Thu, 30 Oct 2025 10:07:46 +0800
In DAMON's damon_sysfs_repeat_call_fn(), time_before() is used to compare
the current jiffies with next_update_jiffies to determine whether to
update the sysfs files at this moment.
On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make
jiffies wrap bugs appear earlier. However, this causes time_before() in
damon_sysfs_repeat_call_fn() to unexpectedly return true during the first
5 minutes after boot on 32-bit systems (see [1] for more explanation,
which fixes another jiffies-related issue before). As a result, DAMON
does not update sysfs files during that period.
There is also an issue unrelated to the system's word size[2]: if the
user stops DAMON just after next_update_jiffies is updated and restarts
it after 'refresh_ms' or a longer delay, next_update_jiffies will retain
an older value, causing time_before() to return false and the update to
happen earlier than expected.
Fix these issues by making next_update_jiffies a global variable and
initializing it each time DAMON is started.
Link: https://lkml.kernel.org/r/20251030020746.967174-3-yanquanmin1@huawei.com
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com [1]
Link: https://lore.kernel.org/all/20251029013038.66625-1-sj@kernel.org/ [2]
Fixes: d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file internal work")
Suggested-by: SeongJae Park <sj(a)kernel.org>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Signed-off-by: Quanmin Yan <yanquanmin1(a)huawei.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: ze zuo <zuoze1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-change-next_update_jiffies-to-a-global-variable
+++ a/mm/damon/sysfs.c
@@ -1552,16 +1552,17 @@ static struct damon_ctx *damon_sysfs_bui
return ctx;
}
+static unsigned long damon_sysfs_next_update_jiffies;
+
static int damon_sysfs_repeat_call_fn(void *data)
{
struct damon_sysfs_kdamond *sysfs_kdamond = data;
- static unsigned long next_update_jiffies;
if (!sysfs_kdamond->refresh_ms)
return 0;
- if (time_before(jiffies, next_update_jiffies))
+ if (time_before(jiffies, damon_sysfs_next_update_jiffies))
return 0;
- next_update_jiffies = jiffies +
+ damon_sysfs_next_update_jiffies = jiffies +
msecs_to_jiffies(sysfs_kdamond->refresh_ms);
if (!mutex_trylock(&damon_sysfs_lock))
@@ -1607,6 +1608,9 @@ static int damon_sysfs_turn_damon_on(str
}
kdamond->damon_ctx = ctx;
+ damon_sysfs_next_update_jiffies =
+ jiffies + msecs_to_jiffies(kdamond->refresh_ms);
+
repeat_call_control->fn = damon_sysfs_repeat_call_fn;
repeat_call_control->data = kdamond;
repeat_call_control->repeat = true;
_
Patches currently in -mm which might be from yanquanmin1(a)huawei.com are
mm-damon-stat-change-last_refresh_jiffies-to-a-global-variable.patch
mm-damon-sysfs-change-next_update_jiffies-to-a-global-variable.patch
mm-damon-add-a-min_sz_region-parameter-to-damon_set_region_biggest_system_ram_default.patch
mm-damon-reclaim-use-min_sz_region-for-core-address-alignment-when-setting-regions.patch
The patch titled
Subject: mm/damon/stat: change last_refresh_jiffies to a global variable
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-stat-change-last_refresh_jiffies-to-a-global-variable.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Quanmin Yan <yanquanmin1(a)huawei.com>
Subject: mm/damon/stat: change last_refresh_jiffies to a global variable
Date: Thu, 30 Oct 2025 10:07:45 +0800
Patch series "mm/damon: fixes for the jiffies-related issues", v2.
On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make
jiffies wrap bugs appear earlier. However, this may cause the
time_before() series of functions to return unexpected values, resulting
in DAMON not functioning as intended. Meanwhile, similar issues exist in
some specific user operation scenarios.
This patchset addresses these issues. The first patch is about the
DAMON_STAT module, and the second patch is about the core layer's sysfs.
This patch (of 2):
In DAMON_STAT's damon_stat_damon_call_fn(), time_before_eq() is used to
avoid unnecessarily frequent stat update.
On 32-bit systems, the kernel initializes jiffies to "-5 minutes" to make
jiffies wrap bugs appear earlier. However, this causes time_before_eq()
in DAMON_STAT to unexpectedly return true during the first 5 minutes after
boot on 32-bit systems (see [1] for more explanation, which fixes another
jiffies-related issue before). As a result, DAMON_STAT does not update
any monitoring results during that period, which becomes more confusing
when DAMON_STAT_ENABLED_DEFAULT is enabled.
There is also an issue unrelated to the system's word size[2]: if the user
stops DAMON_STAT just after last_refresh_jiffies is updated and restarts
it after 5 seconds or a longer delay, last_refresh_jiffies will retain an
older value, causing time_before_eq() to return false and the update to
happen earlier than expected.
Fix these issues by making last_refresh_jiffies a global variable and
initializing it each time DAMON_STAT is started.
Link: https://lkml.kernel.org/r/20251030020746.967174-2-yanquanmin1@huawei.com
Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com [1]
Link: https://lore.kernel.org/all/20251028143250.50144-1-sj@kernel.org/ [2]
Fixes: fabdd1e911da ("mm/damon/stat: calculate and expose estimated memory bandwidth")
Signed-off-by: Quanmin Yan <yanquanmin1(a)huawei.com>
Suggested-by: SeongJae Park <sj(a)kernel.org>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: ze zuo <zuoze1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/stat.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/mm/damon/stat.c~mm-damon-stat-change-last_refresh_jiffies-to-a-global-variable
+++ a/mm/damon/stat.c
@@ -46,6 +46,8 @@ MODULE_PARM_DESC(aggr_interval_us,
static struct damon_ctx *damon_stat_context;
+static unsigned long damon_stat_last_refresh_jiffies;
+
static void damon_stat_set_estimated_memory_bandwidth(struct damon_ctx *c)
{
struct damon_target *t;
@@ -130,13 +132,12 @@ static void damon_stat_set_idletime_perc
static int damon_stat_damon_call_fn(void *data)
{
struct damon_ctx *c = data;
- static unsigned long last_refresh_jiffies;
/* avoid unnecessarily frequent stat update */
- if (time_before_eq(jiffies, last_refresh_jiffies +
+ if (time_before_eq(jiffies, damon_stat_last_refresh_jiffies +
msecs_to_jiffies(5 * MSEC_PER_SEC)))
return 0;
- last_refresh_jiffies = jiffies;
+ damon_stat_last_refresh_jiffies = jiffies;
aggr_interval_us = c->attrs.aggr_interval;
damon_stat_set_estimated_memory_bandwidth(c);
@@ -210,6 +211,8 @@ static int damon_stat_start(void)
err = damon_start(&damon_stat_context, 1, true);
if (err)
return err;
+
+ damon_stat_last_refresh_jiffies = jiffies;
call_control.data = damon_stat_context;
return damon_call(damon_stat_context, &call_control);
}
_
Patches currently in -mm which might be from yanquanmin1(a)huawei.com are
mm-damon-stat-change-last_refresh_jiffies-to-a-global-variable.patch
mm-damon-sysfs-change-next_update_jiffies-to-a-global-variable.patch
mm-damon-add-a-min_sz_region-parameter-to-damon_set_region_biggest_system_ram_default.patch
mm-damon-reclaim-use-min_sz_region-for-core-address-alignment-when-setting-regions.patch
The quilt patch titled
Subject: s390: fix HugeTLB vmemmap optimization crash
has been removed from the -mm tree. Its filename was
s390-fix-hugetlb-vmemmap-optimization-crash.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Luiz Capitulino <luizcap(a)redhat.com>
Subject: s390: fix HugeTLB vmemmap optimization crash
Date: Tue, 28 Oct 2025 17:15:33 -0400
A reproducible crash occurs when enabling HugeTLB vmemmap optimization
(HVO) on s390. The crash and the proposed fix were worked on an s390 KVM
guest running on an older hypervisor, as I don't have access to an LPAR.
However, the same issue should occur on bare-metal.
Reproducer (it may take a few runs to trigger):
# sysctl vm.hugetlb_optimize_vmemmap=1
# echo 1 > /proc/sys/vm/nr_hugepages
# echo 0 > /proc/sys/vm/nr_hugepages
Crash log:
[ 52.340369] list_del corruption. prev->next should be 000000d382110008, but was 000000d7116d3880. (prev=000000d7116d3910)
[ 52.340420] ------------[ cut here ]------------
[ 52.340424] kernel BUG at lib/list_debug.c:62!
[ 52.340566] monitor event: 0040 ilc:2 [#1]SMP
[ 52.340573] Modules linked in: ctcm fsm qeth ccwgroup zfcp scsi_transport_fc qdio dasd_fba_mod dasd_eckd_mod dasd_mod xfs ghash_s390 prng des_s390 libdes sha3_512_s390 sha3_256_s390 virtio_net virtio_blk net_failover sha_common failover dm_mirror dm_region_hash dm_log dm_mod paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt pkey_pckmo pkey aes_s390
[ 52.340606] CPU: 1 UID: 0 PID: 1672 Comm: root-rep2 Kdump: loaded Not tainted 6.18.0-rc3 #1 NONE
[ 52.340610] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 52.340611] Krnl PSW : 0704c00180000000 0000015710cda7fe (__list_del_entry_valid_or_report+0xfe/0x128)
[ 52.340619] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 52.340622] Krnl GPRS: c0000000ffffefff 0000000100000027 000000000000006d 0000000000000000
[ 52.340623] 000000d7116d35d8 000000d7116d35d0 0000000000000002 000000d7116d39b0
[ 52.340625] 000000d7116d3880 000000d7116d3910 000000d7116d3910 000000d382110008
[ 52.340626] 000003ffac1ccd08 000000d7116d39b0 0000015710cda7fa 000000d7116d37d0
[ 52.340632] Krnl Code: 0000015710cda7ee: c020003e496f larl %r2,00000157114a3acc
0000015710cda7f4: c0e5ffd5280e brasl %r14,000001571077f810
#0000015710cda7fa: af000000 mc 0,0
>0000015710cda7fe: b9040029 lgr %r2,%r9
0000015710cda802: c0e5ffe5e193 brasl %r14,0000015710996b28
0000015710cda808: e34090080004 lg %r4,8(%r9)
0000015710cda80e: b9040059 lgr %r5,%r9
0000015710cda812: b9040038 lgr %r3,%r8
[ 52.340643] Call Trace:
[ 52.340645] [<0000015710cda7fe>] __list_del_entry_valid_or_report+0xfe/0x128
[ 52.340649] ([<0000015710cda7fa>] __list_del_entry_valid_or_report+0xfa/0x128)
[ 52.340652] [<0000015710a30b2e>] hugetlb_vmemmap_restore_folios+0x96/0x138
[ 52.340655] [<0000015710a268ac>] update_and_free_pages_bulk+0x64/0x150
[ 52.340659] [<0000015710a26f8a>] set_max_huge_pages+0x4ca/0x6f0
[ 52.340662] [<0000015710a273ba>] hugetlb_sysctl_handler_common+0xea/0x120
[ 52.340665] [<0000015710a27484>] hugetlb_sysctl_handler+0x44/0x50
[ 52.340667] [<0000015710b53ffa>] proc_sys_call_handler+0x17a/0x280
[ 52.340672] [<0000015710a90968>] vfs_write+0x2c8/0x3a0
[ 52.340676] [<0000015710a90bd2>] ksys_write+0x72/0x100
[ 52.340679] [<00000157111483a8>] __do_syscall+0x150/0x318
[ 52.340682] [<0000015711153a5e>] system_call+0x6e/0x90
[ 52.340684] Last Breaking-Event-Address:
[ 52.340684] [<000001571077f85c>] _printk+0x4c/0x58
[ 52.340690] Kernel panic - not syncing: Fatal exception: panic_on_oops
This issue was introduced by commit f13b83fdd996 ("hugetlb: batch TLB
flushes when freeing vmemmap"). Before that change, the HVO
implementation called flush_tlb_kernel_range() each time a vmemmap PMD
split and remapping was performed. The mentioned commit changed this to
issue a few flush_tlb_all() calls after performing all remappings.
However, on s390, flush_tlb_kernel_range() expands to __tlb_flush_kernel()
while flush_tlb_all() is not implemented. As a result, we went from
flushing the TLB for every remapping to no flushing at all.
This commit fixes this by implementing flush_tlb_all() on s390 as an alias
to __tlb_flush_global(). This should cause a flush on all TLB entries on
all CPUs as expected by the flush_tlb_all() semantics.
Link: https://lkml.kernel.org/r/20251028211533.47694-1-luizcap@redhat.com
Fixes: f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")
Signed-off-by: Luiz Capitulino <luizcap(a)redhat.com>
Acked-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)kernel.org>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/s390/include/asm/tlbflush.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/s390/include/asm/tlbflush.h~s390-fix-hugetlb-vmemmap-optimization-crash
+++ a/arch/s390/include/asm/tlbflush.h
@@ -103,9 +103,13 @@ static inline void __tlb_flush_mm_lazy(s
* flush_tlb_range functions need to do the flush.
*/
#define flush_tlb() do { } while (0)
-#define flush_tlb_all() do { } while (0)
#define flush_tlb_page(vma, addr) do { } while (0)
+static inline void flush_tlb_all(void)
+{
+ __tlb_flush_global();
+}
+
static inline void flush_tlb_mm(struct mm_struct *mm)
{
__tlb_flush_mm_lazy(mm);
_
Patches currently in -mm which might be from luizcap(a)redhat.com are
The patch titled
Subject: maple_tree: fix tracepoint string pointers
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-fix-tracepoint-string-pointers.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Martin Kaiser <martin(a)kaiser.cx>
Subject: maple_tree: fix tracepoint string pointers
Date: Thu, 30 Oct 2025 16:55:05 +0100
maple_tree tracepoints contain pointers to function names. Such a pointer
is saved when a tracepoint logs an event. There's no guarantee that it's
still valid when the event is parsed later and the pointer is dereferenced.
The kernel warns about these unsafe pointers.
event 'ma_read' has unsafe pointer field 'fn'
WARNING: kernel/trace/trace.c:3779 at ignore_event+0x1da/0x1e4
Mark the function names as tracepoint_string() to fix the events.
Link: https://lkml.kernel.org/r/20251030155537.87972-1-martin@kaiser.cx
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Martin Kaiser <martin(a)kaiser.cx>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-tracepoint-string-pointers
+++ a/lib/maple_tree.c
@@ -64,6 +64,8 @@
#define CREATE_TRACE_POINTS
#include <trace/events/maple_tree.h>
+#define TP_FCT tracepoint_string(__func__)
+
/*
* Kernel pointer hashing renders much of the maple tree dump useless as tagged
* pointers get hashed to arbitrary values.
@@ -2756,7 +2758,7 @@ static inline void mas_rebalance(struct
MA_STATE(l_mas, mas->tree, mas->index, mas->last);
MA_STATE(r_mas, mas->tree, mas->index, mas->last);
- trace_ma_op(__func__, mas);
+ trace_ma_op(TP_FCT, mas);
/*
* Rebalancing occurs if a node is insufficient. Data is rebalanced
@@ -2997,7 +2999,7 @@ static void mas_split(struct ma_state *m
MA_STATE(prev_l_mas, mas->tree, mas->index, mas->last);
MA_STATE(prev_r_mas, mas->tree, mas->index, mas->last);
- trace_ma_op(__func__, mas);
+ trace_ma_op(TP_FCT, mas);
mast.l = &l_mas;
mast.r = &r_mas;
@@ -3172,7 +3174,7 @@ static bool mas_is_span_wr(struct ma_wr_
return false;
}
- trace_ma_write(__func__, wr_mas->mas, wr_mas->r_max, entry);
+ trace_ma_write(TP_FCT, wr_mas->mas, wr_mas->r_max, entry);
return true;
}
@@ -3416,7 +3418,7 @@ static noinline void mas_wr_spanning_sto
* of data may happen.
*/
mas = wr_mas->mas;
- trace_ma_op(__func__, mas);
+ trace_ma_op(TP_FCT, mas);
if (unlikely(!mas->index && mas->last == ULONG_MAX))
return mas_new_root(mas, wr_mas->entry);
@@ -3552,7 +3554,7 @@ done:
} else {
memcpy(wr_mas->node, newnode, sizeof(struct maple_node));
}
- trace_ma_write(__func__, mas, 0, wr_mas->entry);
+ trace_ma_write(TP_FCT, mas, 0, wr_mas->entry);
mas_update_gap(mas);
mas->end = new_end;
return;
@@ -3596,7 +3598,7 @@ static inline void mas_wr_slot_store(str
mas->offset++; /* Keep mas accurate. */
}
- trace_ma_write(__func__, mas, 0, wr_mas->entry);
+ trace_ma_write(TP_FCT, mas, 0, wr_mas->entry);
/*
* Only update gap when the new entry is empty or there is an empty
* entry in the original two ranges.
@@ -3717,7 +3719,7 @@ static inline void mas_wr_append(struct
mas_update_gap(mas);
mas->end = new_end;
- trace_ma_write(__func__, mas, new_end, wr_mas->entry);
+ trace_ma_write(TP_FCT, mas, new_end, wr_mas->entry);
return;
}
@@ -3731,7 +3733,7 @@ static void mas_wr_bnode(struct ma_wr_st
{
struct maple_big_node b_node;
- trace_ma_write(__func__, wr_mas->mas, 0, wr_mas->entry);
+ trace_ma_write(TP_FCT, wr_mas->mas, 0, wr_mas->entry);
memset(&b_node, 0, sizeof(struct maple_big_node));
mas_store_b_node(wr_mas, &b_node, wr_mas->offset_end);
mas_commit_b_node(wr_mas, &b_node);
@@ -5062,7 +5064,7 @@ void *mas_store(struct ma_state *mas, vo
{
MA_WR_STATE(wr_mas, mas, entry);
- trace_ma_write(__func__, mas, 0, entry);
+ trace_ma_write(TP_FCT, mas, 0, entry);
#ifdef CONFIG_DEBUG_MAPLE_TREE
if (MAS_WARN_ON(mas, mas->index > mas->last))
pr_err("Error %lX > %lX " PTR_FMT "\n", mas->index, mas->last,
@@ -5163,7 +5165,7 @@ void mas_store_prealloc(struct ma_state
}
store:
- trace_ma_write(__func__, mas, 0, entry);
+ trace_ma_write(TP_FCT, mas, 0, entry);
mas_wr_store_entry(&wr_mas);
MAS_WR_BUG_ON(&wr_mas, mas_is_err(mas));
mas_destroy(mas);
@@ -5882,7 +5884,7 @@ void *mtree_load(struct maple_tree *mt,
MA_STATE(mas, mt, index, index);
void *entry;
- trace_ma_read(__func__, &mas);
+ trace_ma_read(TP_FCT, &mas);
rcu_read_lock();
retry:
entry = mas_start(&mas);
@@ -5925,7 +5927,7 @@ int mtree_store_range(struct maple_tree
MA_STATE(mas, mt, index, last);
int ret = 0;
- trace_ma_write(__func__, &mas, 0, entry);
+ trace_ma_write(TP_FCT, &mas, 0, entry);
if (WARN_ON_ONCE(xa_is_advanced(entry)))
return -EINVAL;
@@ -6148,7 +6150,7 @@ void *mtree_erase(struct maple_tree *mt,
void *entry = NULL;
MA_STATE(mas, mt, index, index);
- trace_ma_op(__func__, &mas);
+ trace_ma_op(TP_FCT, &mas);
mtree_lock(mt);
entry = mas_erase(&mas);
@@ -6485,7 +6487,7 @@ void *mt_find(struct maple_tree *mt, uns
unsigned long copy = *index;
#endif
- trace_ma_read(__func__, &mas);
+ trace_ma_read(TP_FCT, &mas);
if ((*index) > max)
return NULL;
_
Patches currently in -mm which might be from martin(a)kaiser.cx are
maple_tree-fix-tracepoint-string-pointers.patch
folio split clears PG_has_hwpoisoned, but the flag should be preserved in
after-split folios containing pages with PG_hwpoisoned flag if the folio is
split to >0 order folios. Scan all pages in a to-be-split folio to
determine which after-split folios need the flag.
An alternatives is to change PG_has_hwpoisoned to PG_maybe_hwpoisoned to
avoid the scan and set it on all after-split folios, but resulting false
positive has undesirable negative impact. To remove false positive, caller
of folio_test_has_hwpoisoned() and folio_contain_hwpoisoned_page() needs to
do the scan. That might be causing a hassle for current and future callers
and more costly than doing the scan in the split code. More details are
discussed in [1].
This issue can be exposed via:
1. splitting a has_hwpoisoned folio to >0 order from debugfs interface;
2. truncating part of a has_hwpoisoned folio in
truncate_inode_partial_folio().
And later accesses to a hwpoisoned page could be possible due to the
missing has_hwpoisoned folio flag. This will lead to MCE errors.
Link: https://lore.kernel.org/all/CAHbLzkoOZm0PXxE9qwtF4gKR=cpRXrSrJ9V9Pm2DJexs98… [1]
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
---
From V3[1]:
1. Separated from the original series;
2. Added Fixes tag and cc'd stable;
3. Simplified page_range_has_hwpoisoned();
4. Renamed check_poisoned_pages to handle_hwpoison, made it const, and
shorten the statement;
5. Removed poisoned_new_folio variable and checked the condition
directly.
[1] https://lore.kernel.org/all/20251022033531.389351-2-ziy@nvidia.com/
mm/huge_memory.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fc65ec3393d2..5215bb6aecfc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3455,6 +3455,14 @@ bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
caller_pins;
}
+static bool page_range_has_hwpoisoned(struct page *page, long nr_pages)
+{
+ for (; nr_pages; page++, nr_pages--)
+ if (PageHWPoison(page))
+ return true;
+ return false;
+}
+
/*
* It splits @folio into @new_order folios and copies the @folio metadata to
* all the resulting folios.
@@ -3462,17 +3470,24 @@ bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
static void __split_folio_to_order(struct folio *folio, int old_order,
int new_order)
{
+ /* Scan poisoned pages when split a poisoned folio to large folios */
+ const bool handle_hwpoison = folio_test_has_hwpoisoned(folio) && new_order;
long new_nr_pages = 1 << new_order;
long nr_pages = 1 << old_order;
long i;
+ folio_clear_has_hwpoisoned(folio);
+
+ /* Check first new_nr_pages since the loop below skips them */
+ if (handle_hwpoison &&
+ page_range_has_hwpoisoned(folio_page(folio, 0), new_nr_pages))
+ folio_set_has_hwpoisoned(folio);
/*
* Skip the first new_nr_pages, since the new folio from them have all
* the flags from the original folio.
*/
for (i = new_nr_pages; i < nr_pages; i += new_nr_pages) {
struct page *new_head = &folio->page + i;
-
/*
* Careful: new_folio is not a "real" folio before we cleared PageTail.
* Don't pass it around before clear_compound_head().
@@ -3514,6 +3529,10 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
(1L << PG_dirty) |
LRU_GEN_MASK | LRU_REFS_MASK));
+ if (handle_hwpoison &&
+ page_range_has_hwpoisoned(new_head, new_nr_pages))
+ folio_set_has_hwpoisoned(new_folio);
+
new_folio->mapping = folio->mapping;
new_folio->index = folio->index + i;
@@ -3600,8 +3619,6 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
int start_order = uniform_split ? new_order : old_order - 1;
int split_order;
- folio_clear_has_hwpoisoned(folio);
-
/*
* split to new_order one order at a time. For uniform split,
* folio is split to new_order directly.
--
2.51.0
A kernel memory leak was identified by the 'ioctl_sg01' test from Linux
Test Project (LTP). The following bytes were maily observed: 0x53425355.
When USB storage devices incorrectly skip the data phase with status data,
the code extracts/validates the CSW from the sg buffer, but fails to clear
it afterwards. This leaves status protocol data in srb's transfer buffer,
such as the US_BULK_CS_SIGN 'USBS' signature observed here. Thus, this
leads to USB protocols leaks to user space through SCSI generic (/dev/sg*)
interfaces, such as the one seen here when the LTP test requested 512 KiB.
Fix the leak by zeroing the CSW data in srb's transfer buffer immediately
after the validation of devices that skip data phase.
Note: Differently from CVE-2018-1000204, which fixed a big leak by zero-
ing pages at allocation time, this leak occurs after allocation, when USB
protocol data is written to already-allocated sg pages.
Fixes: a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Desnes Nunes <desnesn(a)redhat.com>
---
drivers/usb/storage/transport.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1aa1bd26c81f..8e9f6459e197 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -1200,7 +1200,17 @@ int usb_stor_Bulk_transport(struct scsi_cmnd *srb, struct us_data *us)
US_BULK_CS_WRAP_LEN &&
bcs->Signature ==
cpu_to_le32(US_BULK_CS_SIGN)) {
+ unsigned char buf[US_BULK_CS_WRAP_LEN];
+
+ sg = NULL;
+ offset = 0;
+ memset(buf, 0, US_BULK_CS_WRAP_LEN);
usb_stor_dbg(us, "Device skipped data phase\n");
+
+ if (usb_stor_access_xfer_buf(buf, US_BULK_CS_WRAP_LEN, srb,
+ &sg, &offset, TO_XFER_BUF) != US_BULK_CS_WRAP_LEN)
+ usb_stor_dbg(us, "Failed to clear CSW data\n");
+
scsi_set_resid(srb, transfer_length);
goto skipped_data_phase;
}
--
2.51.0
From: Al Viro <viro(a)zeniv.linux.org.uk>
commit c28f922c9dcee0e4876a2c095939d77fe7e15116 upstream.
What we want is to verify there is that clone won't expose something
hidden by a mount we wouldn't be able to undo. "Wouldn't be able to undo"
may be a result of MNT_LOCKED on a child, but it may also come from
lacking admin rights in the userns of the namespace mount belongs to.
clone_private_mnt() checks the former, but not the latter.
There's a number of rather confusing CAP_SYS_ADMIN checks in various
userns during the mount, especially with the new mount API; they serve
different purposes and in case of clone_private_mnt() they usually,
but not always end up covering the missing check mentioned above.
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Reported-by: "Orlando, Noah" <Noah.Orlando(a)deshaw.com>
Fixes: 427215d85e8d ("ovl: prevent private clone if bind mount is not allowed")
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
[ merge conflict resolution: clone_private_mount() was reworked in
db04662e2f4f ("fs: allow detached mounts in clone_private_mount()").
Tweak the relevant ns_capable check so that it works on older kernels ]
Signed-off-by: Noah Orlando <Noah.Orlando(a)deshaw.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
(cherry picked from commit 36fecd740de2d542d2091d65d36554ee2bcf9c65)
[ kovalev: bp to fix CVE-2025-38499 ]
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
fs/namespace.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/namespace.c b/fs/namespace.c
index e9b8d516f191..b6d9250e1b38 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1965,6 +1965,11 @@ struct vfsmount *clone_private_mount(const struct path *path)
if (!check_mnt(old_mnt))
goto invalid;
+ if (!ns_capable(old_mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)) {
+ up_read(&namespace_sem);
+ return ERR_PTR(-EPERM);
+ }
+
if (has_locked_children(old_mnt, path->dentry))
goto invalid;
--
2.50.1
From: Moon Hee Lee <moonhee.lee.ca(a)gmail.com>
commit 16ecdab5446f15a61ec88eb0d23d25d009821db0 upstream.
syzbot triggered a WARN in ieee80211_tdls_oper() by sending
NL80211_TDLS_ENABLE_LINK immediately after NL80211_CMD_CONNECT,
before association completed and without prior TDLS setup.
This left internal state like sdata->u.mgd.tdls_peer uninitialized,
leading to a WARN_ON() in code paths that assumed it was valid.
Reject the operation early if not in station mode or not associated.
Reported-by: syzbot+f73f203f8c9b19037380(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f73f203f8c9b19037380
Fixes: 81dd2b882241 ("mac80211: move TDLS data to mgd private part")
Tested-by: syzbot+f73f203f8c9b19037380(a)syzkaller.appspotmail.com
Signed-off-by: Moon Hee Lee <moonhee.lee.ca(a)gmail.com>
Link: https://patch.msgid.link/20250715230904.661092-2-moonhee.lee.ca@gmail.com
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
[ kovalev: bp to fix CVE-2025-38644; adapted check from !sdata->vif.cfg.assoc
to !sdata->vif.bss_conf.assoc due to the older kernel not having the mac80211
configuration refactoring (see upstream commit f276e20b182d) ]
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
net/mac80211/tdls.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/tdls.c b/net/mac80211/tdls.c
index 137be9ec94af..69a67fcf1112 100644
--- a/net/mac80211/tdls.c
+++ b/net/mac80211/tdls.c
@@ -1349,7 +1349,7 @@ int ieee80211_tdls_oper(struct wiphy *wiphy, struct net_device *dev,
if (!(wiphy->flags & WIPHY_FLAG_SUPPORTS_TDLS))
return -ENOTSUPP;
- if (sdata->vif.type != NL80211_IFTYPE_STATION)
+ if (sdata->vif.type != NL80211_IFTYPE_STATION || !sdata->vif.bss_conf.assoc)
return -EINVAL;
switch (oper) {
--
2.50.1