damon_lru_sort_apply_parameters() allocates a new DAMON context, stages
user-specified DAMON parameters on it, and commits to running DAMON
context at once, using damon_commit_ctx(). The code is, however,
directly updating the monitoring attributes of the running context. And
the attributes are over-written by later damon_commit_ctx() call. This
means that the monitoring attributes parameters are not really working.
Fix the wrong use of the parameter context.
Fixes: a30969436428 ("mm/damon/lru_sort: use damon_commit_ctx()")
Cc: <stable(a)vger.kernel.org> # 6.11.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reviewed-by: Joshua Hahn <joshua.hahnjy(a)gmail.com>
---
This was a part of misc fixes and improvements for 6.18 [1], but Joshua
thankfully found this is fixing a real user visible bug. So sending this
separately as a hotfix.
[1] https://lore.kernel.org/20250915015807.101505-4-sj@kernel.org
mm/damon/lru_sort.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index 14d31009c09e..ab6173a646bd 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -219,7 +219,7 @@ static int damon_lru_sort_apply_parameters(void)
goto out;
}
- err = damon_set_attrs(ctx, &damon_lru_sort_mon_attrs);
+ err = damon_set_attrs(param_ctx, &damon_lru_sort_mon_attrs);
if (err)
goto out;
base-commit: ea93a9235c1c6e61cfa6e5612b7b6b3fc41e79e1
--
2.39.5
From: Shawn Guo <shawnguo(a)kernel.org>
A regression is seen with 6.6 -> 6.12 kernel upgrade on platforms where
cpufreq-dt driver sets cpuinfo.transition_latency as CPUFREQ_ETERNAL (-1),
due to that platform's DT doesn't provide the optional property
'clock-latency-ns'. The dbs sampling_rate was 10000 us on 6.6 and
suddently becomes 6442450 us (4294967295 / 1000 * 1.5) on 6.12 for these
platforms, because that the 10 ms cap for transition_delay_us was
accidentally dropped by the commits below.
commit 37c6dccd6837 ("cpufreq: Remove LATENCY_MULTIPLIER")
commit a755d0e2d41b ("cpufreq: Honour transition_latency over transition_delay_us")
commit e13aa799c2a6 ("cpufreq: Change default transition delay to 2ms")
It slows down dbs governor's reacting to CPU loading change
dramatically. Also, as transition_delay_us is used by schedutil governor
as rate_limit_us, it shows a negative impact on device idle power
consumption, because the device gets slightly less time in the lowest OPP.
Fix the regressions by adding the 10 ms cap on transition delay back.
Cc: stable(a)vger.kernel.org
Fixes: 37c6dccd6837 ("cpufreq: Remove LATENCY_MULTIPLIER")
Signed-off-by: Shawn Guo <shawnguo(a)kernel.org>
---
drivers/cpufreq/cpufreq.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index fc7eace8b65b..36e0c85cb4e0 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -551,8 +551,13 @@ unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy)
latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC;
if (latency)
- /* Give a 50% breathing room between updates */
- return latency + (latency >> 1);
+ /*
+ * Give a 50% breathing room between updates.
+ * And cap the transition delay to 10 ms for platforms
+ * where the latency is too high to be reasonable for
+ * reevaluating frequency.
+ */
+ return min(latency + (latency >> 1), 10 * MSEC_PER_SEC);
return USEC_PER_MSEC;
}
--
2.43.0
This series should fix the recent instabilities seen by MPTCP and NIPA
CIs where the 'mptcp_connect.sh' tests fail regularly when running the
'disconnect' subtests with "plain" TCP sockets, e.g.
# INFO: disconnect
# 63 ns1 MPTCP -> ns1 (10.0.1.1:20001 ) MPTCP (duration 996ms) [ OK ]
# 64 ns1 MPTCP -> ns1 (10.0.1.1:20002 ) TCP (duration 851ms) [ OK ]
# 65 ns1 TCP -> ns1 (10.0.1.1:20003 ) MPTCP Unexpected revents: POLLERR/POLLNVAL(19)
# (duration 896ms) [FAIL] file received by server does not match (in, out):
# -rw-r--r-- 1 root root 11112852 Aug 19 09:16 /tmp/tmp.hlJe5DoMoq.disconnect
# Trailing bytes are:
# /{ga 6@=#.8:-rw------- 1 root root 10085368 Aug 19 09:16 /tmp/tmp.blClunilxx
# Trailing bytes are:
# /{ga 6@=#.8:66 ns1 MPTCP -> ns1 (dead:beef:1::1:20004) MPTCP (duration 987ms) [ OK ]
# 67 ns1 MPTCP -> ns1 (dead:beef:1::1:20005) TCP (duration 911ms) [ OK ]
# 68 ns1 TCP -> ns1 (dead:beef:1::1:20006) MPTCP (duration 980ms) [ OK ]
# [FAIL] Tests of the full disconnection have failed
These issues started to be visible after some behavioural changes in
TCP, where too quick re-connections after a shutdown() can now be more
easily rejected. Patch 3 modifies the selftests to wait, but this
resolution revealed an issue in MPTCP which is fixed by patch 1 (a fix
for v5.9 kernel).
Patches 2 and 4 improve some errors reported by the selftests, and patch
5 helps with the debugging of such issues.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Note: The last two patches are not strictly fixes, but they are useful
in case similar issues happen again. That's why they have been added
here in this series for -net. If that's an issue, please drop them, and
I can re-send them later on.
---
Matthieu Baerts (NGI0) (5):
mptcp: propagate shutdown to subflows when possible
selftests: mptcp: connect: catch IO errors on listen side
selftests: mptcp: avoid spurious errors on TCP disconnect
selftests: mptcp: print trailing bytes with od
selftests: mptcp: connect: print pcap prefix
net/mptcp/protocol.c | 16 ++++++++++++++++
tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 ++++++-----
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 6 +++++-
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 2 +-
4 files changed, 28 insertions(+), 7 deletions(-)
---
base-commit: 2690cb089502b80b905f2abdafd1bf2d54e1abef
change-id: 20250912-net-mptcp-fix-sft-connect-f095ad7a6e36
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The 'sess->rpc_handle_list' XArray manages RPC handles within a ksmbd
session. Access to this list is intended to be protected by
'sess->rpc_lock' (an rw_semaphore). However, the locking implementation was
flawed, leading to potential race conditions.
In ksmbd_session_rpc_open(), the code incorrectly acquired only a read lock
before calling xa_store() and xa_erase(). Since these operations modify
the XArray structure, a write lock is required to ensure exclusive access
and prevent data corruption from concurrent modifications.
Furthermore, ksmbd_session_rpc_method() accessed the list using xa_load()
without holding any lock at all. This could lead to reading inconsistent
data or a potential use-after-free if an entry is concurrently removed and
the pointer is dereferenced.
Fix these issues by:
1. Using down_write() and up_write() in ksmbd_session_rpc_open()
to ensure exclusive access during XArray modification, and ensuring
the lock is correctly released on error paths.
2. Adding down_read() and up_read() in ksmbd_session_rpc_method()
to safely protect the lookup.
Fixes: a1f46c99d9ea ("ksmbd: fix use-after-free in ksmbd_session_rpc_open")
Fixes: b685757c7b08 ("ksmbd: Implements sess->rpc_handle_list as xarray")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yunseong Kim <ysk(a)kzalloc.com>
---
fs/smb/server/mgmt/user_session.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/fs/smb/server/mgmt/user_session.c b/fs/smb/server/mgmt/user_session.c
index 9dec4c2940bc..b36d0676dbe5 100644
--- a/fs/smb/server/mgmt/user_session.c
+++ b/fs/smb/server/mgmt/user_session.c
@@ -104,29 +104,32 @@ int ksmbd_session_rpc_open(struct ksmbd_session *sess, char *rpc_name)
if (!entry)
return -ENOMEM;
- down_read(&sess->rpc_lock);
entry->method = method;
entry->id = id = ksmbd_ipc_id_alloc();
if (id < 0)
goto free_entry;
+
+ down_write(&sess->rpc_lock);
old = xa_store(&sess->rpc_handle_list, id, entry, KSMBD_DEFAULT_GFP);
- if (xa_is_err(old))
+ if (xa_is_err(old)) {
+ up_write(&sess->rpc_lock);
goto free_id;
+ }
resp = ksmbd_rpc_open(sess, id);
- if (!resp)
- goto erase_xa;
+ if (!resp) {
+ xa_erase(&sess->rpc_handle_list, entry->id);
+ up_write(&sess->rpc_lock);
+ goto free_id;
+ }
- up_read(&sess->rpc_lock);
+ up_write(&sess->rpc_lock);
kvfree(resp);
return id;
-erase_xa:
- xa_erase(&sess->rpc_handle_list, entry->id);
free_id:
ksmbd_rpc_id_free(entry->id);
free_entry:
kfree(entry);
- up_read(&sess->rpc_lock);
return -EINVAL;
}
@@ -144,9 +147,14 @@ void ksmbd_session_rpc_close(struct ksmbd_session *sess, int id)
int ksmbd_session_rpc_method(struct ksmbd_session *sess, int id)
{
struct ksmbd_session_rpc *entry;
+ int method;
+ down_read(&sess->rpc_lock);
entry = xa_load(&sess->rpc_handle_list, id);
- return entry ? entry->method : 0;
+ method = entry ? entry->method : 0;
+ up_read(&sess->rpc_lock);
+
+ return method;
}
void ksmbd_session_destroy(struct ksmbd_session *sess)
--
2.51.0
We need to increment i_fastreg_wrs before we bail out from
rds_ib_post_reg_frmr().
We have a fixed budget of how many FRWR operations that can be
outstanding using the dedicated QP used for memory registrations and
de-registrations. This budget is enforced by the atomic_t
i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will
"leak" the possibility of posting an FRWR operation, and if that
accumulates, no FRWR operation can be carried out.
Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Fixes: 3a2886cca703 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown")
Cc: stable(a)vger.kernel.org
Signed-off-by: Håkon Bugge <haakon.bugge(a)oracle.com>
Reviewed-by: Allison Henderson <allison.henderson(a)oracle.com>
---
v3 -> v4:
* Removed unused "out:" label
* Added Allison's r-b
v2 -> v3:
* Amended commit message
* Removed indentation of this section
* Fixing error path from ib_post_send()
v1 -> v2: Added Cc: stable(a)vger.kernel.org
---
net/rds/ib_frmr.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
index 28c1b00221780..bd861191157b5 100644
--- a/net/rds/ib_frmr.c
+++ b/net/rds/ib_frmr.c
@@ -133,12 +133,15 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_dma_len,
&off, PAGE_SIZE);
- if (unlikely(ret != ibmr->sg_dma_len))
- return ret < 0 ? ret : -EINVAL;
+ if (unlikely(ret != ibmr->sg_dma_len)) {
+ ret = ret < 0 ? ret : -EINVAL;
+ goto out_inc;
+ }
- if (cmpxchg(&frmr->fr_state,
- FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE)
- return -EBUSY;
+ if (cmpxchg(&frmr->fr_state, FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE) {
+ ret = -EBUSY;
+ goto out_inc;
+ }
atomic_inc(&ibmr->ic->i_fastreg_inuse_count);
@@ -166,11 +169,10 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
/* Failure here can be because of -ENOMEM as well */
rds_transition_frwr_state(ibmr, FRMR_IS_INUSE, FRMR_IS_STALE);
- atomic_inc(&ibmr->ic->i_fastreg_wrs);
if (printk_ratelimit())
pr_warn("RDS/IB: %s returned error(%d)\n",
__func__, ret);
- goto out;
+ goto out_inc;
}
/* Wait for the registration to complete in order to prevent an invalid
@@ -179,8 +181,10 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
*/
wait_event(frmr->fr_reg_done, !frmr->fr_reg);
-out:
+ return ret;
+out_inc:
+ atomic_inc(&ibmr->ic->i_fastreg_wrs);
return ret;
}
--
2.43.5
The patch titled
Subject: mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
has been added to the -mm mm-new branch. Its filename is
mm-ksm-fix-incorrect-ksm-counter-handling-in-mm_struct-during-fork.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Donet Tom <donettom(a)linux.ibm.com>
Subject: mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
Date: Mon, 15 Sep 2025 20:33:04 +0530
Patch series "mm/ksm: Fix incorrect accounting of KSM counters during
fork.", v2.
The first patch in this series fixes the incorrect accounting of KSM
counters such as ksm_merging_pages, ksm_rmap_items, and the global
ksm_zero_pages during fork.
The following two patches add selftests to verify that the
ksm_merging_pages counter and the global ksm_zero_pages counter are
updated correctly during fork.
Test Results
============
Without the first patch
-----------------------
# [RUN] test_fork_ksm_merging_page_count
not ok 10 ksm_merging_page in child: 32
# [RUN] test_fork_global_ksm_zero_pages_count
not ok 11 Incorrect global ksm zero page counter after fork
With the first patch
--------------------
# [RUN] test_fork_ksm_merging_page_count
ok 10 ksm_merging_pages is not inherited after fork
# [RUN] test_fork_global_ksm_zero_pages_count
ok 11 Global ksm zero page count is correct after fork
This patch (of 3):
Currently, the KSM-related counters in `mm_struct`, such as
`ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited
by the child process during fork. This results in inconsistent
accounting.
When a process uses KSM, identical pages are merged and an rmap item is
created for each merged page. The `ksm_merging_pages` and
`ksm_rmap_items` counters are updated accordingly. However, after a fork,
these counters are copied to the child while the corresponding rmap items
are not. As a result, when the child later triggers an unmerge, there are
no rmap items present in the child, so the counters remain stale, leading
to incorrect accounting.
A similar issue exists with `ksm_zero_pages`, which maintains both a
global counter and a per-process counter. During fork, the per-process
counter is inherited by the child, but the global counter is not
incremented. Since the child also references zero pages, the global
counter should be updated as well. Otherwise, during zero-page unmerge,
both the global and per-process counters are decremented, causing the
global counter to become inconsistent.
To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during
fork, and the global ksm_zero_pages counter is updated with the
per-process ksm_zero_pages value inherited by the child. This ensures
that KSM statistics remain accurate and reflect the activity of each
process correctly.
Link: https://lkml.kernel.org/r/cover.1757946863.git.donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/4044e7623953d9f4c240d0308cf0b2fe769ee553.17579468…
Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process")
Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process")
Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Cc: Aboorva Devarajan <aboorvad(a)linux.ibm.com>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: xu xin <xu.xin16(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org> [6.6]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/ksm.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/include/linux/ksm.h~mm-ksm-fix-incorrect-ksm-counter-handling-in-mm_struct-during-fork
+++ a/include/linux/ksm.h
@@ -56,8 +56,14 @@ static inline long mm_ksm_zero_pages(str
static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
/* Adding mm to ksm is best effort on fork. */
- if (mm_flags_test(MMF_VM_MERGEABLE, oldmm))
+ if (mm_flags_test(MMF_VM_MERGEABLE, oldmm)) {
+ long nr_ksm_zero_pages = atomic_long_read(&mm->ksm_zero_pages);
+
+ mm->ksm_merging_pages = 0;
+ mm->ksm_rmap_items = 0;
+ atomic_long_add(nr_ksm_zero_pages, &ksm_zero_pages);
__ksm_enter(mm);
+ }
}
static inline int ksm_execve(struct mm_struct *mm)
_
Patches currently in -mm which might be from donettom(a)linux.ibm.com are
mm-ksm-fix-incorrect-ksm-counter-handling-in-mm_struct-during-fork.patch
selftests-mm-added-fork-inheritance-test-for-ksm_merging_pages-counter.patch
selftests-mm-added-fork-test-to-verify-global-ksm_zero_pages-counter-behavior.patch
The patch titled
Subject: mm: fix off-by-one error in VMA count limit checks
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-off-by-one-error-in-vma-count-limit-checks.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kalesh Singh <kaleshsingh(a)google.com>
Subject: mm: fix off-by-one error in VMA count limit checks
Date: Mon, 15 Sep 2025 09:36:32 -0700
The VMA count limit check in do_mmap() and do_brk_flags() uses a strict
inequality (>), which allows a process's VMA count to exceed the
configured sysctl_max_map_count limit by one.
A process with mm->map_count == sysctl_max_map_count will incorrectly pass
this check and then exceed the limit upon allocation of a new VMA when its
map_count is incremented.
Other VMA allocation paths, such as split_vma(), already use the correct,
inclusive (>=) comparison.
Fix this bug by changing the comparison to be inclusive in do_mmap() and
do_brk_flags(), bringing them in line with the correct behavior of other
allocation paths.
Link: https://lkml.kernel.org/r/20250915163838.631445-2-kaleshsingh@google.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Pedro Falcato <pfalcato(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 2 +-
mm/vma.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/mm/mmap.c~mm-fix-off-by-one-error-in-vma-count-limit-checks
+++ a/mm/mmap.c
@@ -374,7 +374,7 @@ unsigned long do_mmap(struct file *file,
return -EOVERFLOW;
/* Too many mappings? */
- if (mm->map_count > sysctl_max_map_count)
+ if (mm->map_count >= sysctl_max_map_count)
return -ENOMEM;
/*
--- a/mm/vma.c~mm-fix-off-by-one-error-in-vma-count-limit-checks
+++ a/mm/vma.c
@@ -2772,7 +2772,7 @@ int do_brk_flags(struct vma_iterator *vm
if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT))
return -ENOMEM;
- if (mm->map_count > sysctl_max_map_count)
+ if (mm->map_count >= sysctl_max_map_count)
return -ENOMEM;
if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
_
Patches currently in -mm which might be from kaleshsingh(a)google.com are
mm-fix-off-by-one-error-in-vma-count-limit-checks.patch
alloc_slab_obj_exts() should mark failed obj_exts vector allocations
independent on whether the vector is being allocated for a new or an
existing slab. Current implementation skips doing this for existing
slabs. Fix this by marking failed allocations unconditionally.
Fixes: 09c46563ff6d ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Reported-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Closes: https://lore.kernel.org/all/avhakjldsgczmq356gkwmvfilyvf7o6temvcmtt5lqd4fhp…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable(a)vger.kernel.org # v6.10+
---
mm/slub.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index af343ca570b5..cab4e7822393 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2029,8 +2029,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
slab_nid(slab));
if (!vec) {
/* Mark vectors which failed to allocate */
- if (new_slab)
- mark_failed_objexts_alloc(slab);
+ mark_failed_objexts_alloc(slab);
return -ENOMEM;
}
--
2.51.0.384.g4c02a37b29-goog
When object extension vector allocation fails, we set slab->obj_exts to
OBJEXTS_ALLOC_FAIL to indicate the failure. Later, once the vector is
successfully allocated, we will use this flag to mark codetag references
stored in that vector as empty to avoid codetag warnings.
slab_obj_exts() used to retrieve the slab->obj_exts vector pointer checks
slab->obj_exts for being either NULL or a pointer with MEMCG_DATA_OBJEXTS
bit set. However it does not handle the case when slab->obj_exts equals
OBJEXTS_ALLOC_FAIL. Add the missing condition to avoid extra warning.
Fixes: 09c46563ff6d ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Reported-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Closes: https://lore.kernel.org/all/jftidhymri2af5u3xtcqry3cfu6aqzte3uzlznhlaylgrdz…
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable(a)vger.kernel.org # v6.10+
---
mm/slab.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/slab.h b/mm/slab.h
index c41a512dd07c..b930193fd94e 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -526,8 +526,12 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
unsigned long obj_exts = READ_ONCE(slab->obj_exts);
#ifdef CONFIG_MEMCG
- VM_BUG_ON_PAGE(obj_exts && !(obj_exts & MEMCG_DATA_OBJEXTS),
- slab_page(slab));
+ /*
+ * obj_exts should be either NULL, a valid pointer with
+ * MEMCG_DATA_OBJEXTS bit set or be equal to OBJEXTS_ALLOC_FAIL.
+ */
+ VM_BUG_ON_PAGE(obj_exts && !(obj_exts & MEMCG_DATA_OBJEXTS) &&
+ obj_exts != OBJEXTS_ALLOC_FAIL, slab_page(slab));
VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
#endif
return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
--
2.51.0.384.g4c02a37b29-goog
Commit 88e6c42e40de ("io_uring/io-wq: add check free worker before
create new worker") reused the variable `do_create` for something
else, abusing it for the free worker check.
This caused the value to effectively always be `true` at the time
`nr_workers < max_workers` was checked, but it should really be
`false`. This means the `max_workers` setting was ignored, and worse:
if the limit had already been reached, incrementing `nr_workers` was
skipped even though another worker would be created.
When later lots of workers exit, the `nr_workers` field could easily
underflow, making the problem worse because more and more workers
would be created without incrementing `nr_workers`.
The simple solution is to use a different variable for the free worker
check instead of using one variable for two different things.
Cc: stable(a)vger.kernel.org
Fixes: 88e6c42e40de ("io_uring/io-wq: add check free worker before create new worker")
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
---
io_uring/io-wq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 17dfaa0395c4..1d03b2fc4b25 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -352,16 +352,16 @@ static void create_worker_cb(struct callback_head *cb)
struct io_wq *wq;
struct io_wq_acct *acct;
- bool do_create = false;
+ bool activated_free_worker, do_create = false;
worker = container_of(cb, struct io_worker, create_work);
wq = worker->wq;
acct = worker->acct;
rcu_read_lock();
- do_create = !io_acct_activate_free_worker(acct);
+ activated_free_worker = io_acct_activate_free_worker(acct);
rcu_read_unlock();
- if (!do_create)
+ if (activated_free_worker)
goto no_need_create;
raw_spin_lock(&acct->workers_lock);
--
2.47.3