Hi,
commit 872b8f14d772 ("drm/amd/display: Validate backlight caps are
sane") was added to stable trees to fix a brightness problem on one
laptop on a buggy firmware but with how aggressive it was it caused a
problem on another.
Fortunately the problem on the other was already fixed in 6.12 though!
commit 87d749a6aab7 ("drm/amd/display: Allow backlight to go below
`AMDGPU_DM_DEFAULT_MIN_BACKLIGHT`")
Can that commit please be brought everywhere that 872b8f14d772 went?
Thanks!
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x 7a2369b74abf76cd3e54c45b30f6addb497f831b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100707-delta-trance-5682@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
7a2369b74abf ("mm: z3fold: deprecate CONFIG_Z3FOLD")
04cb7502a5d7 ("zsmalloc: use all available 24 bits of page_type")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a2369b74abf76cd3e54c45b30f6addb497f831b Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosryahmed(a)google.com>
Date: Wed, 4 Sep 2024 23:33:43 +0000
Subject: [PATCH] mm: z3fold: deprecate CONFIG_Z3FOLD
The z3fold compressed pages allocator is rarely used, most users use
zsmalloc. The only disadvantage of zsmalloc in comparison is the
dependency on MMU, and zbud is a more common option for !MMU as it was the
default zswap allocator for a long time.
Historically, zsmalloc had worse latency than zbud and z3fold but offered
better memory savings. This is no longer the case as shown by a simple
recent analysis [1]. That analysis showed that z3fold does not have any
advantage over zsmalloc or zbud considering both performance and memory
usage. In a kernel build test on tmpfs in a limited cgroup, z3fold took
3% more time and used 1.8% more memory. The latency of zswap_load() was
7% higher, and that of zswap_store() was 10% higher. Zsmalloc is better
in all metrics.
Moreover, z3fold apparently has latent bugs, which was made noticeable by
a recent soft lockup bug report with z3fold [2]. Switching to zsmalloc
not only fixed the problem, but also reduced the swap usage from 6~8G to
1~2G. Other users have also reported being bitten by mistakenly enabling
z3fold.
Other than hurting users, z3fold is repeatedly causing wasted engineering
effort. Apart from investigating the above bug, it came up in multiple
development discussions (e.g. [3]) as something we need to handle, when
there aren't any legit users (at least not intentionally).
The natural course of action is to deprecate z3fold, and remove in a few
cycles if no objections are raised from active users. Next on the list
should be zbud, as it offers marginal latency gains at the cost of huge
memory waste when compared to zsmalloc. That one will need to wait until
zsmalloc does not depend on MMU.
Rename the user-visible config option from CONFIG_Z3FOLD to
CONFIG_Z3FOLD_DEPRECATED so that users with CONFIG_Z3FOLD=y get a new
prompt with explanation during make oldconfig. Also, remove
CONFIG_Z3FOLD=y from defconfigs.
[1]https://lore.kernel.org/lkml/CAJD7tkbRF6od-2x_L8-A1QL3=2Ww13sCj4S3i4bNndq…
[2]https://lore.kernel.org/lkml/EF0ABD3E-A239-4111-A8AB-5C442E759CF3@gmail.c…
[3]https://lore.kernel.org/lkml/CAJD7tkbnmeVugfunffSovJf9FAgy9rhBVt_tx=nxUve…
[arnd(a)arndb.de: deprecate ZSWAP_ZPOOL_DEFAULT_Z3FOLD as well]
Link: https://lkml.kernel.org/r/20240909202625.1054880-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20240904233343.933462-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed(a)google.com>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Acked-by: Chris Down <chris(a)chrisdown.name>
Acked-by: Nhat Pham <nphamcs(a)gmail.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Vitaly Wool <vitaly.wool(a)konsulko.com>
Acked-by: Christoph Hellwig <hch(a)lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)kernel.org>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Huacai Chen <chenhuacai(a)kernel.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao(a)linux.ibm.com>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: WANG Xuerui <kernel(a)xen0n.name>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/loongarch/configs/loongson3_defconfig b/arch/loongarch/configs/loongson3_defconfig
index b4252c357c8e..75b366407a60 100644
--- a/arch/loongarch/configs/loongson3_defconfig
+++ b/arch/loongarch/configs/loongson3_defconfig
@@ -96,7 +96,6 @@ CONFIG_ZPOOL=y
CONFIG_ZSWAP=y
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD=y
CONFIG_ZBUD=y
-CONFIG_Z3FOLD=y
CONFIG_ZSMALLOC=m
# CONFIG_COMPAT_BRK is not set
CONFIG_MEMORY_HOTPLUG=y
diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
index 544a65fda77b..d39284489aa2 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -81,7 +81,6 @@ CONFIG_MODULE_SIG_SHA512=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_BINFMT_MISC=m
CONFIG_ZSWAP=y
-CONFIG_Z3FOLD=y
CONFIG_ZSMALLOC=y
# CONFIG_SLAB_MERGE_DEFAULT is not set
CONFIG_SLAB_FREELIST_RANDOM=y
diff --git a/mm/Kconfig b/mm/Kconfig
index 1aa282e35dc7..09aebca1cae3 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -146,12 +146,15 @@ config ZSWAP_ZPOOL_DEFAULT_ZBUD
help
Use the zbud allocator as the default allocator.
-config ZSWAP_ZPOOL_DEFAULT_Z3FOLD
- bool "z3fold"
- select Z3FOLD
+config ZSWAP_ZPOOL_DEFAULT_Z3FOLD_DEPRECATED
+ bool "z3foldi (DEPRECATED)"
+ select Z3FOLD_DEPRECATED
help
Use the z3fold allocator as the default allocator.
+ Deprecated and scheduled for removal in a few cycles,
+ see CONFIG_Z3FOLD_DEPRECATED.
+
config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
bool "zsmalloc"
select ZSMALLOC
@@ -163,7 +166,7 @@ config ZSWAP_ZPOOL_DEFAULT
string
depends on ZSWAP
default "zbud" if ZSWAP_ZPOOL_DEFAULT_ZBUD
- default "z3fold" if ZSWAP_ZPOOL_DEFAULT_Z3FOLD
+ default "z3fold" if ZSWAP_ZPOOL_DEFAULT_Z3FOLD_DEPRECATED
default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
default ""
@@ -177,15 +180,25 @@ config ZBUD
deterministic reclaim properties that make it preferable to a higher
density approach when reclaim will be used.
-config Z3FOLD
- tristate "3:1 compression allocator (z3fold)"
+config Z3FOLD_DEPRECATED
+ tristate "3:1 compression allocator (z3fold) (DEPRECATED)"
depends on ZSWAP
help
+ Deprecated and scheduled for removal in a few cycles. If you have
+ a good reason for using Z3FOLD over ZSMALLOC, please contact
+ linux-mm(a)kvack.org and the zswap maintainers.
+
A special purpose allocator for storing compressed pages.
It is designed to store up to three compressed pages per physical
page. It is a ZBUD derivative so the simplicity and determinism are
still there.
+config Z3FOLD
+ tristate
+ default y if Z3FOLD_DEPRECATED=y
+ default m if Z3FOLD_DEPRECATED=m
+ depends on Z3FOLD_DEPRECATED
+
config ZSMALLOC
tristate
prompt "N:1 compression allocator (zsmalloc)" if (ZSWAP || ZRAM)
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x 34820304cc2cd1804ee1f8f3504ec77813d29c8e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100757-gambling-blurry-b71e@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
34820304cc2c ("uprobes: fix kernel info leak via "[uprobes]" vma")
2abbcc099ec6 ("uprobes: turn xol_area->pages[2] into xol_area->page")
6d27a31ef195 ("uprobes: introduce the global struct vm_special_mapping xol_mapping")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 34820304cc2cd1804ee1f8f3504ec77813d29c8e Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Sun, 29 Sep 2024 18:20:47 +0200
Subject: [PATCH] uprobes: fix kernel info leak via "[uprobes]" vma
xol_add_vma() maps the uninitialized page allocated by __create_xol_area()
into userspace. On some architectures (x86) this memory is readable even
without VM_READ, VM_EXEC results in the same pgprot_t as VM_EXEC|VM_READ,
although this doesn't really matter, debugger can read this memory anyway.
Link: https://lore.kernel.org/all/20240929162047.GA12611@redhat.com/
Reported-by: Will Deacon <will(a)kernel.org>
Fixes: d4b3b6384f98 ("uprobes/core: Allocate XOL slots for uprobes use")
Cc: stable(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2ec796e2f055..4b52cb2ae6d6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1545,7 +1545,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
if (!area->bitmap)
goto free_area;
- area->page = alloc_page(GFP_HIGHUSER);
+ area->page = alloc_page(GFP_HIGHUSER | __GFP_ZERO);
if (!area->page)
goto free_bitmap;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c314094cb4cfa6fc5a17f4881ead2dfebfa717a7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100732-pessimist-ambiguous-58e3@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
c314094cb4cf ("io_uring/net: harden multishot termination case for recv")
4a3223f7bfda ("io_uring/net: switch io_recv() to using io_async_msghdr")
fb6328bc2ab5 ("io_uring/net: simplify msghd->msg_inq checking")
186daf238529 ("io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE")
eb18c29dd2a3 ("io_uring/net: move recv/recvmsg flags out of retry loop")
c3f9109dbc9e ("io_uring/kbuf: flag request if buffer pool is empty after buffer pick")
95041b93e90a ("io_uring: add io_file_can_poll() helper")
521223d7c229 ("io_uring/cancel: don't default to setting req->work.cancel_seq")
4bcb982cce74 ("io_uring: expand main struct io_kiocb flags to 64-bits")
72bd80252fee ("io_uring/net: fix sr->len for IORING_OP_RECV with MSG_WAITALL and buffers")
76b367a2d831 ("io_uring/net: limit inline multishot retries")
91e5d765a82f ("io_uring/net: un-indent mshot retry path in io_recv_finish()")
595e52284d24 ("io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE")
89d528ba2f82 ("io_uring: indicate if io_kbuf_recycle did recycle anything")
4de520f1fcef ("Merge tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c314094cb4cfa6fc5a17f4881ead2dfebfa717a7 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Thu, 26 Sep 2024 07:08:10 -0600
Subject: [PATCH] io_uring/net: harden multishot termination case for recv
If the recv returns zero, or an error, then it doesn't matter if more
data has already been received for this buffer. A condition like that
should terminate the multishot receive. Rather than pass in the
collected return value, pass in whether to terminate or keep the recv
going separately.
Note that this isn't a bug right now, as the only way to get there is
via setting MSG_WAITALL with multishot receive. And if an application
does that, then -EINVAL is returned anyway. But it seems like an easy
bug to introduce, so let's make it a bit more explicit.
Link: https://github.com/axboe/liburing/issues/1246
Cc: stable(a)vger.kernel.org
Fixes: b3fdea6ecb55 ("io_uring: multishot recv")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/net.c b/io_uring/net.c
index f10f5a22d66a..18507658a921 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1133,6 +1133,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
int ret, min_ret = 0;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
size_t len = sr->len;
+ bool mshot_finished;
if (!(req->flags & REQ_F_POLLED) &&
(sr->flags & IORING_RECVSEND_POLL_FIRST))
@@ -1187,6 +1188,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
req_set_fail(req);
}
+ mshot_finished = ret <= 0;
if (ret > 0)
ret += sr->done_io;
else if (sr->done_io)
@@ -1194,7 +1196,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
else
io_kbuf_recycle(req, issue_flags);
- if (!io_recv_finish(req, &ret, kmsg, ret <= 0, issue_flags))
+ if (!io_recv_finish(req, &ret, kmsg, mshot_finished, issue_flags))
goto retry_multishot;
return ret;
From: Patrick Donnelly <pdonnell(a)redhat.com>
Log recovered from a user's cluster:
<7>[ 5413.970692] ceph: get_cap_refs 00000000958c114b ret 1 got Fr
<7>[ 5413.970695] ceph: start_read 00000000958c114b, no cache cap
...
<7>[ 5473.934609] ceph: my wanted = Fr, used = Fr, dirty -
<7>[ 5473.934616] ceph: revocation: pAsLsXsFr -> pAsLsXs (revoking Fr)
<7>[ 5473.934632] ceph: __ceph_caps_issued 00000000958c114b cap 00000000f7784259 issued pAsLsXs
<7>[ 5473.934638] ceph: check_caps 10000000e68.fffffffffffffffe file_want - used Fr dirty - flushing - issued pAsLsXs revoking Fr retain pAsLsXsFsr AUTHONLY NOINVAL FLUSH_FORCE
The MDS subsequently complains that the kernel client is late releasing caps.
Approximately, a series of changes to this code by the three commits cited
below resulted in subtle resource cleanup to be missed. The main culprit is the
change in error handling in 2d31604 which meant that a failure in init_request
would no longer cause cleanup to be called. That would prevent the
ceph_put_cap_refs which would cleanup the leaked cap ref.
Closes: https://tracker.ceph.com/issues/67008
Fixes: 49870056005ca9387e5ee31451991491f99cc45f ("ceph: convert ceph_readpages to ceph_readahead")
Fixes: 2de160417315b8d64455fe03e9bb7d3308ac3281 ("netfs: Change ->init_request() to return an error code")
Fixes: a5c9dc4451394b2854493944dcc0ff71af9705a3 ("ceph: Make ceph_init_request() check caps on readahead")
Signed-off-by: Patrick Donnelly <pdonnell(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
fs/ceph/addr.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 53fef258c2bc..702c6a730b70 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -489,8 +489,11 @@ static int ceph_init_request(struct netfs_io_request *rreq, struct file *file)
rreq->io_streams[0].sreq_max_len = fsc->mount_options->rsize;
out:
- if (ret < 0)
+ if (ret < 0) {
+ if (got)
+ ceph_put_cap_refs(ceph_inode(inode), got);
kfree(priv);
+ }
return ret;
}
base-commit: e32cde8d2bd7d251a8f9b434143977ddf13dcec6
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
From: George Ryurikov <g.ryurikov(a)securitycode.ru>
From: Yu Kuai <yukuai3(a)huawei.com>
commit 1e3cc2125d7cc7d492b2e6e52d09c1e17ba573c3
'bfqq->bfqd' is ensured to set in bfq_init_queue(), and it will never
change afterwards.
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220816015631.1323948-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: George Ryurikov <g.ryurikov(a)securitycode.ru>
---
block/bfq-iosched.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 6687b805bab3..0031c5751d89 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4864,9 +4864,7 @@ void bfq_put_queue(struct bfq_queue *bfqq)
struct hlist_node *n;
struct bfq_group *bfqg = bfqq_group(bfqq);
- if (bfqq->bfqd)
- bfq_log_bfqq(bfqq->bfqd, bfqq, "put_queue: %p %d",
- bfqq, bfqq->ref);
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "put_queue: %p %d", bfqq, bfqq->ref);
bfqq->ref--;
if (bfqq->ref)
@@ -4931,7 +4929,7 @@ void bfq_put_queue(struct bfq_queue *bfqq)
hlist_del_init(&item->woken_list_node);
}
- if (bfqq->bfqd && bfqq->bfqd->last_completed_rq_bfqq == bfqq)
+ if (bfqq->bfqd->last_completed_rq_bfqq == bfqq)
bfqq->bfqd->last_completed_rq_bfqq = NULL;
kmem_cache_free(bfq_pool, bfqq);
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
Looking at the source code links for mm/memory.c in the sample reports
in the syzbot report links [1].
it looks like the line numbers are designated as lines that have been
increased by 1. This may seem like a problem with syzkaller or the
addr2line program that assigns the line numbers, but there is no problem
with either of them.
In the previous commit d61ea1cb0095 ("userfaultfd: UFFD_FEATURE_WP_ASYNC"),
when modifying mm/memory.c, an unknown line break is added to the very first
line of the file. However, the git.kernel.org site displays the source code
with the added line break removed, so even though addr2line has assigned
the correct line number, it looks like the line number has increased by 1.
This may seem like a trivial thing, but I think it would be appropriate
to remove all the newline characters added to the upstream and stable
versions, as they are not only incorrect in terms of code style but also
hinder bug analysis.
[1]
https://syzkaller.appspot.com/bug?extid=4145b11cdf925264bff4https://syzkaller.appspot.com/bug?extid=fa43f1b63e3aa6f66329https://syzkaller.appspot.com/bug?extid=890a1df7294175947697
Fixes: d61ea1cb0095 ("userfaultfd: UFFD_FEATURE_WP_ASYNC")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
---
mm/memory.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c
index 2366578015ad..7dffe8749014 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1,4 +1,3 @@
-
// SPDX-License-Identifier: GPL-2.0-only
/*
* linux/mm/memory.c
--
The patch titled
Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Florian Westphal <fw(a)strlen.de>
Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
Date: Mon, 7 Oct 2024 22:52:24 +0200
Ben Greear reports following splat:
------------[ cut here ]------------
net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
codetag_unload_module+0x19b/0x2a0
? codetag_load_module+0x80/0x80
nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.
Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.
Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat # will WARN.
Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Reported-by: Ben Greear <greearb(a)candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candela…
Cc: Uladzislau Rezki <urezki(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/codetag.c | 2 ++
1 file changed, 2 insertions(+)
--- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls
+++ a/lib/codetag.c
@@ -228,6 +228,8 @@ bool codetag_unload_module(struct module
if (!mod)
return true;
+ kvfree_rcu_barrier();
+
mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
struct codetag_module *found = NULL;
_
Patches currently in -mm which might be from fw(a)strlen.de are
lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls.patch
The patch titled
Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race
Date: Mon, 07 Oct 2024 23:42:04 +0200
In mremap(), move_page_tables() looks at the type of the PMD entry and the
specified address range to figure out by which method the next chunk of
page table entries should be moved.
At that point, the mmap_lock is held in write mode, but no rmap locks are
held yet. For PMD entries that point to page tables and are fully covered
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
which first takes rmap locks, then does move_normal_pmd().
move_normal_pmd() takes the necessary page table locks at source and
destination, then moves an entire page table from the source to the
destination.
The problem is: The rmap locks, which protect against concurrent page
table removal by retract_page_tables() in the THP code, are only taken
after the PMD entry has been read and it has been decided how to move it.
So we can race as follows (with two processes that have mappings of the
same tmpfs file that is stored on a tmpfs mount with huge=advise); note
that process A accesses page tables through the MM while process B does it
through the file rmap:
process A process B
========= =========
mremap
mremap_to
move_vma
move_page_tables
get_old_pmd
alloc_new_pmd
*** PREEMPT ***
madvise(MADV_COLLAPSE)
do_madvise
madvise_walk_vmas
madvise_vma_behavior
madvise_collapse
hpage_collapse_scan_file
collapse_file
retract_page_tables
i_mmap_lock_read(mapping)
pmdp_collapse_flush
i_mmap_unlock_read(mapping)
move_pgt_entry(NORMAL_PMD, ...)
take_rmap_locks
move_normal_pmd
drop_rmap_locks
When this happens, move_normal_pmd() can end up creating bogus PMD entries
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect
depends on arch-specific and machine-specific details; on x86, you can end
up with physical page 0 mapped as a page table, which is likely
exploitable for user->kernel privilege escalation.
Fix the race by letting process B recheck that the PMD still points to a
page table after the rmap locks have been taken. Otherwise, we bail and
let the caller fall back to the PTE-level copying path, which will then
bail immediately at the pmd_none() check.
Bug reachability: Reaching this bug requires that you can create
shmem/file THP mappings - anonymous THP uses different code that doesn't
zap stuff under rmap locks. File THP is gated on an experimental config
flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need
shmem THP to hit this bug. As far as I know, getting shmem THP normally
requires that you can mount your own tmpfs with the right mount flags,
which would require creating your own user+mount namespace; though I don't
know if some distros maybe enable shmem THP by default or something like
that.
Bug impact: This issue can likely be used for user->kernel privilege
escalation when it is reachable.
Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5…
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Closes: https://project-zero.issues.chromium.org/371047675
Co-developed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
--- a/mm/mremap.c~mm-mremap-fix-move_normal_pmd-retract_page_tables-race
+++ a/mm/mremap.c
@@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_ar
{
spinlock_t *old_ptl, *new_ptl;
struct mm_struct *mm = vma->vm_mm;
+ bool res = false;
pmd_t pmd;
if (!arch_supports_page_table_move())
@@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_ar
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
- /* Clear the pmd */
pmd = *old_pmd;
+
+ /* Racing with collapse? */
+ if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd)))
+ goto out_unlock;
+ /* Clear the pmd */
pmd_clear(old_pmd);
+ res = true;
VM_BUG_ON(!pmd_none(*new_pmd));
pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
+out_unlock:
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
- return true;
+ return res;
}
#else
static inline bool move_normal_pmd(struct vm_area_struct *vma,
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-enforce-a-minimal-stack-gap-even-against-inaccessible-vmas.patch
mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch