On 5/12/25 05:34, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> memblock: Accept allocated memory before use in memblock_double_array()
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> memblock-accept-allocated-memory-before-use-in-memblock_double_array.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
The 6.6 version of the patch needs a fixup. As mentioned in the patch
description, any release before v6.12 needs to have the accept_memory()
call changed from:
accept_memory(addr, new_alloc_size);
to
accept_memory(addr, addr + new_alloc_size);
Do you need for me to send a v6.6 specific patch?
Thanks,
Tom
>
>
> From da8bf5daa5e55a6af2b285ecda460d6454712ff4 Mon Sep 17 00:00:00 2001
> From: Tom Lendacky <thomas.lendacky(a)amd.com>
> Date: Thu, 8 May 2025 12:24:10 -0500
> Subject: memblock: Accept allocated memory before use in memblock_double_array()
>
> From: Tom Lendacky <thomas.lendacky(a)amd.com>
>
> commit da8bf5daa5e55a6af2b285ecda460d6454712ff4 upstream.
>
> When increasing the array size in memblock_double_array() and the slab
> is not yet available, a call to memblock_find_in_range() is used to
> reserve/allocate memory. However, the range returned may not have been
> accepted, which can result in a crash when booting an SNP guest:
>
> RIP: 0010:memcpy_orig+0x68/0x130
> Code: ...
> RSP: 0000:ffffffff9cc03ce8 EFLAGS: 00010006
> RAX: ff11001ff83e5000 RBX: 0000000000000000 RCX: fffffffffffff000
> RDX: 0000000000000bc0 RSI: ffffffff9dba8860 RDI: ff11001ff83e5c00
> RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000002000
> R10: 000000207fffe000 R11: 0000040000000000 R12: ffffffff9d06ef78
> R13: ff11001ff83e5000 R14: ffffffff9dba7c60 R15: 0000000000000c00
> memblock_double_array+0xff/0x310
> memblock_add_range+0x1fb/0x2f0
> memblock_reserve+0x4f/0xa0
> memblock_alloc_range_nid+0xac/0x130
> memblock_alloc_internal+0x53/0xc0
> memblock_alloc_try_nid+0x3d/0xa0
> swiotlb_init_remap+0x149/0x2f0
> mem_init+0xb/0xb0
> mm_core_init+0x8f/0x350
> start_kernel+0x17e/0x5d0
> x86_64_start_reservations+0x14/0x30
> x86_64_start_kernel+0x92/0xa0
> secondary_startup_64_no_verify+0x194/0x19b
>
> Mitigate this by calling accept_memory() on the memory range returned
> before the slab is available.
>
> Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
> parameter instead of 'start' and 'size', therefore the accept_memory()
> call must be adjusted to specify 'start + size' for 'end' when applying
> to kernels prior to v6.12.
>
> Cc: stable(a)vger.kernel.org # see patch description, needs adjustments for <= 6.11
> Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
> Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
> Link: https://lore.kernel.org/r/da1ac73bf4ded761e21b4e4bb5178382a580cd73.17467250…
> Signed-off-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> mm/memblock.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -460,7 +460,14 @@ static int __init_memblock memblock_doub
> min(new_area_start, memblock.current_limit),
> new_alloc_size, PAGE_SIZE);
>
> - new_array = addr ? __va(addr) : NULL;
> + if (addr) {
> + /* The memory may not have been accepted, yet. */
> + accept_memory(addr, new_alloc_size);
> +
> + new_array = __va(addr);
> + } else {
> + new_array = NULL;
> + }
> }
> if (!addr) {
> pr_err("memblock: Failed to double %s array from %ld to %ld entries !\n",
>
>
> Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
>
> queue-6.6/memblock-accept-allocated-memory-before-use-in-memblock_double_array.patch
From: Andrii Nakryiko <andrii(a)kernel.org>
commit 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce upstream.
BPF link for some program types is passed as a "context" which can be
used by those BPF programs to look up additional information. E.g., for
multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values.
Because of this runtime dependency, when bpf_link refcnt drops to zero
there could still be active BPF programs running accessing link data.
This patch adds generic support to defer bpf_link dealloc callback to
after RCU GP, if requested. This is done by exposing two different
deallocation callbacks, one synchronous and one deferred. If deferred
one is provided, bpf_link_free() will schedule dealloc_deferred()
callback to happen after RCU GP.
BPF is using two flavors of RCU: "classic" non-sleepable one and RCU
tasks trace one. The latter is used when sleepable BPF programs are
used. bpf_link_free() accommodates that by checking underlying BPF
program's sleepable flag, and goes either through normal RCU GP only for
non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP
(taking into account rcu_trace_implies_rcu_gp() optimization), if BPF
program is sleepable.
We use this for multi-kprobe and multi-uprobe links, which dereference
link during program run. We also preventively switch raw_tp link to use
deferred dealloc callback, as upcoming changes in bpf-next tree expose
raw_tp link data (specifically, cookie value) to BPF program at runtime
as well.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Reported-by: syzbot+981935d9485a560bfbcb(a)syzkaller.appspotmail.com
Reported-by: syzbot+2cb5a6c573e98db598cc(a)syzkaller.appspotmail.com
Reported-by: syzbot+62d8b26793e8a2bd0516(a)syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Acked-by: Jiri Olsa <jolsa(a)kernel.org>
Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
[fixed conflicts due to missing commits 89ae89f53d20
("bpf: Add multi uprobe link")]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
include/linux/bpf.h | 16 +++++++++++++++-
kernel/bpf/syscall.c | 35 ++++++++++++++++++++++++++++++++---
kernel/trace/bpf_trace.c | 2 +-
3 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e9c1338851e3..1cf8c7037289 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1293,12 +1293,26 @@ struct bpf_link {
enum bpf_link_type type;
const struct bpf_link_ops *ops;
struct bpf_prog *prog;
- struct work_struct work;
+ /* rcu is used before freeing, work can be used to schedule that
+ * RCU-based freeing before that, so they never overlap
+ */
+ union {
+ struct rcu_head rcu;
+ struct work_struct work;
+ };
};
struct bpf_link_ops {
void (*release)(struct bpf_link *link);
+ /* deallocate link resources callback, called without RCU grace period
+ * waiting
+ */
void (*dealloc)(struct bpf_link *link);
+ /* deallocate link resources callback, called after RCU grace period;
+ * if underlying BPF program is sleepable we go through tasks trace
+ * RCU GP and then "classic" RCU GP
+ */
+ void (*dealloc_deferred)(struct bpf_link *link);
int (*detach)(struct bpf_link *link);
int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
struct bpf_prog *old_prog);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 27fdf1b2fc46..1cc9b28b065a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2750,17 +2750,46 @@ void bpf_link_inc(struct bpf_link *link)
atomic64_inc(&link->refcnt);
}
+static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu)
+{
+ struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
+
+ /* free bpf_link and its containing memory */
+ link->ops->dealloc_deferred(link);
+}
+
+static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
+{
+ if (rcu_trace_implies_rcu_gp())
+ bpf_link_defer_dealloc_rcu_gp(rcu);
+ else
+ call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp);
+}
+
/* bpf_link_free is guaranteed to be called from process context */
static void bpf_link_free(struct bpf_link *link)
{
+ bool sleepable = false;
+
bpf_link_free_id(link->id);
if (link->prog) {
+ sleepable = link->prog->aux->sleepable;
/* detach BPF program, clean up used resources */
link->ops->release(link);
bpf_prog_put(link->prog);
}
- /* free bpf_link and its containing memory */
- link->ops->dealloc(link);
+ if (link->ops->dealloc_deferred) {
+ /* schedule BPF link deallocation; if underlying BPF program
+ * is sleepable, we need to first wait for RCU tasks trace
+ * sync, then go through "classic" RCU grace period
+ */
+ if (sleepable)
+ call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
+ else
+ call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
+ }
+ if (link->ops->dealloc)
+ link->ops->dealloc(link);
}
static void bpf_link_put_deferred(struct work_struct *work)
@@ -3246,7 +3275,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_raw_tp_link_lops = {
.release = bpf_raw_tp_link_release,
- .dealloc = bpf_raw_tp_link_dealloc,
+ .dealloc_deferred = bpf_raw_tp_link_dealloc,
.show_fdinfo = bpf_raw_tp_link_show_fdinfo,
.fill_link_info = bpf_raw_tp_link_fill_link_info,
};
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7254c808b27c..989b6843069e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2564,7 +2564,7 @@ static void bpf_kprobe_multi_link_dealloc(struct bpf_link *link)
static const struct bpf_link_ops bpf_kprobe_multi_link_lops = {
.release = bpf_kprobe_multi_link_release,
- .dealloc = bpf_kprobe_multi_link_dealloc,
+ .dealloc_deferred = bpf_kprobe_multi_link_dealloc,
};
static void bpf_kprobe_multi_cookie_swap(void *a, void *b, int size, const void *priv)
--
2.34.1
The patch below does not apply to the 6.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y
git checkout FETCH_HEAD
git cherry-pick -x a39f3087092716f2bd531d6fdc20403c3dc2a879
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051210-devourer-antarctic-1ed3@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a39f3087092716f2bd531d6fdc20403c3dc2a879 Mon Sep 17 00:00:00 2001
From: Miguel Ojeda <ojeda(a)kernel.org>
Date: Fri, 2 May 2025 16:02:34 +0200
Subject: [PATCH] rust: allow Rust 1.87.0's `clippy::ptr_eq` lint
Starting with Rust 1.87.0 (expected 2025-05-15) [1], Clippy may expand
the `ptr_eq` lint, e.g.:
error: use `core::ptr::eq` when comparing raw pointers
--> rust/kernel/list.rs:438:12
|
438 | if self.first == item {
| ^^^^^^^^^^^^^^^^^^ help: try: `core::ptr::eq(self.first, item)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_eq
= note: `-D clippy::ptr-eq` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::ptr_eq)]`
It is expected that a PR to relax the lint will be backported [2] by
the time Rust 1.87.0 releases, since the lint was considered too eager
(at least by default) [3].
Thus allow the lint temporarily just in case.
Cc: stable(a)vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust-clippy/pull/14339 [1]
Link: https://github.com/rust-lang/rust-clippy/pull/14526 [2]
Link: https://github.com/rust-lang/rust-clippy/issues/14525 [3]
Link: https://lore.kernel.org/r/20250502140237.1659624-3-ojeda@kernel.org
[ Converted to `allow`s since backport was confirmed. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/rust/kernel/alloc/kvec.rs b/rust/kernel/alloc/kvec.rs
index ae9d072741ce..87a71fd40c3c 100644
--- a/rust/kernel/alloc/kvec.rs
+++ b/rust/kernel/alloc/kvec.rs
@@ -2,6 +2,9 @@
//! Implementation of [`Vec`].
+// May not be needed in Rust 1.87.0 (pending beta backport).
+#![allow(clippy::ptr_eq)]
+
use super::{
allocator::{KVmalloc, Kmalloc, Vmalloc},
layout::ArrayLayout,
diff --git a/rust/kernel/list.rs b/rust/kernel/list.rs
index a335c3b1ff5e..2054682c5724 100644
--- a/rust/kernel/list.rs
+++ b/rust/kernel/list.rs
@@ -4,6 +4,9 @@
//! A linked list implementation.
+// May not be needed in Rust 1.87.0 (pending beta backport).
+#![allow(clippy::ptr_eq)]
+
use crate::sync::ArcBorrow;
use crate::types::Opaque;
use core::iter::{DoubleEndedIterator, FusedIterator};
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit 52f671db18823089a02f07efc04efdb2272ddc17 ]
The test Davide added in commit ca22da2fbd69 ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.
The problem as previously described by Davide (see Link) is that
if we reverse flow of traffic with the redirect (egress -> ingress)
we may reach the same socket which generated the packet. And we may
still be holding its socket lock. The common solution to such deadlocks
is to put the packet in the Rx backlog, rather than run the Rx path
inline. Do that for all egress -> ingress reversals, not just once
we started to nest mirred calls.
In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.
Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
Cc: Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
Suggested-by: Davide Caratti <dcaratti(a)redhat.com>
Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.166…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Acked-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
net/sched/act_mirred.c | 14 +++++---------
.../testing/selftests/net/forwarding/tc_actions.sh | 3 ---
2 files changed, 5 insertions(+), 12 deletions(-)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index bbc34987bd09..896bffd50aa8 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -205,18 +205,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
return err;
}
-static bool is_mirred_nested(void)
-{
- return unlikely(__this_cpu_read(mirred_nest_level) > 1);
-}
-
-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int
+tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
{
int err;
if (!want_ingress)
err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
- else if (is_mirred_nested())
+ else if (!at_ingress)
err = netif_rx(skb);
else
err = netif_receive_skb(skb);
@@ -312,7 +308,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
/* let's the caller reinsert the packet, if possible */
if (use_reinsert) {
- err = tcf_mirred_forward(want_ingress, skb);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb);
if (err)
tcf_action_inc_overlimit_qstats(&m->common);
__this_cpu_dec(mirred_nest_level);
@@ -320,7 +316,7 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
}
}
- err = tcf_mirred_forward(want_ingress, skb2);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb2);
if (err)
tcf_action_inc_overlimit_qstats(&m->common);
__this_cpu_dec(mirred_nest_level);
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index b0f5e55d2d0b..589629636502 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
check_err $? "didn't mirred redirect ICMP"
tc_check_packets "dev $h1 ingress" 102 10
check_err $? "didn't drop mirred ICMP"
- local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
- test ${overlimits} = 10
- check_err $? "wrong overlimits, expected 10 got ${overlimits}"
tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
--
2.34.1