1. fix recursive lock when ebpf prog return SK_PASS. 2. add selftest to reproduce recursive lock.
Note that the test code can reproduce the 'dead-lock' and if just the selftest merged without first patch, the test case will definitely fail, because the issue of deadlock is inevitable.
--- v2->v4: fix line length reported by patchwork and remove unused code. (max_line_length is set to 80 in patchwork but default is 100 in kernel tree) v1->v2: 1.inspired by martin.lau to add selftest to reproduce the issue. 2. follow the community rules for patch. v1: https://lore.kernel.org/bpf/55fc6114-7e64-4b65-86d2-92cfd1e9e92f@linux.dev/T... ---
Jiayuan Chen (2): bpf: fix recursive lock when verdict program return SK_PASS selftests/bpf: Add some tests with sockmap SK_PASS
net/core/skmsg.c | 4 +- .../selftests/bpf/prog_tests/sockmap_basic.c | 54 +++++++++++++++++++ 2 files changed, 56 insertions(+), 2 deletions(-)
When the stream_verdict program returns SK_PASS, it places the received skb into its own receive queue, but a recursive lock eventually occurs, leading to an operating system deadlock. This issue has been present since v6.9.
''' sk_psock_strp_data_ready write_lock_bh(&sk->sk_callback_lock) strp_data_ready strp_read_sock read_sock -> tcp_read_sock strp_recv cb.rcv_msg -> sk_psock_strp_read # now stream_verdict return SK_PASS without peer sock assign __SK_PASS = sk_psock_map_verd(SK_PASS, NULL) sk_psock_verdict_apply sk_psock_skb_ingress_self sk_psock_skb_ingress_enqueue sk_psock_data_ready read_lock_bh(&sk->sk_callback_lock) <= dead lock
'''
This topic has been discussed before, but it has not been fixed. Previous discussion: https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch
Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue") Reported-by: Vincent Whitchurch vincent.whitchurch@datadoghq.com Signed-off-by: Jiayuan Chen mrpre@163.com Signed-off-by: John Fastabend john.fastabend@gmail.com --- net/core/skmsg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index b1dcbd3be89e..e90fbab703b2 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1117,9 +1117,9 @@ static void sk_psock_strp_data_ready(struct sock *sk) if (tls_sw_has_ctx_rx(sk)) { psock->saved_data_ready(sk); } else { - write_lock_bh(&sk->sk_callback_lock); + read_lock_bh(&sk->sk_callback_lock); strp_data_ready(&psock->strp); - write_unlock_bh(&sk->sk_callback_lock); + read_unlock_bh(&sk->sk_callback_lock); } } rcu_read_unlock();
Add a new tests in sockmap_basic.c to test SK_PASS for sockmap
Signed-off-by: Jiayuan Chen mrpre@163.com --- .../selftests/bpf/prog_tests/sockmap_basic.c | 54 +++++++++++++++++++ 1 file changed, 54 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 82bfb266741c..a2041f8e32eb 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -501,6 +501,58 @@ static void test_sockmap_skb_verdict_shutdown(void) test_sockmap_pass_prog__destroy(skel); }
+static void test_sockmap_stream_pass(void) +{ + int zero = 0, sent, recvd; + int verdict, parser; + int err, map; + int c = -1, p = -1; + struct test_sockmap_pass_prog *pass = NULL; + char snd[256] = "0123456789"; + char rcv[256] = "0"; + + pass = test_sockmap_pass_prog__open_and_load(); + verdict = bpf_program__fd(pass->progs.prog_skb_verdict); + parser = bpf_program__fd(pass->progs.prog_skb_parser); + map = bpf_map__fd(pass->maps.sock_map_rx); + + err = bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) + goto out; + + err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) + goto out; + + err = create_pair(AF_INET, SOCK_STREAM, &c, &p); + if (err) + goto out; + + /* sk_data_ready of 'p' will be replaced by strparser handler */ + err = bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) + goto out_close; + + /* + * as 'prog_skb_parser' return the original skb len and + * 'prog_skb_verdict' return SK_PASS, the kernel will just + * pass it through to original socket 'p' + */ + sent = xsend(c, snd, sizeof(snd), 0); + ASSERT_EQ(sent, sizeof(snd), "xsend(c)"); + + recvd = recv_timeout(p, rcv, sizeof(rcv), SOCK_NONBLOCK, + IO_TIMEOUT_SEC); + ASSERT_EQ(recvd, sizeof(rcv), "recv_timeout(p)"); + +out_close: + close(c); + close(p); + +out: + test_sockmap_pass_prog__destroy(pass); +} + static void test_sockmap_skb_verdict_fionread(bool pass_prog) { int err, map, verdict, c0 = -1, c1 = -1, p0 = -1, p1 = -1; @@ -923,6 +975,8 @@ void test_sockmap_basic(void) test_sockmap_progs_query(BPF_SK_SKB_VERDICT); if (test__start_subtest("sockmap skb_verdict shutdown")) test_sockmap_skb_verdict_shutdown(); + if (test__start_subtest("sockmap stream parser and verdict pass")) + test_sockmap_stream_pass(); if (test__start_subtest("sockmap skb_verdict fionread")) test_sockmap_skb_verdict_fionread(true); if (test__start_subtest("sockmap skb_verdict fionread on drop"))
On 11/17/24 7:09 PM, Jiayuan Chen wrote:
- fix recursive lock when ebpf prog return SK_PASS.
- add selftest to reproduce recursive lock.
Note that the test code can reproduce the 'dead-lock' and if just the selftest merged without first patch, the test case will definitely fail, because the issue of deadlock is inevitable.
Acked-by: Martin KaFai Lau martin.lau@kernel.org
Jakub, please help to land it to the net tree. Thanks!
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Mon, 18 Nov 2024 11:09:08 +0800 you wrote:
- fix recursive lock when ebpf prog return SK_PASS.
- add selftest to reproduce recursive lock.
Note that the test code can reproduce the 'dead-lock' and if just the selftest merged without first patch, the test case will definitely fail, because the issue of deadlock is inevitable.
[...]
Here is the summary with links: - [bpf,v4,1/2] bpf: fix recursive lock when verdict program return SK_PASS https://git.kernel.org/netdev/net/c/8ca2a1eeadf0 - [bpf,v4,2/2] selftests/bpf: Add some tests with sockmap SK_PASS https://git.kernel.org/netdev/net/c/0c4d5cb9a1c3
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org