1. Issue Syzkaller reported this issue [1].
2. Reproduce
We can reproduce this issue by using the test_sockmap_with_close_on_write() test I provided in selftest, also you need to apply the following patch to ensure 100% reproducibility (sleep after checking sock):
''' static void sk_psock_verdict_data_ready(struct sock *sk) { ....... if (unlikely(!sock)) return; + if (!strcmp("test_progs", current->comm)) { + printk("sleep 2s to wait socket freed\n"); + mdelay(2000); + printk("sleep end\n"); + } ops = READ_ONCE(sock->ops); if (!ops || !ops->read_skb) return; } '''
Then running './test_progs -v sockmap_basic', and if the kernel has KASAN enabled [2], you will see the following warning:
''' BUG: KASAN: slab-use-after-free in sk_psock_verdict_data_ready+0x29b/0x2d0 Read of size 8 at addr ffff88813a777020 by task test_progs/47055
Tainted: [O]=OOT_MODULE Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x30/0x420 ? sk_psock_verdict_data_ready+0x29b/0x2d0 print_report+0xb7/0x270 ? sk_psock_verdict_data_ready+0x29b/0x2d0 ? kasan_addr_to_slab+0xd/0xa0 ? sk_psock_verdict_data_ready+0x29b/0x2d0 kasan_report+0xca/0x100 ? sk_psock_verdict_data_ready+0x29b/0x2d0 sk_psock_verdict_data_ready+0x29b/0x2d0 unix_stream_sendmsg+0x4a6/0xa40 ? __pfx_unix_stream_sendmsg+0x10/0x10 ? fdget+0x2c1/0x3a0 __sys_sendto+0x39c/0x410 '''
3. Reason ''' CPU0 CPU1 unix_stream_sendmsg(sk): other = unix_peer(sk) other->sk_data_ready(other): socket *sock = sk->sk_socket if (unlikely(!sock)) return; close(other): ... other->close() free(socket) READ_ONCE(sock->ops) ^ use 'sock' after free '''
For TCP, UDP, or other protocols, we have already performed rcu_read_lock() when the network stack receives packets in ip_input.c: ''' ip_local_deliver_finish(): rcu_read_lock() ip_protocol_deliver_rcu() xxx_rcv rcu_read_unlock() '''
However, for Unix sockets, sk_data_ready is called directly from the process context without rcu_read_lock() protection.
4. Solution Based on the fact that the 'struct socket' is released using call_rcu(), We add rcu_read_{un}lock() at the entrance and exit of our sk_data_ready. It will not increase performance overhead, at least for TCP and UDP, they are already in a relatively large critical section.
Of course, we can also add a custom callback for Unix sockets and call rcu_read_lock() before calling _verdict_data_ready like this: ''' if (sk_is_unix(sk)) sk->sk_data_ready = sk_psock_verdict_data_ready_rcu; else sk->sk_data_ready = sk_psock_verdict_data_ready;
sk_psock_verdict_data_ready_rcu(): rcu_read_lock() sk_psock_verdict_data_ready() rcu_read_unlock() ''' However, this will cause too many branches, and it's not suitable to distinguish network protocols in skmsg.c.
[1] https://syzkaller.appspot.com/bug?extid=dd90a702f518e0eac072 [2] https://syzkaller.appspot.com/text?tag=KernelConfig&x=1362a5aee630ff34
Jiayuan Chen (3): bpf, sockmap: avoid using sk_socket after free selftests/bpf: Add socketpair to create_pair to support unix socket selftests/bpf: Add edge case tests for sockmap
net/core/skmsg.c | 18 ++++-- .../selftests/bpf/prog_tests/socket_helpers.h | 13 ++++- .../selftests/bpf/prog_tests/sockmap_basic.c | 57 +++++++++++++++++++ 3 files changed, 82 insertions(+), 6 deletions(-)
Use RCU lock to protect sk_socket, preventing concurrent close and release by another thread.
Because TCP/UDP are already within a relatively large critical section: ''' ip_local_deliver_finish rcu_read_lock ip_protocol_deliver_rcu tcp_rcv/udp_rcv rcu_read_unlock '''
Adding rcu_read_{un}lock() at the entrance and exit of sk_data_ready will not increase performance overhead.
Reported-by: syzbot+dd90a702f518e0eac072@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/6734c033.050a0220.2a2fcc.0015.GAE@google.com/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- net/core/skmsg.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 0ddc4c718833..1b71ae1d1bf5 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1222,27 +1222,35 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
static void sk_psock_verdict_data_ready(struct sock *sk) { - struct socket *sock = sk->sk_socket; + struct socket *sock; const struct proto_ops *ops; int copied;
trace_sk_data_ready(sk);
+ /* We need RCU to prevent the sk_socket from being released. + * Especially for Unix sockets, we are currently in the process + * context and do not have RCU protection. + */ + rcu_read_lock(); + sock = sk->sk_socket; if (unlikely(!sock)) - return; + goto unlock; + ops = READ_ONCE(sock->ops); if (!ops || !ops->read_skb) - return; + goto unlock; + copied = ops->read_skb(sk, sk_psock_verdict_recv); if (copied >= 0) { struct sk_psock *psock;
- rcu_read_lock(); psock = sk_psock(sk); if (psock) sk_psock_data_ready(sk, psock); - rcu_read_unlock(); } +unlock: + rcu_read_unlock(); }
void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock)
On Wed, Feb 26, 2025 at 09:22:40PM +0800, Jiayuan Chen wrote:
Use RCU lock to protect sk_socket, preventing concurrent close and release by another thread.
Because TCP/UDP are already within a relatively large critical section: ''' ip_local_deliver_finish rcu_read_lock ip_protocol_deliver_rcu tcp_rcv/udp_rcv rcu_read_unlock '''
Adding rcu_read_{un}lock() at the entrance and exit of sk_data_ready will not increase performance overhead.
Reported-by: syzbot+dd90a702f518e0eac072@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/6734c033.050a0220.2a2fcc.0015.GAE@google.com/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
sock_def_readable() already acquires RCU read lock anyway.
Reviewed-by: Cong Wang xiyou.wangcong@gmail.com
Thanks!
On 2025-02-27 11:45:53, Cong Wang wrote:
On Wed, Feb 26, 2025 at 09:22:40PM +0800, Jiayuan Chen wrote:
Use RCU lock to protect sk_socket, preventing concurrent close and release by another thread.
Because TCP/UDP are already within a relatively large critical section: ''' ip_local_deliver_finish rcu_read_lock ip_protocol_deliver_rcu tcp_rcv/udp_rcv rcu_read_unlock '''
Adding rcu_read_{un}lock() at the entrance and exit of sk_data_ready will not increase performance overhead.
Reported-by: syzbot+dd90a702f518e0eac072@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/6734c033.050a0220.2a2fcc.0015.GAE@google.com/ Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
sock_def_readable() already acquires RCU read lock anyway.
Reviewed-by: Cong Wang xiyou.wangcong@gmail.com
Thanks.
Reviewed-by: John Fastabend john.fastabend@gmail.com
On 2/26/25 5:22 AM, Jiayuan Chen wrote:
Use RCU lock to protect sk_socket, preventing concurrent close and release by another thread.
Because TCP/UDP are already within a relatively large critical section: ''' ip_local_deliver_finish rcu_read_lock ip_protocol_deliver_rcu tcp_rcv/udp_rcv rcu_read_unlock '''
Adding rcu_read_{un}lock() at the entrance and exit of sk_data_ready will not increase performance overhead.
Can it use a Fixes tag?
On Thu, Feb 27, 2025 at 03:04:26PM -0800, Martin KaFai Lau wrote:
On 2/26/25 5:22 AM, Jiayuan Chen wrote:
Use RCU lock to protect sk_socket, preventing concurrent close and release by another thread.
Because TCP/UDP are already within a relatively large critical section: ''' ip_local_deliver_finish rcu_read_lock ip_protocol_deliver_rcu tcp_rcv/udp_rcv rcu_read_unlock '''
Adding rcu_read_{un}lock() at the entrance and exit of sk_data_ready will not increase performance overhead.
Can it use a Fixes tag?
Thanks, Martin. It seems that this issue has existed since sockmap supported unix. I'll find the corresponding commit as the Fixes tag.
Current wrapper function create_pair() is used to create a pair of connected links and returns two fds, but it does not support unix sockets.
Here we introduce socketpair() into create_pair(), which supports creating a pair of unix sockets, since the semantics of the two are the same.
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- .../selftests/bpf/prog_tests/socket_helpers.h | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/socket_helpers.h b/tools/testing/selftests/bpf/prog_tests/socket_helpers.h index 1bdfb79ef009..a805143dd84f 100644 --- a/tools/testing/selftests/bpf/prog_tests/socket_helpers.h +++ b/tools/testing/selftests/bpf/prog_tests/socket_helpers.h @@ -313,11 +313,22 @@ static inline int recv_timeout(int fd, void *buf, size_t len, int flags,
static inline int create_pair(int family, int sotype, int *p0, int *p1) { - __close_fd int s, c = -1, p = -1; + __close_fd int s = -1, c = -1, p = -1; struct sockaddr_storage addr; socklen_t len = sizeof(addr); int err;
+ if (family == AF_UNIX) { + int fds[2]; + + err = socketpair(family, sotype, 0, fds); + if (!err) { + *p0 = fds[0]; + *p1 = fds[1]; + } + return err; + } + s = socket_loopback(family, sotype); if (s < 0) return s;
On Wed, Feb 26, 2025 at 09:22:41PM +0800, Jiayuan Chen wrote:
Current wrapper function create_pair() is used to create a pair of connected links and returns two fds, but it does not support unix sockets.
Here we introduce socketpair() into create_pair(), which supports creating a pair of unix sockets, since the semantics of the two are the same.
Since it is only for UDS and only has effectively 1 line of code, how about just calling socketpair(AF_UNIX) in your patch 3/3?
Thanks!
On 2025-02-27 11:52:04, Cong Wang wrote:
On Wed, Feb 26, 2025 at 09:22:41PM +0800, Jiayuan Chen wrote:
Current wrapper function create_pair() is used to create a pair of connected links and returns two fds, but it does not support unix sockets.
Here we introduce socketpair() into create_pair(), which supports creating a pair of unix sockets, since the semantics of the two are the same.
Since it is only for UDS and only has effectively 1 line of code, how about just calling socketpair(AF_UNIX) in your patch 3/3?
If we run that test with more than AF_UNIX it might be best as is. I think there might be some value testing that flow on TCP/UDP even if its not related to the bug.
Thanks, John
On Thu, Feb 27, 2025 at 02:21:41PM -0800, John Fastabend wrote:
On 2025-02-27 11:52:04, Cong Wang wrote:
On Wed, Feb 26, 2025 at 09:22:41PM +0800, Jiayuan Chen wrote:
Current wrapper function create_pair() is used to create a pair of connected links and returns two fds, but it does not support unix sockets.
Here we introduce socketpair() into create_pair(), which supports creating a pair of unix sockets, since the semantics of the two are the same.
Since it is only for UDS and only has effectively 1 line of code, how about just calling socketpair(AF_UNIX) in your patch 3/3?
If we run that test with more than AF_UNIX it might be best as is. I think there might be some value testing that flow on TCP/UDP even if its not related to the bug.
Thanks, John
Thanks, John! I'll make selftest cover more types of sockets.
On Thu, Feb 27, 2025 at 11:52:04AM -0800, Cong Wang wrote:
On Wed, Feb 26, 2025 at 09:22:41PM +0800, Jiayuan Chen wrote:
Current wrapper function create_pair() is used to create a pair of connected links and returns two fds, but it does not support unix sockets.
Here we introduce socketpair() into create_pair(), which supports creating a pair of unix sockets, since the semantics of the two are the same.
Since it is only for UDS and only has effectively 1 line of code, how about just calling socketpair(AF_UNIX) in your patch 3/3?
Thanks!
Thanks, Cong, I'll expand selftest to cover TCP and UDP, looks like I missed doing those.
Add edge case tests for sockmap.
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev --- .../selftests/bpf/prog_tests/sockmap_basic.c | 57 +++++++++++++++++++ 1 file changed, 57 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 1e3e4392dcca..e36b5c78fc98 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -1042,6 +1042,61 @@ static void test_sockmap_vsock_unconnected(void) xclose(map); }
+void *close_thread(void *arg) +{ + int fd = *(int *)arg; + + sleep(1); + close(fd); + return NULL; +} + +void test_sockmap_with_close_on_write(int family, int sotype) +{ + struct test_sockmap_pass_prog *skel; + int err, map, verdict, sent; + pthread_t tid; + int zero = 0; + int c = -1, p = -1; + + skel = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + map = bpf_map__fd(skel->maps.sock_map_rx); + + err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach")) + goto out; + + err = create_pair(family, sotype, &c, &p); + if (!ASSERT_OK(err, "create_pair")) + goto out; + + err = bpf_map_update_elem(map, &zero, &p, BPF_ANY); + if (!ASSERT_OK(err, "bpf_map_update_elem")) + goto out; + + err = pthread_create(&tid, 0, close_thread, &p); + if (!ASSERT_OK(err, "pthread_create")) + goto out; + + sent = xsend(c, "a", 1, 0); + if (!ASSERT_EQ(sent, 1, "xsend")) + goto out; + + pthread_join(tid, NULL); + /* p was closed in thread */ + p = -1; +out: + if (c > 0) + close(c); + if (p > 0) + close(p); + test_sockmap_pass_prog__destroy(skel); +} + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -1108,4 +1163,6 @@ void test_sockmap_basic(void) test_sockmap_skb_verdict_vsock_poll(); if (test__start_subtest("sockmap vsock unconnected")) test_sockmap_vsock_unconnected(); + if (test__start_subtest("sockmap with unix and close")) + test_sockmap_with_close_on_write(AF_UNIX, SOCK_STREAM); }
On Wed, Feb 26, 2025 at 09:22:42PM +0800, Jiayuan Chen wrote:
Add edge case tests for sockmap.
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
Acked-by: Cong Wang xiyou.wangcong@gmail.com
I always love to see fixes with test cases.
Thanks!
On 2025-02-26 21:22:42, Jiayuan Chen wrote:
Add edge case tests for sockmap.
Signed-off-by: Jiayuan Chen jiayuan.chen@linux.dev
Reviewed-by: John Fastabend john.fastabend@gmail.com
.../selftests/bpf/prog_tests/sockmap_basic.c | 57> + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -1108,4 +1163,6 @@ void test_sockmap_basic(void) test_sockmap_skb_verdict_vsock_poll(); if (test__start_subtest("sockmap vsock unconnected")) test_sockmap_vsock_unconnected();
- if (test__start_subtest("sockmap with unix and close"))
test_sockmap_with_close_on_write(AF_UNIX, SOCK_STREAM);
Might also be worth running these with other types? No bug was fixed there but still testing close while writing is useful for testing RCU paths in TCP/UDP.
Fine as a follow up patch IMO.
linux-kselftest-mirror@lists.linaro.org