From: Ignat Korchagin ignat@cloudflare.com Date: Wed, 12 Jun 2024 14:22:36 -0400
And curious if bpf_get_socket_cookie() can be called any socket family to trigger the splat. e.g. ieee802154_create() seems to have the same bug.
Just judging from the code - yes, indeed.
If so, how about clearing sock->sk in sk_common_release() ?
This was my first thought, but I was a bit put off by the fact that sk_common_release() is called from many places and the sk object itself is reference counted. So not every call to sk_common_release() seems to actually free the sk object.
sk_common_release() is called
1. when we fail to create a socket (socket() or accept() syscall) 2. when we release the last refcount of the socket's file descriptor (basically close() syscall)
The issue only happens at 1. because we clear sock->sk at 2. in __sock_release() after calling sock->ops->release().
So, we need not take care of these callers of sk_common_release().
- inet_release - ->close() - udp_lib_close - ping_close - raw_close - rawv6_close - l2tp_ip_close - l2tp_ip6_close - sctp_close - ieee802154_sock_release - ->close() - raw_close - dgram_close - mctp_release - ->close() - mctp_sk_close - pn_socket_release - ->close() - pn_sock_close - pep_sock_close
Then, the rest of the callers are:
- __sock_create - pf->create() - inet_create - inet6_create - ieee802154_create - smc_create - __smc_create
- setsockopt(TCP_ULP) - smc_ulp_init - __smc_create
- sctp_accept - sctp_v4_create_accept_sk - sctp_v6_create_accept_sk
we need not care about sctp_v[46]_create_accept_sk() because they don't set sock->sk for the socket; we don't pass sock to sock_init_data(NULL, newsk) before calling sk_common_release().
__sock_create() path and SMC's ULP path have the same issue, and sk_common_release() releases the last refcount of struct sock there.
So, I think we can set NULL to sock->sk in sk_common_release().
Secondly, I was put off by this comment (which I don't fully understand TBH) [1]
On the other hand - in inet/inet6_create() we definitely know that the object would be freed, because we just created that.
But if someone more familiar with the code confirms it is better/possible to do in sk_common_release(), I'm happy to adjust and it would be cleaner indeed.
---8<--- diff --git a/net/core/sock.c b/net/core/sock.c index 8629f9aecf91..bbc94954d9bf 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3754,6 +3754,9 @@ void sk_common_release(struct sock *sk) * until the last reference will be released. */
if (sk->sk_socket)
sk->sk_socket->sk = NULL;
sock_orphan(sk); xfrm_sk_free_policy(sk);
---8<---