2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
On 15.11.2024 17:02, Antonio Quartulli wrote:
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...]
+ skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb);
+ memset(skb->cb, 0, sizeof(skb->cb));
Why do we need to zero the control buffer here?
To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well.
AFAIR, there is no convention to clean the control buffer before the handing over. The common practice is a bit opposite, programmer shall not assume that the control buffer has been zeroed.
Not a big deal to clean it here, we just can save some CPU cycles avoiding it.
I think this was recommended by Sabrina as well.
Curious. It's macsec that does not zero it, or I've not understood how it was done.
I only remember discussing a case [1] where one function within ovpn was expecting a cleared skb->cb to behave correctly but the caller did not clear it. In general, as you said, clearing cb "to be nice to other layers" is not expected. Sorry if some comments I made were confusing.
[1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock;
+ if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL;
+ ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL;
+ /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL;
+ return ovpn_sock->ovpn;
Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption?
Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0?
Yeah, you are right. The question sounds like a riddle. I should eventually stop composing emails at midnight. Let me paraphrase it.
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point
It's more the synchronize_net in unregister_netdevice_many_notify? rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu waits for rcu_read_lock sections (see the comments for rcu_barrier and synchronize_rcu in kernel/rcu/tree.c).
free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
- flush the workqueue somewhere before the netdev release
- set ovpn_sock->ovpn = NULL before scheduling the socket detach
Going with #2, we could fully split detach into a synchronous part and async part (with async not needed for UDP). detach_sync clears the pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so that no more packets will be sent through the ovpn driver.
Related to that topic, I'm not sure what's keeping a reference on the peer to guarantee it doesn't get freed before we're done with peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this tcp stuff should move from the peer to ovpn_socket?