2025-01-17, 13:59:35 +0100, Antonio Quartulli wrote:
On 17/01/2025 12:48, Sabrina Dubroca wrote:
2025-01-13, 10:31:39 +0100, Antonio Quartulli wrote:
int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info) {
- return -EOPNOTSUPP;
- struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
- struct ovpn_priv *ovpn = info->user_ptr[0];
- struct ovpn_socket *ovpn_sock;
- struct socket *sock = NULL;
- struct ovpn_peer *peer;
- u32 sockfd, peer_id;
- int ret;
- /* peers can only be added when the interface is up and running */
- if (!netif_running(ovpn->dev))
return -ENETDOWN;
Since we're not under rtnl_lock here, the device could go down while we're creating this peer, and we may end up with a down device that has a peer anyway.
hmm, indeed. This means we must hold the rtnl_lock to prevent ending up in an inconsistent state.
I'm not sure what this (and the peer flushing on NETDEV_DOWN) is trying to accomplish. Is it a problem to keep peers when the netdevice is down?
This is the result of my discussion with Sergey that started in v23 5/23:
https://lore.kernel.org/r/netdev/20241029-b4-ovpn-v11-5-de4698c73a25@openvpn...
The idea was to match operational state with actual connectivity to peer(s).
Originally I wanted to simply kee the carrier always on, but after further discussion (including the meaning of the openvpn option --persist-tun) we agreed on following the logic where an UP device has a peer connected (logic is slightly different between MP and P2P).
I am not extremely happy with the resulting complexity, but it seemed to be blocker for Sergey.
[after re-reading that discussion with Sergey]
I don't understand why "admin does 'ip link set tun0 down'" means "we should get rid of all peers. For me the carrier situation goes the other way: no peer, no carrier (as if I unplugged the cable from my ethernet card), and it's independent of what the user does (ip link set XXX up/down). You have that with netif_carrier_{on,off}, but flushing peers when the admin does "ip link set tun0 down" is separate IMO.
[...]
int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) {
- return -EOPNOTSUPP;
- struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
- struct ovpn_priv *ovpn = info->user_ptr[0];
- struct ovpn_peer *peer;
- u32 peer_id;
- int ret;
- if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
return -EINVAL;
- ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
ovpn_peer_nl_policy, info->extack);
- if (ret)
return ret;
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
- peer = ovpn_peer_get_by_id(ovpn, peer_id);
- if (!peer) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot find peer with id %u", peer_id);
return -ENOENT;
- }
- netdev_dbg(ovpn->dev, "del peer %u\n", peer->id);
- ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
With the delayed socket release (which is similar to what was in v11, but now with refcounting on the netdevice which should make rtnl_link_unregister in ovpn_cleanup wait [*]), we may return to userspace as if the peer was gone, but the socket hasn't been detached yet.
A userspace application that tries to remove the peer and immediately re-create it with the same socket could get EBUSY if the workqueue hasn't done its job yet. That would be quite confusing to the application.
This may happen only for TCP, because in the UDP case we would increase the refcounter and keep the socket attached.
Not if we're re-attaching to a different ovpn instance/netdevice.
However, re-attaching the same TCP socket is hardly going to happen (in TCP we have one socket per peer, therefore if the peer is going away, we're most likely killing the socket too).
This said, the complexity added by the completion seems quite tiny, therefore I'll add the code you are suggesting below.
Ok.