Notable changes from v10: * extended commit message of 23/23 with brief description of the output * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net
Please note that some patches were already reviewed by Andre Lunn, Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag since no major code modification has happened since the review.
The latest code can also be found at:
https://github.com/OpenVPN/linux-kernel-ovpn
Thanks a lot! Best Regards,
Antonio Quartulli OpenVPN Inc.
--- Antonio Quartulli (23): netlink: add NLA_POLICY_MAX_LEN macro net: introduce OpenVPN Data Channel Offload (ovpn) ovpn: add basic netlink support ovpn: add basic interface creation/destruction/management routines ovpn: keep carrier always on ovpn: introduce the ovpn_peer object ovpn: introduce the ovpn_socket object ovpn: implement basic TX path (UDP) ovpn: implement basic RX path (UDP) ovpn: implement packet processing ovpn: store tunnel and transport statistics ovpn: implement TCP transport ovpn: implement multi-peer support ovpn: implement peer lookup logic ovpn: implement keepalive mechanism ovpn: add support for updating local UDP endpoint ovpn: add support for peer floating ovpn: implement peer add/get/dump/delete via netlink ovpn: implement key add/get/del/swap via netlink ovpn: kill key and notify userspace in case of IV exhaustion ovpn: notify userspace when a peer is deleted ovpn: add basic ethtool support testing/selftests: add test tool and scripts for ovpn module
Documentation/netlink/specs/ovpn.yaml | 362 +++ MAINTAINERS | 11 + drivers/net/Kconfig | 14 + drivers/net/Makefile | 1 + drivers/net/ovpn/Makefile | 22 + drivers/net/ovpn/bind.c | 54 + drivers/net/ovpn/bind.h | 117 + drivers/net/ovpn/crypto.c | 214 ++ drivers/net/ovpn/crypto.h | 145 ++ drivers/net/ovpn/crypto_aead.c | 386 ++++ drivers/net/ovpn/crypto_aead.h | 33 + drivers/net/ovpn/io.c | 462 ++++ drivers/net/ovpn/io.h | 25 + drivers/net/ovpn/main.c | 337 +++ drivers/net/ovpn/main.h | 24 + drivers/net/ovpn/netlink-gen.c | 212 ++ drivers/net/ovpn/netlink-gen.h | 41 + drivers/net/ovpn/netlink.c | 1135 ++++++++++ drivers/net/ovpn/netlink.h | 18 + drivers/net/ovpn/ovpnstruct.h | 61 + drivers/net/ovpn/packet.h | 40 + drivers/net/ovpn/peer.c | 1201 ++++++++++ drivers/net/ovpn/peer.h | 165 ++ drivers/net/ovpn/pktid.c | 130 ++ drivers/net/ovpn/pktid.h | 87 + drivers/net/ovpn/proto.h | 104 + drivers/net/ovpn/skb.h | 56 + drivers/net/ovpn/socket.c | 178 ++ drivers/net/ovpn/socket.h | 55 + drivers/net/ovpn/stats.c | 21 + drivers/net/ovpn/stats.h | 47 + drivers/net/ovpn/tcp.c | 506 +++++ drivers/net/ovpn/tcp.h | 44 + drivers/net/ovpn/udp.c | 406 ++++ drivers/net/ovpn/udp.h | 26 + include/net/netlink.h | 1 + include/uapi/linux/if_link.h | 15 + include/uapi/linux/ovpn.h | 109 + include/uapi/linux/udp.h | 1 + tools/net/ynl/ynl-gen-c.py | 4 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ovpn/.gitignore | 2 + tools/testing/selftests/net/ovpn/Makefile | 17 + tools/testing/selftests/net/ovpn/config | 10 + tools/testing/selftests/net/ovpn/data64.key | 5 + tools/testing/selftests/net/ovpn/ovpn-cli.c | 2370 ++++++++++++++++++++ tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 + .../testing/selftests/net/ovpn/test-chachapoly.sh | 9 + tools/testing/selftests/net/ovpn/test-float.sh | 9 + tools/testing/selftests/net/ovpn/test-tcp.sh | 9 + tools/testing/selftests/net/ovpn/test.sh | 183 ++ tools/testing/selftests/net/ovpn/udp_peers.txt | 5 + 52 files changed, 9494 insertions(+), 1 deletion(-) --- base-commit: ab101c553bc1f76a839163d1dc0d1e715ad6bb4e change-id: 20241002-b4-ovpn-eeee35c694a2
Best regards,
Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy with a maximum length value.
The netlink generator for YAML specs has been extended accordingly.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Donald Hunter donald.hunter@gmail.com --- include/net/netlink.h | 1 + tools/net/ynl/ynl-gen-c.py | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/net/netlink.h b/include/net/netlink.h index db6af207287c839408c58cb28b82408e0548eaca..2dc671c977ff3297975269d236264907009703d3 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -469,6 +469,7 @@ struct nla_policy { .max = _len \ } #define NLA_POLICY_MIN_LEN(_len) NLA_POLICY_MIN(NLA_BINARY, _len) +#define NLA_POLICY_MAX_LEN(_len) NLA_POLICY_MAX(NLA_BINARY, _len)
/** * struct nl_info - netlink source information diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py index 1a825b4081b222cf97eb73f01a2a5c1ffe47cd5c..aa22eb0924754f38ea0b9e68a1ff5a55d94d6717 100755 --- a/tools/net/ynl/ynl-gen-c.py +++ b/tools/net/ynl/ynl-gen-c.py @@ -481,7 +481,7 @@ class TypeBinary(Type): pass elif len(self.checks) == 1: check_name = list(self.checks)[0] - if check_name not in {'exact-len', 'min-len'}: + if check_name not in {'exact-len', 'min-len', 'max-len'}: raise Exception('Unsupported check for binary type: ' + check_name) else: raise Exception('More than one check for binary type not implemented, yet') @@ -492,6 +492,8 @@ class TypeBinary(Type): mem = 'NLA_POLICY_EXACT_LEN(' + self.get_limit_str('exact-len') + ')' elif 'min-len' in self.checks: mem = '{ .len = ' + self.get_limit_str('min-len') + ', }' + elif 'max-len' in self.checks: + mem = 'NLA_POLICY_MAX_LEN(' + self.get_limit_str('max-len') + ')'
return mem
OpenVPN is a userspace software existing since around 2005 that allows users to create secure tunnels.
So far OpenVPN has implemented all operations in userspace, which implies several back and forth between kernel and user land in order to process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..).
With `ovpn` we intend to move the fast path (data channel) entirely in kernel space and thus improve user measured throughput over the tunnel.
`ovpn` is implemented as a simple virtual network device driver, that can be manipulated by means of the standard RTNL APIs. A device of kind `ovpn` allows only IPv4/6 traffic and can be of type: * P2P (peer-to-peer): any packet sent over the interface will be encapsulated and transmitted to the other side (typical OpenVPN client or peer-to-peer behaviour); * P2MP (point-to-multipoint): packets sent over the interface are transmitted to peers based on existing routes (typical OpenVPN server behaviour).
After the interface has been created, OpenVPN in userspace can configure it using a new Netlink API. Specifically it is possible to manage peers and their keys.
The OpenVPN control channel is multiplexed over the same transport socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN OP code for data traffic) is sent to userspace and handled there. This way the `ovpn` codebase is kept as compact as possible while focusing on handling data traffic only (fast path).
Any OpenVPN control feature (like cipher negotiation, TLS handshake, rekeying, etc.) is still fully handled by the userspace process.
When userspace establishes a new connection with a peer, it first performs the handshake and then passes the socket to the `ovpn` kernel module, which takes ownership. From this moment on `ovpn` will handle data traffic for the new peer. When control packets are received on the link, they are forwarded to userspace through the same transport socket they were received on, as userspace is still listening to them.
Some events (like peer deletion) are sent to a Netlink multicast group.
Although it wasn't easy to convince the community, `ovpn` implements only a limited number of the data-channel features supported by the userspace program.
Each feature that made it to `ovpn` was attentively vetted to avoid carrying too much legacy along with us (and to give a clear cut to old and probalby-not-so-useful features).
Notably, only encryption using AEAD ciphers (specifically ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other cipher out there was not deemed useful.
Both UDP and TCP sockets ae supported.
As explained above, in case of P2MP mode, OpenVPN will use the main system routing table to decide which packet goes to which peer. This implies that no routing table was re-implemented in the `ovpn` kernel module.
This kernel module can be enabled by selecting the CONFIG_OVPN entry in the networking drivers section.
NOTE: this first patch introduces the very basic framework only. Features are then added patch by patch, however, although each patch will compile and possibly not break at runtime, only after having applied the full set it is expected to see the ovpn module fully working.
Cc: steffen.klassert@secunet.com Cc: antony.antony@secunet.com Signed-off-by: Antonio Quartulli antonio@openvpn.net --- MAINTAINERS | 8 ++++ drivers/net/Kconfig | 13 ++++++ drivers/net/Makefile | 1 + drivers/net/ovpn/Makefile | 11 +++++ drivers/net/ovpn/io.c | 22 +++++++++ drivers/net/ovpn/io.h | 15 ++++++ drivers/net/ovpn/main.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/main.h | 15 ++++++ include/uapi/linux/udp.h | 1 + 9 files changed, 202 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index f39ab140710f16b1245924bfe381cd64d499ff8a..09e193bbc218d74846cbae26f80ada3e04c3692a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17286,6 +17286,14 @@ F: arch/openrisc/ F: drivers/irqchip/irq-ompic.c F: drivers/irqchip/irq-or1k-*
+OPENVPN DATA CHANNEL OFFLOAD +M: Antonio Quartulli antonio@openvpn.net +L: openvpn-devel@lists.sourceforge.net (moderated for non-subscribers) +L: netdev@vger.kernel.org +S: Supported +T: git https://github.com/OpenVPN/linux-kernel-ovpn.git +F: drivers/net/ovpn/ + OPENVSWITCH M: Pravin B Shelar pshelar@ovn.org L: netdev@vger.kernel.org diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 1fd5acdc73c6af0e1a861867039c3624fc618e25..269b73fcfd348a48174fb96b8f8d4f8788636fa8 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -115,6 +115,19 @@ config WIREGUARD_DEBUG
Say N here unless you know what you're doing.
+config OVPN + tristate "OpenVPN data channel offload" + depends on NET && INET + select NET_UDP_TUNNEL + select DST_CACHE + select CRYPTO + select CRYPTO_AES + select CRYPTO_GCM + select CRYPTO_CHACHA20POLY1305 + help + This module enhances the performance of the OpenVPN userspace software + by offloading the data channel processing to kernelspace. + config EQUALIZER tristate "EQL (serial line load balancing) support" help diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 13743d0e83b5fde479e9b30ad736be402d880dee..5152b3330e28da7eaec821018a26c973bb33ce0c 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -11,6 +11,7 @@ obj-$(CONFIG_IPVLAN) += ipvlan/ obj-$(CONFIG_IPVTAP) += ipvlan/ obj-$(CONFIG_DUMMY) += dummy.o obj-$(CONFIG_WIREGUARD) += wireguard/ +obj-$(CONFIG_OVPN) += ovpn/ obj-$(CONFIG_EQUALIZER) += eql.o obj-$(CONFIG_IFB) += ifb.o obj-$(CONFIG_MACSEC) += macsec.o diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..53fb197027d787d6683e9056d3d341abf6ed38e4 --- /dev/null +++ b/drivers/net/ovpn/Makefile @@ -0,0 +1,11 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# ovpn -- OpenVPN data channel offload in kernel space +# +# Copyright (C) 2020-2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli antonio@openvpn.net + +obj-$(CONFIG_OVPN) := ovpn.o +ovpn-y += main.o +ovpn-y += io.o diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c new file mode 100644 index 0000000000000000000000000000000000000000..ad3813419c33cbdfe7e8ad6f5c8b444a3540a69f --- /dev/null +++ b/drivers/net/ovpn/io.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/netdevice.h> +#include <linux/skbuff.h> + +#include "io.h" + +/* Send user data to the network + */ +netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) +{ + skb_tx_error(skb); + kfree_skb(skb); + return NET_XMIT_DROP; +} diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h new file mode 100644 index 0000000000000000000000000000000000000000..aa259be66441f7b0262f39da12d6c3dce0a9b24c --- /dev/null +++ b/drivers/net/ovpn/io.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_OVPN_H_ +#define _NET_OVPN_OVPN_H_ + +netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev); + +#endif /* _NET_OVPN_OVPN_H_ */ diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c new file mode 100644 index 0000000000000000000000000000000000000000..369a5a2b2fc1a497c8444e59f9b058eb40e49524 --- /dev/null +++ b/drivers/net/ovpn/main.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + * James Yonan james@openvpn.net + */ + +#include <linux/module.h> +#include <linux/netdevice.h> +#include <net/rtnetlink.h> + +#include "main.h" +#include "io.h" + +/* Driver info */ +#define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" +#define DRV_COPYRIGHT "(C) 2020-2024 OpenVPN, Inc." + +/** + * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn' + * @dev: the interface to check + * + * Return: whether the netdevice is of type 'ovpn' + */ +bool ovpn_dev_is_valid(const struct net_device *dev) +{ + return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; +} + +static int ovpn_newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) +{ + return -EOPNOTSUPP; +} + +static struct rtnl_link_ops ovpn_link_ops = { + .kind = "ovpn", + .netns_refund = false, + .newlink = ovpn_newlink, + .dellink = unregister_netdevice_queue, +}; + +static int ovpn_netdev_notifier_call(struct notifier_block *nb, + unsigned long state, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + + if (!ovpn_dev_is_valid(dev)) + return NOTIFY_DONE; + + switch (state) { + case NETDEV_REGISTER: + /* add device to internal list for later destruction upon + * unregistration + */ + break; + case NETDEV_UNREGISTER: + /* can be delivered multiple times, so check registered flag, + * then destroy the interface + */ + break; + case NETDEV_POST_INIT: + case NETDEV_GOING_DOWN: + case NETDEV_DOWN: + case NETDEV_UP: + case NETDEV_PRE_UP: + default: + return NOTIFY_DONE; + } + + return NOTIFY_OK; +} + +static struct notifier_block ovpn_netdev_notifier = { + .notifier_call = ovpn_netdev_notifier_call, +}; + +static int __init ovpn_init(void) +{ + int err = register_netdevice_notifier(&ovpn_netdev_notifier); + + if (err) { + pr_err("ovpn: can't register netdevice notifier: %d\n", err); + return err; + } + + err = rtnl_link_register(&ovpn_link_ops); + if (err) { + pr_err("ovpn: can't register rtnl link ops: %d\n", err); + goto unreg_netdev; + } + + return 0; + +unreg_netdev: + unregister_netdevice_notifier(&ovpn_netdev_notifier); + return err; +} + +static __exit void ovpn_cleanup(void) +{ + rtnl_link_unregister(&ovpn_link_ops); + unregister_netdevice_notifier(&ovpn_netdev_notifier); + + rcu_barrier(); +} + +module_init(ovpn_init); +module_exit(ovpn_cleanup); + +MODULE_DESCRIPTION(DRV_DESCRIPTION); +MODULE_AUTHOR(DRV_COPYRIGHT); +MODULE_LICENSE("GPL"); diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h new file mode 100644 index 0000000000000000000000000000000000000000..a3215316c49bfcdf2496590bac878f145b8b27fd --- /dev/null +++ b/drivers/net/ovpn/main.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_MAIN_H_ +#define _NET_OVPN_MAIN_H_ + +bool ovpn_dev_is_valid(const struct net_device *dev); + +#endif /* _NET_OVPN_MAIN_H_ */ diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h index d85d671deed3c78f6969189281b9083dcac000c6..edca3e430305a6bffc34e617421f1f3071582e69 100644 --- a/include/uapi/linux/udp.h +++ b/include/uapi/linux/udp.h @@ -43,5 +43,6 @@ struct udphdr { #define UDP_ENCAP_GTP1U 5 /* 3GPP TS 29.060 */ #define UDP_ENCAP_RXRPC 6 #define TCP_ENCAP_ESPINTCP 7 /* Yikes, this is really xfrm encap types. */ +#define UDP_ENCAP_OVPNINUDP 8 /* OpenVPN traffic */
#endif /* _UAPI_LINUX_UDP_H */
This commit introduces basic netlink support with family registration/unregistration functionalities and stub pre/post-doit.
More importantly it introduces the YAML uAPI description along with its auto-generated files: - include/uapi/linux/ovpn.h - drivers/net/ovpn/netlink-gen.c - drivers/net/ovpn/netlink-gen.h
Cc: donald.hunter@gmail.com Signed-off-by: Antonio Quartulli antonio@openvpn.net --- Documentation/netlink/specs/ovpn.yaml | 362 ++++++++++++++++++++++++++++++++++ MAINTAINERS | 2 + drivers/net/ovpn/Makefile | 2 + drivers/net/ovpn/main.c | 15 +- drivers/net/ovpn/netlink-gen.c | 212 ++++++++++++++++++++ drivers/net/ovpn/netlink-gen.h | 41 ++++ drivers/net/ovpn/netlink.c | 157 +++++++++++++++ drivers/net/ovpn/netlink.h | 15 ++ drivers/net/ovpn/ovpnstruct.h | 25 +++ include/uapi/linux/ovpn.h | 109 ++++++++++ 10 files changed, 939 insertions(+), 1 deletion(-)
diff --git a/Documentation/netlink/specs/ovpn.yaml b/Documentation/netlink/specs/ovpn.yaml new file mode 100644 index 0000000000000000000000000000000000000000..79339c25d607f1b5d15a0a973f6fc23637e158a2 --- /dev/null +++ b/Documentation/netlink/specs/ovpn.yaml @@ -0,0 +1,362 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) +# +# Author: Antonio Quartulli antonio@openvpn.net +# +# Copyright (c) 2024, OpenVPN Inc. +# + +name: ovpn + +protocol: genetlink + +doc: Netlink protocol to control OpenVPN network devices + +definitions: + - + type: const + name: nonce-tail-size + value: 8 + - + type: enum + name: cipher-alg + entries: [ none, aes-gcm, chacha20-poly1305 ] + - + type: enum + name: del-peer-reason + entries: [ teardown, userspace, expired, transport-error, transport-disconnect ] + - + type: enum + name: key-slot + entries: [ primary, secondary ] + +attribute-sets: + - + name: peer + attributes: + - + name: id + type: u32 + doc: | + The unique ID of the peer. To be used to identify peers during + operations + checks: + max: 0xFFFFFF + - + name: remote-ipv4 + type: u32 + doc: The remote IPv4 address of the peer + byte-order: big-endian + display-hint: ipv4 + - + name: remote-ipv6 + type: binary + doc: The remote IPv6 address of the peer + display-hint: ipv6 + checks: + exact-len: 16 + - + name: remote-ipv6-scope-id + type: u32 + doc: The scope id of the remote IPv6 address of the peer (RFC2553) + - + name: remote-port + type: u16 + doc: The remote port of the peer + byte-order: big-endian + checks: + min: 1 + - + name: socket + type: u32 + doc: The socket to be used to communicate with the peer + - + name: vpn-ipv4 + type: u32 + doc: The IPv4 address assigned to the peer by the server + byte-order: big-endian + display-hint: ipv4 + - + name: vpn-ipv6 + type: binary + doc: The IPv6 address assigned to the peer by the server + display-hint: ipv6 + checks: + exact-len: 16 + - + name: local-ipv4 + type: u32 + doc: The local IPv4 to be used to send packets to the peer (UDP only) + byte-order: big-endian + display-hint: ipv4 + - + name: local-ipv6 + type: binary + doc: The local IPv6 to be used to send packets to the peer (UDP only) + display-hint: ipv6 + checks: + exact-len: 16 + - + name: local-port + type: u16 + doc: The local port to be used to send packets to the peer (UDP only) + byte-order: big-endian + checks: + min: 1 + - + name: keepalive-interval + type: u32 + doc: | + The number of seconds after which a keep alive message is sent to the + peer + - + name: keepalive-timeout + type: u32 + doc: | + The number of seconds from the last activity after which the peer is + assumed dead + - + name: del-reason + type: u32 + doc: The reason why a peer was deleted + enum: del-peer-reason + - + name: vpn-rx-bytes + type: uint + doc: Number of bytes received over the tunnel + - + name: vpn-tx-bytes + type: uint + doc: Number of bytes transmitted over the tunnel + - + name: vpn-rx-packets + type: uint + doc: Number of packets received over the tunnel + - + name: vpn-tx-packets + type: uint + doc: Number of packets transmitted over the tunnel + - + name: link-rx-bytes + type: uint + doc: Number of bytes received at the transport level + - + name: link-tx-bytes + type: uint + doc: Number of bytes transmitted at the transport level + - + name: link-rx-packets + type: u32 + doc: Number of packets received at the transport level + - + name: link-tx-packets + type: u32 + doc: Number of packets transmitted at the transport level + - + name: keyconf + attributes: + - + name: peer-id + type: u32 + doc: | + The unique ID of the peer. To be used to identify peers during + key operations + checks: + max: 0xFFFFFF + - + name: slot + type: u32 + doc: The slot where the key should be stored + enum: key-slot + - + name: key-id + doc: | + The unique ID of the key. Used to fetch the correct key upon + decryption + type: u32 + checks: + max: 7 + - + name: cipher-alg + type: u32 + doc: The cipher to be used when communicating with the peer + enum: cipher-alg + - + name: encrypt-dir + type: nest + doc: Key material for encrypt direction + nested-attributes: keydir + - + name: decrypt-dir + type: nest + doc: Key material for decrypt direction + nested-attributes: keydir + - + name: keydir + attributes: + - + name: cipher-key + type: binary + doc: The actual key to be used by the cipher + checks: + max-len: 256 + - + name: nonce-tail + type: binary + doc: | + Random nonce to be concatenated to the packet ID, in order to + obtain the actual cipher IV + checks: + exact-len: nonce-tail-size + - + name: ovpn + attributes: + - + name: ifindex + type: u32 + doc: Index of the ovpn interface to operate on + - + name: ifname + type: string + doc: Name of the ovpn interface + - + name: peer + type: nest + doc: | + The peer object containing the attributed of interest for the specific + operation + nested-attributes: peer + - + name: keyconf + type: nest + doc: Peer specific cipher configuration + nested-attributes: keyconf + +operations: + list: + - + name: peer-new + attribute-set: ovpn + flags: [ admin-perm ] + doc: Add a remote peer + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - peer + - + name: peer-set + attribute-set: ovpn + flags: [ admin-perm ] + doc: modify a remote peer + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - peer + - + name: peer-get + attribute-set: ovpn + flags: [ admin-perm ] + doc: Retrieve data about existing remote peers (or a specific one) + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - peer + reply: + attributes: + - peer + dump: + request: + attributes: + - ifindex + reply: + attributes: + - peer + - + name: peer-del + attribute-set: ovpn + flags: [ admin-perm ] + doc: Delete existing remote peer + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - peer + - + name: peer-del-ntf + doc: Notification about a peer being deleted + notify: peer-get + mcgrp: peers + + - + name: key-new + attribute-set: ovpn + flags: [ admin-perm ] + doc: Add a cipher key for a specific peer + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - keyconf + - + name: key-get + attribute-set: ovpn + flags: [ admin-perm ] + doc: Retrieve non-sensitive data about peer key and cipher + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - keyconf + reply: + attributes: + - keyconf + - + name: key-swap + attribute-set: ovpn + flags: [ admin-perm ] + doc: Swap primary and secondary session keys for a specific peer + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - keyconf + - + name: key-swap-ntf + notify: key-get + doc: | + Notification about key having exhausted its IV space and requiring + renegotiation + mcgrp: peers + - + name: key-del + attribute-set: ovpn + flags: [ admin-perm ] + doc: Delete cipher key for a specific peer + do: + pre: ovpn-nl-pre-doit + post: ovpn-nl-post-doit + request: + attributes: + - ifindex + - keyconf + +mcast-groups: + list: + - + name: peers diff --git a/MAINTAINERS b/MAINTAINERS index 09e193bbc218d74846cbae26f80ada3e04c3692a..cf3d55c3e98aaea8f8817faed99dd7499cd59a71 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17292,7 +17292,9 @@ L: openvpn-devel@lists.sourceforge.net (moderated for non-subscribers) L: netdev@vger.kernel.org S: Supported T: git https://github.com/OpenVPN/linux-kernel-ovpn.git +F: Documentation/netlink/specs/ovpn.yaml F: drivers/net/ovpn/ +F: include/uapi/linux/ovpn.h
OPENVSWITCH M: Pravin B Shelar pshelar@ovn.org diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index 53fb197027d787d6683e9056d3d341abf6ed38e4..201dc001419f1d99ae95c0ee0f96e68f8a4eac16 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -9,3 +9,5 @@ obj-$(CONFIG_OVPN) := ovpn.o ovpn-y += main.o ovpn-y += io.o +ovpn-y += netlink.o +ovpn-y += netlink-gen.o diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 369a5a2b2fc1a497c8444e59f9b058eb40e49524..d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -7,11 +7,15 @@ * James Yonan james@openvpn.net */
+#include <linux/genetlink.h> #include <linux/module.h> #include <linux/netdevice.h> #include <net/rtnetlink.h> +#include <uapi/linux/ovpn.h>
+#include "ovpnstruct.h" #include "main.h" +#include "netlink.h" #include "io.h"
/* Driver info */ @@ -37,7 +41,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, }
static struct rtnl_link_ops ovpn_link_ops = { - .kind = "ovpn", + .kind = OVPN_FAMILY_NAME, .netns_refund = false, .newlink = ovpn_newlink, .dellink = unregister_netdevice_queue, @@ -93,8 +97,16 @@ static int __init ovpn_init(void) goto unreg_netdev; }
+ err = ovpn_nl_register(); + if (err) { + pr_err("ovpn: can't register netlink family: %d\n", err); + goto unreg_rtnl; + } + return 0;
+unreg_rtnl: + rtnl_link_unregister(&ovpn_link_ops); unreg_netdev: unregister_netdevice_notifier(&ovpn_netdev_notifier); return err; @@ -102,6 +114,7 @@ static int __init ovpn_init(void)
static __exit void ovpn_cleanup(void) { + ovpn_nl_unregister(); rtnl_link_unregister(&ovpn_link_ops); unregister_netdevice_notifier(&ovpn_netdev_notifier);
diff --git a/drivers/net/ovpn/netlink-gen.c b/drivers/net/ovpn/netlink-gen.c new file mode 100644 index 0000000000000000000000000000000000000000..6a43eab9a136cf0d739b9674080d1254a43cf5d0 --- /dev/null +++ b/drivers/net/ovpn/netlink-gen.c @@ -0,0 +1,212 @@ +// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/ovpn.yaml */ +/* YNL-GEN kernel source */ + +#include <net/netlink.h> +#include <net/genetlink.h> + +#include "netlink-gen.h" + +#include <uapi/linux/ovpn.h> + +/* Integer value ranges */ +static const struct netlink_range_validation ovpn_a_peer_id_range = { + .max = 16777215ULL, +}; + +static const struct netlink_range_validation ovpn_a_keyconf_peer_id_range = { + .max = 16777215ULL, +}; + +/* Common nested types */ +const struct nla_policy ovpn_keyconf_nl_policy[OVPN_A_KEYCONF_DECRYPT_DIR + 1] = { + [OVPN_A_KEYCONF_PEER_ID] = NLA_POLICY_FULL_RANGE(NLA_U32, &ovpn_a_keyconf_peer_id_range), + [OVPN_A_KEYCONF_SLOT] = NLA_POLICY_MAX(NLA_U32, 1), + [OVPN_A_KEYCONF_KEY_ID] = NLA_POLICY_MAX(NLA_U32, 7), + [OVPN_A_KEYCONF_CIPHER_ALG] = NLA_POLICY_MAX(NLA_U32, 2), + [OVPN_A_KEYCONF_ENCRYPT_DIR] = NLA_POLICY_NESTED(ovpn_keydir_nl_policy), + [OVPN_A_KEYCONF_DECRYPT_DIR] = NLA_POLICY_NESTED(ovpn_keydir_nl_policy), +}; + +const struct nla_policy ovpn_keydir_nl_policy[OVPN_A_KEYDIR_NONCE_TAIL + 1] = { + [OVPN_A_KEYDIR_CIPHER_KEY] = NLA_POLICY_MAX_LEN(256), + [OVPN_A_KEYDIR_NONCE_TAIL] = NLA_POLICY_EXACT_LEN(OVPN_NONCE_TAIL_SIZE), +}; + +const struct nla_policy ovpn_peer_nl_policy[OVPN_A_PEER_LINK_TX_PACKETS + 1] = { + [OVPN_A_PEER_ID] = NLA_POLICY_FULL_RANGE(NLA_U32, &ovpn_a_peer_id_range), + [OVPN_A_PEER_REMOTE_IPV4] = { .type = NLA_U32, }, + [OVPN_A_PEER_REMOTE_IPV6] = NLA_POLICY_EXACT_LEN(16), + [OVPN_A_PEER_REMOTE_IPV6_SCOPE_ID] = { .type = NLA_U32, }, + [OVPN_A_PEER_REMOTE_PORT] = NLA_POLICY_MIN(NLA_U16, 1), + [OVPN_A_PEER_SOCKET] = { .type = NLA_U32, }, + [OVPN_A_PEER_VPN_IPV4] = { .type = NLA_U32, }, + [OVPN_A_PEER_VPN_IPV6] = NLA_POLICY_EXACT_LEN(16), + [OVPN_A_PEER_LOCAL_IPV4] = { .type = NLA_U32, }, + [OVPN_A_PEER_LOCAL_IPV6] = NLA_POLICY_EXACT_LEN(16), + [OVPN_A_PEER_LOCAL_PORT] = NLA_POLICY_MIN(NLA_U16, 1), + [OVPN_A_PEER_KEEPALIVE_INTERVAL] = { .type = NLA_U32, }, + [OVPN_A_PEER_KEEPALIVE_TIMEOUT] = { .type = NLA_U32, }, + [OVPN_A_PEER_DEL_REASON] = NLA_POLICY_MAX(NLA_U32, 4), + [OVPN_A_PEER_VPN_RX_BYTES] = { .type = NLA_UINT, }, + [OVPN_A_PEER_VPN_TX_BYTES] = { .type = NLA_UINT, }, + [OVPN_A_PEER_VPN_RX_PACKETS] = { .type = NLA_UINT, }, + [OVPN_A_PEER_VPN_TX_PACKETS] = { .type = NLA_UINT, }, + [OVPN_A_PEER_LINK_RX_BYTES] = { .type = NLA_UINT, }, + [OVPN_A_PEER_LINK_TX_BYTES] = { .type = NLA_UINT, }, + [OVPN_A_PEER_LINK_RX_PACKETS] = { .type = NLA_U32, }, + [OVPN_A_PEER_LINK_TX_PACKETS] = { .type = NLA_U32, }, +}; + +/* OVPN_CMD_PEER_NEW - do */ +static const struct nla_policy ovpn_peer_new_nl_policy[OVPN_A_PEER + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy), +}; + +/* OVPN_CMD_PEER_SET - do */ +static const struct nla_policy ovpn_peer_set_nl_policy[OVPN_A_PEER + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy), +}; + +/* OVPN_CMD_PEER_GET - do */ +static const struct nla_policy ovpn_peer_get_do_nl_policy[OVPN_A_PEER + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy), +}; + +/* OVPN_CMD_PEER_GET - dump */ +static const struct nla_policy ovpn_peer_get_dump_nl_policy[OVPN_A_IFINDEX + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, +}; + +/* OVPN_CMD_PEER_DEL - do */ +static const struct nla_policy ovpn_peer_del_nl_policy[OVPN_A_PEER + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_PEER] = NLA_POLICY_NESTED(ovpn_peer_nl_policy), +}; + +/* OVPN_CMD_KEY_NEW - do */ +static const struct nla_policy ovpn_key_new_nl_policy[OVPN_A_KEYCONF + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_KEYCONF] = NLA_POLICY_NESTED(ovpn_keyconf_nl_policy), +}; + +/* OVPN_CMD_KEY_GET - do */ +static const struct nla_policy ovpn_key_get_nl_policy[OVPN_A_KEYCONF + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_KEYCONF] = NLA_POLICY_NESTED(ovpn_keyconf_nl_policy), +}; + +/* OVPN_CMD_KEY_SWAP - do */ +static const struct nla_policy ovpn_key_swap_nl_policy[OVPN_A_KEYCONF + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_KEYCONF] = NLA_POLICY_NESTED(ovpn_keyconf_nl_policy), +}; + +/* OVPN_CMD_KEY_DEL - do */ +static const struct nla_policy ovpn_key_del_nl_policy[OVPN_A_KEYCONF + 1] = { + [OVPN_A_IFINDEX] = { .type = NLA_U32, }, + [OVPN_A_KEYCONF] = NLA_POLICY_NESTED(ovpn_keyconf_nl_policy), +}; + +/* Ops table for ovpn */ +static const struct genl_split_ops ovpn_nl_ops[] = { + { + .cmd = OVPN_CMD_PEER_NEW, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_peer_new_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_peer_new_nl_policy, + .maxattr = OVPN_A_PEER, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = OVPN_CMD_PEER_SET, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_peer_set_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_peer_set_nl_policy, + .maxattr = OVPN_A_PEER, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = OVPN_CMD_PEER_GET, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_peer_get_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_peer_get_do_nl_policy, + .maxattr = OVPN_A_PEER, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = OVPN_CMD_PEER_GET, + .dumpit = ovpn_nl_peer_get_dumpit, + .policy = ovpn_peer_get_dump_nl_policy, + .maxattr = OVPN_A_IFINDEX, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP, + }, + { + .cmd = OVPN_CMD_PEER_DEL, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_peer_del_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_peer_del_nl_policy, + .maxattr = OVPN_A_PEER, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = OVPN_CMD_KEY_NEW, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_key_new_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_key_new_nl_policy, + .maxattr = OVPN_A_KEYCONF, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = OVPN_CMD_KEY_GET, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_key_get_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_key_get_nl_policy, + .maxattr = OVPN_A_KEYCONF, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = OVPN_CMD_KEY_SWAP, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_key_swap_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_key_swap_nl_policy, + .maxattr = OVPN_A_KEYCONF, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, + { + .cmd = OVPN_CMD_KEY_DEL, + .pre_doit = ovpn_nl_pre_doit, + .doit = ovpn_nl_key_del_doit, + .post_doit = ovpn_nl_post_doit, + .policy = ovpn_key_del_nl_policy, + .maxattr = OVPN_A_KEYCONF, + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, + }, +}; + +static const struct genl_multicast_group ovpn_nl_mcgrps[] = { + [OVPN_NLGRP_PEERS] = { "peers", }, +}; + +struct genl_family ovpn_nl_family __ro_after_init = { + .name = OVPN_FAMILY_NAME, + .version = OVPN_FAMILY_VERSION, + .netnsok = true, + .parallel_ops = true, + .module = THIS_MODULE, + .split_ops = ovpn_nl_ops, + .n_split_ops = ARRAY_SIZE(ovpn_nl_ops), + .mcgrps = ovpn_nl_mcgrps, + .n_mcgrps = ARRAY_SIZE(ovpn_nl_mcgrps), +}; diff --git a/drivers/net/ovpn/netlink-gen.h b/drivers/net/ovpn/netlink-gen.h new file mode 100644 index 0000000000000000000000000000000000000000..66a4e4a0a055b4477b67801ded825e9ec068b0e6 --- /dev/null +++ b/drivers/net/ovpn/netlink-gen.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/ovpn.yaml */ +/* YNL-GEN kernel header */ + +#ifndef _LINUX_OVPN_GEN_H +#define _LINUX_OVPN_GEN_H + +#include <net/netlink.h> +#include <net/genetlink.h> + +#include <uapi/linux/ovpn.h> + +/* Common nested types */ +extern const struct nla_policy ovpn_keyconf_nl_policy[OVPN_A_KEYCONF_DECRYPT_DIR + 1]; +extern const struct nla_policy ovpn_keydir_nl_policy[OVPN_A_KEYDIR_NONCE_TAIL + 1]; +extern const struct nla_policy ovpn_peer_nl_policy[OVPN_A_PEER_LINK_TX_PACKETS + 1]; + +int ovpn_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb, + struct genl_info *info); +void +ovpn_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, + struct genl_info *info); + +int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info); +int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info); +int ovpn_nl_peer_get_doit(struct sk_buff *skb, struct genl_info *info); +int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb); +int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info); +int ovpn_nl_key_new_doit(struct sk_buff *skb, struct genl_info *info); +int ovpn_nl_key_get_doit(struct sk_buff *skb, struct genl_info *info); +int ovpn_nl_key_swap_doit(struct sk_buff *skb, struct genl_info *info); +int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info); + +enum { + OVPN_NLGRP_PEERS, +}; + +extern struct genl_family ovpn_nl_family; + +#endif /* _LINUX_OVPN_GEN_H */ diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c new file mode 100644 index 0000000000000000000000000000000000000000..2cc34eb1d1d870c6705714cb971c3c5dfb04afda --- /dev/null +++ b/drivers/net/ovpn/netlink.c @@ -0,0 +1,157 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/netdevice.h> +#include <net/genetlink.h> + +#include <uapi/linux/ovpn.h> + +#include "ovpnstruct.h" +#include "main.h" +#include "io.h" +#include "netlink.h" +#include "netlink-gen.h" + +MODULE_ALIAS_GENL_FAMILY(OVPN_FAMILY_NAME); + +/** + * ovpn_get_dev_from_attrs - retrieve the ovpn private data from the netdevice + * a netlink message is targeting + * @net: network namespace where to look for the interface + * @info: generic netlink info from the user request + * + * Return: the ovpn private data, if found, or an error otherwise + */ +static struct ovpn_struct * +ovpn_get_dev_from_attrs(struct net *net, const struct genl_info *info) +{ + struct ovpn_struct *ovpn; + struct net_device *dev; + int ifindex; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_IFINDEX)) + return ERR_PTR(-EINVAL); + + ifindex = nla_get_u32(info->attrs[OVPN_A_IFINDEX]); + + rcu_read_lock(); + dev = dev_get_by_index_rcu(net, ifindex); + if (!dev) { + rcu_read_unlock(); + NL_SET_ERR_MSG_MOD(info->extack, + "ifindex does not match any interface"); + return ERR_PTR(-ENODEV); + } + + if (!ovpn_dev_is_valid(dev)) { + rcu_read_unlock(); + NL_SET_ERR_MSG_MOD(info->extack, + "specified interface is not ovpn"); + NL_SET_BAD_ATTR(info->extack, info->attrs[OVPN_A_IFINDEX]); + return ERR_PTR(-EINVAL); + } + + ovpn = netdev_priv(dev); + netdev_hold(dev, &ovpn->dev_tracker, GFP_KERNEL); + rcu_read_unlock(); + + return ovpn; +} + +int ovpn_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb, + struct genl_info *info) +{ + struct ovpn_struct *ovpn = ovpn_get_dev_from_attrs(genl_info_net(info), + info); + + if (IS_ERR(ovpn)) + return PTR_ERR(ovpn); + + info->user_ptr[0] = ovpn; + + return 0; +} + +void ovpn_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, + struct genl_info *info) +{ + struct ovpn_struct *ovpn = info->user_ptr[0]; + + if (ovpn) + netdev_put(ovpn->dev, &ovpn->dev_tracker); +} + +int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_peer_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_key_new_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_key_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_key_swap_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) +{ + return -EOPNOTSUPP; +} + +/** + * ovpn_nl_register - perform any needed registration in the NL subsustem + * + * Return: 0 on success, a negative error code otherwise + */ +int __init ovpn_nl_register(void) +{ + int ret = genl_register_family(&ovpn_nl_family); + + if (ret) { + pr_err("ovpn: genl_register_family failed: %d\n", ret); + return ret; + } + + return 0; +} + +/** + * ovpn_nl_unregister - undo any module wide netlink registration + */ +void ovpn_nl_unregister(void) +{ + genl_unregister_family(&ovpn_nl_family); +} diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h new file mode 100644 index 0000000000000000000000000000000000000000..9e87cf11d1e9813b7a75ddf3705ab7d5fabe899f --- /dev/null +++ b/drivers/net/ovpn/netlink.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_NETLINK_H_ +#define _NET_OVPN_NETLINK_H_ + +int ovpn_nl_register(void); +void ovpn_nl_unregister(void); + +#endif /* _NET_OVPN_NETLINK_H_ */ diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h new file mode 100644 index 0000000000000000000000000000000000000000..e3e4df6418b081436378fc51d98db5bd7b5d1fbe --- /dev/null +++ b/drivers/net/ovpn/ovpnstruct.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNSTRUCT_H_ +#define _NET_OVPN_OVPNSTRUCT_H_ + +#include <net/net_trackers.h> + +/** + * struct ovpn_struct - per ovpn interface state + * @dev: the actual netdev representing the tunnel + * @dev_tracker: reference tracker for associated dev + */ +struct ovpn_struct { + struct net_device *dev; + netdevice_tracker dev_tracker; +}; + +#endif /* _NET_OVPN_OVPNSTRUCT_H_ */ diff --git a/include/uapi/linux/ovpn.h b/include/uapi/linux/ovpn.h new file mode 100644 index 0000000000000000000000000000000000000000..7bac0803cd9fd0dde13f4db74acce8d9df5316d8 --- /dev/null +++ b/include/uapi/linux/ovpn.h @@ -0,0 +1,109 @@ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/ovpn.yaml */ +/* YNL-GEN uapi header */ + +#ifndef _UAPI_LINUX_OVPN_H +#define _UAPI_LINUX_OVPN_H + +#define OVPN_FAMILY_NAME "ovpn" +#define OVPN_FAMILY_VERSION 1 + +#define OVPN_NONCE_TAIL_SIZE 8 + +enum ovpn_cipher_alg { + OVPN_CIPHER_ALG_NONE, + OVPN_CIPHER_ALG_AES_GCM, + OVPN_CIPHER_ALG_CHACHA20_POLY1305, +}; + +enum ovpn_del_peer_reason { + OVPN_DEL_PEER_REASON_TEARDOWN, + OVPN_DEL_PEER_REASON_USERSPACE, + OVPN_DEL_PEER_REASON_EXPIRED, + OVPN_DEL_PEER_REASON_TRANSPORT_ERROR, + OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT, +}; + +enum ovpn_key_slot { + OVPN_KEY_SLOT_PRIMARY, + OVPN_KEY_SLOT_SECONDARY, +}; + +enum { + OVPN_A_PEER_ID = 1, + OVPN_A_PEER_REMOTE_IPV4, + OVPN_A_PEER_REMOTE_IPV6, + OVPN_A_PEER_REMOTE_IPV6_SCOPE_ID, + OVPN_A_PEER_REMOTE_PORT, + OVPN_A_PEER_SOCKET, + OVPN_A_PEER_VPN_IPV4, + OVPN_A_PEER_VPN_IPV6, + OVPN_A_PEER_LOCAL_IPV4, + OVPN_A_PEER_LOCAL_IPV6, + OVPN_A_PEER_LOCAL_PORT, + OVPN_A_PEER_KEEPALIVE_INTERVAL, + OVPN_A_PEER_KEEPALIVE_TIMEOUT, + OVPN_A_PEER_DEL_REASON, + OVPN_A_PEER_VPN_RX_BYTES, + OVPN_A_PEER_VPN_TX_BYTES, + OVPN_A_PEER_VPN_RX_PACKETS, + OVPN_A_PEER_VPN_TX_PACKETS, + OVPN_A_PEER_LINK_RX_BYTES, + OVPN_A_PEER_LINK_TX_BYTES, + OVPN_A_PEER_LINK_RX_PACKETS, + OVPN_A_PEER_LINK_TX_PACKETS, + + __OVPN_A_PEER_MAX, + OVPN_A_PEER_MAX = (__OVPN_A_PEER_MAX - 1) +}; + +enum { + OVPN_A_KEYCONF_PEER_ID = 1, + OVPN_A_KEYCONF_SLOT, + OVPN_A_KEYCONF_KEY_ID, + OVPN_A_KEYCONF_CIPHER_ALG, + OVPN_A_KEYCONF_ENCRYPT_DIR, + OVPN_A_KEYCONF_DECRYPT_DIR, + + __OVPN_A_KEYCONF_MAX, + OVPN_A_KEYCONF_MAX = (__OVPN_A_KEYCONF_MAX - 1) +}; + +enum { + OVPN_A_KEYDIR_CIPHER_KEY = 1, + OVPN_A_KEYDIR_NONCE_TAIL, + + __OVPN_A_KEYDIR_MAX, + OVPN_A_KEYDIR_MAX = (__OVPN_A_KEYDIR_MAX - 1) +}; + +enum { + OVPN_A_IFINDEX = 1, + OVPN_A_IFNAME, + OVPN_A_PEER, + OVPN_A_KEYCONF, + + __OVPN_A_MAX, + OVPN_A_MAX = (__OVPN_A_MAX - 1) +}; + +enum { + OVPN_CMD_PEER_NEW = 1, + OVPN_CMD_PEER_SET, + OVPN_CMD_PEER_GET, + OVPN_CMD_PEER_DEL, + OVPN_CMD_PEER_DEL_NTF, + OVPN_CMD_KEY_NEW, + OVPN_CMD_KEY_GET, + OVPN_CMD_KEY_SWAP, + OVPN_CMD_KEY_SWAP_NTF, + OVPN_CMD_KEY_DEL, + + __OVPN_CMD_MAX, + OVPN_CMD_MAX = (__OVPN_CMD_MAX - 1) +}; + +#define OVPN_MCGRP_PEERS "peers" + +#endif /* _UAPI_LINUX_OVPN_H */
On 29.10.2024 12:47, Antonio Quartulli wrote:
This commit introduces basic netlink support with family registration/unregistration functionalities and stub pre/post-doit.
More importantly it introduces the YAML uAPI description along with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h
Cc: donald.hunter@gmail.com Signed-off-by: Antonio Quartulli antonio@openvpn.net
[skipped]
diff --git a/Documentation/netlink/specs/ovpn.yaml b/Documentation/netlink/specs/ovpn.yaml
[skipped]
+attribute-sets:
- name: peer
- attributes:
-
name: id
type: u32
doc: |
The unique ID of the peer. To be used to identify peers during
operations
nit: could you specify the scope of uniqueness? I believe it is not globally uniq, it is just interface uniq, right?
checks:
max: 0xFFFFFF
[skipped]
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 369a5a2b2fc1a497c8444e59f9b058eb40e49524..d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -7,11 +7,15 @@
James Yonan <james@openvpn.net>
*/ +#include <linux/genetlink.h> #include <linux/module.h> #include <linux/netdevice.h> #include <net/rtnetlink.h> +#include <uapi/linux/ovpn.h> +#include "ovpnstruct.h" #include "main.h" +#include "netlink.h" #include "io.h" /* Driver info */ @@ -37,7 +41,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, } static struct rtnl_link_ops ovpn_link_ops = {
- .kind = "ovpn",
- .kind = OVPN_FAMILY_NAME,
nit: are you sure that the link kind is the same as the GENL family? I mean, they are both deriviated from the protocol name that is common for both entities, but is it making RTNL kind a derivative of GENL family?
.netns_refund = false, .newlink = ovpn_newlink, .dellink = unregister_netdevice_queue, @@ -93,8 +97,16 @@ static int __init ovpn_init(void) goto unreg_netdev; }
-- Sergey
On 09/11/2024 00:15, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
This commit introduces basic netlink support with family registration/unregistration functionalities and stub pre/post-doit.
More importantly it introduces the YAML uAPI description along with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h
Cc: donald.hunter@gmail.com Signed-off-by: Antonio Quartulli antonio@openvpn.net
[skipped]
diff --git a/Documentation/netlink/specs/ovpn.yaml b/Documentation/ netlink/specs/ovpn.yaml
[skipped]
+attribute-sets: + - + name: peer + attributes: + - + name: id + type: u32 + doc: | + The unique ID of the peer. To be used to identify peers during + operations
nit: could you specify the scope of uniqueness? I believe it is not globally uniq, it is just interface uniq, right?
Yeah it's per interface/instance. Will make it more clear, also for other IDs.
+ checks: + max: 0xFFFFFF
[skipped]
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 369a5a2b2fc1a497c8444e59f9b058eb40e49524..d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -7,11 +7,15 @@ * James Yonan james@openvpn.net */ +#include <linux/genetlink.h> #include <linux/module.h> #include <linux/netdevice.h> #include <net/rtnetlink.h> +#include <uapi/linux/ovpn.h> +#include "ovpnstruct.h" #include "main.h" +#include "netlink.h" #include "io.h" /* Driver info */ @@ -37,7 +41,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, } static struct rtnl_link_ops ovpn_link_ops = { - .kind = "ovpn", + .kind = OVPN_FAMILY_NAME,
nit: are you sure that the link kind is the same as the GENL family? I mean, they are both deriviated from the protocol name that is common for both entities, but is it making RTNL kind a derivative of GENL family?
I just want to use the same name everywhere and I thought it doesn't make sense to create a separate define (they can be decoupled later should see any need for that). But I can add:
#define OVPN_RTNL_LINK_KIND OVPN_FAMILY_NAME
to make this relationship explicit?
Regards,
On 15.11.2024 12:05, Antonio Quartulli wrote:
On 09/11/2024 00:15, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
@@ -37,7 +41,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, } static struct rtnl_link_ops ovpn_link_ops = { - .kind = "ovpn", + .kind = OVPN_FAMILY_NAME,
nit: are you sure that the link kind is the same as the GENL family? I mean, they are both deriviated from the protocol name that is common for both entities, but is it making RTNL kind a derivative of GENL family?
I just want to use the same name everywhere and I thought it doesn't make sense to create a separate define (they can be decoupled later should see any need for that). But I can add:
#define OVPN_RTNL_LINK_KIND OVPN_FAMILY_NAME
to make this relationship explicit?
Can we just leave it as literal? This string is going to be a part of ABI and there will be no chance to change it in the future. So, what the purpose to define it using a macro if it's self-descriptive?
People also like to define a macro with a generic name like DRV_NAME and use it everywhere. What also looks reasonable.
-- Sergey
On 19/11/2024 03:05, Sergey Ryazanov wrote:
On 15.11.2024 12:05, Antonio Quartulli wrote:
On 09/11/2024 00:15, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
@@ -37,7 +41,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, } static struct rtnl_link_ops ovpn_link_ops = { - .kind = "ovpn", + .kind = OVPN_FAMILY_NAME,
nit: are you sure that the link kind is the same as the GENL family? I mean, they are both deriviated from the protocol name that is common for both entities, but is it making RTNL kind a derivative of GENL family?
I just want to use the same name everywhere and I thought it doesn't make sense to create a separate define (they can be decoupled later should see any need for that). But I can add:
#define OVPN_RTNL_LINK_KIND OVPN_FAMILY_NAME
to make this relationship explicit?
Can we just leave it as literal? This string is going to be a part of ABI and there will be no chance to change it in the future. So, what the purpose to define it using a macro if it's self-descriptive?
I don't truly have a strong opinion, but the netlink family name is also expected to not change anytime soon.
Anyway, I see that using the literal is pretty common across all other drivers, therefore I'll go for it as well.
People also like to define a macro with a generic name like DRV_NAME and use it everywhere. What also looks reasonable.
Yeah, that's exactly how I am using OVPN_FAMILY_NAME. Anyway, I am switching to literal.
Regards,
-- Sergey
On 29.10.2024 12:47, Antonio Quartulli wrote:
This commit introduces basic netlink support with family registration/unregistration functionalities and stub pre/post-doit.
More importantly it introduces the YAML uAPI description along with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h
Cc: donald.hunter@gmail.com Signed-off-by: Antonio Quartulli antonio@openvpn.net
[skipped]
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h --- /dev/null +++ b/drivers/net/ovpn/ovpnstruct.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- Copyright (C) 2019-2024 OpenVPN, Inc.
- Author: James Yonan james@openvpn.net
Antonio Quartulli <antonio@openvpn.net>
- */
+#ifndef _NET_OVPN_OVPNSTRUCT_H_ +#define _NET_OVPN_OVPNSTRUCT_H_
+#include <net/net_trackers.h>
+/**
- struct ovpn_struct - per ovpn interface state
- @dev: the actual netdev representing the tunnel
- @dev_tracker: reference tracker for associated dev
- */
+struct ovpn_struct {
There is no standard convention how to entitle such structures, so the question is basically of out-of-curiosity class. For me, having a sturcuture with name 'struct' is like having no name. Did you consider to use such names as ovpn_dev or ovpn_iface? Meaning, using a name that gives a clue regarding the scope of the content.
For interface functions, when the pointer assigned in such manner as `ovpn = netdev_priv(dev)`, it is clear what is inside. But for functions like ovpn_peer_get_by_id() it is a bit tricky to quickly realize, what is this for.
- struct net_device *dev;
- netdevice_tracker dev_tracker;
+};
+#endif /* _NET_OVPN_OVPNSTRUCT_H_ */
-- Sergey
On 09/11/2024 00:31, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
This commit introduces basic netlink support with family registration/unregistration functionalities and stub pre/post-doit.
More importantly it introduces the YAML uAPI description along with its auto-generated files:
- include/uapi/linux/ovpn.h
- drivers/net/ovpn/netlink-gen.c
- drivers/net/ovpn/netlink-gen.h
Cc: donald.hunter@gmail.com Signed-off-by: Antonio Quartulli antonio@openvpn.net
[skipped]
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ ovpnstruct.h --- /dev/null +++ b/drivers/net/ovpn/ovpnstruct.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- * Copyright (C) 2019-2024 OpenVPN, Inc.
- * Author: James Yonan james@openvpn.net
- * Antonio Quartulli antonio@openvpn.net
- */
+#ifndef _NET_OVPN_OVPNSTRUCT_H_ +#define _NET_OVPN_OVPNSTRUCT_H_
+#include <net/net_trackers.h>
+/**
- struct ovpn_struct - per ovpn interface state
- @dev: the actual netdev representing the tunnel
- @dev_tracker: reference tracker for associated dev
- */
+struct ovpn_struct {
There is no standard convention how to entitle such structures, so the question is basically of out-of-curiosity class. For me, having a sturcuture with name 'struct' is like having no name. Did you consider to use such names as ovpn_dev or ovpn_iface? Meaning, using a name that gives a clue regarding the scope of the content.
Yes, I wanted to switch to ovpn_priv, but did not care much for the time being :)
I can still do it now in v12.
Thanks! Regards,
On 15.11.2024 12:19, Antonio Quartulli wrote:
On 09/11/2024 00:31, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+/**
- struct ovpn_struct - per ovpn interface state
- @dev: the actual netdev representing the tunnel
- @dev_tracker: reference tracker for associated dev
- */
+struct ovpn_struct {
There is no standard convention how to entitle such structures, so the question is basically of out-of-curiosity class. For me, having a sturcuture with name 'struct' is like having no name. Did you consider to use such names as ovpn_dev or ovpn_iface? Meaning, using a name that gives a clue regarding the scope of the content.
Yes, I wanted to switch to ovpn_priv, but did not care much for the time being :)
I can still do it now in v12.
This topic caused me the biggest doubts. I don't want to ask to rename everything on the final lap. Just want to share an outside perspective on the structure name. And let you decide is it worth or not.
And if you ask me, ovpn_priv does not give a clue either. The module is too complex for a vague structure name, even after your great work on clearing its design.
-- Sergey
On 19/11/2024 03:23, Sergey Ryazanov wrote:
On 15.11.2024 12:19, Antonio Quartulli wrote:
On 09/11/2024 00:31, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+/**
- struct ovpn_struct - per ovpn interface state
- @dev: the actual netdev representing the tunnel
- @dev_tracker: reference tracker for associated dev
- */
+struct ovpn_struct {
There is no standard convention how to entitle such structures, so the question is basically of out-of-curiosity class. For me, having a sturcuture with name 'struct' is like having no name. Did you consider to use such names as ovpn_dev or ovpn_iface? Meaning, using a name that gives a clue regarding the scope of the content.
Yes, I wanted to switch to ovpn_priv, but did not care much for the time being :)
I can still do it now in v12.
This topic caused me the biggest doubts. I don't want to ask to rename everything on the final lap. Just want to share an outside perspective on the structure name. And let you decide is it worth or not.
And if you ask me, ovpn_priv does not give a clue either. The module is too complex for a vague structure name, even after your great work on clearing its design.
Well, the word "priv" to me resembles the "netdev_priv()" call, so it's kinda easier to grasp what this is about. In batman-adv we used the same suffix and it was well received. Also, if you grep for "_priv " in drivers/net you will see that this is a common pattern.
Since I already had in mind to change this struct name, I moved on and renamed it to ovpn_priv throughput the patchset (git rebase --exec is my friend ;)).
Thanks
Regards,
-- Sergey
Add basic infrastructure for handling ovpn interfaces.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/main.c | 115 ++++++++++++++++++++++++++++++++++++++++-- drivers/net/ovpn/main.h | 7 +++ drivers/net/ovpn/ovpnstruct.h | 8 +++ drivers/net/ovpn/packet.h | 40 +++++++++++++++ include/uapi/linux/if_link.h | 15 ++++++ 5 files changed, 180 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -10,18 +10,52 @@ #include <linux/genetlink.h> #include <linux/module.h> #include <linux/netdevice.h> +#include <linux/inetdevice.h> +#include <net/ip.h> #include <net/rtnetlink.h> -#include <uapi/linux/ovpn.h> +#include <uapi/linux/if_arp.h>
#include "ovpnstruct.h" #include "main.h" #include "netlink.h" #include "io.h" +#include "packet.h"
/* Driver info */ #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" #define DRV_COPYRIGHT "(C) 2020-2024 OpenVPN, Inc."
+static void ovpn_struct_free(struct net_device *net) +{ +} + +static int ovpn_net_open(struct net_device *dev) +{ + netif_tx_start_all_queues(dev); + return 0; +} + +static int ovpn_net_stop(struct net_device *dev) +{ + netif_tx_stop_all_queues(dev); + return 0; +} + +static const struct net_device_ops ovpn_netdev_ops = { + .ndo_open = ovpn_net_open, + .ndo_stop = ovpn_net_stop, + .ndo_start_xmit = ovpn_net_xmit, +}; + +static const struct device_type ovpn_type = { + .name = OVPN_FAMILY_NAME, +}; + +static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = { + [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P, + OVPN_MODE_MP), +}; + /** * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn' * @dev: the interface to check @@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev) return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; }
+static void ovpn_setup(struct net_device *dev) +{ + /* compute the overhead considering AEAD encryption */ + const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 + + sizeof(struct udphdr) + + max(sizeof(struct ipv6hdr), sizeof(struct iphdr)); + + netdev_features_t feat = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM | + NETIF_F_GSO | NETIF_F_GSO_SOFTWARE | + NETIF_F_HIGHDMA; + + dev->needs_free_netdev = true; + + dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS; + + dev->netdev_ops = &ovpn_netdev_ops; + + dev->priv_destructor = ovpn_struct_free; + + dev->hard_header_len = 0; + dev->addr_len = 0; + dev->mtu = ETH_DATA_LEN - overhead; + dev->min_mtu = IPV4_MIN_MTU; + dev->max_mtu = IP_MAX_MTU - overhead; + + dev->type = ARPHRD_NONE; + dev->flags = IFF_POINTOPOINT | IFF_NOARP; + dev->priv_flags |= IFF_NO_QUEUE; + + dev->lltx = true; + dev->features |= feat; + dev->hw_features |= feat; + dev->hw_enc_features |= feat; + + dev->needed_headroom = OVPN_HEAD_ROOM; + dev->needed_tailroom = OVPN_MAX_PADDING; + + SET_NETDEV_DEVTYPE(dev, &ovpn_type); +} + static int ovpn_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - return -EOPNOTSUPP; + struct ovpn_struct *ovpn = netdev_priv(dev); + enum ovpn_mode mode = OVPN_MODE_P2P; + + if (data && data[IFLA_OVPN_MODE]) { + mode = nla_get_u8(data[IFLA_OVPN_MODE]); + netdev_dbg(dev, "setting device mode: %u\n", mode); + } + + ovpn->dev = dev; + ovpn->mode = mode; + + /* turn carrier explicitly off after registration, this way state is + * clearly defined + */ + netif_carrier_off(dev); + + return register_netdevice(dev); }
static struct rtnl_link_ops ovpn_link_ops = { .kind = OVPN_FAMILY_NAME, .netns_refund = false, + .priv_size = sizeof(struct ovpn_struct), + .setup = ovpn_setup, + .policy = ovpn_policy, + .maxtype = IFLA_OVPN_MAX, .newlink = ovpn_newlink, .dellink = unregister_netdevice_queue, }; @@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, unsigned long state, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct ovpn_struct *ovpn;
if (!ovpn_dev_is_valid(dev)) return NOTIFY_DONE;
+ ovpn = netdev_priv(dev); + switch (state) { case NETDEV_REGISTER: - /* add device to internal list for later destruction upon - * unregistration - */ + ovpn->registered = true; break; case NETDEV_UNREGISTER: + /* twiddle thumbs on netns device moves */ + if (dev->reg_state != NETREG_UNREGISTERING) + break; + /* can be delivered multiple times, so check registered flag, * then destroy the interface */ + if (!ovpn->registered) + return NOTIFY_DONE; + + netif_carrier_off(dev); + ovpn->registered = false; break; case NETDEV_POST_INIT: case NETDEV_GOING_DOWN: case NETDEV_DOWN: case NETDEV_UP: case NETDEV_PRE_UP: + break; default: return NOTIFY_DONE; } diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h index a3215316c49bfcdf2496590bac878f145b8b27fd..0740a05070a817e0daea7b63a1f4fcebd274eb37 100644 --- a/drivers/net/ovpn/main.h +++ b/drivers/net/ovpn/main.h @@ -12,4 +12,11 @@
bool ovpn_dev_is_valid(const struct net_device *dev);
+#define SKB_HEADER_LEN \ + (max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \ + sizeof(struct udphdr) + NET_SKB_PAD) + +#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4) +#define OVPN_MAX_PADDING 16 + #endif /* _NET_OVPN_MAIN_H_ */ diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index e3e4df6418b081436378fc51d98db5bd7b5d1fbe..211df871538d34fdff90d182f21a0b0fb11b28ad 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -11,15 +11,23 @@ #define _NET_OVPN_OVPNSTRUCT_H_
#include <net/net_trackers.h> +#include <uapi/linux/if_link.h> +#include <uapi/linux/ovpn.h>
/** * struct ovpn_struct - per ovpn interface state * @dev: the actual netdev representing the tunnel * @dev_tracker: reference tracker for associated dev + * @registered: whether dev is still registered with netdev or not + * @mode: device operation mode (i.e. p2p, mp, ..) + * @dev_list: entry for the module wide device list */ struct ovpn_struct { struct net_device *dev; netdevice_tracker dev_tracker; + bool registered; + enum ovpn_mode mode; + struct list_head dev_list; };
#endif /* _NET_OVPN_OVPNSTRUCT_H_ */ diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h new file mode 100644 index 0000000000000000000000000000000000000000..7ed146f5932a25f448af6da58738a7eae81007fe --- /dev/null +++ b/drivers/net/ovpn/packet.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + * James Yonan james@openvpn.net + */ + +#ifndef _NET_OVPN_PACKET_H_ +#define _NET_OVPN_PACKET_H_ + +/* When the OpenVPN protocol is ran in AEAD mode, use + * the OpenVPN packet ID as the AEAD nonce: + * + * 00000005 521c3b01 4308c041 + * [seq # ] [ nonce_tail ] + * [ 12-byte full IV ] -> NONCE_SIZE + * [4-bytes -> NONCE_WIRE_SIZE + * on wire] + */ + +/* OpenVPN nonce size */ +#define NONCE_SIZE 12 + +/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the + * size of the AEAD Associated Data (AD) sent over the wire + * and is normally the head of the IV + */ +#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail)) + +/* Last 8 bytes of AEAD nonce + * Provided by userspace and usually derived from + * key material generated during TLS handshake + */ +struct ovpn_nonce_tail { + u8 u8[OVPN_NONCE_TAIL_SIZE]; +}; + +#endif /* _NET_OVPN_PACKET_H_ */ diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1975,4 +1975,19 @@ enum {
#define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1)
+/* OVPN section */ + +enum ovpn_mode { + OVPN_MODE_P2P, + OVPN_MODE_MP, +}; + +enum { + IFLA_OVPN_UNSPEC, + IFLA_OVPN_MODE, + __IFLA_OVPN_MAX, +}; + +#define IFLA_OVPN_MAX (__IFLA_OVPN_MAX - 1) + #endif /* _UAPI_LINUX_IF_LINK_H */
On 29.10.2024 12:47, Antonio Quartulli wrote:
Add basic infrastructure for handling ovpn interfaces.
Signed-off-by: Antonio Quartulli antonio@openvpn.net
drivers/net/ovpn/main.c | 115 ++++++++++++++++++++++++++++++++++++++++-- drivers/net/ovpn/main.h | 7 +++ drivers/net/ovpn/ovpnstruct.h | 8 +++ drivers/net/ovpn/packet.h | 40 +++++++++++++++ include/uapi/linux/if_link.h | 15 ++++++ 5 files changed, 180 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -10,18 +10,52 @@ #include <linux/genetlink.h> #include <linux/module.h> #include <linux/netdevice.h> +#include <linux/inetdevice.h> +#include <net/ip.h> #include <net/rtnetlink.h> -#include <uapi/linux/ovpn.h> +#include <uapi/linux/if_arp.h> #include "ovpnstruct.h" #include "main.h" #include "netlink.h" #include "io.h" +#include "packet.h" /* Driver info */ #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" #define DRV_COPYRIGHT "(C) 2020-2024 OpenVPN, Inc." +static void ovpn_struct_free(struct net_device *net) +{ +}
nit: since this handler is not mandatory, its introduction can be moved to the later patch, which actually fills it with meaningful operations.
+static int ovpn_net_open(struct net_device *dev) +{
- netif_tx_start_all_queues(dev);
- return 0;
+}
+static int ovpn_net_stop(struct net_device *dev) +{
- netif_tx_stop_all_queues(dev);
- return 0;
+}
+static const struct net_device_ops ovpn_netdev_ops = {
- .ndo_open = ovpn_net_open,
- .ndo_stop = ovpn_net_stop,
- .ndo_start_xmit = ovpn_net_xmit,
+};
+static const struct device_type ovpn_type = {
- .name = OVPN_FAMILY_NAME,
nit: same question here regarding name deriviation. Are you sure that the device type name is the same as the GENL family name?
+};
+static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = {
- [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P,
OVPN_MODE_MP),
+};
- /**
- ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
- @dev: the interface to check
@@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev) return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; } +static void ovpn_setup(struct net_device *dev) +{
- /* compute the overhead considering AEAD encryption */
- const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 +
Where are these magic sizeof(u32) and '16' came from?
sizeof(struct udphdr) +
max(sizeof(struct ipv6hdr), sizeof(struct iphdr));
- netdev_features_t feat = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
NETIF_F_GSO | NETIF_F_GSO_SOFTWARE |
NETIF_F_HIGHDMA;
- dev->needs_free_netdev = true;
- dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
- dev->netdev_ops = &ovpn_netdev_ops;
- dev->priv_destructor = ovpn_struct_free;
- dev->hard_header_len = 0;
- dev->addr_len = 0;
- dev->mtu = ETH_DATA_LEN - overhead;
- dev->min_mtu = IPV4_MIN_MTU;
- dev->max_mtu = IP_MAX_MTU - overhead;
- dev->type = ARPHRD_NONE;
- dev->flags = IFF_POINTOPOINT | IFF_NOARP;
- dev->priv_flags |= IFF_NO_QUEUE;
- dev->lltx = true;
- dev->features |= feat;
- dev->hw_features |= feat;
- dev->hw_enc_features |= feat;
- dev->needed_headroom = OVPN_HEAD_ROOM;
- dev->needed_tailroom = OVPN_MAX_PADDING;
- SET_NETDEV_DEVTYPE(dev, &ovpn_type);
+}
- static int ovpn_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) {
- return -EOPNOTSUPP;
- struct ovpn_struct *ovpn = netdev_priv(dev);
- enum ovpn_mode mode = OVPN_MODE_P2P;
- if (data && data[IFLA_OVPN_MODE]) {
mode = nla_get_u8(data[IFLA_OVPN_MODE]);
netdev_dbg(dev, "setting device mode: %u\n", mode);
- }
- ovpn->dev = dev;
- ovpn->mode = mode;
- /* turn carrier explicitly off after registration, this way state is
* clearly defined
*/
- netif_carrier_off(dev);
- return register_netdevice(dev); }
static struct rtnl_link_ops ovpn_link_ops = { .kind = OVPN_FAMILY_NAME, .netns_refund = false,
- .priv_size = sizeof(struct ovpn_struct),
- .setup = ovpn_setup,
- .policy = ovpn_policy,
- .maxtype = IFLA_OVPN_MAX, .newlink = ovpn_newlink, .dellink = unregister_netdevice_queue, };
@@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, unsigned long state, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr);
- struct ovpn_struct *ovpn;
if (!ovpn_dev_is_valid(dev)) return NOTIFY_DONE;
- ovpn = netdev_priv(dev);
nit: netdev_priv() returns only a pointer, it is safe to fetch the pointer in advance, but do not dereference it until we are sure that an event references the desired interface type. Takin this into consideration, the assignment of private data pointer can be moved above to the variable declaration. Just to make code couple of lines shorter.
- switch (state) { case NETDEV_REGISTER:
/* add device to internal list for later destruction upon
* unregistration
*/
break; case NETDEV_UNREGISTER:ovpn->registered = true;
/* twiddle thumbs on netns device moves */
if (dev->reg_state != NETREG_UNREGISTERING)
break;
- /* can be delivered multiple times, so check registered flag,
*/
- then destroy the interface
if (!ovpn->registered)
return NOTIFY_DONE;
netif_carrier_off(dev);
break; case NETDEV_POST_INIT: case NETDEV_GOING_DOWN: case NETDEV_DOWN: case NETDEV_UP: case NETDEV_PRE_UP:ovpn->registered = false;
default: return NOTIFY_DONE; }break;
diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h index a3215316c49bfcdf2496590bac878f145b8b27fd..0740a05070a817e0daea7b63a1f4fcebd274eb37 100644 --- a/drivers/net/ovpn/main.h +++ b/drivers/net/ovpn/main.h @@ -12,4 +12,11 @@ bool ovpn_dev_is_valid(const struct net_device *dev); +#define SKB_HEADER_LEN \
- (max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \
sizeof(struct udphdr) + NET_SKB_PAD)
+#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4)
Where is this magic '16' came from?
+#define OVPN_MAX_PADDING 16
- #endif /* _NET_OVPN_MAIN_H_ */
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index e3e4df6418b081436378fc51d98db5bd7b5d1fbe..211df871538d34fdff90d182f21a0b0fb11b28ad 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -11,15 +11,23 @@ #define _NET_OVPN_OVPNSTRUCT_H_ #include <net/net_trackers.h> +#include <uapi/linux/if_link.h> +#include <uapi/linux/ovpn.h> /**
- struct ovpn_struct - per ovpn interface state
- @dev: the actual netdev representing the tunnel
- @dev_tracker: reference tracker for associated dev
- @registered: whether dev is still registered with netdev or not
- @mode: device operation mode (i.e. p2p, mp, ..)
*/ struct ovpn_struct { struct net_device *dev; netdevice_tracker dev_tracker;
- @dev_list: entry for the module wide device list
- bool registered;
- enum ovpn_mode mode;
- struct list_head dev_list;
dev_list is no more used and should be deleted.
}; #endif /* _NET_OVPN_OVPNSTRUCT_H_ */ diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h new file mode 100644 index 0000000000000000000000000000000000000000..7ed146f5932a25f448af6da58738a7eae81007fe --- /dev/null +++ b/drivers/net/ovpn/packet.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- Copyright (C) 2020-2024 OpenVPN, Inc.
- Author: Antonio Quartulli antonio@openvpn.net
James Yonan <james@openvpn.net>
- */
+#ifndef _NET_OVPN_PACKET_H_ +#define _NET_OVPN_PACKET_H_
+/* When the OpenVPN protocol is ran in AEAD mode, use
- the OpenVPN packet ID as the AEAD nonce:
- 00000005 521c3b01 4308c041
- [seq # ] [ nonce_tail ]
- [ 12-byte full IV ] -> NONCE_SIZE
- [4-bytes -> NONCE_WIRE_SIZE
- on wire]
- */
Nice diagram! Can we go futher and define the OpenVPN packet header as a stucture? Referencing the structure instead of using magic sizes and offsets can greatly improve the code readability. Especially when it comes to header construction/parsing in the encryption/decryption code.
E.g. define a structures like this:
struct ovpn_pkt_hdr { __be32 op; __be32 pktid; u8 auth[]; } __attribute__((packed));
struct ovpn_aead_iv { __be32 pktid; u8 nonce[OVPN_NONCE_TAIL_SIZE]; } __attribute__((packed));
+/* OpenVPN nonce size */ +#define NONCE_SIZE 12
nit: is using the common 'OVPN_' prefix here and for other constants any good idea? E.g. OVPN_NONCE_SIZE. It can give some hints where it comes from for a code reader.
And another one question. Could you clarify in the comment to this constant where it came from? AFAIU, these 12 bytes is the expectation of the nonce size of AEAD crypto protocol, rigth?
+/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the
- size of the AEAD Associated Data (AD) sent over the wire
- and is normally the head of the IV
- */
+#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail))
If the headers and IV are defined as structures, we no more need this constant since the header construction will be done by a compiler according to the structure layout.
+/* Last 8 bytes of AEAD nonce
- Provided by userspace and usually derived from
- key material generated during TLS handshake
- */
+struct ovpn_nonce_tail {
- u8 u8[OVPN_NONCE_TAIL_SIZE];
+};
Why do you need a dadicated structure for this array? Can we declare the corresponding fields like this:
u8 nonce_tail_xmit[OVPN_NONCE_TAIL_SIZE]; u8 nonce_tail_recv[OVPN_NONCE_TAIL_SIZE];
+#endif /* _NET_OVPN_PACKET_H_ */ diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1975,4 +1975,19 @@ enum { #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) +/* OVPN section */
+enum ovpn_mode {
- OVPN_MODE_P2P,
- OVPN_MODE_MP,
+};
Mode min/max values can be defined here and the netlink policy can reference these values:
enum ovpn_mode { OVPN_MODE_P2P, OVPN_MODE_MP, __OVPN_MODE_MAX };
#define OVPN_MODE_MIN OVPN_MODE_P2P #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)
... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)
+enum {
- IFLA_OVPN_UNSPEC,
- IFLA_OVPN_MODE,
- __IFLA_OVPN_MAX,
+};
+#define IFLA_OVPN_MAX (__IFLA_OVPN_MAX - 1)
- #endif /* _UAPI_LINUX_IF_LINK_H */
-- Sergey
2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+/* When the OpenVPN protocol is ran in AEAD mode, use
- the OpenVPN packet ID as the AEAD nonce:
- 00000005 521c3b01 4308c041
- [seq # ] [ nonce_tail ]
- [ 12-byte full IV ] -> NONCE_SIZE
- [4-bytes -> NONCE_WIRE_SIZE
- on wire]
- */
Nice diagram! Can we go futher and define the OpenVPN packet header as a stucture? Referencing the structure instead of using magic sizes and offsets can greatly improve the code readability. Especially when it comes to header construction/parsing in the encryption/decryption code.
E.g. define a structures like this:
struct ovpn_pkt_hdr { __be32 op; __be32 pktid; u8 auth[]; } __attribute__((packed));
struct ovpn_aead_iv { __be32 pktid; u8 nonce[OVPN_NONCE_TAIL_SIZE]; } __attribute__((packed));
__attribute__((packed)) should not be needed here as the fields in both structs look properly aligned, and IIRC using packed can cause the compiler to generate worse code.
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1975,4 +1975,19 @@ enum { #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) +/* OVPN section */
+enum ovpn_mode {
- OVPN_MODE_P2P,
- OVPN_MODE_MP,
+};
Mode min/max values can be defined here and the netlink policy can reference these values:
enum ovpn_mode { OVPN_MODE_P2P, OVPN_MODE_MP, __OVPN_MODE_MAX };
#define OVPN_MODE_MIN OVPN_MODE_P2P #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)
... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)
I don't think there's much benefit to that, other than making the diff smaller on a (very unlikely) patch that would add a new mode in the future. It even looks more inconvenient to me when reading the code ("ok what are _MIN and _MAX? the code is using _P2P and _MP, do they match?").
On 12.11.2024 18:47, Sabrina Dubroca wrote:
2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+/* When the OpenVPN protocol is ran in AEAD mode, use
- the OpenVPN packet ID as the AEAD nonce:
- 00000005 521c3b01 4308c041
- [seq # ] [ nonce_tail ]
- [ 12-byte full IV ] -> NONCE_SIZE
- [4-bytes -> NONCE_WIRE_SIZE
- on wire]
- */
Nice diagram! Can we go futher and define the OpenVPN packet header as a stucture? Referencing the structure instead of using magic sizes and offsets can greatly improve the code readability. Especially when it comes to header construction/parsing in the encryption/decryption code.
E.g. define a structures like this:
struct ovpn_pkt_hdr { __be32 op; __be32 pktid; u8 auth[]; } __attribute__((packed));
struct ovpn_aead_iv { __be32 pktid; u8 nonce[OVPN_NONCE_TAIL_SIZE]; } __attribute__((packed));
__attribute__((packed)) should not be needed here as the fields in both structs look properly aligned, and IIRC using packed can cause the compiler to generate worse code.
True, the fields are pretty good aligned and from code generation perspective packed indication is unneeded. I suggested to mark structs as packed mostly as a documentation to clearly state that these structures represent specific memory layout.
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1975,4 +1975,19 @@ enum { #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) +/* OVPN section */
+enum ovpn_mode {
- OVPN_MODE_P2P,
- OVPN_MODE_MP,
+};
Mode min/max values can be defined here and the netlink policy can reference these values:
enum ovpn_mode { OVPN_MODE_P2P, OVPN_MODE_MP, __OVPN_MODE_MAX };
#define OVPN_MODE_MIN OVPN_MODE_P2P #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)
... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)
I don't think there's much benefit to that, other than making the diff smaller on a (very unlikely) patch that would add a new mode in the future. It even looks more inconvenient to me when reading the code ("ok what are _MIN and _MAX? the code is using _P2P and _MP, do they match?").
I would answer yes. Just prefer to trust these kind of statements until it crashes badly. Honestly, I never thought that referring to a max value might raise such a question. Can you give an example why it should be meaningful to know exact min/max values of an unordered set?
I suggested to define boundaries indeed for documentation purpose. Diff reduction is also desirable, but as you already mentioned, here it is not the case. Using specific values in a range declaration assigns them with extra semantic. Like, MODE_P2P is also a minimal possible value while MODE_MP has this extra meaning of minimal possible value. And we can learn this only from the policy which is specified far way from the modes declarations. I also see policies declaration as referring to already defined information rather than creating new meanings. On another hand the NL policy is the only user, so maybe we should left it as-is for the sake of simplicity.
-- Sergey
On 12/11/2024 17:47, Sabrina Dubroca wrote:
2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+/* When the OpenVPN protocol is ran in AEAD mode, use
- the OpenVPN packet ID as the AEAD nonce:
- 00000005 521c3b01 4308c041
- [seq # ] [ nonce_tail ]
- [ 12-byte full IV ] -> NONCE_SIZE
- [4-bytes -> NONCE_WIRE_SIZE
- on wire]
- */
Nice diagram! Can we go futher and define the OpenVPN packet header as a stucture? Referencing the structure instead of using magic sizes and offsets can greatly improve the code readability. Especially when it comes to header construction/parsing in the encryption/decryption code.
E.g. define a structures like this:
struct ovpn_pkt_hdr { __be32 op; __be32 pktid; u8 auth[]; } __attribute__((packed));
struct ovpn_aead_iv { __be32 pktid; u8 nonce[OVPN_NONCE_TAIL_SIZE]; } __attribute__((packed));
__attribute__((packed)) should not be needed here as the fields in both structs look properly aligned, and IIRC using packed can cause the compiler to generate worse code.
Agreed. Using packed will make certain architecture read every field byte by byte (I remember David M. biting us on this in batman-adv :))
This said, I like the idea of using a struct, but I don't feel confident enough to change the code now that we are hitting v12. This kind of change will be better implemented later and tested carefully. (and patches are always welcome! :))
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -1975,4 +1975,19 @@ enum { #define IFLA_DSA_MAX (__IFLA_DSA_MAX - 1) +/* OVPN section */
+enum ovpn_mode {
- OVPN_MODE_P2P,
- OVPN_MODE_MP,
+};
Mode min/max values can be defined here and the netlink policy can reference these values:
enum ovpn_mode { OVPN_MODE_P2P, OVPN_MODE_MP, __OVPN_MODE_MAX };
#define OVPN_MODE_MIN OVPN_MODE_P2P #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)
... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)
I don't think there's much benefit to that, other than making the diff smaller on a (very unlikely) patch that would add a new mode in the future. It even looks more inconvenient to me when reading the code ("ok what are _MIN and _MAX? the code is using _P2P and _MP, do they match?").
I agree with Sabrina here. I also initially thought about having MIN/MAX, but it wouldn't make things simpler for the ovpn_mode.
Regards,
On 14.11.2024 10:07, Antonio Quartulli wrote:
On 12/11/2024 17:47, Sabrina Dubroca wrote:
2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+/* When the OpenVPN protocol is ran in AEAD mode, use
- the OpenVPN packet ID as the AEAD nonce:
- * 00000005 521c3b01 4308c041
- * [seq # ] [ nonce_tail ]
- * [ 12-byte full IV ] -> NONCE_SIZE
- * [4-bytes -> NONCE_WIRE_SIZE
- * on wire]
- */
Nice diagram! Can we go futher and define the OpenVPN packet header as a stucture? Referencing the structure instead of using magic sizes and offsets can greatly improve the code readability. Especially when it comes to header construction/parsing in the encryption/decryption code.
E.g. define a structures like this:
struct ovpn_pkt_hdr { __be32 op; __be32 pktid; u8 auth[]; } __attribute__((packed));
struct ovpn_aead_iv { __be32 pktid; u8 nonce[OVPN_NONCE_TAIL_SIZE]; } __attribute__((packed));
__attribute__((packed)) should not be needed here as the fields in both structs look properly aligned, and IIRC using packed can cause the compiler to generate worse code.
Agreed. Using packed will make certain architecture read every field byte by byte (I remember David M. biting us on this in batman-adv :))
Still curious to see an example of that strange architecture/compiler combination. Anyway, as Sabrina mentioned, the header is already pretty aligned. So it's up to you how to document the structure.
This said, I like the idea of using a struct, but I don't feel confident enough to change the code now that we are hitting v12. This kind of change will be better implemented later and tested carefully. (and patches are always welcome! :))
The main reason behind the structure introduction is to improve the code readability. To reduce a shadow, where bugs can reside. I wonder how many people have invested their time to dig through the encryption preparation function?
As for risk of breaking something I should say that it can be addressed by connecting the kernel implementation to pure usespace implementation, which can be assumed the reference. And, I believe, it worth the benefit of merging easy to understand code.
-- Sergey
On 14/11/2024 23:57, Sergey Ryazanov wrote:
On 14.11.2024 10:07, Antonio Quartulli wrote:
On 12/11/2024 17:47, Sabrina Dubroca wrote:
2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+/* When the OpenVPN protocol is ran in AEAD mode, use
- the OpenVPN packet ID as the AEAD nonce:
- * 00000005 521c3b01 4308c041
- * [seq # ] [ nonce_tail ]
- * [ 12-byte full IV ] -> NONCE_SIZE
- * [4-bytes -> NONCE_WIRE_SIZE
- * on wire]
- */
Nice diagram! Can we go futher and define the OpenVPN packet header as a stucture? Referencing the structure instead of using magic sizes and offsets can greatly improve the code readability. Especially when it comes to header construction/parsing in the encryption/decryption code.
E.g. define a structures like this:
struct ovpn_pkt_hdr { __be32 op; __be32 pktid; u8 auth[]; } __attribute__((packed));
struct ovpn_aead_iv { __be32 pktid; u8 nonce[OVPN_NONCE_TAIL_SIZE]; } __attribute__((packed));
__attribute__((packed)) should not be needed here as the fields in both structs look properly aligned, and IIRC using packed can cause the compiler to generate worse code.
Agreed. Using packed will make certain architecture read every field byte by byte (I remember David M. biting us on this in batman-adv :))
Still curious to see an example of that strange architecture/compiler combination. Anyway, as Sabrina mentioned, the header is already pretty aligned. So it's up to you how to document the structure.
IIRC MIPS was one of those, but don't take my word for granted.
This said, I like the idea of using a struct, but I don't feel confident enough to change the code now that we are hitting v12. This kind of change will be better implemented later and tested carefully. (and patches are always welcome! :))
The main reason behind the structure introduction is to improve the code readability. To reduce a shadow, where bugs can reside. I wonder how many people have invested their time to dig through the encryption preparation function?
As for risk of breaking something I should say that it can be addressed by connecting the kernel implementation to pure usespace implementation, which can be assumed the reference. And, I believe, it worth the benefit of merging easy to understand code.
I understand your point, but this is something I need to spend time on because the openvpn packet format is not "very stable", as in "it can vary depending on negotiated features".
When implementing ovpn I decided what was the supported set of features so to create a stable packet header, but this may change moving forward (there is already some work going on in userspace regarding new features that ovpn will have to support). Therefore I want to take some time thinking about what's best.
Regards,
On 09/11/2024 02:01, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
Add basic infrastructure for handling ovpn interfaces.
Signed-off-by: Antonio Quartulli antonio@openvpn.net
drivers/net/ovpn/main.c | 115 ++++++++++++++++++++++++++++++++ ++++++++-- drivers/net/ovpn/main.h | 7 +++ drivers/net/ovpn/ovpnstruct.h | 8 +++ drivers/net/ovpn/packet.h | 40 +++++++++++++++ include/uapi/linux/if_link.h | 15 ++++++ 5 files changed, 180 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -10,18 +10,52 @@ #include <linux/genetlink.h> #include <linux/module.h> #include <linux/netdevice.h> +#include <linux/inetdevice.h> +#include <net/ip.h> #include <net/rtnetlink.h> -#include <uapi/linux/ovpn.h> +#include <uapi/linux/if_arp.h> #include "ovpnstruct.h" #include "main.h" #include "netlink.h" #include "io.h" +#include "packet.h" /* Driver info */ #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" #define DRV_COPYRIGHT "(C) 2020-2024 OpenVPN, Inc." +static void ovpn_struct_free(struct net_device *net) +{ +}
nit: since this handler is not mandatory, its introduction can be moved to the later patch, which actually fills it with meaningful operations.
ehmm sure I will move it
+static int ovpn_net_open(struct net_device *dev) +{ + netif_tx_start_all_queues(dev); + return 0; +}
+static int ovpn_net_stop(struct net_device *dev) +{ + netif_tx_stop_all_queues(dev); + return 0; +}
+static const struct net_device_ops ovpn_netdev_ops = { + .ndo_open = ovpn_net_open, + .ndo_stop = ovpn_net_stop, + .ndo_start_xmit = ovpn_net_xmit, +};
+static const struct device_type ovpn_type = { + .name = OVPN_FAMILY_NAME,
nit: same question here regarding name deriviation. Are you sure that the device type name is the same as the GENL family name?
Like I said in the previous patch, I want all representative strings to be "ovpn", that is already the netlink family name. But I can create another constant to document this explicitly.
+};
+static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = { + [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P, + OVPN_MODE_MP), +};
/** * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn' * @dev: the interface to check @@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev) return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; } +static void ovpn_setup(struct net_device *dev) +{ + /* compute the overhead considering AEAD encryption */ + const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 +
Where are these magic sizeof(u32) and '16' came from?
It's in the "nice diagram" you commented later in this patch :-) But I can extend the comment.
[...]
@@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, unsigned long state, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct ovpn_struct *ovpn; if (!ovpn_dev_is_valid(dev)) return NOTIFY_DONE; + ovpn = netdev_priv(dev);
nit: netdev_priv() returns only a pointer, it is safe to fetch the pointer in advance, but do not dereference it until we are sure that an event references the desired interface type. Takin this into consideration, the assignment of private data pointer can be moved above to the variable declaration. Just to make code couple of lines shorter.
I do it here because it seems more "logically correct" to retrieve the priv pointer after having confirmed that this is a ovpn interface with ovpn_dev_is_valid().
Moving it above kinda says "I already know there is a ovpn object here", but this is not the case until after the valid() check. So I prefer to keep it here.
[...]
--- a/drivers/net/ovpn/main.h +++ b/drivers/net/ovpn/main.h @@ -12,4 +12,11 @@ bool ovpn_dev_is_valid(const struct net_device *dev); +#define SKB_HEADER_LEN \ + (max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \ + sizeof(struct udphdr) + NET_SKB_PAD)
+#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4)
Where is this magic '16' came from?
should be the same 16 af the over head above (it's the auth tag len) Will make this more explicit with a comment.
+#define OVPN_MAX_PADDING 16
#endif /* _NET_OVPN_MAIN_H_ */ diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ ovpnstruct.h index e3e4df6418b081436378fc51d98db5bd7b5d1fbe..211df871538d34fdff90d182f21a0b0fb11b28ad 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -11,15 +11,23 @@ #define _NET_OVPN_OVPNSTRUCT_H_ #include <net/net_trackers.h> +#include <uapi/linux/if_link.h> +#include <uapi/linux/ovpn.h> /** * struct ovpn_struct - per ovpn interface state * @dev: the actual netdev representing the tunnel * @dev_tracker: reference tracker for associated dev
- @registered: whether dev is still registered with netdev or not
- @mode: device operation mode (i.e. p2p, mp, ..)
- @dev_list: entry for the module wide device list
*/ struct ovpn_struct { struct net_device *dev; netdevice_tracker dev_tracker; + bool registered; + enum ovpn_mode mode; + struct list_head dev_list;
dev_list is no more used and should be deleted.
ACK
[...]
+/* OpenVPN nonce size */ +#define NONCE_SIZE 12
nit: is using the common 'OVPN_' prefix here and for other constants any good idea? E.g. OVPN_NONCE_SIZE. It can give some hints where it comes from for a code reader.
ACK
And another one question. Could you clarify in the comment to this constant where it came from? AFAIU, these 12 bytes is the expectation of the nonce size of AEAD crypto protocol, rigth?
Correct: 12bytes/96bits. Will extend the comment.
+/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the
- size of the AEAD Associated Data (AD) sent over the wire
- and is normally the head of the IV
- */
+#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail))
If the headers and IV are defined as structures, we no more need this constant since the header construction will be done by a compiler according to the structure layout.
yap yap. Will do this later as explained in the other email.
+/* Last 8 bytes of AEAD nonce
- Provided by userspace and usually derived from
- key material generated during TLS handshake
- */
+struct ovpn_nonce_tail { + u8 u8[OVPN_NONCE_TAIL_SIZE]; +};
Why do you need a dadicated structure for this array? Can we declare the corresponding fields like this:
u8 nonce_tail_xmit[OVPN_NONCE_TAIL_SIZE]; u8 nonce_tail_recv[OVPN_NONCE_TAIL_SIZE];
I think the original reason was to have something to pass to sizeof() without making it harder for the reader.
At some point I also wanted to get rid of the struct,but something stopped me. Not sure what was though. Will give it a try.
Thanks a lot. Regards,
Missed the most essential note regarding this patch :)
On 29.10.2024 12:47, Antonio Quartulli wrote:
+static int ovpn_net_open(struct net_device *dev) +{
- netif_tx_start_all_queues(dev);
- return 0;
+}
+static int ovpn_net_stop(struct net_device *dev) +{
- netif_tx_stop_all_queues(dev);
Here we stop a user generated traffic in downlink. Shall we take care about other kinds of traffic: keepalive, uplink?
I believe we should remove all the peers here or at least stop the keepalive generation. But peers removing is better since administratively down is administratively down, meaning user expected full traffic stop in any direction. And even if we only stop the keepalive generation then peer(s) anyway will destroy the tunnel on their side.
This way we even should not care about peers removing on the device unregistering. What do you think?
- return 0;
+}
On 10/11/2024 21:42, Sergey Ryazanov wrote:
Missed the most essential note regarding this patch :)
On 29.10.2024 12:47, Antonio Quartulli wrote:
+static int ovpn_net_open(struct net_device *dev) +{ + netif_tx_start_all_queues(dev); + return 0; +}
+static int ovpn_net_stop(struct net_device *dev) +{ + netif_tx_stop_all_queues(dev);
Here we stop a user generated traffic in downlink. Shall we take care about other kinds of traffic: keepalive, uplink?
Keepalive is "metadata" and should continue to flow, regardless of whether the user interface is brought down.
Uplink traffic directed to *this* device should just be dropped at delivery time.
Incoming traffic directed to other peers will continue to work.
I believe we should remove all the peers here or at least stop the keepalive generation. But peers removing is better since administratively down is administratively down, meaning user expected full traffic stop in any direction. And even if we only stop the keepalive generation then peer(s) anyway will destroy the tunnel on their side.
Uhm, I don't think the user expects all "protocol" traffic (and client to client) to stop by simply bringing down the interface.
This way we even should not care about peers removing on the device unregistering. What do you think?
I think you are now mixing data plane and control plane.
The fact that the user is stopping payload traffic does not imply we want to stop the VPN. The user may just be doing something with the interface (and on an MP node client-to-client traffic will still continue to flow).
This would also be a non-negligible (and user faving) change in behaviour compared to the current openvpn implementation.
Thanks for your input though, I can imagine coming from different angles things may look not the same.
Regards,
+ return 0; +}
On 15.11.2024 16:03, Antonio Quartulli wrote:
On 10/11/2024 21:42, Sergey Ryazanov wrote:
Missed the most essential note regarding this patch :)
On 29.10.2024 12:47, Antonio Quartulli wrote:
+static int ovpn_net_open(struct net_device *dev) +{ + netif_tx_start_all_queues(dev); + return 0; +}
+static int ovpn_net_stop(struct net_device *dev) +{ + netif_tx_stop_all_queues(dev);
Here we stop a user generated traffic in downlink. Shall we take care about other kinds of traffic: keepalive, uplink?
Keepalive is "metadata" and should continue to flow, regardless of whether the user interface is brought down.
Uplink traffic directed to *this* device should just be dropped at delivery time.
Incoming traffic directed to other peers will continue to work.
How it's possible? AFAIU, the module uses the kernel IP routing subsystem. Putting the interface down will effectively block a client-to-client packet to reenter the interface.
I believe we should remove all the peers here or at least stop the keepalive generation. But peers removing is better since administratively down is administratively down, meaning user expected full traffic stop in any direction. And even if we only stop the keepalive generation then peer(s) anyway will destroy the tunnel on their side.
Uhm, I don't think the user expects all "protocol" traffic (and client to client) to stop by simply bringing down the interface.
This way we even should not care about peers removing on the device unregistering. What do you think?
I think you are now mixing data plane and control plane.
The fact that the user is stopping payload traffic does not imply we want to stop the VPN. The user may just be doing something with the interface (and on an MP node client-to-client traffic will still continue to flow).
This would also be a non-negligible (and user faving) change in behaviour compared to the current openvpn implementation.
It's not about previous implementation, it's about the interface management procedures. I just cannot image how the proposed approach can be aligned with RFC 2863 section 3.1.13. IfAdminStatus and IfOperStatus.
And if we are talking about a user experience, I cannot imagine my WLAN interface maintaining a connection to the access point after shutting it down. Or even better, a WLAN interface in the AP mode still forwarding traffic between wireless clients. Or a bridge interface switching traffic between ports and sending STP frames.
Thanks for your input though, I can imagine coming from different angles things may look not the same.
I believe nobody will mind if a userspace service will do a failover to continue serving connected clients. But from the kernel perspective, when user says 'ip link set down' the party is over.
-- Sergey
On 19/11/2024 04:08, Sergey Ryazanov wrote:
On 15.11.2024 16:03, Antonio Quartulli wrote:
On 10/11/2024 21:42, Sergey Ryazanov wrote:
Missed the most essential note regarding this patch :)
On 29.10.2024 12:47, Antonio Quartulli wrote:
+static int ovpn_net_open(struct net_device *dev) +{ + netif_tx_start_all_queues(dev); + return 0; +}
+static int ovpn_net_stop(struct net_device *dev) +{ + netif_tx_stop_all_queues(dev);
Here we stop a user generated traffic in downlink. Shall we take care about other kinds of traffic: keepalive, uplink?
Keepalive is "metadata" and should continue to flow, regardless of whether the user interface is brought down.
Uplink traffic directed to *this* device should just be dropped at delivery time.
Incoming traffic directed to other peers will continue to work.
How it's possible? AFAIU, the module uses the kernel IP routing subsystem. Putting the interface down will effectively block a client- to-client packet to reenter the interface.
True. At least part of the traffic is stopped (traffic directed to the VPN IP of a peer will still flow as it does not require a routing table lookup).
I circled this discussion through the other devs to see what perspective they would bring and we also agree that if something is stopping, better stop the entire infra.
Also, if a user is fumbling with the link state, they are probably trying to bring the VPN down.
I will go that way and basically perform the same cleanup as if the interface is being deleted.
"the party is over"[cit.] :)
Regards,
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected.
This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch --- drivers/net/ovpn/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net)
static int ovpn_net_open(struct net_device *dev) { + /* ovpn keeps the carrier always on to avoid losing IP or route + * configuration upon disconnection. This way it can prevent leaks + * of traffic outside of the VPN tunnel. + * The user may override this behaviour by tearing down the interface + * manually. + */ + netif_carrier_on(dev); netif_tx_start_all_queues(dev); return 0; }
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected.
This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch
drivers/net/ovpn/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_open(struct net_device *dev) {
- /* ovpn keeps the carrier always on to avoid losing IP or route
* configuration upon disconnection. This way it can prevent leaks
* of traffic outside of the VPN tunnel.
* The user may override this behaviour by tearing down the interface
* manually.
*/
- netif_carrier_on(dev);
If a user cares about the traffic leaking, then he can create a blackhole route with huge metric:
# ip route add blackhole default metric 10000
Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface?
netif_tx_start_all_queues(dev); return 0; }
On 09/11/2024 02:11, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected.
This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch
drivers/net/ovpn/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_open(struct net_device *dev) { + /* ovpn keeps the carrier always on to avoid losing IP or route + * configuration upon disconnection. This way it can prevent leaks + * of traffic outside of the VPN tunnel. + * The user may override this behaviour by tearing down the interface + * manually. + */ + netif_carrier_on(dev);
If a user cares about the traffic leaking, then he can create a blackhole route with huge metric:
# ip route add blackhole default metric 10000
Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface?
This was discussed loooong ago with Andrew. Here my last response:
https://lore.kernel.org/all/d896bbd8-2709-4834-a637-f982fc51fc57@openvpn.net...
Regards,
netif_tx_start_all_queues(dev); return 0; }
On 15.11.2024 16:13, Antonio Quartulli wrote:
On 09/11/2024 02:11, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected.
This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch
drivers/net/ovpn/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_open(struct net_device *dev) { + /* ovpn keeps the carrier always on to avoid losing IP or route + * configuration upon disconnection. This way it can prevent leaks + * of traffic outside of the VPN tunnel. + * The user may override this behaviour by tearing down the interface + * manually. + */ + netif_carrier_on(dev);
If a user cares about the traffic leaking, then he can create a blackhole route with huge metric:
# ip route add blackhole default metric 10000
Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface?
This was discussed loooong ago with Andrew. Here my last response:
https://lore.kernel.org/all/d896bbd8-2709-4834-a637- f982fc51fc57@openvpn.net/
Thank you for sharing the link to the beginning of the conversation. Till the moment we have 3 topics regarding the operational state indication: 1. possible absence of a conception of running state, 2. influence on routing protocol implementations, 3. traffic leaking.
As for conception of the running state, it should exists for tunneling protocols with a state tracking. In this specific case, we can assume interface running when it has configured peer with keys. The protocol even has nice feature for the connection monitoring - keepalive.
Routing protocols on one hand could benefit from the operational state indication. On another hand, hello/hold timer values mentioned in the documentation are comparable with default routing protocols timers. So, actual improvement is debatable.
Regarding the traffic leading, as I mentioned before, the blackhole route or a firewall rule works better then implicit blackholing with a non-running interface.
Long story short, I agree that we might not need a real operational state indication now. Still protecting from a traffic leaking is not good enough justification.
Andrew, what do you think? Is the traffic leaking prevention any good justification or it needs to be updated?
-- Sergey
On 20/11/2024 23:56, Sergey Ryazanov wrote:
On 15.11.2024 16:13, Antonio Quartulli wrote:
On 09/11/2024 02:11, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected.
This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch
drivers/net/ovpn/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_open(struct net_device *dev) { + /* ovpn keeps the carrier always on to avoid losing IP or route + * configuration upon disconnection. This way it can prevent leaks + * of traffic outside of the VPN tunnel. + * The user may override this behaviour by tearing down the interface + * manually. + */ + netif_carrier_on(dev);
If a user cares about the traffic leaking, then he can create a blackhole route with huge metric:
# ip route add blackhole default metric 10000
Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface?
This was discussed loooong ago with Andrew. Here my last response:
https://lore.kernel.org/all/d896bbd8-2709-4834-a637- f982fc51fc57@openvpn.net/
Thank you for sharing the link to the beginning of the conversation. Till the moment we have 3 topics regarding the operational state indication:
- possible absence of a conception of running state,
- influence on routing protocol implementations,
- traffic leaking.
As for conception of the running state, it should exists for tunneling protocols with a state tracking. In this specific case, we can assume interface running when it has configured peer with keys. The protocol even has nice feature for the connection monitoring - keepalive.
What about a device in MP mode? It doesn't make sense to turn the carrier off when the MP node has no peers connected. At the same time I don't like having P2P and MP devices behaving differently in this regard. Therefore keeping the carrier on seemed the most logical way forward (at least for now - we can still come back to this once we have something smarter to implement).
Routing protocols on one hand could benefit from the operational state indication. On another hand, hello/hold timer values mentioned in the documentation are comparable with default routing protocols timers. So, actual improvement is debatable.
Regarding the traffic leading, as I mentioned before, the blackhole route or a firewall rule works better then implicit blackholing with a non-running interface.
Long story short, I agree that we might not need a real operational state indication now. Still protecting from a traffic leaking is not good enough justification.
Well, it's the so called "persistent interface" concept in VPNs: leave everything as is, even if the connection is lost. I know it can be implemented in many other different ways..but I don't see a real problem with keeping this way.
A blackhole/firewall can still be added if the user prefers (and not use the persistent interface).
Regards,
Andrew, what do you think? Is the traffic leaking prevention any good justification or it needs to be updated?
-- Sergey
On 21.11.2024 23:17, Antonio Quartulli wrote:
On 20/11/2024 23:56, Sergey Ryazanov wrote:
On 15.11.2024 16:13, Antonio Quartulli wrote:
On 09/11/2024 02:11, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected.
This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch
drivers/net/ovpn/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_open(struct net_device *dev) { + /* ovpn keeps the carrier always on to avoid losing IP or route + * configuration upon disconnection. This way it can prevent leaks + * of traffic outside of the VPN tunnel. + * The user may override this behaviour by tearing down the interface + * manually. + */ + netif_carrier_on(dev);
If a user cares about the traffic leaking, then he can create a blackhole route with huge metric:
# ip route add blackhole default metric 10000
Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface?
This was discussed loooong ago with Andrew. Here my last response:
https://lore.kernel.org/all/d896bbd8-2709-4834-a637- f982fc51fc57@openvpn.net/
Thank you for sharing the link to the beginning of the conversation. Till the moment we have 3 topics regarding the operational state indication:
- possible absence of a conception of running state,
- influence on routing protocol implementations,
- traffic leaking.
As for conception of the running state, it should exists for tunneling protocols with a state tracking. In this specific case, we can assume interface running when it has configured peer with keys. The protocol even has nice feature for the connection monitoring - keepalive.
What about a device in MP mode? It doesn't make sense to turn the carrier off when the MP node has no peers connected. At the same time I don't like having P2P and MP devices behaving differently in this regard.
MP with a single network interface is an endless headache. Indeed. On the other hand, penalizing P2P users just because protocol support MP doesn't look like a solution either.
Therefore keeping the carrier on seemed the most logical way forward (at least for now - we can still come back to this once we have something smarter to implement).
It was shown above how to distinguish between running and non-running cases.
If an author doesn't want to implement operational state indication now, then I'm Ok with it. Not a big deal now. I just don't like the idea to promote the abuse of the running state indicator. Please see below.
Routing protocols on one hand could benefit from the operational state indication. On another hand, hello/hold timer values mentioned in the documentation are comparable with default routing protocols timers. So, actual improvement is debatable.
Regarding the traffic leading, as I mentioned before, the blackhole route or a firewall rule works better then implicit blackholing with a non-running interface.
Long story short, I agree that we might not need a real operational state indication now. Still protecting from a traffic leaking is not good enough justification.
Well, it's the so called "persistent interface" concept in VPNs: leave everything as is, even if the connection is lost.
It's called routing framework abuse. The IP router will choose the route and the egress interface not because this route is a good option to deliver a packet, but because someone trick it.
At some circumstance, e.g. Android app, it could be the only way to prevent traffic leaking. But these special circumstances do not make solution generic and eligible for inclusion into the mainline code.
I know it can be implemented in many other different ways..but I don't see a real problem with keeping this way.
At least routing protocols and network monitoring software will not be happy to see a dead interface pretending that it's still running. Generally speaking, saying that interface is running, when module knows for sure that a packet can not be delivered is a user misguiding.
A blackhole/firewall can still be added if the user prefers (and not use the persistent interface).
The solution with false-indication is not so reliable as it might look. Interface shutdown, inability of a user-space application to start, user-space application crash, user-space application restart, each of them will void the trick. Ergo, blackhole/firewall must be employed by a security concerned user. What makes the proposed feature odd.
To summaries, I'm Ok if this change will be merged with a comment like "For future study" or "To be done" or "To be implemented". But a comment like "to prevent traffic leaking" or any other comment implying a "breakthrough security feature" will have a big NACK from my side.
-- Sergey
On 23/11/2024 23:25, Sergey Ryazanov wrote:
On 21.11.2024 23:17, Antonio Quartulli wrote:
On 20/11/2024 23:56, Sergey Ryazanov wrote:
On 15.11.2024 16:13, Antonio Quartulli wrote:
On 09/11/2024 02:11, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn interface will keep carrier always on and let the user decide when an interface should be considered disconnected.
This way, even if an ovpn interface is not connected to any peer, it can still retain all IPs and routes and thus prevent any data leak.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch
drivers/net/ovpn/main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_open(struct net_device *dev) { + /* ovpn keeps the carrier always on to avoid losing IP or route + * configuration upon disconnection. This way it can prevent leaks + * of traffic outside of the VPN tunnel. + * The user may override this behaviour by tearing down the interface + * manually. + */ + netif_carrier_on(dev);
If a user cares about the traffic leaking, then he can create a blackhole route with huge metric:
# ip route add blackhole default metric 10000
Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface?
This was discussed loooong ago with Andrew. Here my last response:
https://lore.kernel.org/all/d896bbd8-2709-4834-a637- f982fc51fc57@openvpn.net/
Thank you for sharing the link to the beginning of the conversation. Till the moment we have 3 topics regarding the operational state indication:
- possible absence of a conception of running state,
- influence on routing protocol implementations,
- traffic leaking.
As for conception of the running state, it should exists for tunneling protocols with a state tracking. In this specific case, we can assume interface running when it has configured peer with keys. The protocol even has nice feature for the connection monitoring - keepalive.
What about a device in MP mode? It doesn't make sense to turn the carrier off when the MP node has no peers connected. At the same time I don't like having P2P and MP devices behaving differently in this regard.
MP with a single network interface is an endless headache. Indeed. On the other hand, penalizing P2P users just because protocol support MP doesn't look like a solution either.
On the upper side, with "iroutes" implemented using the system routing table, routing protocols will be able to detect new routes only when the related client has connected. (The same for the disconnection)
But this is a bit orthogonal compared to the oper state.
Therefore keeping the carrier on seemed the most logical way forward (at least for now - we can still come back to this once we have something smarter to implement).
It was shown above how to distinguish between running and non-running cases.
If an author doesn't want to implement operational state indication now, then I'm Ok with it. Not a big deal now. I just don't like the idea to promote the abuse of the running state indicator. Please see below.
Routing protocols on one hand could benefit from the operational state indication. On another hand, hello/hold timer values mentioned in the documentation are comparable with default routing protocols timers. So, actual improvement is debatable.
Regarding the traffic leading, as I mentioned before, the blackhole route or a firewall rule works better then implicit blackholing with a non-running interface.
Long story short, I agree that we might not need a real operational state indication now. Still protecting from a traffic leaking is not good enough justification.
Well, it's the so called "persistent interface" concept in VPNs: leave everything as is, even if the connection is lost.
It's called routing framework abuse. The IP router will choose the route and the egress interface not because this route is a good option to deliver a packet, but because someone trick it.
This is what the user wants. OpenVPN (userspace) will tear down the P2P interface upon disconnection, assuming the --persist-tun option was not specified by the user.
So the interface is gone in any case.
By keeping the netcarrier on we are just ensuring that, if the user wanted persist-tun, the iface is not actually making decisions on its own.
With a tun interface this can be done, now you want to basically drop this feature that existed for long time and break existing setups.
At some circumstance, e.g. Android app, it could be the only way to prevent traffic leaking. But these special circumstances do not make solution generic and eligible for inclusion into the mainline code.
Why not? We are not changing the general rule, but just defining a specific behaviour for a specific driver.
For example, I don't think a tun interface goes down when there is no socket attached to it, still packets are just going to be blackhole'd in that case. No?
I know it can be implemented in many other different ways..but I don't see a real problem with keeping this way.
At least routing protocols and network monitoring software will not be happy to see a dead interface pretending that it's still running.
They won't know that the interface is disconnected, they will possibly just see traffic being dropped.
Generally speaking, saying that interface is running, when module knows for sure that a packet can not be delivered is a user misguiding.
Or a feature, wanted by the user.
A blackhole/firewall can still be added if the user prefers (and not use the persistent interface).
The solution with false-indication is not so reliable as it might look. Interface shutdown, inability of a user-space application to start, user-space application crash, user-space application restart, each of them will void the trick. Ergo, blackhole/firewall must be employed by a security concerned user. What makes the proposed feature odd.
Yeah, this is what other VPN clients call "kill switch". Persist-tun is just one piece of the puzzle, yet important.
To summaries, I'm Ok if this change will be merged with a comment like "For future study" or "To be done" or "To be implemented". But a comment like "to prevent traffic leaking" or any other comment implying a "breakthrough security feature" will have a big NACK from my side.
What if the comment redirects the user to --persist-tun option in order to clarify the context and the wanted behaviour?
Would that help?
-- Sergey
On 24.11.2024 00:52, Antonio Quartulli wrote:
On 23/11/2024 23:25, Sergey Ryazanov wrote:
On 21.11.2024 23:17, Antonio Quartulli wrote:
On 20/11/2024 23:56, Sergey Ryazanov wrote:
On 15.11.2024 16:13, Antonio Quartulli wrote:
On 09/11/2024 02:11, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote: > An ovpn interface will keep carrier always on and let the user > decide when an interface should be considered disconnected. > > This way, even if an ovpn interface is not connected to any peer, > it can still retain all IPs and routes and thus prevent any data > leak. > > Signed-off-by: Antonio Quartulli antonio@openvpn.net > Reviewed-by: Andrew Lunn andrew@lunn.ch > --- > drivers/net/ovpn/main.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c > index > eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644 > --- a/drivers/net/ovpn/main.c > +++ b/drivers/net/ovpn/main.c > @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device > *net) > static int ovpn_net_open(struct net_device *dev) > { > + /* ovpn keeps the carrier always on to avoid losing IP or route > + * configuration upon disconnection. This way it can prevent > leaks > + * of traffic outside of the VPN tunnel. > + * The user may override this behaviour by tearing down the > interface > + * manually. > + */ > + netif_carrier_on(dev);
If a user cares about the traffic leaking, then he can create a blackhole route with huge metric:
# ip route add blackhole default metric 10000
Why the network interface should implicitly provide this functionality? And on another hand, how a routing daemon can learn a topology change without indication from the interface?
This was discussed loooong ago with Andrew. Here my last response:
https://lore.kernel.org/all/d896bbd8-2709-4834-a637- f982fc51fc57@openvpn.net/
Thank you for sharing the link to the beginning of the conversation. Till the moment we have 3 topics regarding the operational state indication:
- possible absence of a conception of running state,
- influence on routing protocol implementations,
- traffic leaking.
As for conception of the running state, it should exists for tunneling protocols with a state tracking. In this specific case, we can assume interface running when it has configured peer with keys. The protocol even has nice feature for the connection monitoring - keepalive.
What about a device in MP mode? It doesn't make sense to turn the carrier off when the MP node has no peers connected. At the same time I don't like having P2P and MP devices behaving differently in this regard.
MP with a single network interface is an endless headache. Indeed. On the other hand, penalizing P2P users just because protocol support MP doesn't look like a solution either.
On the upper side, with "iroutes" implemented using the system routing table, routing protocols will be able to detect new routes only when the related client has connected. (The same for the disconnection)
But this is a bit orthogonal compared to the oper state.
The patch has nothing common with the routes configuration. The main concern is forcing of the running state indication. And more specifically, the concern is the given justification for this activity.
Therefore keeping the carrier on seemed the most logical way forward (at least for now - we can still come back to this once we have something smarter to implement).
It was shown above how to distinguish between running and non-running cases.
If an author doesn't want to implement operational state indication now, then I'm Ok with it. Not a big deal now. I just don't like the idea to promote the abuse of the running state indicator. Please see below.
Routing protocols on one hand could benefit from the operational state indication. On another hand, hello/hold timer values mentioned in the documentation are comparable with default routing protocols timers. So, actual improvement is debatable.
Regarding the traffic leading, as I mentioned before, the blackhole route or a firewall rule works better then implicit blackholing with a non-running interface.
Long story short, I agree that we might not need a real operational state indication now. Still protecting from a traffic leaking is not good enough justification.
Well, it's the so called "persistent interface" concept in VPNs: leave everything as is, even if the connection is lost.
It's called routing framework abuse. The IP router will choose the route and the egress interface not because this route is a good option to deliver a packet, but because someone trick it.
This is what the user wants.
Will be happy to see a study on user's preferences.
OpenVPN (userspace) will tear down the P2P interface upon disconnection, assuming the --persist-tun option was not specified by the user.
So the interface is gone in any case.
By keeping the netcarrier on we are just ensuring that, if the user wanted persist-tun, the iface is not actually making decisions on its own.
Regarding a decision on its own. Ethernet interface going to the not-running state upon lost of carrier from a switch. It's hardly could be considered a decision of the interface. It's an indication of the fact.
Similarly, beeping of UPS is not its decision to make user's life miserable, it's the indication of the power line failure. I hope, at least we are both agree on that a UPS should indicate the line failure.
Back to the 'persist-tun' option. I checked the openvpn(8) man page. It gives a reasonable hints to use this option to avoid negative outcomes on internal openvpn process restart. E.g. in case of privilege dropping. It servers the same purpose as 'persist-key'. And there is no word regarding traffic leaking.
If somebody have decided that this option gives the funny side-effect and allows to cut the corners, then I cannot say anything but sorry.
With a tun interface this can be done, now you want to basically drop this feature that existed for long time and break existing setups.
Amicus Plato, sed magis amica veritas
Yes, I don't want to see this interface misbehaviour advertised as a security feature. I hope the previous email gives a detailed explanation why.
If it's going to break existing setup, then end-users can be supported with a changelog notice explaining how to properly address the risk of the traffic leaking.
At some circumstance, e.g. Android app, it could be the only way to prevent traffic leaking. But these special circumstances do not make solution generic and eligible for inclusion into the mainline code.
Why not? We are not changing the general rule, but just defining a specific behaviour for a specific driver.
Yeah. This patch is not changing the general rule. The patch breaks it and the comment in the code makes proud of it. Looks like an old joke that documented bug become a feature.
From a system administrator or a firmware developer perspective, the proposed behaviour will look like inconsistency comparing to other interface types. And this inconsistency requires to be addressed with special configuration or a dedicated script in a worst case. And I cannot see justified reason to make their life harder.
For example, I don't think a tun interface goes down when there is no socket attached to it, still packets are just going to be blackhole'd in that case. No?
Nope. Tun interface indeed will go into the non-running state on the detach event. Moreover, the tun module supports running/non-running indication change upon a command from userspace. But not every userspace application feel a desire to implement it.
I know it can be implemented in many other different ways..but I don't see a real problem with keeping this way.
At least routing protocols and network monitoring software will not be happy to see a dead interface pretending that it's still running.
They won't know that the interface is disconnected, they will possibly just see traffic being dropped.
Packet loss detection is quite complex operation. So yes, they are indeed monitoring the interface operational state to warn operator as soon as possible and take some automatic actions if we are talking about routing protocols. Some sophisticated monitoring systems even capable to generate events like 'link unstable' with higher severity if they see interface operational state flapping in a short period of time.
So yeah, for these kinds of systems, proper operational state indication is essential.
Generally speaking, saying that interface is running, when module knows for sure that a packet can not be delivered is a user misguiding.
Or a feature, wanted by the user.
A blackhole/firewall can still be added if the user prefers (and not use the persistent interface).
The solution with false-indication is not so reliable as it might look. Interface shutdown, inability of a user-space application to start, user-space application crash, user-space application restart, each of them will void the trick. Ergo, blackhole/firewall must be employed by a security concerned user. What makes the proposed feature odd.
Yeah, this is what other VPN clients call "kill switch". Persist-tun is just one piece of the puzzle, yet important.
To summaries, I'm Ok if this change will be merged with a comment like "For future study" or "To be done" or "To be implemented". But a comment like "to prevent traffic leaking" or any other comment implying a "breakthrough security feature" will have a big NACK from my side.
What if the comment redirects the user to --persist-tun option in order to clarify the context and the wanted behaviour?
Would that help?
Nope. As it was mentioned above, the are no indication that 'persist-tun' is a 'security' feature even in the current openvpn documentation.
If the openvpn developers want to keep implementation bug-to-bug compatible, then feel free to configure the blackhole route on behalf of end-user by means of the userspace daemon. Nobody will mind.
-- Sergey
On 25/11/2024 03:26, Sergey Ryazanov wrote:
OpenVPN (userspace) will tear down the P2P interface upon disconnection, assuming the --persist-tun option was not specified by the user.
So the interface is gone in any case.
By keeping the netcarrier on we are just ensuring that, if the user wanted persist-tun, the iface is not actually making decisions on its own.
Regarding a decision on its own. Ethernet interface going to the not- running state upon lost of carrier from a switch. It's hardly could be considered a decision of the interface. It's an indication of the fact.
Similarly, beeping of UPS is not its decision to make user's life miserable, it's the indication of the power line failure. I hope, at least we are both agree on that a UPS should indicate the line failure.
The answer is always "it depends".
Back to the 'persist-tun' option. I checked the openvpn(8) man page. It gives a reasonable hints to use this option to avoid negative outcomes on internal openvpn process restart. E.g. in case of privilege dropping. It servers the same purpose as 'persist-key'. And there is no word regarding traffic leaking.
FTR, here is the text in the manpage:
--persist-tun Don't close and reopen TUN/TAP device or run up/down scripts across SIGUSR1 or --ping-restart restarts.
SIGUSR1 is a restart signal similar to SIGHUP, but which offers finer-grained control over reset options.
SIGUSR1 is a session reconnection, not a process restart. The manpage just indicates what happens at the low level when this option is provided.
The next question is: what is this useful for? Many things, among those there is the fact the interface will retain its configuration (i.e. IPs, routes, etc).
If somebody have decided that this option gives the funny side-effect and allows to cut the corners, then I cannot say anything but sorry.
Well, OpenVPN is more than 20 years old. If a given API allows a specific user behaviour and had done so for those many years, changing it is still a user breakage. Not much we can do.
With a tun interface this can be done, now you want to basically drop this feature that existed for long time and break existing setups.
Amicus Plato, sed magis amica veritas
Yes, I don't want to see this interface misbehaviour advertised as a security feature. I hope the previous email gives a detailed explanation why.
Let's forget about the traffic leak mention and the "security feature". That comment was probably written in the middle of the night and I agree it gives a false sense or what is happening.
If it's going to break existing setup, then end-users can be supported with a changelog notice explaining how to properly address the risk of the traffic leaking.
Nope, we can't just break existing user setups.
At some circumstance, e.g. Android app, it could be the only way to prevent traffic leaking. But these special circumstances do not make solution generic and eligible for inclusion into the mainline code.
Why not? We are not changing the general rule, but just defining a specific behaviour for a specific driver.
Yeah. This patch is not changing the general rule. The patch breaks it and the comment in the code makes proud of it. Looks like an old joke that documented bug become a feature.
Like I said above, let's make the comment meaningful for the expected goal: implement persist-tun while leaving userspace the chance to decide what to do.
From a system administrator or a firmware developer perspective, the proposed behaviour will look like inconsistency comparing to other interface types. And this inconsistency requires to be addressed with special configuration or a dedicated script in a worst case. And I cannot see justified reason to make their life harder.
You can configure openvpn to bring the interface down when the connection is lost. Why do you say it requires extra scripting and such?
For example, I don't think a tun interface goes down when there is no socket attached to it, still packets are just going to be blackhole'd in that case. No?
Nope. Tun interface indeed will go into the non-running state on the detach event. Moreover, the tun module supports running/non-running indication change upon a command from userspace. But not every userspace application feel a desire to implement it.
With 'ovpn' we basically want a similar effect: let userspace decide what to do depending on the configuration.
I know it can be implemented in many other different ways..but I don't see a real problem with keeping this way.
At least routing protocols and network monitoring software will not be happy to see a dead interface pretending that it's still running.
They won't know that the interface is disconnected, they will possibly just see traffic being dropped.
Packet loss detection is quite complex operation. So yes, they are indeed monitoring the interface operational state to warn operator as soon as possible and take some automatic actions if we are talking about routing protocols. Some sophisticated monitoring systems even capable to generate events like 'link unstable' with higher severity if they see interface operational state flapping in a short period of time.
So yeah, for these kinds of systems, proper operational state indication is essential.
Again, if the user has not explicitly allowed the persistent behaviour, the interface will be brought down when a disconnection happens. But if the user/administrator *wants* to avoid that, he has needs a chance to do that.
Otherwise people that needs this behaviour will just have to stick to using tun and the full userspace implementation.
Generally speaking, saying that interface is running, when module knows for sure that a packet can not be delivered is a user misguiding.
Or a feature, wanted by the user.
A blackhole/firewall can still be added if the user prefers (and not use the persistent interface).
The solution with false-indication is not so reliable as it might look. Interface shutdown, inability of a user-space application to start, user-space application crash, user-space application restart, each of them will void the trick. Ergo, blackhole/firewall must be employed by a security concerned user. What makes the proposed feature odd.
Yeah, this is what other VPN clients call "kill switch". Persist-tun is just one piece of the puzzle, yet important.
To summaries, I'm Ok if this change will be merged with a comment like "For future study" or "To be done" or "To be implemented". But a comment like "to prevent traffic leaking" or any other comment implying a "breakthrough security feature" will have a big NACK from my side.
What if the comment redirects the user to --persist-tun option in order to clarify the context and the wanted behaviour?
Would that help?
Nope. As it was mentioned above, the are no indication that 'persist- tun' is a 'security' feature even in the current openvpn documentation.
Like I mentioned above, I agree we should get rid of that sentence. The security feature must be implemented by means of extra tools, just the interface staying up is not enough.
If the openvpn developers want to keep implementation bug-to-bug compatible, then feel free to configure the blackhole route on behalf of end-user by means of the userspace daemon. Nobody will mind.
bug-to-bug compatible? What do you mean? Having userspace configure a blackhole route is something that can be considered by whoeever decides to implement the "kill switch" feature.
OpenVPN does not. It just implements --persist-tun.
So all in all, the conclusion is that in this case it's usersapce to decide when the interface should go up and down, depending on the configuration. I'd like to keep it as it is to avoid the ovpn interface to make decisions on its own.
I can spell this out in the comment (I think it definitely makes sense), to clarify that the netcarrier is expected to be driven by userspace (where the control plane is) rather than having the device make decisions without having the full picture.
What do you think?
Regards,
On 25.11.2024 15:07, Antonio Quartulli wrote:
On 25/11/2024 03:26, Sergey Ryazanov wrote:
OpenVPN (userspace) will tear down the P2P interface upon disconnection, assuming the --persist-tun option was not specified by the user.
So the interface is gone in any case.
By keeping the netcarrier on we are just ensuring that, if the user wanted persist-tun, the iface is not actually making decisions on its own.
Regarding a decision on its own. Ethernet interface going to the not- running state upon lost of carrier from a switch. It's hardly could be considered a decision of the interface. It's an indication of the fact.
Similarly, beeping of UPS is not its decision to make user's life miserable, it's the indication of the power line failure. I hope, at least we are both agree on that a UPS should indicate the line failure.
The answer is always "it depends".
Back to the 'persist-tun' option. I checked the openvpn(8) man page. It gives a reasonable hints to use this option to avoid negative outcomes on internal openvpn process restart. E.g. in case of privilege dropping. It servers the same purpose as 'persist-key'. And there is no word regarding traffic leaking.
FTR, here is the text in the manpage:
--persist-tun Don't close and reopen TUN/TAP device or run up/down scripts across SIGUSR1 or --ping-restart restarts.
SIGUSR1 is a restart signal similar to SIGHUP, but which offers finer-grained control over reset options.
SIGUSR1 is a session reconnection, not a process restart. The manpage just indicates what happens at the low level when this option is provided.
Still no mentions of the traffic leaking prevention. Is it?
The next question is: what is this useful for? Many things, among those there is the fact the interface will retain its configuration (i.e. IPs, routes, etc).
This is unrelated to the correct operational state indication. Addresses and routes are not reset in case of interface going to non-running state.
If somebody have decided that this option gives the funny side-effect and allows to cut the corners, then I cannot say anything but sorry.
Well, OpenVPN is more than 20 years old.
More than 20 years of misguiding users has been duly noted :)
Should I mention that RFC 1066 containing ifOperStatus definition was issues 12 years before OpenVPN? And than it was updated with multiple clarifications.
If a given API allows a specific user behaviour and had done so for those many years, changing it is still a user breakage. Not much we can do.
With a tun interface this can be done, now you want to basically drop this feature that existed for long time and break existing setups.
Amicus Plato, sed magis amica veritas
Yes, I don't want to see this interface misbehaviour advertised as a security feature. I hope the previous email gives a detailed explanation why.
Let's forget about the traffic leak mention and the "security feature". That comment was probably written in the middle of the night and I agree it gives a false sense or what is happening.
If it's going to break existing setup, then end-users can be supported with a changelog notice explaining how to properly address the risk of the traffic leaking.
Nope, we can't just break existing user setups.
At some circumstance, e.g. Android app, it could be the only way to prevent traffic leaking. But these special circumstances do not make solution generic and eligible for inclusion into the mainline code.
Why not? We are not changing the general rule, but just defining a specific behaviour for a specific driver.
Yeah. This patch is not changing the general rule. The patch breaks it and the comment in the code makes proud of it. Looks like an old joke that documented bug become a feature.
Like I said above, let's make the comment meaningful for the expected goal: implement persist-tun while leaving userspace the chance to decide what to do.
From a system administrator or a firmware developer perspective, the proposed behaviour will look like inconsistency comparing to other interface types. And this inconsistency requires to be addressed with special configuration or a dedicated script in a worst case. And I cannot see justified reason to make their life harder.
You can configure openvpn to bring the interface down when the connection is lost. Why do you say it requires extra scripting and such?
Being administratively down and being operationally down are different states.
For example, I don't think a tun interface goes down when there is no socket attached to it, still packets are just going to be blackhole'd in that case. No?
Nope. Tun interface indeed will go into the non-running state on the detach event. Moreover, the tun module supports running/non-running indication change upon a command from userspace. But not every userspace application feel a desire to implement it.
With 'ovpn' we basically want a similar effect: let userspace decide what to do depending on the configuration.
I know it can be implemented in many other different ways..but I don't see a real problem with keeping this way.
At least routing protocols and network monitoring software will not be happy to see a dead interface pretending that it's still running.
They won't know that the interface is disconnected, they will possibly just see traffic being dropped.
Packet loss detection is quite complex operation. So yes, they are indeed monitoring the interface operational state to warn operator as soon as possible and take some automatic actions if we are talking about routing protocols. Some sophisticated monitoring systems even capable to generate events like 'link unstable' with higher severity if they see interface operational state flapping in a short period of time.
So yeah, for these kinds of systems, proper operational state indication is essential.
Again, if the user has not explicitly allowed the persistent behaviour, the interface will be brought down when a disconnection happens. But if the user/administrator *wants* to avoid that, he has needs a chance to do that.
Otherwise people that needs this behaviour will just have to stick to using tun and the full userspace implementation.
Generally speaking, saying that interface is running, when module knows for sure that a packet can not be delivered is a user misguiding.
Or a feature, wanted by the user.
A blackhole/firewall can still be added if the user prefers (and not use the persistent interface).
The solution with false-indication is not so reliable as it might look. Interface shutdown, inability of a user-space application to start, user-space application crash, user-space application restart, each of them will void the trick. Ergo, blackhole/firewall must be employed by a security concerned user. What makes the proposed feature odd.
Yeah, this is what other VPN clients call "kill switch". Persist-tun is just one piece of the puzzle, yet important.
To summaries, I'm Ok if this change will be merged with a comment like "For future study" or "To be done" or "To be implemented". But a comment like "to prevent traffic leaking" or any other comment implying a "breakthrough security feature" will have a big NACK from my side.
What if the comment redirects the user to --persist-tun option in order to clarify the context and the wanted behaviour?
Would that help?
Nope. As it was mentioned above, the are no indication that 'persist- tun' is a 'security' feature even in the current openvpn documentation.
Like I mentioned above, I agree we should get rid of that sentence. The security feature must be implemented by means of extra tools, just the interface staying up is not enough.
If the openvpn developers want to keep implementation bug-to-bug compatible, then feel free to configure the blackhole route on behalf of end-user by means of the userspace daemon. Nobody will mind.
bug-to-bug compatible? What do you mean?
http://www.jargon.net/jargonfile/b/bug-compatible.html
With that difference, the local operational state indication does not break compatibility between hosts.
Having userspace configure a blackhole route is something that can be considered by whoeever decides to implement the "kill switch" feature.
OpenVPN does not. It just implements --persist-tun.
So all in all, the conclusion is that in this case it's usersapce to decide when the interface should go up and down, depending on the configuration. I'd like to keep it as it is to avoid the ovpn interface to make decisions on its own.
I can spell this out in the comment (I think it definitely makes sense), to clarify that the netcarrier is expected to be driven by userspace (where the control plane is) rather than having the device make decisions without having the full picture.
What do you think?
It wasn't suggested to destroy the interface in case of interface becoming non-operational. I apologize if something I wrote earlier sounded like that. The interface existence stays unquestionable. It's going to be solid persistent.
Back to the proposed rephrasing. If the 'full picture' means forcing the running state indication even when the netdev is not capable to deliver packets, then it looks like an attempt to hide the control knob of the misguiding feature somewhere else.
And since the concept of on-purpose false indication is still here, many words regarding the control plane and a full picture do not sound good either.
-- Sergey
On 25/11/2024 22:32, Sergey Ryazanov wrote: [...]
FTR, here is the text in the manpage:
--persist-tun Don't close and reopen TUN/TAP device or run up/down scripts across SIGUSR1 or --ping-restart restarts.
SIGUSR1 is a restart signal similar to SIGHUP, but which offers finer-grained control over reset options.
SIGUSR1 is a session reconnection, not a process restart. The manpage just indicates what happens at the low level when this option is provided.
Still no mentions of the traffic leaking prevention. Is it?
Like I said, the manpage only mentions the low level bits. I have already proposed a patch to further extend this text.
[...]
Having userspace configure a blackhole route is something that can be considered by whoeever decides to implement the "kill switch" feature.
OpenVPN does not. It just implements --persist-tun.
So all in all, the conclusion is that in this case it's usersapce to decide when the interface should go up and down, depending on the configuration. I'd like to keep it as it is to avoid the ovpn interface to make decisions on its own.
I can spell this out in the comment (I think it definitely makes sense), to clarify that the netcarrier is expected to be driven by userspace (where the control plane is) rather than having the device make decisions without having the full picture.
What do you think?
It wasn't suggested to destroy the interface in case of interface becoming non-operational. I apologize if something I wrote earlier sounded like that. The interface existence stays unquestionable. It's going to be solid persistent.
Back to the proposed rephrasing. If the 'full picture' means forcing the running state indication even when the netdev is not capable to deliver packets, then it looks like an attempt to hide the control knob of the misguiding feature somewhere else.
And since the concept of on-purpose false indication is still here, many words regarding the control plane and a full picture do not sound good either.
Can you please point out the code where other virtual drivers are doing what you are suggesting so I can have a look?
Wireguard is the closest module in terms of concept and I couldn't see anything like that. Neither in ipsec. But I may have overlooked something.
Please let me know.
Regards,
On 26/11/2024 09:17, Antonio Quartulli wrote: [...]
It wasn't suggested to destroy the interface in case of interface becoming non-operational. I apologize if something I wrote earlier sounded like that. The interface existence stays unquestionable. It's going to be solid persistent.
Back to the proposed rephrasing. If the 'full picture' means forcing the running state indication even when the netdev is not capable to deliver packets, then it looks like an attempt to hide the control knob of the misguiding feature somewhere else.
And since the concept of on-purpose false indication is still here, many words regarding the control plane and a full picture do not sound good either.
Sergey,
I have played a bit with this and, if I understood your idea correctly, the following should be an acceptable design for a P2P interface:
* iface created -> netif_carrier_off * peer added -> netif_carrier_on * peer deleted -> netif_carrier_off * iface goes down -> peer deleted -> netif_carrier_off * iface goes up -> carrier stays down until peer is added
P2MP interface behaviour is not changed: when interface is brought up carrier goes on and it is never turned off.
How does it sound?
My main concern was about bringing the interface down, but this is actually not happening. Correct me if I am wrong.
Thanks.
Regards,
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Along with the ovpn_peer, also the ovpn_bind object is introcued as the two are strictly related. An ovpn_bind object wraps a sockaddr representing the local coordinates being used to talk to a specific peer.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/Makefile | 2 + drivers/net/ovpn/bind.c | 58 +++++++ drivers/net/ovpn/bind.h | 117 ++++++++++++++ drivers/net/ovpn/main.c | 11 ++ drivers/net/ovpn/main.h | 2 + drivers/net/ovpn/ovpnstruct.h | 4 + drivers/net/ovpn/peer.c | 354 ++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/peer.h | 79 ++++++++++ 8 files changed, 627 insertions(+)
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index 201dc001419f1d99ae95c0ee0f96e68f8a4eac16..ce13499b3e1775a7f2a9ce16c6cb0aa088f93685 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -7,7 +7,9 @@ # Author: Antonio Quartulli antonio@openvpn.net
obj-$(CONFIG_OVPN) := ovpn.o +ovpn-y += bind.o ovpn-y += main.o ovpn-y += io.o ovpn-y += netlink.o ovpn-y += netlink-gen.o +ovpn-y += peer.o diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c new file mode 100644 index 0000000000000000000000000000000000000000..b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a --- /dev/null +++ b/drivers/net/ovpn/bind.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2012-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/netdevice.h> +#include <linux/socket.h> + +#include "ovpnstruct.h" +#include "bind.h" +#include "peer.h" + +/** + * ovpn_bind_from_sockaddr - retrieve binding matching sockaddr + * @ss: the sockaddr to match + * + * Return: the bind matching the passed sockaddr if found, NULL otherwise + */ +struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss) +{ + struct ovpn_bind *bind; + size_t sa_len; + + if (ss->ss_family == AF_INET) + sa_len = sizeof(struct sockaddr_in); + else if (ss->ss_family == AF_INET6) + sa_len = sizeof(struct sockaddr_in6); + else + return ERR_PTR(-EAFNOSUPPORT); + + bind = kzalloc(sizeof(*bind), GFP_ATOMIC); + if (unlikely(!bind)) + return ERR_PTR(-ENOMEM); + + memcpy(&bind->remote, ss, sa_len); + + return bind; +} + +/** + * ovpn_bind_reset - assign new binding to peer + * @peer: the peer whose binding has to be replaced + * @new: the new bind to assign + */ +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new) +{ + struct ovpn_bind *old; + + spin_lock_bh(&peer->lock); + old = rcu_replace_pointer(peer->bind, new, true); + spin_unlock_bh(&peer->lock); + + kfree_rcu(old, rcu); +} diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h new file mode 100644 index 0000000000000000000000000000000000000000..859213d5040deb36c416eafcf5c6ab31c4d52c7a --- /dev/null +++ b/drivers/net/ovpn/bind.h @@ -0,0 +1,117 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2012-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNBIND_H_ +#define _NET_OVPN_OVPNBIND_H_ + +#include <net/ip.h> +#include <linux/in.h> +#include <linux/in6.h> +#include <linux/rcupdate.h> +#include <linux/skbuff.h> +#include <linux/spinlock.h> + +struct ovpn_peer; + +/** + * union ovpn_sockaddr - basic transport layer address + * @in4: IPv4 address + * @in6: IPv6 address + */ +union ovpn_sockaddr { + struct sockaddr_in in4; + struct sockaddr_in6 in6; +}; + +/** + * struct ovpn_bind - remote peer binding + * @remote: the remote peer sockaddress + * @local: local endpoint used to talk to the peer + * @local.ipv4: local IPv4 used to talk to the peer + * @local.ipv6: local IPv6 used to talk to the peer + * @rcu: used to schedule RCU cleanup job + */ +struct ovpn_bind { + union ovpn_sockaddr remote; /* remote sockaddr */ + + union { + struct in_addr ipv4; + struct in6_addr ipv6; + } local; + + struct rcu_head rcu; +}; + +/** + * skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6 + * @skb: the packet sk_buff to inspect + * + * Return: AF_INET, AF_INET6 or 0 in case of unknown protocol + */ +static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb) +{ + switch (skb->protocol) { + case htons(ETH_P_IP): + return AF_INET; + case htons(ETH_P_IPV6): + return AF_INET6; + default: + return 0; + } +} + +/** + * ovpn_bind_skb_src_match - match packet source with binding + * @bind: the binding to match + * @skb: the packet to match + * + * Return: true if the packet source matches the remote peer sockaddr + * in the binding + */ +static inline bool ovpn_bind_skb_src_match(const struct ovpn_bind *bind, + const struct sk_buff *skb) +{ + const unsigned short family = skb_protocol_to_family(skb); + const union ovpn_sockaddr *remote; + + if (unlikely(!bind)) + return false; + + remote = &bind->remote; + + if (unlikely(remote->in4.sin_family != family)) + return false; + + switch (family) { + case AF_INET: + if (unlikely(remote->in4.sin_addr.s_addr != ip_hdr(skb)->saddr)) + return false; + + if (unlikely(remote->in4.sin_port != udp_hdr(skb)->source)) + return false; + break; + case AF_INET6: + if (unlikely(!ipv6_addr_equal(&remote->in6.sin6_addr, + &ipv6_hdr(skb)->saddr))) + return false; + + if (unlikely(remote->in6.sin6_port != udp_hdr(skb)->source)) + return false; + break; + default: + return false; + } + + return true; +} + +struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *sa); +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *bind); + +#endif /* _NET_OVPN_OVPNBIND_H_ */ diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eaa83a8662e4ac2c758201008268f9633643c0b6..5492ce07751d135c1484fe1ed8227c646df94969 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -20,6 +20,7 @@ #include "netlink.h" #include "io.h" #include "packet.h" +#include "peer.h"
/* Driver info */ #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" @@ -29,6 +30,11 @@ static void ovpn_struct_free(struct net_device *net) { }
+static int ovpn_net_init(struct net_device *dev) +{ + return 0; +} + static int ovpn_net_open(struct net_device *dev) { /* ovpn keeps the carrier always on to avoid losing IP or route @@ -49,6 +55,7 @@ static int ovpn_net_stop(struct net_device *dev) }
static const struct net_device_ops ovpn_netdev_ops = { + .ndo_init = ovpn_net_init, .ndo_open = ovpn_net_open, .ndo_stop = ovpn_net_stop, .ndo_start_xmit = ovpn_net_xmit, @@ -128,6 +135,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev,
ovpn->dev = dev; ovpn->mode = mode; + spin_lock_init(&ovpn->lock);
/* turn carrier explicitly off after registration, this way state is * clearly defined @@ -176,6 +184,9 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb,
netif_carrier_off(dev); ovpn->registered = false; + + if (ovpn->mode == OVPN_MODE_P2P) + ovpn_peer_release_p2p(ovpn); break; case NETDEV_POST_INIT: case NETDEV_GOING_DOWN: diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h index 0740a05070a817e0daea7b63a1f4fcebd274eb37..28e5c44816e110974333a7a6a9cf18bd15ae84e6 100644 --- a/drivers/net/ovpn/main.h +++ b/drivers/net/ovpn/main.h @@ -19,4 +19,6 @@ bool ovpn_dev_is_valid(const struct net_device *dev); #define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4) #define OVPN_MAX_PADDING 16
+#define OVPN_QUEUE_LEN 1024 + #endif /* _NET_OVPN_MAIN_H_ */ diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index 211df871538d34fdff90d182f21a0b0fb11b28ad..a22c5083381c131db01a28c0f51e661d690d4998 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -20,6 +20,8 @@ * @dev_tracker: reference tracker for associated dev * @registered: whether dev is still registered with netdev or not * @mode: device operation mode (i.e. p2p, mp, ..) + * @lock: protect this object + * @peer: in P2P mode, this is the only remote peer * @dev_list: entry for the module wide device list */ struct ovpn_struct { @@ -27,6 +29,8 @@ struct ovpn_struct { netdevice_tracker dev_tracker; bool registered; enum ovpn_mode mode; + spinlock_t lock; /* protect writing to the ovpn_struct object */ + struct ovpn_peer __rcu *peer; struct list_head dev_list; };
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c new file mode 100644 index 0000000000000000000000000000000000000000..d9788a0cc99b5839c466c35d1b2266cc6b95fb72 --- /dev/null +++ b/drivers/net/ovpn/peer.c @@ -0,0 +1,354 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/skbuff.h> +#include <linux/list.h> + +#include "ovpnstruct.h" +#include "bind.h" +#include "io.h" +#include "main.h" +#include "netlink.h" +#include "peer.h" + +/** + * ovpn_peer_new - allocate and initialize a new peer object + * @ovpn: the openvpn instance inside which the peer should be created + * @id: the ID assigned to this peer + * + * Return: a pointer to the new peer on success or an error code otherwise + */ +struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) +{ + struct ovpn_peer *peer; + int ret; + + /* alloc and init peer object */ + peer = kzalloc(sizeof(*peer), GFP_KERNEL); + if (!peer) + return ERR_PTR(-ENOMEM); + + peer->id = id; + peer->halt = false; + peer->ovpn = ovpn; + + peer->vpn_addrs.ipv4.s_addr = htonl(INADDR_ANY); + peer->vpn_addrs.ipv6 = in6addr_any; + + RCU_INIT_POINTER(peer->bind, NULL); + spin_lock_init(&peer->lock); + kref_init(&peer->refcount); + + ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL); + if (ret < 0) { + netdev_err(ovpn->dev, "%s: cannot initialize dst cache\n", + __func__); + kfree(peer); + return ERR_PTR(ret); + } + + netdev_hold(ovpn->dev, &ovpn->dev_tracker, GFP_KERNEL); + + return peer; +} + +/** + * ovpn_peer_release - release peer private members + * @peer: the peer to release + */ +static void ovpn_peer_release(struct ovpn_peer *peer) +{ + ovpn_bind_reset(peer, NULL); + + dst_cache_destroy(&peer->dst_cache); + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); + kfree_rcu(peer, rcu); +} + +/** + * ovpn_peer_release_kref - callback for kref_put + * @kref: the kref object belonging to the peer + */ +void ovpn_peer_release_kref(struct kref *kref) +{ + struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount); + + ovpn_peer_release(peer); +} + +/** + * ovpn_peer_skb_to_sockaddr - fill sockaddr with skb source address + * @skb: the packet to extract data from + * @ss: the sockaddr to fill + * + * Return: true on success or false otherwise + */ +static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb, + struct sockaddr_storage *ss) +{ + struct sockaddr_in6 *sa6; + struct sockaddr_in *sa4; + + ss->ss_family = skb_protocol_to_family(skb); + switch (ss->ss_family) { + case AF_INET: + sa4 = (struct sockaddr_in *)ss; + sa4->sin_family = AF_INET; + sa4->sin_addr.s_addr = ip_hdr(skb)->saddr; + sa4->sin_port = udp_hdr(skb)->source; + break; + case AF_INET6: + sa6 = (struct sockaddr_in6 *)ss; + sa6->sin6_family = AF_INET6; + sa6->sin6_addr = ipv6_hdr(skb)->saddr; + sa6->sin6_port = udp_hdr(skb)->source; + break; + default: + return false; + } + + return true; +} + +/** + * ovpn_peer_transp_match - check if sockaddr and peer binding match + * @peer: the peer to get the binding from + * @ss: the sockaddr to match + * + * Return: true if sockaddr and binding match or false otherwise + */ +static bool ovpn_peer_transp_match(const struct ovpn_peer *peer, + const struct sockaddr_storage *ss) +{ + struct ovpn_bind *bind = rcu_dereference(peer->bind); + struct sockaddr_in6 *sa6; + struct sockaddr_in *sa4; + + if (unlikely(!bind)) + return false; + + if (ss->ss_family != bind->remote.in4.sin_family) + return false; + + switch (ss->ss_family) { + case AF_INET: + sa4 = (struct sockaddr_in *)ss; + if (sa4->sin_addr.s_addr != bind->remote.in4.sin_addr.s_addr) + return false; + if (sa4->sin_port != bind->remote.in4.sin_port) + return false; + break; + case AF_INET6: + sa6 = (struct sockaddr_in6 *)ss; + if (!ipv6_addr_equal(&sa6->sin6_addr, + &bind->remote.in6.sin6_addr)) + return false; + if (sa6->sin6_port != bind->remote.in6.sin6_port) + return false; + break; + default: + return false; + } + + return true; +} + +/** + * ovpn_peer_get_by_transp_addr_p2p - get peer by transport address in a P2P + * instance + * @ovpn: the openvpn instance to search + * @ss: the transport socket address + * + * Return: the peer if found or NULL otherwise + */ +static struct ovpn_peer * +ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn, + struct sockaddr_storage *ss) +{ + struct ovpn_peer *tmp, *peer = NULL; + + rcu_read_lock(); + tmp = rcu_dereference(ovpn->peer); + if (likely(tmp && ovpn_peer_transp_match(tmp, ss) && + ovpn_peer_hold(tmp))) + peer = tmp; + rcu_read_unlock(); + + return peer; +} + +/** + * ovpn_peer_get_by_transp_addr - retrieve peer by transport address + * @ovpn: the openvpn instance to search + * @skb: the skb to retrieve the source transport address from + * + * Return: a pointer to the peer if found or NULL otherwise + */ +struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, + struct sk_buff *skb) +{ + struct ovpn_peer *peer = NULL; + struct sockaddr_storage ss = { 0 }; + + if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) + return NULL; + + if (ovpn->mode == OVPN_MODE_P2P) + peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); + + return peer; +} + +/** + * ovpn_peer_get_by_id_p2p - get peer by ID in a P2P instance + * @ovpn: the openvpn instance to search + * @peer_id: the ID of the peer to find + * + * Return: the peer if found or NULL otherwise + */ +static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn, + u32 peer_id) +{ + struct ovpn_peer *tmp, *peer = NULL; + + rcu_read_lock(); + tmp = rcu_dereference(ovpn->peer); + if (likely(tmp && tmp->id == peer_id && ovpn_peer_hold(tmp))) + peer = tmp; + rcu_read_unlock(); + + return peer; +} + +/** + * ovpn_peer_get_by_id - retrieve peer by ID + * @ovpn: the openvpn instance to search + * @peer_id: the unique peer identifier to match + * + * Return: a pointer to the peer if found or NULL otherwise + */ +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) +{ + struct ovpn_peer *peer = NULL; + + if (ovpn->mode == OVPN_MODE_P2P) + peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id); + + return peer; +} + +/** + * ovpn_peer_add_p2p - add peer to related tables in a P2P instance + * @ovpn: the instance to add the peer to + * @peer: the peer to add + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_peer_add_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *peer) +{ + struct ovpn_peer *tmp; + + spin_lock_bh(&ovpn->lock); + /* in p2p mode it is possible to have a single peer only, therefore the + * old one is released and substituted by the new one + */ + tmp = rcu_dereference_protected(ovpn->peer, + lockdep_is_held(&ovpn->lock)); + if (tmp) { + tmp->delete_reason = OVPN_DEL_PEER_REASON_TEARDOWN; + ovpn_peer_put(tmp); + } + + rcu_assign_pointer(ovpn->peer, peer); + spin_unlock_bh(&ovpn->lock); + + return 0; +} + +/** + * ovpn_peer_add - add peer to the related tables + * @ovpn: the openvpn instance the peer belongs to + * @peer: the peer object to add + * + * Assume refcounter was increased by caller + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) +{ + switch (ovpn->mode) { + case OVPN_MODE_P2P: + return ovpn_peer_add_p2p(ovpn, peer); + default: + return -EOPNOTSUPP; + } +} + +/** + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance + * @peer: the peer to delete + * @reason: reason why the peer was deleted (sent to userspace) + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_peer_del_p2p(struct ovpn_peer *peer, + enum ovpn_del_peer_reason reason) + __must_hold(&peer->ovpn->lock) +{ + struct ovpn_peer *tmp; + + tmp = rcu_dereference_protected(peer->ovpn->peer, + lockdep_is_held(&peer->ovpn->lock)); + if (tmp != peer) { + DEBUG_NET_WARN_ON_ONCE(1); + if (tmp) + ovpn_peer_put(tmp); + + return -ENOENT; + } + + tmp->delete_reason = reason; + RCU_INIT_POINTER(peer->ovpn->peer, NULL); + ovpn_peer_put(tmp); + + return 0; +} + +/** + * ovpn_peer_release_p2p - release peer upon P2P device teardown + * @ovpn: the instance being torn down + */ +void ovpn_peer_release_p2p(struct ovpn_struct *ovpn) +{ + struct ovpn_peer *tmp; + + spin_lock_bh(&ovpn->lock); + tmp = rcu_dereference_protected(ovpn->peer, + lockdep_is_held(&ovpn->lock)); + if (tmp) + ovpn_peer_del_p2p(tmp, OVPN_DEL_PEER_REASON_TEARDOWN); + spin_unlock_bh(&ovpn->lock); +} + +/** + * ovpn_peer_del - delete peer from related tables + * @peer: the peer object to delete + * @reason: reason for deleting peer (will be sent to userspace) + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason) +{ + switch (peer->ovpn->mode) { + case OVPN_MODE_P2P: + return ovpn_peer_del_p2p(peer, reason); + default: + return -EOPNOTSUPP; + } +} diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h new file mode 100644 index 0000000000000000000000000000000000000000..6e0c6b14559de886d0677117f5a7ae029214e1f8 --- /dev/null +++ b/drivers/net/ovpn/peer.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNPEER_H_ +#define _NET_OVPN_OVPNPEER_H_ + +#include <net/dst_cache.h> + +/** + * struct ovpn_peer - the main remote peer object + * @ovpn: main openvpn instance this peer belongs to + * @id: unique identifier + * @vpn_addrs: IP addresses assigned over the tunnel + * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel + * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel + * @dst_cache: cache for dst_entry used to send to peer + * @bind: remote peer binding + * @halt: true if ovpn_peer_mark_delete was called + * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..) + * @lock: protects binding to peer (bind) + * @refcount: reference counter + * @rcu: used to free peer in an RCU safe way + * @delete_work: deferred cleanup work, used to notify userspace + */ +struct ovpn_peer { + struct ovpn_struct *ovpn; + u32 id; + struct { + struct in_addr ipv4; + struct in6_addr ipv6; + } vpn_addrs; + struct dst_cache dst_cache; + struct ovpn_bind __rcu *bind; + bool halt; + enum ovpn_del_peer_reason delete_reason; + spinlock_t lock; /* protects bind */ + struct kref refcount; + struct rcu_head rcu; + struct work_struct delete_work; +}; + +/** + * ovpn_peer_hold - increase reference counter + * @peer: the peer whose counter should be increased + * + * Return: true if the counter was increased or false if it was zero already + */ +static inline bool ovpn_peer_hold(struct ovpn_peer *peer) +{ + return kref_get_unless_zero(&peer->refcount); +} + +void ovpn_peer_release_kref(struct kref *kref); + +/** + * ovpn_peer_put - decrease reference counter + * @peer: the peer whose counter should be decreased + */ +static inline void ovpn_peer_put(struct ovpn_peer *peer) +{ + kref_put(&peer->refcount, ovpn_peer_release_kref); +} + +struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id); +int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer); +int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason); +void ovpn_peer_release_p2p(struct ovpn_struct *ovpn); + +struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, + struct sk_buff *skb); +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id); + +#endif /* _NET_OVPN_OVPNPEER_H_ */
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
+static void ovpn_peer_release(struct ovpn_peer *peer) +{
- ovpn_bind_reset(peer, NULL);
- dst_cache_destroy(&peer->dst_cache);
Is it safe to destroy the cache at this time? In the same function, we use rcu to free the peer, but AFAICT the dst_cache will be freed immediately:
void dst_cache_destroy(struct dst_cache *dst_cache) { [...] free_percpu(dst_cache->cache); }
(probably no real issue because ovpn_udp_send_skb gets called while we hold a reference to the peer?)
- netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
- kfree_rcu(peer, rcu);
+}
[...]
+static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
enum ovpn_del_peer_reason reason)
- __must_hold(&peer->ovpn->lock)
+{
- struct ovpn_peer *tmp;
- tmp = rcu_dereference_protected(peer->ovpn->peer,
lockdep_is_held(&peer->ovpn->lock));
- if (tmp != peer) {
DEBUG_NET_WARN_ON_ONCE(1);
if (tmp)
ovpn_peer_put(tmp);
Does peer->ovpn->peer need to be set to NULL here as well? Or is it going to survive this _put?
return -ENOENT;
- }
- tmp->delete_reason = reason;
- RCU_INIT_POINTER(peer->ovpn->peer, NULL);
- ovpn_peer_put(tmp);
- return 0;
+}
On 30/10/2024 17:37, Sabrina Dubroca wrote:
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
+static void ovpn_peer_release(struct ovpn_peer *peer) +{
- ovpn_bind_reset(peer, NULL);
- dst_cache_destroy(&peer->dst_cache);
Is it safe to destroy the cache at this time? In the same function, we use rcu to free the peer, but AFAICT the dst_cache will be freed immediately:
void dst_cache_destroy(struct dst_cache *dst_cache) { [...] free_percpu(dst_cache->cache); }
(probably no real issue because ovpn_udp_send_skb gets called while we hold a reference to the peer?)
Right. That was my assumption: release happens on refcnt = 0 only, therefore no field should be in use anymore. Anything that may still be in use will have its own refcounter.
- netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
- kfree_rcu(peer, rcu);
+}
[...]
+static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
enum ovpn_del_peer_reason reason)
- __must_hold(&peer->ovpn->lock)
+{
- struct ovpn_peer *tmp;
- tmp = rcu_dereference_protected(peer->ovpn->peer,
lockdep_is_held(&peer->ovpn->lock));
- if (tmp != peer) {
DEBUG_NET_WARN_ON_ONCE(1);
if (tmp)
ovpn_peer_put(tmp);
Does peer->ovpn->peer need to be set to NULL here as well? Or is it going to survive this _put?
First of all consider that this is truly something that we don't expect to happen (hence the WARN_ON). If this is happening it's because we are trying to delete a peer that is not the one we are connected to (unexplainable scenario in p2p mode).
Still, should we hit this case (I truly can't see how), I'd say "leave everything as is - maybe this call was just a mistake".
Cheers,
return -ENOENT;
- }
- tmp->delete_reason = reason;
- RCU_INIT_POINTER(peer->ovpn->peer, NULL);
- ovpn_peer_put(tmp);
- return 0;
+}
2024-10-30, 21:47:58 +0100, Antonio Quartulli wrote:
On 30/10/2024 17:37, Sabrina Dubroca wrote:
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
+static void ovpn_peer_release(struct ovpn_peer *peer) +{
- ovpn_bind_reset(peer, NULL);
- dst_cache_destroy(&peer->dst_cache);
Is it safe to destroy the cache at this time? In the same function, we use rcu to free the peer, but AFAICT the dst_cache will be freed immediately:
void dst_cache_destroy(struct dst_cache *dst_cache) { [...] free_percpu(dst_cache->cache); }
(probably no real issue because ovpn_udp_send_skb gets called while we hold a reference to the peer?)
Right. That was my assumption: release happens on refcnt = 0 only, therefore no field should be in use anymore. Anything that may still be in use will have its own refcounter.
My worry is that code changes over time, assumptions are forgotten, and we end up with code that was a bit odd but safe not being safe anymore.
- netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
- kfree_rcu(peer, rcu);
+}
[...]
+static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
enum ovpn_del_peer_reason reason)
- __must_hold(&peer->ovpn->lock)
+{
- struct ovpn_peer *tmp;
- tmp = rcu_dereference_protected(peer->ovpn->peer,
lockdep_is_held(&peer->ovpn->lock));
- if (tmp != peer) {
DEBUG_NET_WARN_ON_ONCE(1);
if (tmp)
ovpn_peer_put(tmp);
Does peer->ovpn->peer need to be set to NULL here as well? Or is it going to survive this _put?
First of all consider that this is truly something that we don't expect to happen (hence the WARN_ON). If this is happening it's because we are trying to delete a peer that is not the one we are connected to (unexplainable scenario in p2p mode).
Still, should we hit this case (I truly can't see how), I'd say "leave everything as is - maybe this call was just a mistake".
Yeah, true, let's leave it. Thanks.
On 05/11/2024 14:12, Sabrina Dubroca wrote:
2024-10-30, 21:47:58 +0100, Antonio Quartulli wrote:
On 30/10/2024 17:37, Sabrina Dubroca wrote:
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
+static void ovpn_peer_release(struct ovpn_peer *peer) +{
- ovpn_bind_reset(peer, NULL);
- dst_cache_destroy(&peer->dst_cache);
Is it safe to destroy the cache at this time? In the same function, we use rcu to free the peer, but AFAICT the dst_cache will be freed immediately:
void dst_cache_destroy(struct dst_cache *dst_cache) { [...] free_percpu(dst_cache->cache); }
(probably no real issue because ovpn_udp_send_skb gets called while we hold a reference to the peer?)
Right. That was my assumption: release happens on refcnt = 0 only, therefore no field should be in use anymore. Anything that may still be in use will have its own refcounter.
My worry is that code changes over time, assumptions are forgotten, and we end up with code that was a bit odd but safe not being safe anymore.
Yeah, makes sense. I'll move the call to dst_cache_destroy() and to kfree(peer) in a RCU callback.
Thanks!
- netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
- kfree_rcu(peer, rcu);
+}
[...]
+static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
enum ovpn_del_peer_reason reason)
- __must_hold(&peer->ovpn->lock)
+{
- struct ovpn_peer *tmp;
- tmp = rcu_dereference_protected(peer->ovpn->peer,
lockdep_is_held(&peer->ovpn->lock));
- if (tmp != peer) {
DEBUG_NET_WARN_ON_ONCE(1);
if (tmp)
ovpn_peer_put(tmp);
Does peer->ovpn->peer need to be set to NULL here as well? Or is it going to survive this _put?
First of all consider that this is truly something that we don't expect to happen (hence the WARN_ON). If this is happening it's because we are trying to delete a peer that is not the one we are connected to (unexplainable scenario in p2p mode).
Still, should we hit this case (I truly can't see how), I'd say "leave everything as is - maybe this call was just a mistake".
Yeah, true, let's leave it. Thanks.
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well?
Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown.
Along with the ovpn_peer, also the ovpn_bind object is introcued as the two are strictly related. An ovpn_bind object wraps a sockaddr representing the local coordinates being used to talk to a specific peer.
Signed-off-by: Antonio Quartulli antonio@openvpn.net
drivers/net/ovpn/Makefile | 2 + drivers/net/ovpn/bind.c | 58 +++++++ drivers/net/ovpn/bind.h | 117 ++++++++++++++
Why do we need these bind.c/bind.h files? They contains a minimum of code and still anyway references the peer object. Can we merge these definitions and code into peer.c/peer.h?
drivers/net/ovpn/main.c | 11 ++ drivers/net/ovpn/main.h | 2 + drivers/net/ovpn/ovpnstruct.h | 4 + drivers/net/ovpn/peer.c | 354 ++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/peer.h | 79 ++++++++++ 8 files changed, 627 insertions(+)
[...]
diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c new file mode 100644 index 0000000000000000000000000000000000000000..b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a --- /dev/null +++ b/drivers/net/ovpn/bind.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload
- Copyright (C) 2012-2024 OpenVPN, Inc.
- Author: James Yonan james@openvpn.net
Antonio Quartulli <antonio@openvpn.net>
- */
+#include <linux/netdevice.h> +#include <linux/socket.h>
+#include "ovpnstruct.h" +#include "bind.h" +#include "peer.h"
+/**
- ovpn_bind_from_sockaddr - retrieve binding matching sockaddr
- @ss: the sockaddr to match
- Return: the bind matching the passed sockaddr if found, NULL otherwise
The function returns ERR_PTR() in case of error, the comment should be updated.
- */
+struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss) +{
- struct ovpn_bind *bind;
- size_t sa_len;
- if (ss->ss_family == AF_INET)
sa_len = sizeof(struct sockaddr_in);
- else if (ss->ss_family == AF_INET6)
sa_len = sizeof(struct sockaddr_in6);
- else
return ERR_PTR(-EAFNOSUPPORT);
- bind = kzalloc(sizeof(*bind), GFP_ATOMIC);
- if (unlikely(!bind))
return ERR_PTR(-ENOMEM);
- memcpy(&bind->remote, ss, sa_len);
- return bind;
+}
+/**
- ovpn_bind_reset - assign new binding to peer
- @peer: the peer whose binding has to be replaced
- @new: the new bind to assign
- */
+void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new) +{
- struct ovpn_bind *old;
- spin_lock_bh(&peer->lock);
- old = rcu_replace_pointer(peer->bind, new, true);
- spin_unlock_bh(&peer->lock);
Locking will be removed from this function in the subsequent patch. Should we move the peer->lock usage to ovpn_peer_release() now?
- kfree_rcu(old, rcu);
+} diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h new file mode 100644 index 0000000000000000000000000000000000000000..859213d5040deb36c416eafcf5c6ab31c4d52c7a --- /dev/null +++ b/drivers/net/ovpn/bind.h @@ -0,0 +1,117 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- Copyright (C) 2012-2024 OpenVPN, Inc.
- Author: James Yonan james@openvpn.net
Antonio Quartulli <antonio@openvpn.net>
- */
+#ifndef _NET_OVPN_OVPNBIND_H_ +#define _NET_OVPN_OVPNBIND_H_
+#include <net/ip.h> +#include <linux/in.h> +#include <linux/in6.h> +#include <linux/rcupdate.h> +#include <linux/skbuff.h> +#include <linux/spinlock.h>
+struct ovpn_peer;
+/**
- union ovpn_sockaddr - basic transport layer address
Why do we need this dedicated named union? Can we merge this union into the ovpn_bind struct as already done for the local address?
- @in4: IPv4 address
- @in6: IPv6 address
- */
+union ovpn_sockaddr {
Family type can be putted here as a dedicated element to make address type check simple:
unsigned short int sa_family;
- struct sockaddr_in in4;
- struct sockaddr_in6 in6;
+};> + +/**
- struct ovpn_bind - remote peer binding
- @remote: the remote peer sockaddress
- @local: local endpoint used to talk to the peer
- @local.ipv4: local IPv4 used to talk to the peer
- @local.ipv6: local IPv6 used to talk to the peer
- @rcu: used to schedule RCU cleanup job
- */
+struct ovpn_bind {
- union ovpn_sockaddr remote; /* remote sockaddr */
- union {
struct in_addr ipv4;
struct in6_addr ipv6;
- } local;
- struct rcu_head rcu;
+};
+/**
- skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6
- @skb: the packet sk_buff to inspect
- Return: AF_INET, AF_INET6 or 0 in case of unknown protocol
- */
+static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb)
The function called outside the peer.c file only in ovpn_decrypt_post() and that call is debatable. Considering this, I belive, the skb_protocol_to_family() should be moved inside peer.c as static non-inlined function.
+{
- switch (skb->protocol) {
- case htons(ETH_P_IP):
return AF_INET;
- case htons(ETH_P_IPV6):
return AF_INET6;
- default:
return 0;
- }
+}
+/**
- ovpn_bind_skb_src_match - match packet source with binding
- @bind: the binding to match
- @skb: the packet to match
- Return: true if the packet source matches the remote peer sockaddr
- in the binding
- */
+static inline bool ovpn_bind_skb_src_match(const struct ovpn_bind *bind,
const struct sk_buff *skb)
The function is called only from ovpn_peer_float() and probably should be moved into peer.c and un-inlined.
+{
- const unsigned short family = skb_protocol_to_family(skb);
- const union ovpn_sockaddr *remote;
- if (unlikely(!bind))
return false;
The caller ovpn_peer_float() function already verified bind object pointer, why we should redo the same check here?
- remote = &bind->remote;
- if (unlikely(remote->in4.sin_family != family))
return false;
- switch (family) {
- case AF_INET:
if (unlikely(remote->in4.sin_addr.s_addr != ip_hdr(skb)->saddr))
return false;
if (unlikely(remote->in4.sin_port != udp_hdr(skb)->source))
return false;
break;
- case AF_INET6:
if (unlikely(!ipv6_addr_equal(&remote->in6.sin6_addr,
&ipv6_hdr(skb)->saddr)))
return false;
if (unlikely(remote->in6.sin6_port != udp_hdr(skb)->source))
return false;
break;
- default:
return false;
- }
- return true;
+}
+struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *sa); +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *bind);
+#endif /* _NET_OVPN_OVPNBIND_H_ */ diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index eaa83a8662e4ac2c758201008268f9633643c0b6..5492ce07751d135c1484fe1ed8227c646df94969 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -20,6 +20,7 @@ #include "netlink.h" #include "io.h" #include "packet.h" +#include "peer.h" /* Driver info */ #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" @@ -29,6 +30,11 @@ static void ovpn_struct_free(struct net_device *net) { } +static int ovpn_net_init(struct net_device *dev) +{
- return 0;
+}
The function is not required. Can we move its introduction to 'implement basic RX path (UDP)' patch, where its content will be added and where counterpart ovpn_net_uninit() function will be introduced?
- static int ovpn_net_open(struct net_device *dev) { /* ovpn keeps the carrier always on to avoid losing IP or route
@@ -49,6 +55,7 @@ static int ovpn_net_stop(struct net_device *dev) } static const struct net_device_ops ovpn_netdev_ops = {
- .ndo_init = ovpn_net_init, .ndo_open = ovpn_net_open, .ndo_stop = ovpn_net_stop, .ndo_start_xmit = ovpn_net_xmit,
@@ -128,6 +135,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, ovpn->dev = dev; ovpn->mode = mode;
- spin_lock_init(&ovpn->lock);
/* turn carrier explicitly off after registration, this way state is * clearly defined @@ -176,6 +184,9 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, netif_carrier_off(dev); ovpn->registered = false;
if (ovpn->mode == OVPN_MODE_P2P)
break; case NETDEV_POST_INIT: case NETDEV_GOING_DOWN:ovpn_peer_release_p2p(ovpn);
diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h index 0740a05070a817e0daea7b63a1f4fcebd274eb37..28e5c44816e110974333a7a6a9cf18bd15ae84e6 100644 --- a/drivers/net/ovpn/main.h +++ b/drivers/net/ovpn/main.h @@ -19,4 +19,6 @@ bool ovpn_dev_is_valid(const struct net_device *dev); #define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4) #define OVPN_MAX_PADDING 16 +#define OVPN_QUEUE_LEN 1024
This macro is unused, should we drop it?
- #endif /* _NET_OVPN_MAIN_H_ */
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index 211df871538d34fdff90d182f21a0b0fb11b28ad..a22c5083381c131db01a28c0f51e661d690d4998 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -20,6 +20,8 @@
- @dev_tracker: reference tracker for associated dev
- @registered: whether dev is still registered with netdev or not
- @mode: device operation mode (i.e. p2p, mp, ..)
- @lock: protect this object
*/ struct ovpn_struct {
- @peer: in P2P mode, this is the only remote peer
- @dev_list: entry for the module wide device list
@@ -27,6 +29,8 @@ struct ovpn_struct { netdevice_tracker dev_tracker; bool registered; enum ovpn_mode mode;
- spinlock_t lock; /* protect writing to the ovpn_struct object */
- struct ovpn_peer __rcu *peer; struct list_head dev_list; };
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c new file mode 100644 index 0000000000000000000000000000000000000000..d9788a0cc99b5839c466c35d1b2266cc6b95fb72 --- /dev/null +++ b/drivers/net/ovpn/peer.c @@ -0,0 +1,354 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload
- Copyright (C) 2020-2024 OpenVPN, Inc.
- Author: James Yonan james@openvpn.net
Antonio Quartulli <antonio@openvpn.net>
- */
+#include <linux/skbuff.h> +#include <linux/list.h>
+#include "ovpnstruct.h" +#include "bind.h" +#include "io.h" +#include "main.h" +#include "netlink.h" +#include "peer.h"
+/**
- ovpn_peer_new - allocate and initialize a new peer object
- @ovpn: the openvpn instance inside which the peer should be created
- @id: the ID assigned to this peer
- Return: a pointer to the new peer on success or an error code otherwise
- */
+struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) +{
- struct ovpn_peer *peer;
- int ret;
- /* alloc and init peer object */
- peer = kzalloc(sizeof(*peer), GFP_KERNEL);
- if (!peer)
return ERR_PTR(-ENOMEM);
- peer->id = id;
- peer->halt = false;
- peer->ovpn = ovpn;
- peer->vpn_addrs.ipv4.s_addr = htonl(INADDR_ANY);
- peer->vpn_addrs.ipv6 = in6addr_any;
- RCU_INIT_POINTER(peer->bind, NULL);
- spin_lock_init(&peer->lock);
- kref_init(&peer->refcount);
- ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL);
- if (ret < 0) {
netdev_err(ovpn->dev, "%s: cannot initialize dst cache\n",
__func__);
kfree(peer);
return ERR_PTR(ret);
- }
- netdev_hold(ovpn->dev, &ovpn->dev_tracker, GFP_KERNEL);
- return peer;
+}
+/**
- ovpn_peer_release - release peer private members
- @peer: the peer to release
- */
+static void ovpn_peer_release(struct ovpn_peer *peer) +{
- ovpn_bind_reset(peer, NULL);
- dst_cache_destroy(&peer->dst_cache);
- netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
- kfree_rcu(peer, rcu);
+}
+/**
- ovpn_peer_release_kref - callback for kref_put
- @kref: the kref object belonging to the peer
- */
+void ovpn_peer_release_kref(struct kref *kref) +{
- struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount);
- ovpn_peer_release(peer);
+}
+/**
- ovpn_peer_skb_to_sockaddr - fill sockaddr with skb source address
- @skb: the packet to extract data from
- @ss: the sockaddr to fill
- Return: true on success or false otherwise
- */
+static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb,
struct sockaddr_storage *ss)
+{
- struct sockaddr_in6 *sa6;
- struct sockaddr_in *sa4;
- ss->ss_family = skb_protocol_to_family(skb);
Why do we need skb_protocol_to_family() call? Can we directly use skb->protocol and ETH_P_IP/ETH_P_IPV6 in the switch?
- switch (ss->ss_family) {
- case AF_INET:
sa4 = (struct sockaddr_in *)ss;
sa4->sin_family = AF_INET;
sa4->sin_addr.s_addr = ip_hdr(skb)->saddr;
sa4->sin_port = udp_hdr(skb)->source;
break;
- case AF_INET6:
sa6 = (struct sockaddr_in6 *)ss;
sa6->sin6_family = AF_INET6;
sa6->sin6_addr = ipv6_hdr(skb)->saddr;
sa6->sin6_port = udp_hdr(skb)->source;
break;
- default:
return false;
- }
- return true;
+}
+/**
- ovpn_peer_transp_match - check if sockaddr and peer binding match
- @peer: the peer to get the binding from
- @ss: the sockaddr to match
- Return: true if sockaddr and binding match or false otherwise
- */
+static bool ovpn_peer_transp_match(const struct ovpn_peer *peer,
const struct sockaddr_storage *ss)
+{
- struct ovpn_bind *bind = rcu_dereference(peer->bind);
- struct sockaddr_in6 *sa6;
- struct sockaddr_in *sa4;
- if (unlikely(!bind))
return false;
- if (ss->ss_family != bind->remote.in4.sin_family)
nit: if the dedicated 'sa_family' element is added into the union, the check can be more straighforward (without 'in4' access):
if (ss->ss_family != bind->remote.sa_family)
return false;
- switch (ss->ss_family) {
- case AF_INET:
sa4 = (struct sockaddr_in *)ss;
if (sa4->sin_addr.s_addr != bind->remote.in4.sin_addr.s_addr)
return false;
if (sa4->sin_port != bind->remote.in4.sin_port)
return false;
break;
- case AF_INET6:
sa6 = (struct sockaddr_in6 *)ss;
if (!ipv6_addr_equal(&sa6->sin6_addr,
&bind->remote.in6.sin6_addr))
return false;
if (sa6->sin6_port != bind->remote.in6.sin6_port)
return false;
break;
- default:
return false;
- }
- return true;
+}
+/**
- ovpn_peer_get_by_transp_addr_p2p - get peer by transport address in a P2P
instance
- @ovpn: the openvpn instance to search
- @ss: the transport socket address
- Return: the peer if found or NULL otherwise
- */
+static struct ovpn_peer * +ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn,
struct sockaddr_storage *ss)
+{
- struct ovpn_peer *tmp, *peer = NULL;
- rcu_read_lock();
- tmp = rcu_dereference(ovpn->peer);
- if (likely(tmp && ovpn_peer_transp_match(tmp, ss) &&
ovpn_peer_hold(tmp)))
peer = tmp;
- rcu_read_unlock();
- return peer;
+}
+/**
- ovpn_peer_get_by_transp_addr - retrieve peer by transport address
- @ovpn: the openvpn instance to search
- @skb: the skb to retrieve the source transport address from
- Return: a pointer to the peer if found or NULL otherwise
- */
+struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
struct sk_buff *skb)
+{
- struct ovpn_peer *peer = NULL;
- struct sockaddr_storage ss = { 0 };
nit: reverse x-mas tree order, please.
- if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
return NULL;
- if (ovpn->mode == OVPN_MODE_P2P)
peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
- return peer;
+}
+/**
- ovpn_peer_get_by_id_p2p - get peer by ID in a P2P instance
- @ovpn: the openvpn instance to search
- @peer_id: the ID of the peer to find
- Return: the peer if found or NULL otherwise
- */
+static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
u32 peer_id)
+{
- struct ovpn_peer *tmp, *peer = NULL;
- rcu_read_lock();
- tmp = rcu_dereference(ovpn->peer);
- if (likely(tmp && tmp->id == peer_id && ovpn_peer_hold(tmp)))
peer = tmp;
- rcu_read_unlock();
- return peer;
+}
+/**
- ovpn_peer_get_by_id - retrieve peer by ID
- @ovpn: the openvpn instance to search
- @peer_id: the unique peer identifier to match
- Return: a pointer to the peer if found or NULL otherwise
- */
+struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) +{
- struct ovpn_peer *peer = NULL;
- if (ovpn->mode == OVPN_MODE_P2P)
peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
- return peer;
+}
+/**
- ovpn_peer_add_p2p - add peer to related tables in a P2P instance
- @ovpn: the instance to add the peer to
- @peer: the peer to add
- Return: 0 on success or a negative error code otherwise
- */
+static int ovpn_peer_add_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *peer) +{
- struct ovpn_peer *tmp;
- spin_lock_bh(&ovpn->lock);
- /* in p2p mode it is possible to have a single peer only, therefore the
* old one is released and substituted by the new one
*/
- tmp = rcu_dereference_protected(ovpn->peer,
lockdep_is_held(&ovpn->lock));
- if (tmp) {
tmp->delete_reason = OVPN_DEL_PEER_REASON_TEARDOWN;
ovpn_peer_put(tmp);
- }
- rcu_assign_pointer(ovpn->peer, peer);
nit: the rcu_dereference_protected() + rcu_assign_pointer() calls can be replaced with a single rcu_replace_pointer() call.
- spin_unlock_bh(&ovpn->lock);
- return 0;
+}
+/**
- ovpn_peer_add - add peer to the related tables
- @ovpn: the openvpn instance the peer belongs to
- @peer: the peer object to add
- Assume refcounter was increased by caller
- Return: 0 on success or a negative error code otherwise
- */
+int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) +{
- switch (ovpn->mode) {
- case OVPN_MODE_P2P:
return ovpn_peer_add_p2p(ovpn, peer);
- default:
return -EOPNOTSUPP;
- }
+}
+/**
- ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
- @peer: the peer to delete
- @reason: reason why the peer was deleted (sent to userspace)
- Return: 0 on success or a negative error code otherwise
- */
+static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
enum ovpn_del_peer_reason reason)
- __must_hold(&peer->ovpn->lock)
+{
- struct ovpn_peer *tmp;
- tmp = rcu_dereference_protected(peer->ovpn->peer,
lockdep_is_held(&peer->ovpn->lock));
- if (tmp != peer) {
DEBUG_NET_WARN_ON_ONCE(1);
nit: the above two lines can be simplified to:
if (DEBUG_NET_WARN_ON_ONCE(tmp != peer)) {
if (tmp)
ovpn_peer_put(tmp);
return -ENOENT;
- }
- tmp->delete_reason = reason;
- RCU_INIT_POINTER(peer->ovpn->peer, NULL);
- ovpn_peer_put(tmp);
- return 0;
+}
+/**
- ovpn_peer_release_p2p - release peer upon P2P device teardown
- @ovpn: the instance being torn down
- */
+void ovpn_peer_release_p2p(struct ovpn_struct *ovpn) +{
- struct ovpn_peer *tmp;
- spin_lock_bh(&ovpn->lock);
- tmp = rcu_dereference_protected(ovpn->peer,
lockdep_is_held(&ovpn->lock));
- if (tmp)
ovpn_peer_del_p2p(tmp, OVPN_DEL_PEER_REASON_TEARDOWN);
- spin_unlock_bh(&ovpn->lock);
+}
+/**
- ovpn_peer_del - delete peer from related tables
- @peer: the peer object to delete
- @reason: reason for deleting peer (will be sent to userspace)
- Return: 0 on success or a negative error code otherwise
- */
+int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason) +{
- switch (peer->ovpn->mode) {
- case OVPN_MODE_P2P:
return ovpn_peer_del_p2p(peer, reason);
- default:
return -EOPNOTSUPP;
- }
+} diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h new file mode 100644 index 0000000000000000000000000000000000000000..6e0c6b14559de886d0677117f5a7ae029214e1f8 --- /dev/null +++ b/drivers/net/ovpn/peer.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- Copyright (C) 2020-2024 OpenVPN, Inc.
- Author: James Yonan james@openvpn.net
Antonio Quartulli <antonio@openvpn.net>
- */
+#ifndef _NET_OVPN_OVPNPEER_H_ +#define _NET_OVPN_OVPNPEER_H_
+#include <net/dst_cache.h>
+/**
- struct ovpn_peer - the main remote peer object
- @ovpn: main openvpn instance this peer belongs to
- @id: unique identifier
- @vpn_addrs: IP addresses assigned over the tunnel
- @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
- @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
- @dst_cache: cache for dst_entry used to send to peer
- @bind: remote peer binding
- @halt: true if ovpn_peer_mark_delete was called
- @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
- @lock: protects binding to peer (bind)
- @refcount: reference counter
- @rcu: used to free peer in an RCU safe way
- @delete_work: deferred cleanup work, used to notify userspace
- */
+struct ovpn_peer {
- struct ovpn_struct *ovpn;
- u32 id;
- struct {
struct in_addr ipv4;
struct in6_addr ipv6;
- } vpn_addrs;
- struct dst_cache dst_cache;
- struct ovpn_bind __rcu *bind;
- bool halt;
- enum ovpn_del_peer_reason delete_reason;
- spinlock_t lock; /* protects bind */
- struct kref refcount;
- struct rcu_head rcu;
- struct work_struct delete_work;
+};
+/**
- ovpn_peer_hold - increase reference counter
- @peer: the peer whose counter should be increased
- Return: true if the counter was increased or false if it was zero already
- */
+static inline bool ovpn_peer_hold(struct ovpn_peer *peer) +{
- return kref_get_unless_zero(&peer->refcount);
+}
+void ovpn_peer_release_kref(struct kref *kref);
+/**
- ovpn_peer_put - decrease reference counter
- @peer: the peer whose counter should be decreased
- */
+static inline void ovpn_peer_put(struct ovpn_peer *peer) +{
- kref_put(&peer->refcount, ovpn_peer_release_kref);
+}
+struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id); +int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer); +int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason); +void ovpn_peer_release_p2p(struct ovpn_struct *ovpn);
+struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
struct sk_buff *skb);
+struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id);
+#endif /* _NET_OVPN_OVPNPEER_H_ */
2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well?
Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown.
It would save a few 'switch(mode)', but force every client to allocate the hashtable for no reason at all. That tradeoff doesn't look very beneficial to me, the P2P-specific code is really simple. And if you keep ovpn->peer to make lookups faster, you're not removing that many 'switch(mode)'.
On 12.11.2024 19:31, Sabrina Dubroca wrote:
2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well?
Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown.
It would save a few 'switch(mode)', but force every client to allocate the hashtable for no reason at all. That tradeoff doesn't look very beneficial to me, the P2P-specific code is really simple. And if you keep ovpn->peer to make lookups faster, you're not removing that many 'switch(mode)'.
Looking at the done review, I can retrospectively conclude that I personally do not like short 'switch' statements and special handlers :)
Seriously, this module has a highest density of switches per KLOC from what I have seen before and a major part of it dedicated to handle the special case of P2P connection. What together look too unusual, so it feels like a flaw in the design. I racked my brains to come up with a better solution and failed. So I took a different approach, inviting people to discuss item pieces of the code to find a solution collectively or to realize that there is no better solution for now.
The problem is that all these hash tables become inefficient with the single entry (P2P case). I was thinking about allocating a table with a single bin, but it still requires hash function run to access the indexed entry.
And back to the hashtable(s) size for the MP mode. 8k-bins table looks a good choice for a normal server with 1-2Gb uplink serving up to 1k connections. But it sill unclear, how this choice can affect installations with a bigger number of connections? Or is this module applicable for embedded solutions? E.g. running a couple of VPN servers on a home router with a few actual connections looks like a waste of RAM. I was about to suggest to use rhashtable due to its dynamic sizing feature, but the module needs three tables. Any better idea?
-- Sergey
2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote:
On 12.11.2024 19:31, Sabrina Dubroca wrote:
2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well?
Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown.
It would save a few 'switch(mode)', but force every client to allocate the hashtable for no reason at all. That tradeoff doesn't look very beneficial to me, the P2P-specific code is really simple. And if you keep ovpn->peer to make lookups faster, you're not removing that many 'switch(mode)'.
Looking at the done review, I can retrospectively conclude that I personally do not like short 'switch' statements and special handlers :)
Seriously, this module has a highest density of switches per KLOC from what I have seen before and a major part of it dedicated to handle the special case of P2P connection.
I think it's fine. Either way there will be two implementations of whatever mode-dependent operation needs to be done. switch doesn't make it more complex than an ops structure.
If you're reading the current version and find ovpn_peer_add, you see directly that it'll do either ovpn_peer_add_mp or ovpn_peer_add_p2p. With an ops structure, you'd have a call to ovpn->ops->peer_add, and you'd have to look up all possible ops structures to know that it can be either ovpn_peer_add_mp or ovpn_peer_add_p2p. If there's an undefined number of implementations living in different modules (like net_device_ops, or L4 protocols), you don't have a choice.
xfrm went the opposite way to what you're proposing a few years ago (see commit 0c620e97b349 ("xfrm: remove output indirection from xfrm_mode") and others), and it made the code simpler.
What together look too unusual, so it feels like a flaw in the design.
I don't think it's a flaw in the design, maybe just different needs from other code you've seen (but similar in some ways to xfrm).
I racked my brains to come up with a better solution and failed. So I took a different approach, inviting people to discuss item pieces of the code to find a solution collectively or to realize that there is no better solution for now.
Sure. And I think there is no better solution, so I'm answering this thread to say that.
The problem is that all these hash tables become inefficient with the single entry (P2P case). I was thinking about allocating a table with a single bin, but it still requires hash function run to access the indexed entry.
And the current implementation relies on fixed-size hashtables (hash_for_each_safe -> HASH_SIZE -> ARRAY_SIZE -> sizeof).
And back to the hashtable(s) size for the MP mode. 8k-bins table looks a good choice for a normal server with 1-2Gb uplink serving up to 1k connections. But it sill unclear, how this choice can affect installations with a bigger number of connections? Or is this module applicable for embedded solutions? E.g. running a couple of VPN servers on a home router with a few actual connections looks like a waste of RAM. I was about to suggest to use rhashtable due to its dynamic sizing feature, but the module needs three tables. Any better idea?
For this initial implementation I think it's fine. Sure, converting to rhashtable (or some other type of dynamically-sized hashtable, if rhashtable doesn't fit) in the future would make sense. But I don't think it's necessary to get the patches into net-next.
On 13.11.2024 12:03, Sabrina Dubroca wrote:
2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote:
On 12.11.2024 19:31, Sabrina Dubroca wrote:
2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well?
Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown.
It would save a few 'switch(mode)', but force every client to allocate the hashtable for no reason at all. That tradeoff doesn't look very beneficial to me, the P2P-specific code is really simple. And if you keep ovpn->peer to make lookups faster, you're not removing that many 'switch(mode)'.
Looking at the done review, I can retrospectively conclude that I personally do not like short 'switch' statements and special handlers :)
Seriously, this module has a highest density of switches per KLOC from what I have seen before and a major part of it dedicated to handle the special case of P2P connection.
I think it's fine. Either way there will be two implementations of whatever mode-dependent operation needs to be done. switch doesn't make it more complex than an ops structure.
If you're reading the current version and find ovpn_peer_add, you see directly that it'll do either ovpn_peer_add_mp or ovpn_peer_add_p2p. With an ops structure, you'd have a call to ovpn->ops->peer_add, and you'd have to look up all possible ops structures to know that it can be either ovpn_peer_add_mp or ovpn_peer_add_p2p. If there's an undefined number of implementations living in different modules (like net_device_ops, or L4 protocols), you don't have a choice.
xfrm went the opposite way to what you're proposing a few years ago (see commit 0c620e97b349 ("xfrm: remove output indirection from xfrm_mode") and others), and it made the code simpler.
I checked this. Florian did a nice rework. And the way of implementation looks reasonable since there are more than two encapsulation modes and handling is more complex than just selecting a function to call.
What I don't like about switches, that it requires extra lines of code and pushes an author to introduce a default case with error handling. It was mentioned that the module unlikely going to support more than two modes. In this context shall we consider ternary operator usage. E.g.:
next_run = ovpn->mode == OVPN_MODE_P2P ? ovpn_peer_keepalive_work_p2p(...) : ovpn_peer_keepalive_work_mp(...);
And back to the hashtable(s) size for the MP mode. 8k-bins table looks a good choice for a normal server with 1-2Gb uplink serving up to 1k connections. But it sill unclear, how this choice can affect installations with a bigger number of connections? Or is this module applicable for embedded solutions? E.g. running a couple of VPN servers on a home router with a few actual connections looks like a waste of RAM. I was about to suggest to use rhashtable due to its dynamic sizing feature, but the module needs three tables. Any better idea?
For this initial implementation I think it's fine. Sure, converting to rhashtable (or some other type of dynamically-sized hashtable, if rhashtable doesn't fit) in the future would make sense. But I don't think it's necessary to get the patches into net-next.
Make sense. Thanks for sharing these thoughts.
-- Sergey
On 21/11/2024 00:22, Sergey Ryazanov wrote:
On 13.11.2024 12:03, Sabrina Dubroca wrote:
2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote:
On 12.11.2024 19:31, Sabrina Dubroca wrote:
2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well?
Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown.
It would save a few 'switch(mode)', but force every client to allocate the hashtable for no reason at all. That tradeoff doesn't look very beneficial to me, the P2P-specific code is really simple. And if you keep ovpn->peer to make lookups faster, you're not removing that many 'switch(mode)'.
Looking at the done review, I can retrospectively conclude that I personally do not like short 'switch' statements and special handlers :)
Seriously, this module has a highest density of switches per KLOC from what I have seen before and a major part of it dedicated to handle the special case of P2P connection.
I think it's fine. Either way there will be two implementations of whatever mode-dependent operation needs to be done. switch doesn't make it more complex than an ops structure.
If you're reading the current version and find ovpn_peer_add, you see directly that it'll do either ovpn_peer_add_mp or ovpn_peer_add_p2p. With an ops structure, you'd have a call to ovpn->ops->peer_add, and you'd have to look up all possible ops structures to know that it can be either ovpn_peer_add_mp or ovpn_peer_add_p2p. If there's an undefined number of implementations living in different modules (like net_device_ops, or L4 protocols), you don't have a choice.
xfrm went the opposite way to what you're proposing a few years ago (see commit 0c620e97b349 ("xfrm: remove output indirection from xfrm_mode") and others), and it made the code simpler.
I checked this. Florian did a nice rework. And the way of implementation looks reasonable since there are more than two encapsulation modes and handling is more complex than just selecting a function to call.
What I don't like about switches, that it requires extra lines of code and pushes an author to introduce a default case with error handling. It was mentioned that the module unlikely going to support more than two modes. In this context shall we consider ternary operator usage. E.g.:
the default case can actually be dropped. That way we can have the compiler warn when one of the enum values is not handled in the switch (should there be a new one at some point). However, the default is just a sanity check against future code changes which may introduce a bug.
next_run = ovpn->mode == OVPN_MODE_P2P ? ovpn_peer_keepalive_work_p2p(...) : ovpn_peer_keepalive_work_mp(...);
I find this ugly to read :-) The switch is much more elegant and straightforward.
Do you agree this is getting more into a bike shed coloring discussion? :-D
Since there is not much gain in changing approach, I think it is better if the maintainer picks a style that he finds more suitable (or simply likes more). no?
And back to the hashtable(s) size for the MP mode. 8k-bins table looks a good choice for a normal server with 1-2Gb uplink serving up to 1k connections. But it sill unclear, how this choice can affect installations with a bigger number of connections? Or is this module applicable for embedded solutions? E.g. running a couple of VPN servers on a home router with a few actual connections looks like a waste of RAM. I was about to suggest to use rhashtable due to its dynamic sizing feature, but the module needs three tables. Any better idea?
For this initial implementation I think it's fine. Sure, converting to rhashtable (or some other type of dynamically-sized hashtable, if rhashtable doesn't fit) in the future would make sense. But I don't think it's necessary to get the patches into net-next.
Agreed. It's in the pipe (along with other features that I have already implemented), but it will come later.
Regards,
Make sense. Thanks for sharing these thoughts.
-- Sergey
On 21.11.2024 23:23, Antonio Quartulli wrote:
On 21/11/2024 00:22, Sergey Ryazanov wrote:
On 13.11.2024 12:03, Sabrina Dubroca wrote:
2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote:
On 12.11.2024 19:31, Sabrina Dubroca wrote:
2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote: > An ovpn_peer object holds the whole status of a remote peer > (regardless whether it is a server or a client). > > This includes status for crypto, tx/rx buffers, napi, etc. > > Only support for one peer is introduced (P2P mode). > Multi peer support is introduced with a later patch.
Reviewing the peer creation/destroying code I came to a generic question. Did you consider keeping a single P2P peer in the peers table as well?
Looks like such approach can greatly simply the code by dropping all these 'switch (ovpn->mode)' checks and implementing a unified peer management. The 'peer' field in the main private data structure can be kept to accelerate lookups, still using peers table for management tasks like removing all the peers on the interface teardown.
It would save a few 'switch(mode)', but force every client to allocate the hashtable for no reason at all. That tradeoff doesn't look very beneficial to me, the P2P-specific code is really simple. And if you keep ovpn->peer to make lookups faster, you're not removing that many 'switch(mode)'.
Looking at the done review, I can retrospectively conclude that I personally do not like short 'switch' statements and special handlers :)
Seriously, this module has a highest density of switches per KLOC from what I have seen before and a major part of it dedicated to handle the special case of P2P connection.
I think it's fine. Either way there will be two implementations of whatever mode-dependent operation needs to be done. switch doesn't make it more complex than an ops structure.
If you're reading the current version and find ovpn_peer_add, you see directly that it'll do either ovpn_peer_add_mp or ovpn_peer_add_p2p. With an ops structure, you'd have a call to ovpn->ops->peer_add, and you'd have to look up all possible ops structures to know that it can be either ovpn_peer_add_mp or ovpn_peer_add_p2p. If there's an undefined number of implementations living in different modules (like net_device_ops, or L4 protocols), you don't have a choice.
xfrm went the opposite way to what you're proposing a few years ago (see commit 0c620e97b349 ("xfrm: remove output indirection from xfrm_mode") and others), and it made the code simpler.
I checked this. Florian did a nice rework. And the way of implementation looks reasonable since there are more than two encapsulation modes and handling is more complex than just selecting a function to call.
What I don't like about switches, that it requires extra lines of code and pushes an author to introduce a default case with error handling. It was mentioned that the module unlikely going to support more than two modes. In this context shall we consider ternary operator usage. E.g.:
the default case can actually be dropped. That way we can have the compiler warn when one of the enum values is not handled in the switch (should there be a new one at some point). However, the default is just a sanity check against future code changes which may introduce a bug.
next_run = ovpn->mode == OVPN_MODE_P2P ? ovpn_peer_keepalive_work_p2p(...) : ovpn_peer_keepalive_work_mp(...);
I find this ugly to read :-)
Yeah. Doesn't look pretty as well.
Just to conclude the discussion. Considering what we discussed here and the Sabrina's point regarding the trampoline penalty for indirect invocation, we do not have a better solution for now other than using switches everywhere.
-- Sergey
On 29.10.2024 12:47, Antonio Quartulli wrote:
[...]
+static void ovpn_peer_release(struct ovpn_peer *peer) +{
- ovpn_bind_reset(peer, NULL);
nit: this empty line after ovpn_bind_reset() is removed in the 'implement basic TX path (UDP)' patch. What tricks git and it produces a sensless diff with 'ovpn_bind_reset(...)' line beeing removed and then introduced again. If you do not like this empty line then remove it here, please :)
- dst_cache_destroy(&peer->dst_cache);
- netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
- kfree_rcu(peer, rcu);
+}
-- Sergey
On 10/11/2024 20:52, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
[...]
+static void ovpn_peer_release(struct ovpn_peer *peer) +{ + ovpn_bind_reset(peer, NULL);
nit: this empty line after ovpn_bind_reset() is removed in the 'implement basic TX path (UDP)' patch. What tricks git and it produces a sensless diff with 'ovpn_bind_reset(...)' line beeing removed and then introduced again. If you do not like this empty line then remove it here, please :)
Thanks! will make sure it won't be introduced at all.
Regards,
+ dst_cache_destroy(&peer->dst_cache); + netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); + kfree_rcu(peer, rcu); +}
-- Sergey
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
+/**
- struct ovpn_peer - the main remote peer object
- @ovpn: main openvpn instance this peer belongs to
- @id: unique identifier
- @vpn_addrs: IP addresses assigned over the tunnel
- @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
- @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
- @dst_cache: cache for dst_entry used to send to peer
- @bind: remote peer binding
- @halt: true if ovpn_peer_mark_delete was called
nit: It's initialized to false in ovpn_peer_new, but then never set to true nor read. Drop it?
- @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
- @lock: protects binding to peer (bind)
nit: as well as the keepalive values that are introduced later? (I guess the comment should be fixed up in patch 15 when the keepalive mechanism is added)
On 20/11/2024 12:56, Sabrina Dubroca wrote:
2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
+/**
- struct ovpn_peer - the main remote peer object
- @ovpn: main openvpn instance this peer belongs to
- @id: unique identifier
- @vpn_addrs: IP addresses assigned over the tunnel
- @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
- @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
- @dst_cache: cache for dst_entry used to send to peer
- @bind: remote peer binding
- @halt: true if ovpn_peer_mark_delete was called
nit: It's initialized to false in ovpn_peer_new, but then never set to true nor read. Drop it?
argh. leftover from some older version. Thanks
- @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
- @lock: protects binding to peer (bind)
nit: as well as the keepalive values that are introduced later? (I guess the comment should be fixed up in patch 15 when the keepalive mechanism is added)
ACK
This specific structure is used in the ovpn kernel module to wrap and carry around a standard kernel socket.
ovpn takes ownership of passed sockets and therefore an ovpn specific objects is attached to them for status tracking purposes.
Initially only UDP support is introduced. TCP will come in a later patch.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/Makefile | 2 + drivers/net/ovpn/socket.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/socket.h | 48 +++++++++++++++++++ drivers/net/ovpn/udp.c | 72 ++++++++++++++++++++++++++++ drivers/net/ovpn/udp.h | 17 +++++++ 5 files changed, 259 insertions(+)
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index ce13499b3e1775a7f2a9ce16c6cb0aa088f93685..56bddc9bef83e0befde6af3c3565bb91731d7b22 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -13,3 +13,5 @@ ovpn-y += io.o ovpn-y += netlink.o ovpn-y += netlink-gen.o ovpn-y += peer.o +ovpn-y += socket.o +ovpn-y += udp.o diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c new file mode 100644 index 0000000000000000000000000000000000000000..090a3232ab0ec19702110f1a90f45c7f10889f6f --- /dev/null +++ b/drivers/net/ovpn/socket.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/net.h> +#include <linux/netdevice.h> + +#include "ovpnstruct.h" +#include "main.h" +#include "io.h" +#include "peer.h" +#include "socket.h" +#include "udp.h" + +static void ovpn_socket_detach(struct socket *sock) +{ + if (!sock) + return; + + sockfd_put(sock); +} + +/** + * ovpn_socket_release_kref - kref_put callback + * @kref: the kref object + */ +void ovpn_socket_release_kref(struct kref *kref) +{ + struct ovpn_socket *sock = container_of(kref, struct ovpn_socket, + refcount); + + ovpn_socket_detach(sock->sock); + kfree_rcu(sock, rcu); +} + +static bool ovpn_socket_hold(struct ovpn_socket *sock) +{ + return kref_get_unless_zero(&sock->refcount); +} + +static struct ovpn_socket *ovpn_socket_get(struct socket *sock) +{ + struct ovpn_socket *ovpn_sock; + + rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + if (!ovpn_socket_hold(ovpn_sock)) { + pr_warn("%s: found ovpn_socket with ref = 0\n", __func__); + ovpn_sock = NULL; + } + rcu_read_unlock(); + + return ovpn_sock; +} + +static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) +{ + int ret = -EOPNOTSUPP; + + if (!sock || !peer) + return -EINVAL; + + if (sock->sk->sk_protocol == IPPROTO_UDP) + ret = ovpn_udp_socket_attach(sock, peer->ovpn); + + return ret; +} + +/** + * ovpn_socket_new - create a new socket and initialize it + * @sock: the kernel socket to embed + * @peer: the peer reachable via this socket + * + * Return: an openvpn socket on success or a negative error code otherwise + */ +struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer) +{ + struct ovpn_socket *ovpn_sock; + int ret; + + ret = ovpn_socket_attach(sock, peer); + if (ret < 0 && ret != -EALREADY) + return ERR_PTR(ret); + + /* if this socket is already owned by this interface, just increase the + * refcounter and use it as expected. + * + * Since UDP sockets can be used to talk to multiple remote endpoints, + * openvpn normally instantiates only one socket and shares it among all + * its peers. For this reason, when we find out that a socket is already + * used for some other peer in *this* instance, we can happily increase + * its refcounter and use it normally. + */ + if (ret == -EALREADY) { + /* caller is expected to increase the sock refcounter before + * passing it to this function. For this reason we drop it if + * not needed, like when this socket is already owned. + */ + ovpn_sock = ovpn_socket_get(sock); + sockfd_put(sock); + return ovpn_sock; + } + + ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL); + if (!ovpn_sock) + return ERR_PTR(-ENOMEM); + + ovpn_sock->ovpn = peer->ovpn; + ovpn_sock->sock = sock; + kref_init(&ovpn_sock->refcount); + + rcu_assign_sk_user_data(sock->sk, ovpn_sock); + + return ovpn_sock; +} diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h new file mode 100644 index 0000000000000000000000000000000000000000..5ad9c5073b085482da95ee8ebf40acf20bf2e4b3 --- /dev/null +++ b/drivers/net/ovpn/socket.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_SOCK_H_ +#define _NET_OVPN_SOCK_H_ + +#include <linux/net.h> +#include <linux/kref.h> +#include <net/sock.h> + +struct ovpn_struct; +struct ovpn_peer; + +/** + * struct ovpn_socket - a kernel socket referenced in the ovpn code + * @ovpn: ovpn instance owning this socket (UDP only) + * @sock: the low level sock object + * @refcount: amount of contexts currently referencing this object + * @rcu: member used to schedule RCU destructor callback + */ +struct ovpn_socket { + struct ovpn_struct *ovpn; + struct socket *sock; + struct kref refcount; + struct rcu_head rcu; +}; + +void ovpn_socket_release_kref(struct kref *kref); + +/** + * ovpn_socket_put - decrease reference counter + * @sock: the socket whose reference counter should be decreased + */ +static inline void ovpn_socket_put(struct ovpn_socket *sock) +{ + kref_put(&sock->refcount, ovpn_socket_release_kref); +} + +struct ovpn_socket *ovpn_socket_new(struct socket *sock, + struct ovpn_peer *peer); + +#endif /* _NET_OVPN_SOCK_H_ */ diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c new file mode 100644 index 0000000000000000000000000000000000000000..c10474d252e19a0626d17a6f5dd328a5e5811551 --- /dev/null +++ b/drivers/net/ovpn/udp.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/netdevice.h> +#include <linux/socket.h> +#include <net/udp.h> + +#include "ovpnstruct.h" +#include "main.h" +#include "socket.h" +#include "udp.h" + +/** + * ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn + * @sock: socket to configure + * @ovpn: the openvp instance to link + * + * After invoking this function, the sock will be controlled by ovpn so that + * any incoming packet may be processed by ovpn first. + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) +{ + struct ovpn_socket *old_data; + int ret = 0; + + /* sanity check */ + if (sock->sk->sk_protocol != IPPROTO_UDP) { + DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + } + + /* make sure no pre-existing encapsulation handler exists */ + rcu_read_lock(); + old_data = rcu_dereference_sk_user_data(sock->sk); + if (!old_data) { + /* socket is currently unused - we can take it */ + rcu_read_unlock(); + return 0; + } + + /* socket is in use. We need to understand if it's owned by this ovpn + * instance or by something else. + * In the former case, we can increase the refcounter and happily + * use it, because the same UDP socket is expected to be shared among + * different peers. + * + * Unlikely TCP, a single UDP socket can be used to talk to many remote + * hosts and therefore openvpn instantiates one only for all its peers + */ + if ((READ_ONCE(udp_sk(sock->sk)->encap_type) == UDP_ENCAP_OVPNINUDP) && + old_data->ovpn == ovpn) { + netdev_dbg(ovpn->dev, + "%s: provided socket already owned by this interface\n", + __func__); + ret = -EALREADY; + } else { + netdev_err(ovpn->dev, + "%s: provided socket already taken by other user\n", + __func__); + ret = -EBUSY; + } + rcu_read_unlock(); + + return ret; +} diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h new file mode 100644 index 0000000000000000000000000000000000000000..f2507f8f2c71ea9d5e5ac5446801e2d56f86700f --- /dev/null +++ b/drivers/net/ovpn/udp.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_UDP_H_ +#define _NET_OVPN_UDP_H_ + +struct ovpn_struct; +struct socket; + +int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn); + +#endif /* _NET_OVPN_UDP_H_ */
On 29.10.2024 12:47, Antonio Quartulli wrote:
This specific structure is used in the ovpn kernel module to wrap and carry around a standard kernel socket.
ovpn takes ownership of passed sockets and therefore an ovpn specific objects is attached to them for status tracking purposes.
Initially only UDP support is introduced. TCP will come in a later patch.
Signed-off-by: Antonio Quartulli antonio@openvpn.net
[...]
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c new file mode 100644 index 0000000000000000000000000000000000000000..090a3232ab0ec19702110f1a90f45c7f10889f6f --- /dev/null +++ b/drivers/net/ovpn/socket.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload
- Copyright (C) 2020-2024 OpenVPN, Inc.
- Author: James Yonan james@openvpn.net
Antonio Quartulli <antonio@openvpn.net>
- */
+#include <linux/net.h> +#include <linux/netdevice.h>
+#include "ovpnstruct.h" +#include "main.h" +#include "io.h" +#include "peer.h" +#include "socket.h" +#include "udp.h"
+static void ovpn_socket_detach(struct socket *sock) +{
- if (!sock)
return;
- sockfd_put(sock);
+}
+/**
- ovpn_socket_release_kref - kref_put callback
- @kref: the kref object
- */
+void ovpn_socket_release_kref(struct kref *kref) +{
- struct ovpn_socket *sock = container_of(kref, struct ovpn_socket,
refcount);
- ovpn_socket_detach(sock->sock);
- kfree_rcu(sock, rcu);
+}
+static bool ovpn_socket_hold(struct ovpn_socket *sock) +{
- return kref_get_unless_zero(&sock->refcount);
Why do we need to wrap this kref acquiring call into the function. Why we cannot simply call kref_get_unless_zero() from ovpn_socket_get()?
+}
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock) +{
- struct ovpn_socket *ovpn_sock;
- rcu_read_lock();
- ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
- if (!ovpn_socket_hold(ovpn_sock)) {
pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
Should we be more specific here and print warning with netdev_warn(ovpn_sock->ovpn->dev, ...)?
And, BTW, how we can pick-up a half-destroyed socket?
ovpn_sock = NULL;
- }
- rcu_read_unlock();
- return ovpn_sock;
+}
+static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) +{
- int ret = -EOPNOTSUPP;
- if (!sock || !peer)
return -EINVAL;
- if (sock->sk->sk_protocol == IPPROTO_UDP)
ret = ovpn_udp_socket_attach(sock, peer->ovpn);
- return ret;
+}
+/**
- ovpn_socket_new - create a new socket and initialize it
- @sock: the kernel socket to embed
- @peer: the peer reachable via this socket
- Return: an openvpn socket on success or a negative error code otherwise
- */
+struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer) +{
- struct ovpn_socket *ovpn_sock;
- int ret;
- ret = ovpn_socket_attach(sock, peer);
- if (ret < 0 && ret != -EALREADY)
return ERR_PTR(ret);
- /* if this socket is already owned by this interface, just increase the
* refcounter and use it as expected.
*
* Since UDP sockets can be used to talk to multiple remote endpoints,
* openvpn normally instantiates only one socket and shares it among all
* its peers. For this reason, when we find out that a socket is already
* used for some other peer in *this* instance, we can happily increase
* its refcounter and use it normally.
*/
- if (ret == -EALREADY) {
/* caller is expected to increase the sock refcounter before
* passing it to this function. For this reason we drop it if
* not needed, like when this socket is already owned.
*/
ovpn_sock = ovpn_socket_get(sock);
sockfd_put(sock);
return ovpn_sock;
- }
- ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL);
- if (!ovpn_sock)
return ERR_PTR(-ENOMEM);
- ovpn_sock->ovpn = peer->ovpn;
- ovpn_sock->sock = sock;
- kref_init(&ovpn_sock->refcount);
- rcu_assign_sk_user_data(sock->sk, ovpn_sock);
- return ovpn_sock;
+} diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h new file mode 100644 index 0000000000000000000000000000000000000000..5ad9c5073b085482da95ee8ebf40acf20bf2e4b3 --- /dev/null +++ b/drivers/net/ovpn/socket.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- Copyright (C) 2020-2024 OpenVPN, Inc.
- Author: James Yonan james@openvpn.net
Antonio Quartulli <antonio@openvpn.net>
- */
+#ifndef _NET_OVPN_SOCK_H_ +#define _NET_OVPN_SOCK_H_
+#include <linux/net.h> +#include <linux/kref.h> +#include <net/sock.h>
+struct ovpn_struct; +struct ovpn_peer;
+/**
- struct ovpn_socket - a kernel socket referenced in the ovpn code
- @ovpn: ovpn instance owning this socket (UDP only)
- @sock: the low level sock object
- @refcount: amount of contexts currently referencing this object
- @rcu: member used to schedule RCU destructor callback
- */
+struct ovpn_socket {
- struct ovpn_struct *ovpn;
- struct socket *sock;
- struct kref refcount;
- struct rcu_head rcu;
+};
+void ovpn_socket_release_kref(struct kref *kref);
+/**
- ovpn_socket_put - decrease reference counter
- @sock: the socket whose reference counter should be decreased
- */
+static inline void ovpn_socket_put(struct ovpn_socket *sock) +{
- kref_put(&sock->refcount, ovpn_socket_release_kref);
+}
+struct ovpn_socket *ovpn_socket_new(struct socket *sock,
struct ovpn_peer *peer);
+#endif /* _NET_OVPN_SOCK_H_ */ diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c new file mode 100644 index 0000000000000000000000000000000000000000..c10474d252e19a0626d17a6f5dd328a5e5811551 --- /dev/null +++ b/drivers/net/ovpn/udp.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload
- Copyright (C) 2019-2024 OpenVPN, Inc.
- Author: Antonio Quartulli antonio@openvpn.net
- */
+#include <linux/netdevice.h> +#include <linux/socket.h> +#include <net/udp.h>
+#include "ovpnstruct.h" +#include "main.h" +#include "socket.h" +#include "udp.h"
+/**
- ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn
- @sock: socket to configure
- @ovpn: the openvp instance to link
- After invoking this function, the sock will be controlled by ovpn so that
- any incoming packet may be processed by ovpn first.
- Return: 0 on success or a negative error code otherwise
- */
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) +{
- struct ovpn_socket *old_data;
- int ret = 0;
- /* sanity check */
- if (sock->sk->sk_protocol != IPPROTO_UDP) {
The function will be called only for a UDP socket. The caller makes sure this is truth. So, why do we need this check?
DEBUG_NET_WARN_ON_ONCE(1);
return -EINVAL;
- }
- /* make sure no pre-existing encapsulation handler exists */
- rcu_read_lock();
- old_data = rcu_dereference_sk_user_data(sock->sk);
- if (!old_data) {
/* socket is currently unused - we can take it */
rcu_read_unlock();
return 0;
- }
- /* socket is in use. We need to understand if it's owned by this ovpn
* instance or by something else.
* In the former case, we can increase the refcounter and happily
* use it, because the same UDP socket is expected to be shared among
* different peers.
*
* Unlikely TCP, a single UDP socket can be used to talk to many remote
* hosts and therefore openvpn instantiates one only for all its peers
*/
- if ((READ_ONCE(udp_sk(sock->sk)->encap_type) == UDP_ENCAP_OVPNINUDP) &&
old_data->ovpn == ovpn) {
netdev_dbg(ovpn->dev,
"%s: provided socket already owned by this interface\n",
__func__);
Why do we need the function name being printed here?
ret = -EALREADY;
- } else {
netdev_err(ovpn->dev,
"%s: provided socket already taken by other user\n",
__func__);
The same comment regarding the function name printing.
And why 'error' level? There is a few ways to fall into this case and each of them implies a user-space screw up. But why we consider these user-space screw ups our (kernel) problem? I suggesting to reduce level at least to 'warning' or maybe even 'notice'. See level definitions in include/linux/kern_levels.h
ret = -EBUSY;
- }
- rcu_read_unlock();
- return ret;
+}
-- Sergey
On 10/11/2024 19:26, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
This specific structure is used in the ovpn kernel module to wrap and carry around a standard kernel socket.
ovpn takes ownership of passed sockets and therefore an ovpn specific objects is attached to them for status tracking purposes.
Initially only UDP support is introduced. TCP will come in a later patch.
Signed-off-by: Antonio Quartulli antonio@openvpn.net
[...]
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c new file mode 100644 index 0000000000000000000000000000000000000000..090a3232ab0ec19702110f1a90f45c7f10889f6f --- /dev/null +++ b/drivers/net/ovpn/socket.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload
- * Copyright (C) 2020-2024 OpenVPN, Inc.
- * Author: James Yonan james@openvpn.net
- * Antonio Quartulli antonio@openvpn.net
- */
+#include <linux/net.h> +#include <linux/netdevice.h>
+#include "ovpnstruct.h" +#include "main.h" +#include "io.h" +#include "peer.h" +#include "socket.h" +#include "udp.h"
+static void ovpn_socket_detach(struct socket *sock) +{ + if (!sock) + return;
+ sockfd_put(sock); +}
+/**
- ovpn_socket_release_kref - kref_put callback
- @kref: the kref object
- */
+void ovpn_socket_release_kref(struct kref *kref) +{ + struct ovpn_socket *sock = container_of(kref, struct ovpn_socket, + refcount);
+ ovpn_socket_detach(sock->sock); + kfree_rcu(sock, rcu); +}
+static bool ovpn_socket_hold(struct ovpn_socket *sock) +{ + return kref_get_unless_zero(&sock->refcount);
Why do we need to wrap this kref acquiring call into the function. Why we cannot simply call kref_get_unless_zero() from ovpn_socket_get()?
Generally I prefer to keep the API among objects consistent. In this specific case, it means having hold() and put() helpers in order to avoid calling kref_* functions directly in the code.
This is a pretty simple case because hold() is called only once, but I still like to be consistent.
+}
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock) +{ + struct ovpn_socket *ovpn_sock;
+ rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + if (!ovpn_socket_hold(ovpn_sock)) { + pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
Should we be more specific here and print warning with netdev_warn(ovpn_sock->ovpn->dev, ...)?
ACK must be an unnoticed leftover
And, BTW, how we can pick-up a half-destroyed socket?
I don't think this can happen under basic conditions. But I am pretty sure in case of bugs this *could* happen quite easily.
[...]
+/**
- ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it
to ovpn
- @sock: socket to configure
- @ovpn: the openvp instance to link
- After invoking this function, the sock will be controlled by ovpn
so that
- any incoming packet may be processed by ovpn first.
- Return: 0 on success or a negative error code otherwise
- */
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) +{ + struct ovpn_socket *old_data; + int ret = 0;
+ /* sanity check */ + if (sock->sk->sk_protocol != IPPROTO_UDP) {
The function will be called only for a UDP socket. The caller makes sure this is truth. So, why do we need this check?
To avoid this function being copied/called somewhere else in the future and we forget about this critical assumption.
Indeed it's a just sanity check.
+ DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + }
+ /* make sure no pre-existing encapsulation handler exists */ + rcu_read_lock(); + old_data = rcu_dereference_sk_user_data(sock->sk); + if (!old_data) { + /* socket is currently unused - we can take it */ + rcu_read_unlock(); + return 0; + }
+ /* socket is in use. We need to understand if it's owned by this ovpn + * instance or by something else. + * In the former case, we can increase the refcounter and happily + * use it, because the same UDP socket is expected to be shared among + * different peers. + * + * Unlikely TCP, a single UDP socket can be used to talk to many remote + * hosts and therefore openvpn instantiates one only for all its peers + */ + if ((READ_ONCE(udp_sk(sock->sk)->encap_type) == UDP_ENCAP_OVPNINUDP) && + old_data->ovpn == ovpn) { + netdev_dbg(ovpn->dev, + "%s: provided socket already owned by this interface\n", + __func__);
Why do we need the function name being printed here?
leftover, will fix, thanks!
+ ret = -EALREADY; + } else { + netdev_err(ovpn->dev, + "%s: provided socket already taken by other user\n", + __func__);
The same comment regarding the function name printing.
ACK
And why 'error' level? There is a few ways to fall into this case and each of them implies a user-space screw up. But why we consider these user-space screw ups our (kernel) problem? I suggesting to reduce level at least to 'warning' or maybe even 'notice'. See level definitions in include/linux/kern_levels.h
Yeah, this can be reduced. The error will be reported to the user via netlink in any case.
Thanks!
Regards,
On 15/11/2024 15:28, Antonio Quartulli wrote: [...]
+}
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock) +{ + struct ovpn_socket *ovpn_sock;
+ rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + if (!ovpn_socket_hold(ovpn_sock)) { + pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
Should we be more specific here and print warning with netdev_warn(ovpn_sock->ovpn->dev, ...)?
ACK must be an unnoticed leftover
I take this back. If refcounter is zero, I'd avoid accessing any field of the ovpn_sock object, thus the pr_warn() without any reference to the device.
Regards,
On 19.11.2024 15:44, Antonio Quartulli wrote:
On 15/11/2024 15:28, Antonio Quartulli wrote: [...]
+}
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock) +{ + struct ovpn_socket *ovpn_sock;
+ rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + if (!ovpn_socket_hold(ovpn_sock)) { + pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
Should we be more specific here and print warning with netdev_warn(ovpn_sock->ovpn->dev, ...)?
ACK must be an unnoticed leftover
I take this back. If refcounter is zero, I'd avoid accessing any field of the ovpn_sock object, thus the pr_warn() without any reference to the device.
If it's such unlikely scenario, then should it be:
if (WARN_ON(!ovpn_socket_hold(ovpn_sock))) ovpn_sock = NULL;
?
-- Sergey
On 21/11/2024 00:34, Sergey Ryazanov wrote:
On 19.11.2024 15:44, Antonio Quartulli wrote:
On 15/11/2024 15:28, Antonio Quartulli wrote: [...]
+}
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock) +{ + struct ovpn_socket *ovpn_sock;
+ rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + if (!ovpn_socket_hold(ovpn_sock)) { + pr_warn("%s: found ovpn_socket with ref = 0\n", __func__);
Should we be more specific here and print warning with netdev_warn(ovpn_sock->ovpn->dev, ...)?
ACK must be an unnoticed leftover
I take this back. If refcounter is zero, I'd avoid accessing any field of the ovpn_sock object, thus the pr_warn() without any reference to the device.
If it's such unlikely scenario, then should it be:
if (WARN_ON(!ovpn_socket_hold(ovpn_sock))) ovpn_sock = NULL;
?
Yeah, makes sense.
Thanks!
-- Sergey
On 15.11.2024 16:28, Antonio Quartulli wrote:
On 10/11/2024 19:26, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
[...]
+static bool ovpn_socket_hold(struct ovpn_socket *sock) +{ + return kref_get_unless_zero(&sock->refcount);
Why do we need to wrap this kref acquiring call into the function. Why we cannot simply call kref_get_unless_zero() from ovpn_socket_get()?
Generally I prefer to keep the API among objects consistent. In this specific case, it means having hold() and put() helpers in order to avoid calling kref_* functions directly in the code.
This is a pretty simple case because hold() is called only once, but I still like to be consistent.
Make sense. The counterpart ovpn_socket_hold() function declared in the header file. Probably that's why I missed it. Shall we move the holding routine there as well?
[...]
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) +{ + struct ovpn_socket *old_data; + int ret = 0;
+ /* sanity check */ + if (sock->sk->sk_protocol != IPPROTO_UDP) {
The function will be called only for a UDP socket. The caller makes sure this is truth. So, why do we need this check?
To avoid this function being copied/called somewhere else in the future and we forget about this critical assumption.
Shall we do the same for all other functions in this file? E.g. ovpn_udp_socket_detach/ovpn_udp_send_skb? And who is giving guarantee that the code will be copied together with the check?
Indeed it's a just sanity check.
Shall we check for pointers validity before dereferencing them?
if (!ovpn || !sock || !sock->sk || !sock->sk->sk_protocol != IPPROTO_UDP) {
With the above questions I would like to show that it's endless number of possible mistakes. And no matter how much do we check, a creative engineer will find a way to ruin the kernel.
So, is it worth to spend code lines for checking socket for being UDP inside a function that has '_udp_' in its name and is called only inside the module?
+ DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + }
-- Sergey
On 21/11/2024 00:58, Sergey Ryazanov wrote:
On 15.11.2024 16:28, Antonio Quartulli wrote:
On 10/11/2024 19:26, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
[...]
+static bool ovpn_socket_hold(struct ovpn_socket *sock) +{ + return kref_get_unless_zero(&sock->refcount);
Why do we need to wrap this kref acquiring call into the function. Why we cannot simply call kref_get_unless_zero() from ovpn_socket_get()?
Generally I prefer to keep the API among objects consistent. In this specific case, it means having hold() and put() helpers in order to avoid calling kref_* functions directly in the code.
This is a pretty simple case because hold() is called only once, but I still like to be consistent.
Make sense. The counterpart ovpn_socket_hold() function declared in the header file. Probably that's why I missed it. Shall we move the holding routine there as well?
I prefer not to, because that function is used only in socket.c. Moving/declaring it in socket.h would export a symbols that is not used anywhere else.
The _put() variant is instead use in peer.c, thus it is exported.
[...]
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) +{ + struct ovpn_socket *old_data; + int ret = 0;
+ /* sanity check */ + if (sock->sk->sk_protocol != IPPROTO_UDP) {
The function will be called only for a UDP socket. The caller makes sure this is truth. So, why do we need this check?
To avoid this function being copied/called somewhere else in the future and we forget about this critical assumption.
Shall we do the same for all other functions in this file? E.g. ovpn_udp_socket_detach/ovpn_udp_send_skb?
Those functions work on a socket that is already owned, thus it already passed this precheck, while _attach() is the one seeing the new socket for the first time.
If this check is triggered it would only be due to a bug. Hence the DEBUG_NET_WARN_ON_ONCE().
And who is giving guarantee that the code will be copied together with the check?
No guarantee is given :)
Indeed it's a just sanity check.
Shall we check for pointers validity before dereferencing them?
if (!ovpn || !sock || !sock->sk || !sock->sk->sk_protocol != IPPROTO_UDP) {
With the above questions I would like to show that it's endless number of possible mistakes. And no matter how much do we check, a creative engineer will find a way to ruin the kernel.
So, is it worth to spend code lines for checking socket for being UDP inside a function that has '_udp_' in its name and is called only inside the module?
Are you suggesting we should drop any kind of check for functions called only within the module? I am not sure I follow..
Anyway, I am dropping the check at the beginning in the function.
Regards,
+ DEBUG_NET_WARN_ON_ONCE(1); + return -EINVAL; + }
-- Sergey
On November 21, 2024 11:36:19 PM, Antonio Quartulli antonio@openvpn.net wrote:
On 21/11/2024 00:58, Sergey Ryazanov wrote:
On 15.11.2024 16:28, Antonio Quartulli wrote:
On 10/11/2024 19:26, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
[...]
+static bool ovpn_socket_hold(struct ovpn_socket *sock) +{ + return kref_get_unless_zero(&sock->refcount);
Why do we need to wrap this kref acquiring call into the function. Why we cannot simply call kref_get_unless_zero() from ovpn_socket_get()?
Generally I prefer to keep the API among objects consistent. In this specific case, it means having hold() and put() helpers in order to avoid calling kref_* functions directly in the code.
This is a pretty simple case because hold() is called only once, but I still like to be consistent.
Make sense. The counterpart ovpn_socket_hold() function declared in the header file. Probably that's why I missed it. Shall we move the holding routine there as well?
I prefer not to, because that function is used only in socket.c. Moving/declaring it in socket.h would export a symbols that is not used anywhere else.
The _put() variant is instead use in peer.c, thus it is exported.
Technically, inline function is not exported. On another hand, it makes sense to keep header file clean. Agree.
[...]
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) +{ + struct ovpn_socket *old_data; + int ret = 0;
+ /* sanity check */ + if (sock->sk->sk_protocol != IPPROTO_UDP) {
The function will be called only for a UDP socket. The caller makes sure this is truth. So, why do we need this check?
To avoid this function being copied/called somewhere else in the future and we forget about this critical assumption.
Shall we do the same for all other functions in this file? E.g. ovpn_udp_socket_detach/ovpn_udp_send_skb?
Those functions work on a socket that is already owned, thus it already passed this precheck, while _attach() is the one seeing the new socket for the first time.
If this check is triggered it would only be due to a bug. Hence the DEBUG_NET_WARN_ON_ONCE().
And who is giving guarantee that the code will be copied together with the check?
No guarantee is given :)
Indeed it's a just sanity check.
Shall we check for pointers validity before dereferencing them?
if (!ovpn || !sock || !sock->sk || !sock->sk->sk_protocol != IPPROTO_UDP) {
With the above questions I would like to show that it's endless number of possible mistakes. And no matter how much do we check, a creative engineer will find a way to ruin the kernel.
So, is it worth to spend code lines for checking socket for being UDP inside a function that has '_udp_' in its name and is called only inside the module?
Are you suggesting we should drop any kind of check for functions called only within the module? I am not sure I follow..
Sanity checks in the internal functions, yes. I'm afraid, they give a false feel of safety. Short a clear code for me is more preferable, especially when I know in advance who and how going to call a function.
Anyway, I am dropping the check at the beginning in the function.
Packets sent over the ovpn interface are processed and transmitted to the connected peer, if any.
Implementation is UDP only. TCP will be added by a later patch.
Note: no crypto/encapsulation exists yet. packets are just captured and sent.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/io.c | 138 +++++++++++++++++++++++++++- drivers/net/ovpn/peer.c | 37 +++++++- drivers/net/ovpn/peer.h | 4 + drivers/net/ovpn/skb.h | 51 +++++++++++ drivers/net/ovpn/udp.c | 232 ++++++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/udp.h | 8 ++ 6 files changed, 468 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index ad3813419c33cbdfe7e8ad6f5c8b444a3540a69f..77ba4d33ae0bd2f52e8bd1c06a182d24285297b4 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -9,14 +9,150 @@
#include <linux/netdevice.h> #include <linux/skbuff.h> +#include <net/gso.h>
#include "io.h" +#include "ovpnstruct.h" +#include "peer.h" +#include "udp.h" +#include "skb.h" +#include "socket.h" + +static void ovpn_encrypt_post(struct sk_buff *skb, int ret) +{ + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; + + if (unlikely(ret < 0)) + goto err; + + skb_mark_not_on_list(skb); + + switch (peer->sock->sock->sk->sk_protocol) { + case IPPROTO_UDP: + ovpn_udp_send_skb(peer->ovpn, peer, skb); + break; + default: + /* no transport configured yet */ + goto err; + } + /* skb passed down the stack - don't free it */ + skb = NULL; +err: + if (unlikely(skb)) + dev_core_stats_tx_dropped_inc(peer->ovpn->dev); + ovpn_peer_put(peer); + kfree_skb(skb); +} + +static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb) +{ + ovpn_skb_cb(skb)->peer = peer; + + /* take a reference to the peer because the crypto code may run async. + * ovpn_encrypt_post() will release it upon completion + */ + if (unlikely(!ovpn_peer_hold(peer))) { + DEBUG_NET_WARN_ON_ONCE(1); + return false; + } + + ovpn_encrypt_post(skb, 0); + return true; +} + +/* send skb to connected peer, if any */ +static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb, + struct ovpn_peer *peer) +{ + struct sk_buff *curr, *next; + + if (likely(!peer)) + /* retrieve peer serving the destination IP of this packet */ + peer = ovpn_peer_get_by_dst(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: no peer to send data to\n", + ovpn->dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + } + + /* this might be a GSO-segmented skb list: process each skb + * independently + */ + skb_list_walk_safe(skb, curr, next) + if (unlikely(!ovpn_encrypt_one(peer, curr))) { + dev_core_stats_tx_dropped_inc(ovpn->dev); + kfree_skb(curr); + } + + /* skb passed over, no need to free */ + skb = NULL; +drop: + if (likely(peer)) + ovpn_peer_put(peer); + kfree_skb_list(skb); +}
/* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) { + struct ovpn_struct *ovpn = netdev_priv(dev); + struct sk_buff *segments, *curr, *next; + struct sk_buff_head skb_list; + __be16 proto; + int ret; + + /* reset netfilter state */ + nf_reset_ct(skb); + + /* verify IP header size in network packet */ + proto = ovpn_ip_check_protocol(skb); + if (unlikely(!proto || skb->protocol != proto)) { + net_err_ratelimited("%s: dropping malformed payload packet\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + } + + if (skb_is_gso(skb)) { + segments = skb_gso_segment(skb, 0); + if (IS_ERR(segments)) { + ret = PTR_ERR(segments); + net_err_ratelimited("%s: cannot segment packet: %d\n", + dev->name, ret); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + } + + consume_skb(skb); + skb = segments; + } + + /* from this moment on, "skb" might be a list */ + + __skb_queue_head_init(&skb_list); + skb_list_walk_safe(skb, curr, next) { + skb_mark_not_on_list(curr); + + curr = skb_share_check(curr, GFP_ATOMIC); + if (unlikely(!curr)) { + net_err_ratelimited("%s: skb_share_check failed\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + continue; + } + + __skb_queue_tail(&skb_list, curr); + } + skb_list.prev->next = NULL; + + ovpn_send(ovpn, skb_list.next, NULL); + + return NETDEV_TX_OK; + +drop: skb_tx_error(skb); - kfree_skb(skb); + kfree_skb_list(skb); return NET_XMIT_DROP; } diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index d9788a0cc99b5839c466c35d1b2266cc6b95fb72..aff3e9e99b7d2dd2fa68484d9a396d43f75a6d0b 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -16,6 +16,7 @@ #include "main.h" #include "netlink.h" #include "peer.h" +#include "socket.h"
/** * ovpn_peer_new - allocate and initialize a new peer object @@ -64,8 +65,10 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) */ static void ovpn_peer_release(struct ovpn_peer *peer) { - ovpn_bind_reset(peer, NULL); + if (peer->sock) + ovpn_socket_put(peer->sock);
+ ovpn_bind_reset(peer, NULL); dst_cache_destroy(&peer->dst_cache); netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); kfree_rcu(peer, rcu); @@ -243,6 +246,38 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) return peer; }
+/** + * ovpn_peer_get_by_dst - Lookup peer to send skb to + * @ovpn: the private data representing the current VPN session + * @skb: the skb to extract the destination address from + * + * This function takes a tunnel packet and looks up the peer to send it to + * after encapsulation. The skb is expected to be the in-tunnel packet, without + * any OpenVPN related header. + * + * Assume that the IP header is accessible in the skb data. + * + * Return: the peer if found or NULL otherwise. + */ +struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, + struct sk_buff *skb) +{ + struct ovpn_peer *peer = NULL; + + /* in P2P mode, no matter the destination, packets are always sent to + * the single peer listening on the other side + */ + if (ovpn->mode == OVPN_MODE_P2P) { + rcu_read_lock(); + peer = rcu_dereference(ovpn->peer); + if (unlikely(peer && !ovpn_peer_hold(peer))) + peer = NULL; + rcu_read_unlock(); + } + + return peer; +} + /** * ovpn_peer_add_p2p - add peer to related tables in a P2P instance * @ovpn: the instance to add the peer to diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 6e0c6b14559de886d0677117f5a7ae029214e1f8..51955aa39f1aa85ce541e289c60e9635cadb9c48 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -19,6 +19,7 @@ * @vpn_addrs: IP addresses assigned over the tunnel * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel + * @sock: the socket being used to talk to this peer * @dst_cache: cache for dst_entry used to send to peer * @bind: remote peer binding * @halt: true if ovpn_peer_mark_delete was called @@ -35,6 +36,7 @@ struct ovpn_peer { struct in_addr ipv4; struct in6_addr ipv6; } vpn_addrs; + struct ovpn_socket *sock; struct dst_cache dst_cache; struct ovpn_bind __rcu *bind; bool halt; @@ -75,5 +77,7 @@ void ovpn_peer_release_p2p(struct ovpn_struct *ovpn); struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb); struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id); +struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, + struct sk_buff *skb);
#endif /* _NET_OVPN_OVPNPEER_H_ */ diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h new file mode 100644 index 0000000000000000000000000000000000000000..e070fe6f448c0b7a9631394ebef4554f6348ef44 --- /dev/null +++ b/drivers/net/ovpn/skb.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + * James Yonan james@openvpn.net + */ + +#ifndef _NET_OVPN_SKB_H_ +#define _NET_OVPN_SKB_H_ + +#include <linux/in.h> +#include <linux/in6.h> +#include <linux/ip.h> +#include <linux/skbuff.h> +#include <linux/socket.h> +#include <linux/types.h> + +struct ovpn_cb { + struct ovpn_peer *peer; +}; + +static inline struct ovpn_cb *ovpn_skb_cb(struct sk_buff *skb) +{ + BUILD_BUG_ON(sizeof(struct ovpn_cb) > sizeof(skb->cb)); + return (struct ovpn_cb *)skb->cb; +} + +/* Return IP protocol version from skb header. + * Return 0 if protocol is not IPv4/IPv6 or cannot be read. + */ +static inline __be16 ovpn_ip_check_protocol(struct sk_buff *skb) +{ + __be16 proto = 0; + + /* skb could be non-linear, + * make sure IP header is in non-fragmented part + */ + if (!pskb_network_may_pull(skb, sizeof(struct iphdr))) + return 0; + + if (ip_hdr(skb)->version == 4) + proto = htons(ETH_P_IP); + else if (ip_hdr(skb)->version == 6) + proto = htons(ETH_P_IPV6); + + return proto; +} + +#endif /* _NET_OVPN_SKB_H_ */ diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index c10474d252e19a0626d17a6f5dd328a5e5811551..d26d7566e9c8dfe91fa77f49c34fb179a9fb2239 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -7,14 +7,246 @@ */
#include <linux/netdevice.h> +#include <linux/inetdevice.h> #include <linux/socket.h> +#include <net/addrconf.h> +#include <net/dst_cache.h> +#include <net/route.h> +#include <net/ipv6_stubs.h> #include <net/udp.h> +#include <net/udp_tunnel.h>
#include "ovpnstruct.h" #include "main.h" +#include "bind.h" +#include "io.h" +#include "peer.h" #include "socket.h" #include "udp.h"
+/** + * ovpn_udp4_output - send IPv4 packet over udp socket + * @ovpn: the openvpn instance + * @bind: the binding related to the destination peer + * @cache: dst cache + * @sk: the socket to send the packet over + * @skb: the packet to send + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_udp4_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind, + struct dst_cache *cache, struct sock *sk, + struct sk_buff *skb) +{ + struct rtable *rt; + struct flowi4 fl = { + .saddr = bind->local.ipv4.s_addr, + .daddr = bind->remote.in4.sin_addr.s_addr, + .fl4_sport = inet_sk(sk)->inet_sport, + .fl4_dport = bind->remote.in4.sin_port, + .flowi4_proto = sk->sk_protocol, + .flowi4_mark = sk->sk_mark, + }; + int ret; + + local_bh_disable(); + rt = dst_cache_get_ip4(cache, &fl.saddr); + if (rt) + goto transmit; + + if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr, + RT_SCOPE_HOST))) { + /* we may end up here when the cached address is not usable + * anymore. In this case we reset address/cache and perform a + * new look up + */ + fl.saddr = 0; + bind->local.ipv4.s_addr = 0; + dst_cache_reset(cache); + } + + rt = ip_route_output_flow(sock_net(sk), &fl, sk); + if (IS_ERR(rt) && PTR_ERR(rt) == -EINVAL) { + fl.saddr = 0; + bind->local.ipv4.s_addr = 0; + dst_cache_reset(cache); + + rt = ip_route_output_flow(sock_net(sk), &fl, sk); + } + + if (IS_ERR(rt)) { + ret = PTR_ERR(rt); + net_dbg_ratelimited("%s: no route to host %pISpc: %d\n", + ovpn->dev->name, &bind->remote.in4, ret); + goto err; + } + dst_cache_set_ip4(cache, &rt->dst, fl.saddr); + +transmit: + udp_tunnel_xmit_skb(rt, sk, skb, fl.saddr, fl.daddr, 0, + ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport, + fl.fl4_dport, false, sk->sk_no_check_tx); + ret = 0; +err: + local_bh_enable(); + return ret; +} + +#if IS_ENABLED(CONFIG_IPV6) +/** + * ovpn_udp6_output - send IPv6 packet over udp socket + * @ovpn: the openvpn instance + * @bind: the binding related to the destination peer + * @cache: dst cache + * @sk: the socket to send the packet over + * @skb: the packet to send + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_udp6_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind, + struct dst_cache *cache, struct sock *sk, + struct sk_buff *skb) +{ + struct dst_entry *dst; + int ret; + + struct flowi6 fl = { + .saddr = bind->local.ipv6, + .daddr = bind->remote.in6.sin6_addr, + .fl6_sport = inet_sk(sk)->inet_sport, + .fl6_dport = bind->remote.in6.sin6_port, + .flowi6_proto = sk->sk_protocol, + .flowi6_mark = sk->sk_mark, + .flowi6_oif = bind->remote.in6.sin6_scope_id, + }; + + local_bh_disable(); + dst = dst_cache_get_ip6(cache, &fl.saddr); + if (dst) + goto transmit; + + if (unlikely(!ipv6_chk_addr(sock_net(sk), &fl.saddr, NULL, 0))) { + /* we may end up here when the cached address is not usable + * anymore. In this case we reset address/cache and perform a + * new look up + */ + fl.saddr = in6addr_any; + bind->local.ipv6 = in6addr_any; + dst_cache_reset(cache); + } + + dst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(sk), sk, &fl, NULL); + if (IS_ERR(dst)) { + ret = PTR_ERR(dst); + net_dbg_ratelimited("%s: no route to host %pISpc: %d\n", + ovpn->dev->name, &bind->remote.in6, ret); + goto err; + } + dst_cache_set_ip6(cache, dst, &fl.saddr); + +transmit: + udp_tunnel6_xmit_skb(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0, + ip6_dst_hoplimit(dst), 0, fl.fl6_sport, + fl.fl6_dport, udp_get_no_check6_tx(sk)); + ret = 0; +err: + local_bh_enable(); + return ret; +} +#endif + +/** + * ovpn_udp_output - transmit skb using udp-tunnel + * @ovpn: the openvpn instance + * @bind: the binding related to the destination peer + * @cache: dst cache + * @sk: the socket to send the packet over + * @skb: the packet to send + * + * rcu_read_lock should be held on entry. + * On return, the skb is consumed. + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_udp_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind, + struct dst_cache *cache, struct sock *sk, + struct sk_buff *skb) +{ + int ret; + + /* set sk to null if skb is already orphaned */ + if (!skb->destructor) + skb->sk = NULL; + + /* always permit openvpn-created packets to be (outside) fragmented */ + skb->ignore_df = 1; + + switch (bind->remote.in4.sin_family) { + case AF_INET: + ret = ovpn_udp4_output(ovpn, bind, cache, sk, skb); + break; +#if IS_ENABLED(CONFIG_IPV6) + case AF_INET6: + ret = ovpn_udp6_output(ovpn, bind, cache, sk, skb); + break; +#endif + default: + ret = -EAFNOSUPPORT; + break; + } + + return ret; +} + +/** + * ovpn_udp_send_skb - prepare skb and send it over via UDP + * @ovpn: the openvpn instance + * @peer: the destination peer + * @skb: the packet to send + */ +void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, + struct sk_buff *skb) +{ + struct ovpn_bind *bind; + unsigned int pkt_len; + struct socket *sock; + int ret = -1; + + skb->dev = ovpn->dev; + /* no checksum performed at this layer */ + skb->ip_summed = CHECKSUM_NONE; + + /* get socket info */ + sock = peer->sock->sock; + if (unlikely(!sock)) { + net_warn_ratelimited("%s: no sock for remote peer\n", __func__); + goto out; + } + + rcu_read_lock(); + /* get binding */ + bind = rcu_dereference(peer->bind); + if (unlikely(!bind)) { + net_warn_ratelimited("%s: no bind for remote peer\n", __func__); + goto out_unlock; + } + + /* crypto layer -> transport (UDP) */ + pkt_len = skb->len; + ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb); + +out_unlock: + rcu_read_unlock(); +out: + if (unlikely(ret < 0)) { + dev_core_stats_tx_dropped_inc(ovpn->dev); + kfree_skb(skb); + return; + } + + dev_sw_netstats_tx_add(ovpn->dev, 1, pkt_len); +} + /** * ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn * @sock: socket to configure diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h index f2507f8f2c71ea9d5e5ac5446801e2d56f86700f..e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4 100644 --- a/drivers/net/ovpn/udp.h +++ b/drivers/net/ovpn/udp.h @@ -9,9 +9,17 @@ #ifndef _NET_OVPN_UDP_H_ #define _NET_OVPN_UDP_H_
+#include <linux/skbuff.h> +#include <net/sock.h> + +struct ovpn_peer; struct ovpn_struct; +struct sk_buff; struct socket;
int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn);
+void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, + struct sk_buff *skb); + #endif /* _NET_OVPN_UDP_H_ */
2024-10-29, 11:47:21 +0100, Antonio Quartulli wrote:
+static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb,
struct ovpn_peer *peer)
+{
- struct sk_buff *curr, *next;
- if (likely(!peer))
/* retrieve peer serving the destination IP of this packet */
peer = ovpn_peer_get_by_dst(ovpn, skb);
- if (unlikely(!peer)) {
net_dbg_ratelimited("%s: no peer to send data to\n",
ovpn->dev->name);
dev_core_stats_tx_dropped_inc(ovpn->dev);
goto drop;
- }
- /* this might be a GSO-segmented skb list: process each skb
* independently
*/
- skb_list_walk_safe(skb, curr, next)
nit (if you end up reposting): there should probably be some braces around the (multi-line) loop body.
if (unlikely(!ovpn_encrypt_one(peer, curr))) {
dev_core_stats_tx_dropped_inc(ovpn->dev);
kfree_skb(curr);
}
+void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
struct sk_buff *skb)
+{
[...]
- /* crypto layer -> transport (UDP) */
- pkt_len = skb->len;
- ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb);
+out_unlock:
- rcu_read_unlock();
+out:
- if (unlikely(ret < 0)) {
dev_core_stats_tx_dropped_inc(ovpn->dev);
kfree_skb(skb);
return;
- }
- dev_sw_netstats_tx_add(ovpn->dev, 1, pkt_len);
If I'm following things correctly, that's already been counted:
ovpn_udp_output -> ovpn_udp4_output -> udp_tunnel_xmit_skb -> iptunnel_xmit -> iptunnel_xmit_stats
which does (on success) the same thing as dev_sw_netstats_tx_add. On failure it increments a different tx_dropped counter than what dev_core_stats_tx_dropped_inc, but they should get summed in the end.
+}
On 30/10/2024 18:14, Sabrina Dubroca wrote:
2024-10-29, 11:47:21 +0100, Antonio Quartulli wrote:
+static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb,
struct ovpn_peer *peer)
+{
- struct sk_buff *curr, *next;
- if (likely(!peer))
/* retrieve peer serving the destination IP of this packet */
peer = ovpn_peer_get_by_dst(ovpn, skb);
- if (unlikely(!peer)) {
net_dbg_ratelimited("%s: no peer to send data to\n",
ovpn->dev->name);
dev_core_stats_tx_dropped_inc(ovpn->dev);
goto drop;
- }
- /* this might be a GSO-segmented skb list: process each skb
* independently
*/
- skb_list_walk_safe(skb, curr, next)
nit (if you end up reposting): there should probably be some braces around the (multi-line) loop body.
ACK
if (unlikely(!ovpn_encrypt_one(peer, curr))) {
dev_core_stats_tx_dropped_inc(ovpn->dev);
kfree_skb(curr);
}
+void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
struct sk_buff *skb)
+{
[...]
- /* crypto layer -> transport (UDP) */
- pkt_len = skb->len;
- ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb);
+out_unlock:
- rcu_read_unlock();
+out:
- if (unlikely(ret < 0)) {
dev_core_stats_tx_dropped_inc(ovpn->dev);
kfree_skb(skb);
return;
- }
- dev_sw_netstats_tx_add(ovpn->dev, 1, pkt_len);
If I'm following things correctly, that's already been counted:
ovpn_udp_output -> ovpn_udp4_output -> udp_tunnel_xmit_skb -> iptunnel_xmit -> iptunnel_xmit_stats
which does (on success) the same thing as dev_sw_netstats_tx_add. On
Right. This means we can remove that call to tx_add().
failure it increments a different tx_dropped counter than what dev_core_stats_tx_dropped_inc, but they should get summed in the end.
It seems they are summed up in dev_get_tstats64(), therefore I should remove the tx_dropped_inc() call to avoid double counting.
Thanks!
Cheers,
+}
On 29.10.2024 12:47, Antonio Quartulli wrote:
Packets sent over the ovpn interface are processed and transmitted to the connected peer, if any.
Implementation is UDP only. TCP will be added by a later patch.
Note: no crypto/encapsulation exists yet. packets are just captured and sent.
Signed-off-by: Antonio Quartulli antonio@openvpn.net
drivers/net/ovpn/io.c | 138 +++++++++++++++++++++++++++- drivers/net/ovpn/peer.c | 37 +++++++- drivers/net/ovpn/peer.h | 4 + drivers/net/ovpn/skb.h | 51 +++++++++++ drivers/net/ovpn/udp.c | 232 ++++++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/udp.h | 8 ++ 6 files changed, 468 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index ad3813419c33cbdfe7e8ad6f5c8b444a3540a69f..77ba4d33ae0bd2f52e8bd1c06a182d24285297b4 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -9,14 +9,150 @@ #include <linux/netdevice.h> #include <linux/skbuff.h> +#include <net/gso.h> #include "io.h" +#include "ovpnstruct.h" +#include "peer.h" +#include "udp.h" +#include "skb.h" +#include "socket.h"
+static void ovpn_encrypt_post(struct sk_buff *skb, int ret) +{
- struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
- if (unlikely(ret < 0))
goto err;
- skb_mark_not_on_list(skb);
- switch (peer->sock->sock->sk->sk_protocol) {
- case IPPROTO_UDP:
ovpn_udp_send_skb(peer->ovpn, peer, skb);
break;
- default:
/* no transport configured yet */
goto err;
- }
Did you consider calling protocol specific sending function indirectly? E.g.:
peer->sock->send(peer, skb);
- /* skb passed down the stack - don't free it */
- skb = NULL;
+err:
- if (unlikely(skb))
dev_core_stats_tx_dropped_inc(peer->ovpn->dev);
- ovpn_peer_put(peer);
- kfree_skb(skb);
+}
+static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb) +{
- ovpn_skb_cb(skb)->peer = peer;
- /* take a reference to the peer because the crypto code may run async.
* ovpn_encrypt_post() will release it upon completion
*/
- if (unlikely(!ovpn_peer_hold(peer))) {
DEBUG_NET_WARN_ON_ONCE(1);
return false;
- }
- ovpn_encrypt_post(skb, 0);
- return true;
+}
+/* send skb to connected peer, if any */ +static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb,
struct ovpn_peer *peer)
+{
- struct sk_buff *curr, *next;
- if (likely(!peer))
/* retrieve peer serving the destination IP of this packet */
peer = ovpn_peer_get_by_dst(ovpn, skb);
- if (unlikely(!peer)) {
net_dbg_ratelimited("%s: no peer to send data to\n",
ovpn->dev->name);
dev_core_stats_tx_dropped_inc(ovpn->dev);
goto drop;
- }
The function is called only from ovpn_xmit_special() and from ovpn_net_xmit(). The keepalive always provides a peer object, while ovpn_net_xmit() never do it. If we move the peer lookup call into ovpn_net_xmit() then we can eliminate all the above peer checks.
- /* this might be a GSO-segmented skb list: process each skb
* independently
*/
- skb_list_walk_safe(skb, curr, next)
if (unlikely(!ovpn_encrypt_one(peer, curr))) {
dev_core_stats_tx_dropped_inc(ovpn->dev);
kfree_skb(curr);
}
- /* skb passed over, no need to free */
- skb = NULL;
+drop:
- if (likely(peer))
ovpn_peer_put(peer);
- kfree_skb_list(skb);
+} /* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) {
- struct ovpn_struct *ovpn = netdev_priv(dev);
- struct sk_buff *segments, *curr, *next;
- struct sk_buff_head skb_list;
- __be16 proto;
- int ret;
- /* reset netfilter state */
- nf_reset_ct(skb);
- /* verify IP header size in network packet */
- proto = ovpn_ip_check_protocol(skb);
- if (unlikely(!proto || skb->protocol != proto)) {
net_err_ratelimited("%s: dropping malformed payload packet\n",
dev->name);
dev_core_stats_tx_dropped_inc(ovpn->dev);
goto drop;
- }
- if (skb_is_gso(skb)) {
segments = skb_gso_segment(skb, 0);
if (IS_ERR(segments)) {
ret = PTR_ERR(segments);
net_err_ratelimited("%s: cannot segment packet: %d\n",
dev->name, ret);
dev_core_stats_tx_dropped_inc(ovpn->dev);
goto drop;
}
consume_skb(skb);
skb = segments;
- }
- /* from this moment on, "skb" might be a list */
- __skb_queue_head_init(&skb_list);
- skb_list_walk_safe(skb, curr, next) {
skb_mark_not_on_list(curr);
curr = skb_share_check(curr, GFP_ATOMIC);
if (unlikely(!curr)) {
net_err_ratelimited("%s: skb_share_check failed\n",
dev->name);
dev_core_stats_tx_dropped_inc(ovpn->dev);
continue;
}
__skb_queue_tail(&skb_list, curr);
- }
- skb_list.prev->next = NULL;
I belive, the peer lookup should be done here to call ovpn_send() with proper peer object and simplify it.
- ovpn_send(ovpn, skb_list.next, NULL);
- return NETDEV_TX_OK;
+drop: skb_tx_error(skb);
- kfree_skb(skb);
- kfree_skb_list(skb); return NET_XMIT_DROP; }
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index d9788a0cc99b5839c466c35d1b2266cc6b95fb72..aff3e9e99b7d2dd2fa68484d9a396d43f75a6d0b 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -16,6 +16,7 @@ #include "main.h" #include "netlink.h" #include "peer.h" +#include "socket.h" /**
- ovpn_peer_new - allocate and initialize a new peer object
@@ -64,8 +65,10 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) */ static void ovpn_peer_release(struct ovpn_peer *peer) {
- ovpn_bind_reset(peer, NULL);
- if (peer->sock)
ovpn_socket_put(peer->sock);
- ovpn_bind_reset(peer, NULL); dst_cache_destroy(&peer->dst_cache); netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); kfree_rcu(peer, rcu);
@@ -243,6 +246,38 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) return peer; } +/**
- ovpn_peer_get_by_dst - Lookup peer to send skb to
- @ovpn: the private data representing the current VPN session
- @skb: the skb to extract the destination address from
- This function takes a tunnel packet and looks up the peer to send it to
- after encapsulation. The skb is expected to be the in-tunnel packet, without
- any OpenVPN related header.
- Assume that the IP header is accessible in the skb data.
- Return: the peer if found or NULL otherwise.
- */
+struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
struct sk_buff *skb)
+{
- struct ovpn_peer *peer = NULL;
- /* in P2P mode, no matter the destination, packets are always sent to
* the single peer listening on the other side
*/
- if (ovpn->mode == OVPN_MODE_P2P) {
rcu_read_lock();
peer = rcu_dereference(ovpn->peer);
if (unlikely(peer && !ovpn_peer_hold(peer)))
peer = NULL;
rcu_read_unlock();
- }
- return peer;
+}
- /**
- ovpn_peer_add_p2p - add peer to related tables in a P2P instance
- @ovpn: the instance to add the peer to
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 6e0c6b14559de886d0677117f5a7ae029214e1f8..51955aa39f1aa85ce541e289c60e9635cadb9c48 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -19,6 +19,7 @@
- @vpn_addrs: IP addresses assigned over the tunnel
- @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
- @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
- @sock: the socket being used to talk to this peer
- @dst_cache: cache for dst_entry used to send to peer
- @bind: remote peer binding
- @halt: true if ovpn_peer_mark_delete was called
@@ -35,6 +36,7 @@ struct ovpn_peer { struct in_addr ipv4; struct in6_addr ipv6; } vpn_addrs;
- struct ovpn_socket *sock; struct dst_cache dst_cache; struct ovpn_bind __rcu *bind; bool halt;
@@ -75,5 +77,7 @@ void ovpn_peer_release_p2p(struct ovpn_struct *ovpn); struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb); struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id); +struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn,
struct sk_buff *skb);
#endif /* _NET_OVPN_OVPNPEER_H_ */ diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h new file mode 100644 index 0000000000000000000000000000000000000000..e070fe6f448c0b7a9631394ebef4554f6348ef44 --- /dev/null +++ b/drivers/net/ovpn/skb.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- Copyright (C) 2020-2024 OpenVPN, Inc.
- Author: Antonio Quartulli antonio@openvpn.net
James Yonan <james@openvpn.net>
- */
+#ifndef _NET_OVPN_SKB_H_ +#define _NET_OVPN_SKB_H_
+#include <linux/in.h> +#include <linux/in6.h> +#include <linux/ip.h> +#include <linux/skbuff.h> +#include <linux/socket.h> +#include <linux/types.h>
+struct ovpn_cb {
- struct ovpn_peer *peer;
+};
+static inline struct ovpn_cb *ovpn_skb_cb(struct sk_buff *skb) +{
- BUILD_BUG_ON(sizeof(struct ovpn_cb) > sizeof(skb->cb));
- return (struct ovpn_cb *)skb->cb;
+}
+/* Return IP protocol version from skb header.
- Return 0 if protocol is not IPv4/IPv6 or cannot be read.
- */
+static inline __be16 ovpn_ip_check_protocol(struct sk_buff *skb) +{
- __be16 proto = 0;
- /* skb could be non-linear,
* make sure IP header is in non-fragmented part
*/
- if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
return 0;
- if (ip_hdr(skb)->version == 4)
proto = htons(ETH_P_IP);
- else if (ip_hdr(skb)->version == 6)
proto = htons(ETH_P_IPV6);
- return proto;
+}
+#endif /* _NET_OVPN_SKB_H_ */ diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index c10474d252e19a0626d17a6f5dd328a5e5811551..d26d7566e9c8dfe91fa77f49c34fb179a9fb2239 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -7,14 +7,246 @@ */ #include <linux/netdevice.h> +#include <linux/inetdevice.h> #include <linux/socket.h> +#include <net/addrconf.h> +#include <net/dst_cache.h> +#include <net/route.h> +#include <net/ipv6_stubs.h> #include <net/udp.h> +#include <net/udp_tunnel.h> #include "ovpnstruct.h" #include "main.h" +#include "bind.h" +#include "io.h" +#include "peer.h" #include "socket.h" #include "udp.h" +/**
- ovpn_udp4_output - send IPv4 packet over udp socket
- @ovpn: the openvpn instance
- @bind: the binding related to the destination peer
- @cache: dst cache
- @sk: the socket to send the packet over
- @skb: the packet to send
- Return: 0 on success or a negative error code otherwise
- */
+static int ovpn_udp4_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
struct dst_cache *cache, struct sock *sk,
struct sk_buff *skb)
+{
- struct rtable *rt;
- struct flowi4 fl = {
.saddr = bind->local.ipv4.s_addr,
.daddr = bind->remote.in4.sin_addr.s_addr,
.fl4_sport = inet_sk(sk)->inet_sport,
.fl4_dport = bind->remote.in4.sin_port,
.flowi4_proto = sk->sk_protocol,
.flowi4_mark = sk->sk_mark,
- };
- int ret;
- local_bh_disable();
- rt = dst_cache_get_ip4(cache, &fl.saddr);
- if (rt)
goto transmit;
- if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr,
RT_SCOPE_HOST))) {
/* we may end up here when the cached address is not usable
* anymore. In this case we reset address/cache and perform a
* new look up
*/
fl.saddr = 0;
bind->local.ipv4.s_addr = 0;
dst_cache_reset(cache);
- }
- rt = ip_route_output_flow(sock_net(sk), &fl, sk);
- if (IS_ERR(rt) && PTR_ERR(rt) == -EINVAL) {
fl.saddr = 0;
bind->local.ipv4.s_addr = 0;
dst_cache_reset(cache);
rt = ip_route_output_flow(sock_net(sk), &fl, sk);
- }
- if (IS_ERR(rt)) {
ret = PTR_ERR(rt);
net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
ovpn->dev->name, &bind->remote.in4, ret);
goto err;
- }
- dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
+transmit:
- udp_tunnel_xmit_skb(rt, sk, skb, fl.saddr, fl.daddr, 0,
ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
fl.fl4_dport, false, sk->sk_no_check_tx);
- ret = 0;
+err:
- local_bh_enable();
- return ret;
+}
+#if IS_ENABLED(CONFIG_IPV6) +/**
- ovpn_udp6_output - send IPv6 packet over udp socket
- @ovpn: the openvpn instance
- @bind: the binding related to the destination peer
- @cache: dst cache
- @sk: the socket to send the packet over
- @skb: the packet to send
- Return: 0 on success or a negative error code otherwise
- */
+static int ovpn_udp6_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
struct dst_cache *cache, struct sock *sk,
struct sk_buff *skb)
+{
- struct dst_entry *dst;
- int ret;
- struct flowi6 fl = {
.saddr = bind->local.ipv6,
.daddr = bind->remote.in6.sin6_addr,
.fl6_sport = inet_sk(sk)->inet_sport,
.fl6_dport = bind->remote.in6.sin6_port,
.flowi6_proto = sk->sk_protocol,
.flowi6_mark = sk->sk_mark,
.flowi6_oif = bind->remote.in6.sin6_scope_id,
- };
- local_bh_disable();
- dst = dst_cache_get_ip6(cache, &fl.saddr);
- if (dst)
goto transmit;
- if (unlikely(!ipv6_chk_addr(sock_net(sk), &fl.saddr, NULL, 0))) {
/* we may end up here when the cached address is not usable
* anymore. In this case we reset address/cache and perform a
* new look up
*/
fl.saddr = in6addr_any;
bind->local.ipv6 = in6addr_any;
dst_cache_reset(cache);
- }
- dst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(sk), sk, &fl, NULL);
- if (IS_ERR(dst)) {
ret = PTR_ERR(dst);
net_dbg_ratelimited("%s: no route to host %pISpc: %d\n",
ovpn->dev->name, &bind->remote.in6, ret);
goto err;
- }
- dst_cache_set_ip6(cache, dst, &fl.saddr);
+transmit:
- udp_tunnel6_xmit_skb(dst, sk, skb, skb->dev, &fl.saddr, &fl.daddr, 0,
ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
fl.fl6_dport, udp_get_no_check6_tx(sk));
- ret = 0;
+err:
- local_bh_enable();
- return ret;
+} +#endif
+/**
- ovpn_udp_output - transmit skb using udp-tunnel
- @ovpn: the openvpn instance
- @bind: the binding related to the destination peer
- @cache: dst cache
- @sk: the socket to send the packet over
- @skb: the packet to send
- rcu_read_lock should be held on entry.
- On return, the skb is consumed.
- Return: 0 on success or a negative error code otherwise
- */
+static int ovpn_udp_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
struct dst_cache *cache, struct sock *sk,
struct sk_buff *skb)
+{
- int ret;
- /* set sk to null if skb is already orphaned */
- if (!skb->destructor)
skb->sk = NULL;
- /* always permit openvpn-created packets to be (outside) fragmented */
- skb->ignore_df = 1;
- switch (bind->remote.in4.sin_family) {
- case AF_INET:
ret = ovpn_udp4_output(ovpn, bind, cache, sk, skb);
break;
+#if IS_ENABLED(CONFIG_IPV6)
- case AF_INET6:
ret = ovpn_udp6_output(ovpn, bind, cache, sk, skb);
break;
+#endif
- default:
ret = -EAFNOSUPPORT;
break;
- }
- return ret;
+}
+/**
- ovpn_udp_send_skb - prepare skb and send it over via UDP
- @ovpn: the openvpn instance
- @peer: the destination peer
- @skb: the packet to send
- */
+void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
struct sk_buff *skb)
+{
- struct ovpn_bind *bind;
- unsigned int pkt_len;
- struct socket *sock;
- int ret = -1;
- skb->dev = ovpn->dev;
- /* no checksum performed at this layer */
- skb->ip_summed = CHECKSUM_NONE;
- /* get socket info */
- sock = peer->sock->sock;
- if (unlikely(!sock)) {
net_warn_ratelimited("%s: no sock for remote peer\n", __func__);
If we do not have netdev_{err,warn,etc}_ratelimited() helper functions, can we at least emulate it like this:
net_warn_ratelimited("%s: no UDP sock for remote peer #%u\n", netdev_name(ovpn->dev), peer->id);
or just use netdev_warn_once(...) since the condition looks more speculative than expected.
Peer id and interface name are more informative than just a function name.
goto out;
- }
- rcu_read_lock();
- /* get binding */
- bind = rcu_dereference(peer->bind);
- if (unlikely(!bind)) {
net_warn_ratelimited("%s: no bind for remote peer\n", __func__);
Ditto
goto out_unlock;
- }
- /* crypto layer -> transport (UDP) */
- pkt_len = skb->len;
- ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb);
+out_unlock:
- rcu_read_unlock();
+out:
- if (unlikely(ret < 0)) {
dev_core_stats_tx_dropped_inc(ovpn->dev);
kfree_skb(skb);
return;
- }
- dev_sw_netstats_tx_add(ovpn->dev, 1, pkt_len);
+}
- /**
- ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn
- @sock: socket to configure
diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h index f2507f8f2c71ea9d5e5ac5446801e2d56f86700f..e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4 100644 --- a/drivers/net/ovpn/udp.h +++ b/drivers/net/ovpn/udp.h @@ -9,9 +9,17 @@ #ifndef _NET_OVPN_UDP_H_ #define _NET_OVPN_UDP_H_ +#include <linux/skbuff.h> +#include <net/sock.h>
+struct ovpn_peer; struct ovpn_struct; +struct sk_buff;
This declaration looks odd since we already have skbuff.h included above.
struct socket; int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn); +void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
struct sk_buff *skb);
- #endif /* _NET_OVPN_UDP_H_ */
2024-11-11, 00:32:51 +0200, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+static void ovpn_encrypt_post(struct sk_buff *skb, int ret) +{
- struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
- if (unlikely(ret < 0))
goto err;
- skb_mark_not_on_list(skb);
- switch (peer->sock->sock->sk->sk_protocol) {
- case IPPROTO_UDP:
ovpn_udp_send_skb(peer->ovpn, peer, skb);
break;
- default:
/* no transport configured yet */
goto err;
- }
Did you consider calling protocol specific sending function indirectly? E.g.:
peer->sock->send(peer, skb);
In a case where - only 2 implementations exist - no other implementation is likely to be added in the future - both implementations are part of the same module
I don't think indirect calls are beneficial (especially after the meltdown/etc mitigations, see for example 4f24ed77dec9 ("udp: use indirect call wrappers for GRO socket lookup"), 0e219ae48c3b ("net: use indirect calls helpers for L3 handler hooks"), and many others similar patches).
[...]
- ovpn_send(ovpn, skb_list.next, NULL);
- return NETDEV_TX_OK;
+drop: skb_tx_error(skb);
- kfree_skb(skb);
- kfree_skb_list(skb); return NET_XMIT_DROP; }
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index d9788a0cc99b5839c466c35d1b2266cc6b95fb72..aff3e9e99b7d2dd2fa68484d9a396d43f75a6d0b 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c
[very long chunk of Antonio's patch quoted without comments]
Please trim your replies to only the necessary context.
On 10/11/2024 23:32, Sergey Ryazanov wrote: [...]
+/* send skb to connected peer, if any */ +static void ovpn_send(struct ovpn_struct *ovpn, struct sk_buff *skb, + struct ovpn_peer *peer) +{ + struct sk_buff *curr, *next;
+ if (likely(!peer)) + /* retrieve peer serving the destination IP of this packet */ + peer = ovpn_peer_get_by_dst(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: no peer to send data to\n", + ovpn->dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + }
The function is called only from ovpn_xmit_special() and from ovpn_net_xmit(). The keepalive always provides a peer object, while ovpn_net_xmit() never do it. If we move the peer lookup call into ovpn_net_xmit() then we can eliminate all the above peer checks.
yeah, I think that's a good idea! See below..
+ /* this might be a GSO-segmented skb list: process each skb + * independently + */ + skb_list_walk_safe(skb, curr, next) + if (unlikely(!ovpn_encrypt_one(peer, curr))) { + dev_core_stats_tx_dropped_inc(ovpn->dev); + kfree_skb(curr); + }
+ /* skb passed over, no need to free */ + skb = NULL; +drop: + if (likely(peer)) + ovpn_peer_put(peer); + kfree_skb_list(skb); +}
..because this error path disappears as well.
And I can move the stats increment to ovpn_net_xmit() in order to avoid counting keepalive packets as vpn data.
/* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) { + struct ovpn_struct *ovpn = netdev_priv(dev); + struct sk_buff *segments, *curr, *next; + struct sk_buff_head skb_list; + __be16 proto; + int ret;
+ /* reset netfilter state */ + nf_reset_ct(skb);
+ /* verify IP header size in network packet */ + proto = ovpn_ip_check_protocol(skb); + if (unlikely(!proto || skb->protocol != proto)) { + net_err_ratelimited("%s: dropping malformed payload packet\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + }
+ if (skb_is_gso(skb)) { + segments = skb_gso_segment(skb, 0); + if (IS_ERR(segments)) { + ret = PTR_ERR(segments); + net_err_ratelimited("%s: cannot segment packet: %d\n", + dev->name, ret); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + }
+ consume_skb(skb); + skb = segments; + }
+ /* from this moment on, "skb" might be a list */
+ __skb_queue_head_init(&skb_list); + skb_list_walk_safe(skb, curr, next) { + skb_mark_not_on_list(curr);
+ curr = skb_share_check(curr, GFP_ATOMIC); + if (unlikely(!curr)) { + net_err_ratelimited("%s: skb_share_check failed\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + continue; + }
+ __skb_queue_tail(&skb_list, curr); + } + skb_list.prev->next = NULL;
I belive, the peer lookup should be done here to call ovpn_send() with proper peer object and simplify it.
ACK
+ ovpn_send(ovpn, skb_list.next, NULL);
+ return NETDEV_TX_OK;
+drop: skb_tx_error(skb); - kfree_skb(skb); + kfree_skb_list(skb); return NET_XMIT_DROP; }
[...]
+/**
- ovpn_udp_send_skb - prepare skb and send it over via UDP
- @ovpn: the openvpn instance
- @peer: the destination peer
- @skb: the packet to send
- */
+void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, + struct sk_buff *skb) +{ + struct ovpn_bind *bind; + unsigned int pkt_len; + struct socket *sock; + int ret = -1;
+ skb->dev = ovpn->dev; + /* no checksum performed at this layer */ + skb->ip_summed = CHECKSUM_NONE;
+ /* get socket info */ + sock = peer->sock->sock; + if (unlikely(!sock)) { + net_warn_ratelimited("%s: no sock for remote peer\n", __func__);
If we do not have netdev_{err,warn,etc}_ratelimited() helper functions, can we at least emulate it like this:
net_warn_ratelimited("%s: no UDP sock for remote peer #%u\n", netdev_name(ovpn->dev), peer->id);
that's what I try to do, but some prints have escaped my axe. Will fix that, thanks!
or just use netdev_warn_once(...) since the condition looks more speculative than expected.
Peer id and interface name are more informative than just a function name.
Yeah, I use the function name in some debug messages, although not extremely useful.
Will make sure the iface name is always printed (there are similar occurrences like this)
+ goto out; + }
+ rcu_read_lock(); + /* get binding */ + bind = rcu_dereference(peer->bind); + if (unlikely(!bind)) { + net_warn_ratelimited("%s: no bind for remote peer\n", __func__);
Ditto
+ goto out_unlock; + }
+ /* crypto layer -> transport (UDP) */ + pkt_len = skb->len; + ret = ovpn_udp_output(ovpn, bind, &peer->dst_cache, sock->sk, skb);
+out_unlock: + rcu_read_unlock(); +out: + if (unlikely(ret < 0)) { + dev_core_stats_tx_dropped_inc(ovpn->dev); + kfree_skb(skb); + return; + }
+ dev_sw_netstats_tx_add(ovpn->dev, 1, pkt_len); +}
/** * ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn * @sock: socket to configure diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h index f2507f8f2c71ea9d5e5ac5446801e2d56f86700f..e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4 100644 --- a/drivers/net/ovpn/udp.h +++ b/drivers/net/ovpn/udp.h @@ -9,9 +9,17 @@ #ifndef _NET_OVPN_UDP_H_ #define _NET_OVPN_UDP_H_ +#include <linux/skbuff.h> +#include <net/sock.h>
+struct ovpn_peer; struct ovpn_struct; +struct sk_buff;
This declaration looks odd since we already have skbuff.h included above.
I believe originally there was no include, then I need to add that. Will double check,
Thanks a lot! Regards,
Another one forgotten question, sorry about this. Please find the question inlined.
On 29.10.2024 12:47, Antonio Quartulli wrote:
/* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) {
- struct ovpn_struct *ovpn = netdev_priv(dev);
- struct sk_buff *segments, *curr, *next;
- struct sk_buff_head skb_list;
- __be16 proto;
- int ret;
- /* reset netfilter state */
- nf_reset_ct(skb);
- /* verify IP header size in network packet */
- proto = ovpn_ip_check_protocol(skb);
- if (unlikely(!proto || skb->protocol != proto)) {
net_err_ratelimited("%s: dropping malformed payload packet\n",
dev->name);
dev_core_stats_tx_dropped_inc(ovpn->dev);
goto drop;
- }
The above check implies that kernel can feed a network device with skb->protocol value mismatches actual skb content. Can you share any example of such case?
If you just want to be sure that the user packet is either IPv4 or IPv6 then it can be done like this and without error messages:
/* Support only IPv4 or IPv6 traffic transporting */ if (unlikely(skb->protocol == ETH_P_IP || skb->protocol == ETH_P_IPV6)) goto drop;
- if (skb_is_gso(skb)) {
segments = skb_gso_segment(skb, 0);
if (IS_ERR(segments)) {
ret = PTR_ERR(segments);
net_err_ratelimited("%s: cannot segment packet: %d\n",
dev->name, ret);
dev_core_stats_tx_dropped_inc(ovpn->dev);
goto drop;
}
consume_skb(skb);
skb = segments;
- }
- /* from this moment on, "skb" might be a list */
- __skb_queue_head_init(&skb_list);
- skb_list_walk_safe(skb, curr, next) {
skb_mark_not_on_list(curr);
curr = skb_share_check(curr, GFP_ATOMIC);
if (unlikely(!curr)) {
net_err_ratelimited("%s: skb_share_check failed\n",
dev->name);
dev_core_stats_tx_dropped_inc(ovpn->dev);
continue;
}
__skb_queue_tail(&skb_list, curr);
- }
- skb_list.prev->next = NULL;
- ovpn_send(ovpn, skb_list.next, NULL);
- return NETDEV_TX_OK;
+drop: skb_tx_error(skb);
- kfree_skb(skb);
- kfree_skb_list(skb); return NET_XMIT_DROP; }
-- Sergey
On 11/11/2024 00:54, Sergey Ryazanov wrote:
Another one forgotten question, sorry about this. Please find the question inlined.
On 29.10.2024 12:47, Antonio Quartulli wrote:
/* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) { + struct ovpn_struct *ovpn = netdev_priv(dev); + struct sk_buff *segments, *curr, *next; + struct sk_buff_head skb_list; + __be16 proto; + int ret;
+ /* reset netfilter state */ + nf_reset_ct(skb);
+ /* verify IP header size in network packet */ + proto = ovpn_ip_check_protocol(skb); + if (unlikely(!proto || skb->protocol != proto)) { + net_err_ratelimited("%s: dropping malformed payload packet\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + }
The above check implies that kernel can feed a network device with skb-
protocol value mismatches actual skb content. Can you share any
example of such case?
If you just want to be sure that the user packet is either IPv4 or IPv6 then it can be done like this and without error messages:
/* Support only IPv4 or IPv6 traffic transporting */ if (unlikely(skb->protocol == ETH_P_IP || skb->protocol == ETH_P_IPV6)) goto drop;
It look good, but I will still increase the drop counter, because something entered the interface and we are trashing it.
Why not printing a message? The interface is not Ethernet based, so I think we should not expect anything else other than v4 or v6, no?
Thanks.
Regards,
On 15.11.2024 16:39, Antonio Quartulli wrote:
On 11/11/2024 00:54, Sergey Ryazanov wrote:
Another one forgotten question, sorry about this. Please find the question inlined.
On 29.10.2024 12:47, Antonio Quartulli wrote:
/* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) { + struct ovpn_struct *ovpn = netdev_priv(dev); + struct sk_buff *segments, *curr, *next; + struct sk_buff_head skb_list; + __be16 proto; + int ret;
+ /* reset netfilter state */ + nf_reset_ct(skb);
+ /* verify IP header size in network packet */ + proto = ovpn_ip_check_protocol(skb); + if (unlikely(!proto || skb->protocol != proto)) { + net_err_ratelimited("%s: dropping malformed payload packet\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + }
The above check implies that kernel can feed a network device with skb- >protocol value mismatches actual skb content. Can you share any example of such case?
If you just want to be sure that the user packet is either IPv4 or IPv6 then it can be done like this and without error messages:
/* Support only IPv4 or IPv6 traffic transporting */ if (unlikely(skb->protocol == ETH_P_IP || skb->protocol == ETH_P_IPV6)) goto drop;
It look good, but I will still increase the drop counter, because something entered the interface and we are trashing it.
Sure. I just shared a minimalistic example and don't mind if the case will be counted. Just a small hint, the counter can be moved to the 'drop:' label below.
And sorry for misguiding, the '->protocol' field value has network endians, so constants should be wrapped in htons():
if (unlikely(skb->protocol == htons(ETH_P_IP) || skb->protocol == htons(ETH_P_IPV6))) goto drop;
Why not printing a message? The interface is not Ethernet based, so I think we should not expect anything else other than v4 or v6, no?
Non-Ethernet encapsulation doesn't give any guaranty that packets will be IPv4/IPv6 only. There are 65k possible 'protocols' and this is an interface function, which technically can be called with any protocol type.
With this given, nobody wants to flood the log with messages for every MPLS/LLDP/etc packet. Especially with messages saying that the packet is malformed and giving no clue, why the packet was considered wrong.
-- Sergey
On 21/11/2024 01:29, Sergey Ryazanov wrote:
On 15.11.2024 16:39, Antonio Quartulli wrote:
On 11/11/2024 00:54, Sergey Ryazanov wrote:
Another one forgotten question, sorry about this. Please find the question inlined.
On 29.10.2024 12:47, Antonio Quartulli wrote:
/* Send user data to the network */ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) { + struct ovpn_struct *ovpn = netdev_priv(dev); + struct sk_buff *segments, *curr, *next; + struct sk_buff_head skb_list; + __be16 proto; + int ret;
+ /* reset netfilter state */ + nf_reset_ct(skb);
+ /* verify IP header size in network packet */ + proto = ovpn_ip_check_protocol(skb); + if (unlikely(!proto || skb->protocol != proto)) { + net_err_ratelimited("%s: dropping malformed payload packet\n", + dev->name); + dev_core_stats_tx_dropped_inc(ovpn->dev); + goto drop; + }
The above check implies that kernel can feed a network device with skb- >protocol value mismatches actual skb content. Can you share any example of such case?
If you just want to be sure that the user packet is either IPv4 or IPv6 then it can be done like this and without error messages:
/* Support only IPv4 or IPv6 traffic transporting */ if (unlikely(skb->protocol == ETH_P_IP || skb->protocol == ETH_P_IPV6)) goto drop;
It look good, but I will still increase the drop counter, because something entered the interface and we are trashing it.
Sure. I just shared a minimalistic example and don't mind if the case will be counted. Just a small hint, the counter can be moved to the 'drop:' label below.
ok, will double check. thanks
And sorry for misguiding, the '->protocol' field value has network endians, so constants should be wrapped in htons():
if (unlikely(skb->protocol == htons(ETH_P_IP) || skb->protocol == htons(ETH_P_IPV6)))
yap yap, already considered. thanks for pointing it out though.
goto drop;
Why not printing a message? The interface is not Ethernet based, so I think we should not expect anything else other than v4 or v6, no?
Non-Ethernet encapsulation doesn't give any guaranty that packets will be IPv4/IPv6 only. There are 65k possible 'protocols' and this is an interface function, which technically can be called with any protocol type.
With this given, nobody wants to flood the log with messages for every MPLS/LLDP/etc packet. Especially with messages saying that the packet is malformed and giving no clue, why the packet was considered wrong.
Ok, I see. I am dropping the message then.
Regards,
-- Sergey
2024-10-29, 11:47:21 +0100, Antonio Quartulli wrote:
+static int ovpn_udp4_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
struct dst_cache *cache, struct sock *sk,
struct sk_buff *skb)
+{
[...]
- if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr,
RT_SCOPE_HOST))) {
/* we may end up here when the cached address is not usable
* anymore. In this case we reset address/cache and perform a
* new look up
*/
fl.saddr = 0;
bind->local.ipv4.s_addr = 0;
Here we're updating bind->local without holding peer->lock, that's inconsistent with ovpn_peer_update_local_endpoint.
+static int ovpn_udp6_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
struct dst_cache *cache, struct sock *sk,
struct sk_buff *skb)
+{
[...]
- if (unlikely(!ipv6_chk_addr(sock_net(sk), &fl.saddr, NULL, 0))) {
/* we may end up here when the cached address is not usable
* anymore. In this case we reset address/cache and perform a
* new look up
*/
fl.saddr = in6addr_any;
bind->local.ipv6 = in6addr_any;
And here as well.
On 20/11/2024 12:45, Sabrina Dubroca wrote:
2024-10-29, 11:47:21 +0100, Antonio Quartulli wrote:
+static int ovpn_udp4_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
struct dst_cache *cache, struct sock *sk,
struct sk_buff *skb)
+{
[...]
- if (unlikely(!inet_confirm_addr(sock_net(sk), NULL, 0, fl.saddr,
RT_SCOPE_HOST))) {
/* we may end up here when the cached address is not usable
* anymore. In this case we reset address/cache and perform a
* new look up
*/
fl.saddr = 0;
bind->local.ipv4.s_addr = 0;
Here we're updating bind->local without holding peer->lock, that's inconsistent with ovpn_peer_update_local_endpoint.
ACK
+static int ovpn_udp6_output(struct ovpn_struct *ovpn, struct ovpn_bind *bind,
struct dst_cache *cache, struct sock *sk,
struct sk_buff *skb)
+{
[...]
- if (unlikely(!ipv6_chk_addr(sock_net(sk), &fl.saddr, NULL, 0))) {
/* we may end up here when the cached address is not usable
* anymore. In this case we reset address/cache and perform a
* new look up
*/
fl.saddr = in6addr_any;
bind->local.ipv6 = in6addr_any;
And here as well.
ACK
Will fix both. Thank you.
Regards,
Packets received over the socket are forwarded to the user device.
Implementation is UDP only. TCP will be added by a later patch.
Note: no decryption/decapsulation exists yet, packets are forwarded as they arrive without much processing.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/io.c | 66 ++++++++++++++++++++++++++- drivers/net/ovpn/io.h | 2 + drivers/net/ovpn/main.c | 13 +++++- drivers/net/ovpn/ovpnstruct.h | 3 ++ drivers/net/ovpn/proto.h | 75 ++++++++++++++++++++++++++++++ drivers/net/ovpn/socket.c | 24 ++++++++++ drivers/net/ovpn/udp.c | 104 +++++++++++++++++++++++++++++++++++++++++- drivers/net/ovpn/udp.h | 3 +- 8 files changed, 286 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -9,15 +9,79 @@
#include <linux/netdevice.h> #include <linux/skbuff.h> +#include <net/gro_cells.h> #include <net/gso.h>
-#include "io.h" #include "ovpnstruct.h" #include "peer.h" +#include "io.h" +#include "netlink.h" +#include "proto.h" #include "udp.h" #include "skb.h" #include "socket.h"
+/* Called after decrypt to write the IP packet to the device. + * This method is expected to manage/free the skb. + */ +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{ + unsigned int pkt_len; + + /* we can't guarantee the packet wasn't corrupted before entering the + * VPN, therefore we give other layers a chance to check that + */ + skb->ip_summed = CHECKSUM_NONE; + + /* skb hash for transport packet no longer valid after decapsulation */ + skb_clear_hash(skb); + + /* post-decrypt scrub -- prepare to inject encapsulated packet onto the + * interface, based on __skb_tunnel_rx() in dst.h + */ + skb->dev = peer->ovpn->dev; + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, true); + + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb); + + memset(skb->cb, 0, sizeof(skb->cb)); + + /* cause packet to be "received" by the interface */ + pkt_len = skb->len; + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, + skb) == NET_RX_SUCCESS)) + /* update RX stats with the size of decrypted packet */ + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); +} + +static void ovpn_decrypt_post(struct sk_buff *skb, int ret) +{ + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; + + if (unlikely(ret < 0)) + goto drop; + + ovpn_netdev_write(peer, skb); + /* skb is passed to upper layer - don't free it */ + skb = NULL; +drop: + if (unlikely(skb)) + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); + ovpn_peer_put(peer); + kfree_skb(skb); +} + +/* pick next packet from RX queue, decrypt and forward it to the device */ +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb) +{ + ovpn_skb_cb(skb)->peer = peer; + ovpn_decrypt_post(skb, 0); +} + static void ovpn_encrypt_post(struct sk_buff *skb, int ret) { struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h index aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f 100644 --- a/drivers/net/ovpn/io.h +++ b/drivers/net/ovpn/io.h @@ -12,4 +12,6 @@
netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
+void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb); + #endif /* _NET_OVPN_OVPN_H_ */ diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -11,6 +11,7 @@ #include <linux/module.h> #include <linux/netdevice.h> #include <linux/inetdevice.h> +#include <net/gro_cells.h> #include <net/ip.h> #include <net/rtnetlink.h> #include <uapi/linux/if_arp.h> @@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net)
static int ovpn_net_init(struct net_device *dev) { - return 0; + struct ovpn_struct *ovpn = netdev_priv(dev); + + return gro_cells_init(&ovpn->gro_cells, dev); +} + +static void ovpn_net_uninit(struct net_device *dev) +{ + struct ovpn_struct *ovpn = netdev_priv(dev); + + gro_cells_destroy(&ovpn->gro_cells); }
static int ovpn_net_open(struct net_device *dev) @@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev)
static const struct net_device_ops ovpn_netdev_ops = { .ndo_init = ovpn_net_init, + .ndo_uninit = ovpn_net_uninit, .ndo_open = ovpn_net_open, .ndo_stop = ovpn_net_stop, .ndo_start_xmit = ovpn_net_xmit, diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index a22c5083381c131db01a28c0f51e661d690d4998..4a48fc048890ab1cda78bc104fe3034b4a49d226 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -10,6 +10,7 @@ #ifndef _NET_OVPN_OVPNSTRUCT_H_ #define _NET_OVPN_OVPNSTRUCT_H_
+#include <net/gro_cells.h> #include <net/net_trackers.h> #include <uapi/linux/if_link.h> #include <uapi/linux/ovpn.h> @@ -23,6 +24,7 @@ * @lock: protect this object * @peer: in P2P mode, this is the only remote peer * @dev_list: entry for the module wide device list + * @gro_cells: pointer to the Generic Receive Offload cell */ struct ovpn_struct { struct net_device *dev; @@ -32,6 +34,7 @@ struct ovpn_struct { spinlock_t lock; /* protect writing to the ovpn_struct object */ struct ovpn_peer __rcu *peer; struct list_head dev_list; + struct gro_cells gro_cells; };
#endif /* _NET_OVPN_OVPNSTRUCT_H_ */ diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h new file mode 100644 index 0000000000000000000000000000000000000000..69604cf26bbf82539ee5cd5a7ac9c23920f555de --- /dev/null +++ b/drivers/net/ovpn/proto.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + * James Yonan james@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNPROTO_H_ +#define _NET_OVPN_OVPNPROTO_H_ + +#include "main.h" + +#include <linux/skbuff.h> + +/* Methods for operating on the initial command + * byte of the OpenVPN protocol. + */ + +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in + * one byte + */ +#define OVPN_KEY_ID_MASK 0x07 +#define OVPN_OPCODE_SHIFT 3 +#define OVPN_OPCODE_MASK 0x1F +/* upper bounds on opcode and key ID */ +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) +/* packet opcodes of interest to us */ +#define OVPN_DATA_V1 6 /* data channel V1 packet */ +#define OVPN_DATA_V2 9 /* data channel V2 packet */ +/* size of initial packet opcode */ +#define OVPN_OP_SIZE_V1 1 +#define OVPN_OP_SIZE_V2 4 +#define OVPN_PEER_ID_MASK 0x00FFFFFF +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF +/* first byte of keepalive message */ +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a +/* first byte of exit message */ +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28 + +/** + * ovpn_opcode_from_skb - extract OP code from skb at specified offset + * @skb: the packet to extract the OP code from + * @offset: the offset in the data buffer where the OP code is located + * + * Note: this function assumes that the skb head was pulled enough + * to access the first byte. + * + * Return: the OP code + */ +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset) +{ + u8 byte = *(skb->data + offset); + + return byte >> OVPN_OPCODE_SHIFT; +} + +/** + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset + * @skb: the packet to extract the OP code from + * @offset: the offset in the data buffer where the OP code is located + * + * Note: this function assumes that the skb head was pulled enough + * to access the first 4 bytes. + * + * Return: the peer ID. + */ +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset) +{ + return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; +} + +#endif /* _NET_OVPN_OVPNPROTO_H_ */ diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) if (!sock) return;
+ if (sock->sk->sk_protocol == IPPROTO_UDP) + ovpn_udp_socket_detach(sock); + sockfd_put(sock); }
@@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) return ret; }
+/* Retrieve the corresponding ovpn object from a UDP socket + * rcu_read_lock must be held on entry + */ +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock; + + if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL; + + ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL; + + /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL; + + return ovpn_sock->ovpn; +} + /** * ovpn_socket_new - create a new socket and initialize it * @sock: the kernel socket to embed diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -21,9 +21,95 @@ #include "bind.h" #include "io.h" #include "peer.h" +#include "proto.h" #include "socket.h" #include "udp.h"
+/** + * ovpn_udp_encap_recv - Start processing a received UDP packet. + * @sk: socket over which the packet was received + * @skb: the received packet + * + * If the first byte of the payload is DATA_V2, the packet is further processed, + * otherwise it is forwarded to the UDP stack for delivery to user space. + * + * Return: + * 0 if skb was consumed or dropped + * >0 if skb should be passed up to userspace as UDP (packet not consumed) + * <0 if skb should be resubmitted as proto -N (packet not consumed) + */ +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{ + struct ovpn_peer *peer = NULL; + struct ovpn_struct *ovpn; + u32 peer_id; + u8 opcode; + + ovpn = ovpn_from_udp_sock(sk); + if (unlikely(!ovpn)) { + net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n", + __func__); + goto drop_noovpn; + } + + /* Make sure the first 4 bytes of the skb data buffer after the UDP + * header are accessible. + * They are required to fetch the OP code, the key ID and the peer ID. + */ + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + + OVPN_OP_SIZE_V2))) { + net_dbg_ratelimited("%s: packet too small\n", __func__); + goto drop; + } + + opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); + if (unlikely(opcode != OVPN_DATA_V2)) { + /* DATA_V1 is not supported */ + if (opcode == OVPN_DATA_V1) + goto drop; + + /* unknown or control packet: let it bubble up to userspace */ + return 1; + } + + peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); + /* some OpenVPN server implementations send data packets with the + * peer-id set to undef. In this case we skip the peer lookup by peer-id + * and we try with the transport address + */ + if (peer_id != OVPN_PEER_ID_UNDEF) { + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", + __func__, peer_id); + goto drop; + } + } + + if (!peer) { + /* data packet with undef peer-id */ + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", + __func__); + goto drop; + } + } + + /* pop off outer UDP header */ + __skb_pull(skb, sizeof(struct udphdr)); + ovpn_recv(peer, skb); + return 0; + +drop: + if (peer) + ovpn_peer_put(peer); + dev_core_stats_rx_dropped_inc(ovpn->dev); +drop_noovpn: + kfree_skb(skb); + return 0; +} + /** * ovpn_udp4_output - send IPv4 packet over udp socket * @ovpn: the openvpn instance @@ -259,8 +345,12 @@ void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, */ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) { + struct udp_tunnel_sock_cfg cfg = { + .encap_type = UDP_ENCAP_OVPNINUDP, + .encap_rcv = ovpn_udp_encap_recv, + }; struct ovpn_socket *old_data; - int ret = 0; + int ret;
/* sanity check */ if (sock->sk->sk_protocol != IPPROTO_UDP) { @@ -274,6 +364,7 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) if (!old_data) { /* socket is currently unused - we can take it */ rcu_read_unlock(); + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); return 0; }
@@ -302,3 +393,14 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
return ret; } + +/** + * ovpn_udp_socket_detach - clean udp-tunnel status for this socket + * @sock: the socket to clean + */ +void ovpn_udp_socket_detach(struct socket *sock) +{ + struct udp_tunnel_sock_cfg cfg = { }; + + setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg); +} diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h index e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4..fecb68464896bc1228315faf268453f9005e693d 100644 --- a/drivers/net/ovpn/udp.h +++ b/drivers/net/ovpn/udp.h @@ -18,8 +18,9 @@ struct sk_buff; struct socket;
int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn); - +void ovpn_udp_socket_detach(struct socket *sock); void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, struct sk_buff *skb); +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk);
#endif /* _NET_OVPN_UDP_H_ */
2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote:
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{
[...]
- opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
- if (unlikely(opcode != OVPN_DATA_V2)) {
/* DATA_V1 is not supported */
if (opcode == OVPN_DATA_V1)
The TCP encap code passes everything that's not V2 to userspace. Why not do that with UDP as well?
goto drop;
/* unknown or control packet: let it bubble up to userspace */
return 1;
- }
- peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
- /* some OpenVPN server implementations send data packets with the
* peer-id set to undef. In this case we skip the peer lookup by peer-id
* and we try with the transport address
*/
- if (peer_id != OVPN_PEER_ID_UNDEF) {
peer = ovpn_peer_get_by_id(ovpn, peer_id);
if (!peer) {
net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
__func__, peer_id);
goto drop;
}
- }
- if (!peer) {
nit: that could be an "else" combined with the previous case?
/* data packet with undef peer-id */
peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
if (unlikely(!peer)) {
net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n",
__func__);
goto drop;
}
- }
On 31/10/2024 12:29, Sabrina Dubroca wrote:
2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote:
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{
[...]
- opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
- if (unlikely(opcode != OVPN_DATA_V2)) {
/* DATA_V1 is not supported */
if (opcode == OVPN_DATA_V1)
The TCP encap code passes everything that's not V2 to userspace. Why not do that with UDP as well?
If that's the case, then this is a bug in the TCP code.
DATA_Vx packets are part of the data channel and userspace can't do anything with them (userspace handles the control channel only when the ovpn module is in use).
I'll go check the TCP code then, because sending DATA_V1 to userspace is not expected. Thanks for noticing this discrepancy.
goto drop;
/* unknown or control packet: let it bubble up to userspace */
return 1;
- }
- peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
- /* some OpenVPN server implementations send data packets with the
* peer-id set to undef. In this case we skip the peer lookup by peer-id
* and we try with the transport address
*/
- if (peer_id != OVPN_PEER_ID_UNDEF) {
peer = ovpn_peer_get_by_id(ovpn, peer_id);
if (!peer) {
net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
__func__, peer_id);
goto drop;
}
- }
- if (!peer) {
nit: that could be an "else" combined with the previous case?
mhh that's true. Then I can combine the two "if (!peer)" in one block only.
Thanks! Regards,
/* data packet with undef peer-id */
peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
if (unlikely(!peer)) {
net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n",
__func__);
goto drop;
}
- }
On 29.10.2024 12:47, Antonio Quartulli wrote:
Packets received over the socket are forwarded to the user device.
Implementation is UDP only. TCP will be added by a later patch.
Note: no decryption/decapsulation exists yet, packets are forwarded as they arrive without much processing.
Signed-off-by: Antonio Quartulli antonio@openvpn.net
drivers/net/ovpn/io.c | 66 ++++++++++++++++++++++++++- drivers/net/ovpn/io.h | 2 + drivers/net/ovpn/main.c | 13 +++++- drivers/net/ovpn/ovpnstruct.h | 3 ++ drivers/net/ovpn/proto.h | 75 ++++++++++++++++++++++++++++++ drivers/net/ovpn/socket.c | 24 ++++++++++ drivers/net/ovpn/udp.c | 104 +++++++++++++++++++++++++++++++++++++++++- drivers/net/ovpn/udp.h | 3 +- 8 files changed, 286 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -9,15 +9,79 @@ #include <linux/netdevice.h> #include <linux/skbuff.h> +#include <net/gro_cells.h> #include <net/gso.h> -#include "io.h" #include "ovpnstruct.h" #include "peer.h" +#include "io.h" +#include "netlink.h" +#include "proto.h" #include "udp.h" #include "skb.h" #include "socket.h" +/* Called after decrypt to write the IP packet to the device.
- This method is expected to manage/free the skb.
- */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{
- unsigned int pkt_len;
- /* we can't guarantee the packet wasn't corrupted before entering the
* VPN, therefore we give other layers a chance to check that
*/
- skb->ip_summed = CHECKSUM_NONE;
- /* skb hash for transport packet no longer valid after decapsulation */
- skb_clear_hash(skb);
- /* post-decrypt scrub -- prepare to inject encapsulated packet onto the
* interface, based on __skb_tunnel_rx() in dst.h
*/
- skb->dev = peer->ovpn->dev;
- skb_set_queue_mapping(skb, 0);
- skb_scrub_packet(skb, true);
The skb->protocol field is going to be updated in the upcoming patch in the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, why do not touch the protocol field here?
- skb_reset_network_header(skb);
ovpn_decrypt_post() already reseted the network header. Why do we need it here again?
- skb_reset_transport_header(skb);
- skb_probe_transport_header(skb);
- skb_reset_inner_headers(skb);
- memset(skb->cb, 0, sizeof(skb->cb));
Why do we need to zero the control buffer here?
- /* cause packet to be "received" by the interface */
- pkt_len = skb->len;
- if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
skb) == NET_RX_SUCCESS))
/* update RX stats with the size of decrypted packet */
dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
+}
+static void ovpn_decrypt_post(struct sk_buff *skb, int ret) +{
- struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
- if (unlikely(ret < 0))
goto drop;
- ovpn_netdev_write(peer, skb);
- /* skb is passed to upper layer - don't free it */
- skb = NULL;
+drop:
- if (unlikely(skb))
dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
- ovpn_peer_put(peer);
- kfree_skb(skb);
+}
+/* pick next packet from RX queue, decrypt and forward it to the device */
The function now receives packets from externel callers. Should we update the above comment?
+void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb) +{
- ovpn_skb_cb(skb)->peer = peer;
- ovpn_decrypt_post(skb, 0);
+}
- static void ovpn_encrypt_post(struct sk_buff *skb, int ret) { struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h index aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f 100644 --- a/drivers/net/ovpn/io.h +++ b/drivers/net/ovpn/io.h @@ -12,4 +12,6 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev); +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb);
- #endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -11,6 +11,7 @@ #include <linux/module.h> #include <linux/netdevice.h> #include <linux/inetdevice.h> +#include <net/gro_cells.h> #include <net/ip.h> #include <net/rtnetlink.h> #include <uapi/linux/if_arp.h> @@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net) static int ovpn_net_init(struct net_device *dev) {
- return 0;
- struct ovpn_struct *ovpn = netdev_priv(dev);
- return gro_cells_init(&ovpn->gro_cells, dev);
+}
+static void ovpn_net_uninit(struct net_device *dev) +{
- struct ovpn_struct *ovpn = netdev_priv(dev);
- gro_cells_destroy(&ovpn->gro_cells); }
static int ovpn_net_open(struct net_device *dev) @@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev) static const struct net_device_ops ovpn_netdev_ops = { .ndo_init = ovpn_net_init,
- .ndo_uninit = ovpn_net_uninit, .ndo_open = ovpn_net_open, .ndo_stop = ovpn_net_stop, .ndo_start_xmit = ovpn_net_xmit,
diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index a22c5083381c131db01a28c0f51e661d690d4998..4a48fc048890ab1cda78bc104fe3034b4a49d226 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -10,6 +10,7 @@ #ifndef _NET_OVPN_OVPNSTRUCT_H_ #define _NET_OVPN_OVPNSTRUCT_H_ +#include <net/gro_cells.h> #include <net/net_trackers.h> #include <uapi/linux/if_link.h> #include <uapi/linux/ovpn.h> @@ -23,6 +24,7 @@
- @lock: protect this object
- @peer: in P2P mode, this is the only remote peer
- @dev_list: entry for the module wide device list
*/ struct ovpn_struct { struct net_device *dev;
- @gro_cells: pointer to the Generic Receive Offload cell
@@ -32,6 +34,7 @@ struct ovpn_struct { spinlock_t lock; /* protect writing to the ovpn_struct object */ struct ovpn_peer __rcu *peer; struct list_head dev_list;
- struct gro_cells gro_cells; };
#endif /* _NET_OVPN_OVPNSTRUCT_H_ */ diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h new file mode 100644 index 0000000000000000000000000000000000000000..69604cf26bbf82539ee5cd5a7ac9c23920f555de --- /dev/null +++ b/drivers/net/ovpn/proto.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- Copyright (C) 2020-2024 OpenVPN, Inc.
- Author: Antonio Quartulli antonio@openvpn.net
James Yonan <james@openvpn.net>
- */
+#ifndef _NET_OVPN_OVPNPROTO_H_ +#define _NET_OVPN_OVPNPROTO_H_
+#include "main.h"
+#include <linux/skbuff.h>
+/* Methods for operating on the initial command
- byte of the OpenVPN protocol.
- */
+/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in
- one byte
- */
+#define OVPN_KEY_ID_MASK 0x07 +#define OVPN_OPCODE_SHIFT 3 +#define OVPN_OPCODE_MASK 0x1F
Instead of defining mask(s) and shift(s), we can define only masks and use bitfield API (see below).
+/* upper bounds on opcode and key ID */ +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) +/* packet opcodes of interest to us */ +#define OVPN_DATA_V1 6 /* data channel V1 packet */ +#define OVPN_DATA_V2 9 /* data channel V2 packet */ +/* size of initial packet opcode */ +#define OVPN_OP_SIZE_V1 1 +#define OVPN_OP_SIZE_V2 4 +#define OVPN_PEER_ID_MASK 0x00FFFFFF +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF +/* first byte of keepalive message */ +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a +/* first byte of exit message */ +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28
From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be removed.
+/**
- ovpn_opcode_from_skb - extract OP code from skb at specified offset
- @skb: the packet to extract the OP code from
- @offset: the offset in the data buffer where the OP code is located
- Note: this function assumes that the skb head was pulled enough
- to access the first byte.
- Return: the OP code
- */
+static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset) +{
- u8 byte = *(skb->data + offset);
- return byte >> OVPN_OPCODE_SHIFT;
For example here, the shift can be replaced with bitfield macro:
#define OVPN_OPCODE_PKTTYPE_MSK 0xf8000000 #define OVPN_OPCODE_KEYID_MSK 0x07000000 #define OVPN_OPCODE_PEERID_MSK 0x00ffffff
static inline u8 ovpn_opcode_from_skb(...) { u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset));
return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode); }
And the upcoming ovpn_opcode_compose() can be implemented like this:
static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id) { return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) | FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) | FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id); }
And with this size can be even embedded into ovpn_aead_encrypt() to make the header composing more clear.
+}
+/**
- ovpn_peer_id_from_skb - extract peer ID from skb at specified offset
- @skb: the packet to extract the OP code from
- @offset: the offset in the data buffer where the OP code is located
- Note: this function assumes that the skb head was pulled enough
- to access the first 4 bytes.
- Return: the peer ID.
- */
+static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset) +{
- return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK;
+}
+#endif /* _NET_OVPN_OVPNPROTO_H_ */ diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) if (!sock) return;
- if (sock->sk->sk_protocol == IPPROTO_UDP)
ovpn_udp_socket_detach(sock);
- sockfd_put(sock); }
@@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) return ret; } +/* Retrieve the corresponding ovpn object from a UDP socket
- rcu_read_lock must be held on entry
- */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{
- struct ovpn_socket *ovpn_sock;
- if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP))
return NULL;
- ovpn_sock = rcu_dereference_sk_user_data(sk);
- if (unlikely(!ovpn_sock))
return NULL;
- /* make sure that sk matches our stored transport socket */
- if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
return NULL;
- return ovpn_sock->ovpn;
Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption?
If the above is right then shall we set ->ovpn = NULL before scheduling the socket releasing work or somehow else mark the socket as half-destroyed?
+}
- /**
- ovpn_socket_new - create a new socket and initialize it
- @sock: the kernel socket to embed
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -21,9 +21,95 @@ #include "bind.h" #include "io.h" #include "peer.h" +#include "proto.h" #include "socket.h" #include "udp.h" +/**
- ovpn_udp_encap_recv - Start processing a received UDP packet.
- @sk: socket over which the packet was received
- @skb: the received packet
- If the first byte of the payload is DATA_V2, the packet is further processed,
- otherwise it is forwarded to the UDP stack for delivery to user space.
- Return:
- 0 if skb was consumed or dropped
0 if skb should be passed up to userspace as UDP (packet not consumed)
- <0 if skb should be resubmitted as proto -N (packet not consumed)
- */
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{
- struct ovpn_peer *peer = NULL;
- struct ovpn_struct *ovpn;
- u32 peer_id;
- u8 opcode;
- ovpn = ovpn_from_udp_sock(sk);
- if (unlikely(!ovpn)) {
net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n",
__func__);
Probably we should zero ovpn pointer in the ovpn_sock to survive scheduled socket release (see comment in ovpn_from_udp_sock). So, this print should be removed to avoid printing misguiding errors.
goto drop_noovpn;
- }
- /* Make sure the first 4 bytes of the skb data buffer after the UDP
* header are accessible.
* They are required to fetch the OP code, the key ID and the peer ID.
*/
- if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) +
OVPN_OP_SIZE_V2))) {
net_dbg_ratelimited("%s: packet too small\n", __func__);
goto drop;
- }
- opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
- if (unlikely(opcode != OVPN_DATA_V2)) {
/* DATA_V1 is not supported */
if (opcode == OVPN_DATA_V1)
goto drop;
This packet dropping makes protocol accelerator, intendent to speed up the data packets processing, a protocol enforcement entity, isn't it? Shall we follow the principle of beeing liberal in what we accept and just forward everything besides data packets upstream to a userspace application?
/* unknown or control packet: let it bubble up to userspace */
return 1;
- }
- peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
- /* some OpenVPN server implementations send data packets with the
* peer-id set to undef. In this case we skip the peer lookup by peer-id
* and we try with the transport address
*/
- if (peer_id != OVPN_PEER_ID_UNDEF) {
peer = ovpn_peer_get_by_id(ovpn, peer_id);
if (!peer) {
net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
__func__, peer_id);
Why do we consider a peer sending us garbage our problem? Meaning, this peer miss can be not our fault but a malformed packet from a 3rd party side. E.g. nowdays I can see a lot of traces of these "active probers" in my OpenVPN logs. Shall remove this message or at least make it debug to avoid bothering users with garbage traveling Internet? Anyway we can not do anything regarding incoming traffic.
goto drop;
}
- }
- if (!peer) {
AFAIU, this condition can true only in case of peer_id beeing equal to OVPN_PEER_ID_UNDEF, right? In this case the condition check can be replaced by simple 'else' statement.
And to make code more corresponding to the above comment regarding implementations that send undefined peer-id, can we swap sides of the lookup method selection? E.g.
/* Comment about fancy implementations sending undefined peer-id */ if (peer_id == OVPN_PEER_ID_UNDEF) { /* Do transport address based loockup */ } else { /* Do peer-id based loockup */ }
/* data packet with undef peer-id */
peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
if (unlikely(!peer)) {
net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n",
__func__);
goto drop;
}
- }
- /* pop off outer UDP header */
- __skb_pull(skb, sizeof(struct udphdr));
- ovpn_recv(peer, skb);
- return 0;
+drop:
- if (peer)
ovpn_peer_put(peer);
AFAIU, the peer is alway NULL here. Shall we remove the above check?
- dev_core_stats_rx_dropped_inc(ovpn->dev);
+drop_noovpn:
- kfree_skb(skb);
- return 0;
+}
- /**
- ovpn_udp4_output - send IPv4 packet over udp socket
- @ovpn: the openvpn instance
@@ -259,8 +345,12 @@ void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, */ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) {
- struct udp_tunnel_sock_cfg cfg = {
.encap_type = UDP_ENCAP_OVPNINUDP,
.encap_rcv = ovpn_udp_encap_recv,
- }; struct ovpn_socket *old_data;
- int ret = 0;
- int ret;
/* sanity check */ if (sock->sk->sk_protocol != IPPROTO_UDP) { @@ -274,6 +364,7 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) if (!old_data) { /* socket is currently unused - we can take it */ rcu_read_unlock();
return 0; }setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
@@ -302,3 +393,14 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn) return ret; }
+/**
- ovpn_udp_socket_detach - clean udp-tunnel status for this socket
- @sock: the socket to clean
- */
+void ovpn_udp_socket_detach(struct socket *sock) +{
- struct udp_tunnel_sock_cfg cfg = { };
- setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
+} diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h index e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4..fecb68464896bc1228315faf268453f9005e693d 100644 --- a/drivers/net/ovpn/udp.h +++ b/drivers/net/ovpn/udp.h @@ -18,8 +18,9 @@ struct sk_buff; struct socket; int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn);
+void ovpn_udp_socket_detach(struct socket *sock); void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer, struct sk_buff *skb); +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk); #endif /* _NET_OVPN_UDP_H_ */
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...]
+/* Called after decrypt to write the IP packet to the device.
- This method is expected to manage/free the skb.
- */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{ + unsigned int pkt_len;
+ /* we can't guarantee the packet wasn't corrupted before entering the + * VPN, therefore we give other layers a chance to check that + */ + skb->ip_summed = CHECKSUM_NONE;
+ /* skb hash for transport packet no longer valid after decapsulation */ + skb_clear_hash(skb);
+ /* post-decrypt scrub -- prepare to inject encapsulated packet onto the + * interface, based on __skb_tunnel_rx() in dst.h + */ + skb->dev = peer->ovpn->dev; + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, true);
The skb->protocol field is going to be updated in the upcoming patch in the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, why do not touch the protocol field here?
Well, I would personally not document missing details in a partly implemented code path.
+ skb_reset_network_header(skb);
ovpn_decrypt_post() already reseted the network header. Why do we need it here again?
yeah, I think this can be removed.
+ skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb);
+ memset(skb->cb, 0, sizeof(skb->cb));
Why do we need to zero the control buffer here?
To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well.
I think this was recommended by Sabrina as well.
+ /* cause packet to be "received" by the interface */ + pkt_len = skb->len; + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, + skb) == NET_RX_SUCCESS)) + /* update RX stats with the size of decrypted packet */ + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); +}
+static void ovpn_decrypt_post(struct sk_buff *skb, int ret) +{ + struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
+ if (unlikely(ret < 0)) + goto drop;
+ ovpn_netdev_write(peer, skb); + /* skb is passed to upper layer - don't free it */ + skb = NULL; +drop: + if (unlikely(skb)) + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); + ovpn_peer_put(peer); + kfree_skb(skb); +}
+/* pick next packet from RX queue, decrypt and forward it to the device */
The function now receives packets from externel callers. Should we update the above comment?
yap will do.
[...]
--- /dev/null +++ b/drivers/net/ovpn/proto.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload
- * Copyright (C) 2020-2024 OpenVPN, Inc.
- * Author: Antonio Quartulli antonio@openvpn.net
- * James Yonan james@openvpn.net
- */
+#ifndef _NET_OVPN_OVPNPROTO_H_ +#define _NET_OVPN_OVPNPROTO_H_
+#include "main.h"
+#include <linux/skbuff.h>
+/* Methods for operating on the initial command
- byte of the OpenVPN protocol.
- */
+/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in
- one byte
- */
+#define OVPN_KEY_ID_MASK 0x07 +#define OVPN_OPCODE_SHIFT 3 +#define OVPN_OPCODE_MASK 0x1F
Instead of defining mask(s) and shift(s), we can define only masks and use bitfield API (see below).
+/* upper bounds on opcode and key ID */ +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1) +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1) +/* packet opcodes of interest to us */ +#define OVPN_DATA_V1 6 /* data channel V1 packet */ +#define OVPN_DATA_V2 9 /* data channel V2 packet */ +/* size of initial packet opcode */ +#define OVPN_OP_SIZE_V1 1 +#define OVPN_OP_SIZE_V2 4 +#define OVPN_PEER_ID_MASK 0x00FFFFFF +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF +/* first byte of keepalive message */ +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a +/* first byte of exit message */ +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28
From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be removed.
ACK
+/**
- ovpn_opcode_from_skb - extract OP code from skb at specified offset
- @skb: the packet to extract the OP code from
- @offset: the offset in the data buffer where the OP code is located
- Note: this function assumes that the skb head was pulled enough
- to access the first byte.
- Return: the OP code
- */
+static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset) +{ + u8 byte = *(skb->data + offset);
+ return byte >> OVPN_OPCODE_SHIFT;
For example here, the shift can be replaced with bitfield macro:
#define OVPN_OPCODE_PKTTYPE_MSK 0xf8000000 #define OVPN_OPCODE_KEYID_MSK 0x07000000 #define OVPN_OPCODE_PEERID_MSK 0x00ffffff
static inline u8 ovpn_opcode_from_skb(...) { u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset));
return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode); }
And the upcoming ovpn_opcode_compose() can be implemented like this:
static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id) { return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) | FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) | FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id); }
And with this size can be even embedded into ovpn_aead_encrypt() to make the header composing more clear.
I wasn't aware of the bitfield API.
Yeah, it looks cleaner and gives a better definition of the first 4 bytes of the header.
There is also GENMASK() that helps with creating MASKs instead of hardcofing the bits in hex.
Will give it a try, thanks!
+}
+/**
- ovpn_peer_id_from_skb - extract peer ID from skb at specified offset
- @skb: the packet to extract the OP code from
- @offset: the offset in the data buffer where the OP code is located
- Note: this function assumes that the skb head was pulled enough
- to access the first 4 bytes.
- Return: the peer ID.
- */
+static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset) +{ + return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; +}
+#endif /* _NET_OVPN_OVPNPROTO_H_ */ diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) if (!sock) return; + if (sock->sk->sk_protocol == IPPROTO_UDP) + ovpn_udp_socket_detach(sock);
sockfd_put(sock); } @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) return ret; } +/* Retrieve the corresponding ovpn object from a UDP socket
- rcu_read_lock must be held on entry
- */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock;
+ if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL;
+ ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL;
+ /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL;
+ return ovpn_sock->ovpn;
Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption?
Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0?
If the above is right then shall we set ->ovpn = NULL before scheduling the socket releasing work or somehow else mark the socket as half- destroyed?
+}
/** * ovpn_socket_new - create a new socket and initialize it * @sock: the kernel socket to embed diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -21,9 +21,95 @@ #include "bind.h" #include "io.h" #include "peer.h" +#include "proto.h" #include "socket.h" #include "udp.h" +/**
- ovpn_udp_encap_recv - Start processing a received UDP packet.
- @sk: socket over which the packet was received
- @skb: the received packet
- If the first byte of the payload is DATA_V2, the packet is further
processed,
- otherwise it is forwarded to the UDP stack for delivery to user
space.
- Return:
- * 0 if skb was consumed or dropped
0 if skb should be passed up to userspace as UDP (packet notconsumed)
- <0 if skb should be resubmitted as proto -N (packet not consumed)
- */
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{ + struct ovpn_peer *peer = NULL; + struct ovpn_struct *ovpn; + u32 peer_id; + u8 opcode;
+ ovpn = ovpn_from_udp_sock(sk); + if (unlikely(!ovpn)) { + net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n", + __func__);
Probably we should zero ovpn pointer in the ovpn_sock to survive scheduled socket release (see comment in ovpn_from_udp_sock). So, this print should be removed to avoid printing misguiding errors.
I am also not following this. ovpn is already NULL if we are entering this branch, no?
And I think this condition is quite improbable as well.
+ goto drop_noovpn; + }
+ /* Make sure the first 4 bytes of the skb data buffer after the UDP + * header are accessible. + * They are required to fetch the OP code, the key ID and the peer ID. + */ + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + + OVPN_OP_SIZE_V2))) { + net_dbg_ratelimited("%s: packet too small\n", __func__); + goto drop; + }
+ opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); + if (unlikely(opcode != OVPN_DATA_V2)) { + /* DATA_V1 is not supported */ + if (opcode == OVPN_DATA_V1) + goto drop;
This packet dropping makes protocol accelerator, intendent to speed up the data packets processing, a protocol enforcement entity, isn't it? Shall we follow the principle of beeing liberal in what we accept and just forward everything besides data packets upstream to a userspace application?
'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto expect any DATA packet to bubble up as it would not know what to do with it.
So any decision regarding data packets should stay in 'ovpn'.
We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used nowadays).
Moreover, it's quite impossible that a peer will send us DATA_V1 if it passed userspace handshake and negotiation.
+ /* unknown or control packet: let it bubble up to userspace */ + return 1; + }
+ peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); + /* some OpenVPN server implementations send data packets with the + * peer-id set to undef. In this case we skip the peer lookup by peer-id + * and we try with the transport address + */ + if (peer_id != OVPN_PEER_ID_UNDEF) { + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", + __func__, peer_id);
Why do we consider a peer sending us garbage our problem? Meaning, this peer miss can be not our fault but a malformed packet from a 3rd party side. E.g. nowdays I can see a lot of traces of these "active probers" in my OpenVPN logs. Shall remove this message or at least make it debug to avoid bothering users with garbage traveling Internet? Anyway we can not do anything regarding incoming traffic.
It could also be a peer that believes to be connected while 'ovpn' dropped it earlier on. So this message would help the admin/user understanding what's going on. no?
Maybe make it an info/notice instead of error?
+ goto drop; + } + }
+ if (!peer) {
AFAIU, this condition can true only in case of peer_id beeing equal to OVPN_PEER_ID_UNDEF, right? In this case the condition check can be replaced by simple 'else' statement.
This part was actually rewritten already, so better wait for v12 before further discussing.
And to make code more corresponding to the above comment regarding implementations that send undefined peer-id, can we swap sides of the lookup method selection? E.g.
/* Comment about fancy implementations sending undefined peer-id */ if (peer_id == OVPN_PEER_ID_UNDEF) { /* Do transport address based loockup */ } else { /* Do peer-id based loockup */ }
+ /* data packet with undef peer-id */ + peer = ovpn_peer_get_by_transp_addr(ovpn, skb); + if (unlikely(!peer)) { + net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n", + __func__); + goto drop; + } + }
+ /* pop off outer UDP header */ + __skb_pull(skb, sizeof(struct udphdr)); + ovpn_recv(peer, skb); + return 0;
+drop: + if (peer) + ovpn_peer_put(peer);
AFAIU, the peer is alway NULL here. Shall we remove the above check?
yeah simplified as well already.
Thanks!
Regards,
On 15.11.2024 17:02, Antonio Quartulli wrote:
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...]
+/* Called after decrypt to write the IP packet to the device.
- This method is expected to manage/free the skb.
- */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{ + unsigned int pkt_len;
+ /* we can't guarantee the packet wasn't corrupted before entering the + * VPN, therefore we give other layers a chance to check that + */ + skb->ip_summed = CHECKSUM_NONE;
+ /* skb hash for transport packet no longer valid after decapsulation */ + skb_clear_hash(skb);
+ /* post-decrypt scrub -- prepare to inject encapsulated packet onto the + * interface, based on __skb_tunnel_rx() in dst.h + */ + skb->dev = peer->ovpn->dev; + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, true);
The skb->protocol field is going to be updated in the upcoming patch in the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, why do not touch the protocol field here?
Well, I would personally not document missing details in a partly implemented code path.
Looks like the question wasn't precisely emphrased. By bad. Let me elaborate it in more details: 1. usually skb->protocol is updated just before a packet leaves a module 2. I've not found it were it was expected 3. skb->protocol is updated in the caller function - ovpn_decrypt_post(), along with the skb_reset_network_header() call.
The question is, shall we put some comment here in the ovpn_netdev_write() function elaborating that this was done in the caller? Or is such comment odd?
+ skb_reset_network_header(skb);
ovpn_decrypt_post() already reseted the network header. Why do we need it here again?
yeah, I think this can be removed.
+ skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb);
+ memset(skb->cb, 0, sizeof(skb->cb));
Why do we need to zero the control buffer here?
To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well.
AFAIR, there is no convention to clean the control buffer before the handing over. The common practice is a bit opposite, programmer shall not assume that the control buffer has been zeroed.
Not a big deal to clean it here, we just can save some CPU cycles avoiding it.
I think this was recommended by Sabrina as well.
Curious. It's macsec that does not zero it, or I've not understood how it was done.
+ /* cause packet to be "received" by the interface */ + pkt_len = skb->len; + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, + skb) == NET_RX_SUCCESS)) + /* update RX stats with the size of decrypted packet */ + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); +}
[...]
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) if (!sock) return; + if (sock->sk->sk_protocol == IPPROTO_UDP) + ovpn_udp_socket_detach(sock);
sockfd_put(sock); } @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) return ret; } +/* Retrieve the corresponding ovpn object from a UDP socket
- rcu_read_lock must be held on entry
- */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock;
+ if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL;
+ ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL;
+ /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL;
+ return ovpn_sock->ovpn;
Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption?
Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0?
Yeah, you are right. The question sounds like a riddle. I should eventually stop composing emails at midnight. Let me paraphrase it.
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions: 1. flush the workqueue somewhere before the netdev release 2. set ovpn_sock->ovpn = NULL before scheduling the socket detach
If the above is right then shall we set ->ovpn = NULL before scheduling the socket releasing work or somehow else mark the socket as half- destroyed?
+}
/** * ovpn_socket_new - create a new socket and initialize it * @sock: the kernel socket to embed diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -21,9 +21,95 @@ #include "bind.h" #include "io.h" #include "peer.h" +#include "proto.h" #include "socket.h" #include "udp.h" +/**
- ovpn_udp_encap_recv - Start processing a received UDP packet.
- @sk: socket over which the packet was received
- @skb: the received packet
- If the first byte of the payload is DATA_V2, the packet is
further processed,
- otherwise it is forwarded to the UDP stack for delivery to user
space.
- Return:
- * 0 if skb was consumed or dropped
0 if skb should be passed up to userspace as UDP (packet notconsumed)
- <0 if skb should be resubmitted as proto -N (packet not consumed)
- */
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{ + struct ovpn_peer *peer = NULL; + struct ovpn_struct *ovpn; + u32 peer_id; + u8 opcode;
+ ovpn = ovpn_from_udp_sock(sk); + if (unlikely(!ovpn)) { + net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n", + __func__);
Probably we should zero ovpn pointer in the ovpn_sock to survive scheduled socket release (see comment in ovpn_from_udp_sock). So, this print should be removed to avoid printing misguiding errors.
I am also not following this. ovpn is already NULL if we are entering this branch, no?
And I think this condition is quite improbable as well.
Here, due to the scheduled nature of the detach function invocation, ovpn_from_udp_sock() can return us a pointer to the freed memory.
So we should prevent ovpn_udp_encap_recv() invocation after the netdev release by flushing the workqueue. Or we can set ovpn_sock->ovpn = NULL even before scheduling the socket detaching. And in this case, ovpn_from_udp_sock() returning NULL will be a legitimate case and we should drop the error printing.
+ goto drop_noovpn; + }
+ /* Make sure the first 4 bytes of the skb data buffer after the UDP + * header are accessible. + * They are required to fetch the OP code, the key ID and the peer ID. + */ + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + + OVPN_OP_SIZE_V2))) { + net_dbg_ratelimited("%s: packet too small\n", __func__); + goto drop; + }
+ opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); + if (unlikely(opcode != OVPN_DATA_V2)) { + /* DATA_V1 is not supported */ + if (opcode == OVPN_DATA_V1) + goto drop;
This packet dropping makes protocol accelerator, intendent to speed up the data packets processing, a protocol enforcement entity, isn't it? Shall we follow the principle of beeing liberal in what we accept and just forward everything besides data packets upstream to a userspace application?
'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto expect any DATA packet to bubble up as it would not know what to do with it.
So any decision regarding data packets should stay in 'ovpn'.
We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used nowadays).
Moreover, it's quite impossible that a peer will send us DATA_V1 if it passed userspace handshake and negotiation.
The question was about the special handling of this packet type. If this packet type is unlikely, then why should the kernel take special care of it? Is this specific packet type going to crash the userspace application?
+ /* unknown or control packet: let it bubble up to userspace */ + return 1; + }
+ peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); + /* some OpenVPN server implementations send data packets with the + * peer-id set to undef. In this case we skip the peer lookup by peer-id + * and we try with the transport address + */ + if (peer_id != OVPN_PEER_ID_UNDEF) { + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", + __func__, peer_id);
Why do we consider a peer sending us garbage our problem? Meaning, this peer miss can be not our fault but a malformed packet from a 3rd party side. E.g. nowdays I can see a lot of traces of these "active probers" in my OpenVPN logs. Shall remove this message or at least make it debug to avoid bothering users with garbage traveling Internet? Anyway we can not do anything regarding incoming traffic.
It could also be a peer that believes to be connected while 'ovpn' dropped it earlier on. So this message would help the admin/user understanding what's going on. no?
It could help troubleshooting, potentionally. On the other hand, it will flood the kernel log with whatever junk is floating around the Internet. For sure.
Maybe make it an info/notice instead of error?
At best it can be a debug message for developers. But IMHO the really best choice is to get rid of it.
+ goto drop; + } + }
-- Sergey
On 26/11/2024 01:32, Sergey Ryazanov wrote:
On 15.11.2024 17:02, Antonio Quartulli wrote:
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...]
+/* Called after decrypt to write the IP packet to the device.
- This method is expected to manage/free the skb.
- */
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{ + unsigned int pkt_len;
+ /* we can't guarantee the packet wasn't corrupted before entering the + * VPN, therefore we give other layers a chance to check that + */ + skb->ip_summed = CHECKSUM_NONE;
+ /* skb hash for transport packet no longer valid after decapsulation */ + skb_clear_hash(skb);
+ /* post-decrypt scrub -- prepare to inject encapsulated packet onto the + * interface, based on __skb_tunnel_rx() in dst.h + */ + skb->dev = peer->ovpn->dev; + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, true);
The skb->protocol field is going to be updated in the upcoming patch in the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, why do not touch the protocol field here?
Well, I would personally not document missing details in a partly implemented code path.
Looks like the question wasn't precisely emphrased. By bad. Let me elaborate it in more details:
- usually skb->protocol is updated just before a packet leaves a module
- I've not found it were it was expected
- skb->protocol is updated in the caller function -
ovpn_decrypt_post(), along with the skb_reset_network_header() call.
The question is, shall we put some comment here in the ovpn_netdev_write() function elaborating that this was done in the caller? Or is such comment odd?
Ok, got it. Mah personally I don't think it's truly needed. But I have no strong opinion.
+ skb_reset_network_header(skb);
ovpn_decrypt_post() already reseted the network header. Why do we need it here again?
yeah, I think this can be removed.
+ skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb);
+ memset(skb->cb, 0, sizeof(skb->cb));
Why do we need to zero the control buffer here?
To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well.
AFAIR, there is no convention to clean the control buffer before the handing over. The common practice is a bit opposite, programmer shall not assume that the control buffer has been zeroed.
Not a big deal to clean it here, we just can save some CPU cycles avoiding it.
If there is no convention, then I agree with you and I'd remove it.
I think this was recommended by Sabrina as well.
Curious. It's macsec that does not zero it, or I've not understood how it was done.
I don't see it being zero'd. So I possibly misunderstood the suggestion. I'll remove the memset.
+ /* cause packet to be "received" by the interface */ + pkt_len = skb->len; + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, + skb) == NET_RX_SUCCESS)) + /* update RX stats with the size of decrypted packet */ + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); +}
[...]
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock) if (!sock) return; + if (sock->sk->sk_protocol == IPPROTO_UDP) + ovpn_udp_socket_detach(sock);
sockfd_put(sock); } @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer) return ret; } +/* Retrieve the corresponding ovpn object from a UDP socket
- rcu_read_lock must be held on entry
- */
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock;
+ if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL;
+ ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL;
+ /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL;
+ return ovpn_sock->ovpn;
Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption?
Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0?
Yeah, you are right. The question sounds like a riddle. I should eventually stop composing emails at midnight. Let me paraphrase it.
:)
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
ok
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
- flush the workqueue somewhere before the netdev release
yes! This is what I was missing. This will also solve the "how can the module wait for all workers to be done before unloading?"
- set ovpn_sock->ovpn = NULL before scheduling the socket detach
This makes sense too. But 1 is definitely what we need.
If the above is right then shall we set ->ovpn = NULL before scheduling the socket releasing work or somehow else mark the socket as half- destroyed?
Will think about it, it may make sense to nullify ->ovpn as well.
+}
/** * ovpn_socket_new - create a new socket and initialize it * @sock: the kernel socket to embed diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644 --- a/drivers/net/ovpn/udp.c +++ b/drivers/net/ovpn/udp.c @@ -21,9 +21,95 @@ #include "bind.h" #include "io.h" #include "peer.h" +#include "proto.h" #include "socket.h" #include "udp.h" +/**
- ovpn_udp_encap_recv - Start processing a received UDP packet.
- @sk: socket over which the packet was received
- @skb: the received packet
- If the first byte of the payload is DATA_V2, the packet is
further processed,
- otherwise it is forwarded to the UDP stack for delivery to user
space.
- Return:
- * 0 if skb was consumed or dropped
0 if skb should be passed up to userspace as UDP (packet notconsumed)
- <0 if skb should be resubmitted as proto -N (packet not consumed)
- */
+static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb) +{ + struct ovpn_peer *peer = NULL; + struct ovpn_struct *ovpn; + u32 peer_id; + u8 opcode;
+ ovpn = ovpn_from_udp_sock(sk); + if (unlikely(!ovpn)) { + net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n", + __func__);
Probably we should zero ovpn pointer in the ovpn_sock to survive scheduled socket release (see comment in ovpn_from_udp_sock). So, this print should be removed to avoid printing misguiding errors.
I am also not following this. ovpn is already NULL if we are entering this branch, no?
And I think this condition is quite improbable as well.
Here, due to the scheduled nature of the detach function invocation, ovpn_from_udp_sock() can return us a pointer to the freed memory.
So we should prevent ovpn_udp_encap_recv() invocation after the netdev release by flushing the workqueue. Or we can set ovpn_sock->ovpn = NULL even before scheduling the socket detaching. And in this case, ovpn_from_udp_sock() returning NULL will be a legitimate case and we should drop the error printing.
ok got it. it is related with the comment above.
+ goto drop_noovpn; + }
+ /* Make sure the first 4 bytes of the skb data buffer after the UDP + * header are accessible. + * They are required to fetch the OP code, the key ID and the peer ID. + */ + if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + + OVPN_OP_SIZE_V2))) { + net_dbg_ratelimited("%s: packet too small\n", __func__); + goto drop; + }
+ opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr)); + if (unlikely(opcode != OVPN_DATA_V2)) { + /* DATA_V1 is not supported */ + if (opcode == OVPN_DATA_V1) + goto drop;
This packet dropping makes protocol accelerator, intendent to speed up the data packets processing, a protocol enforcement entity, isn't it? Shall we follow the principle of beeing liberal in what we accept and just forward everything besides data packets upstream to a userspace application?
'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto expect any DATA packet to bubble up as it would not know what to do with it.
So any decision regarding data packets should stay in 'ovpn'.
We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used nowadays).
Moreover, it's quite impossible that a peer will send us DATA_V1 if it passed userspace handshake and negotiation.
The question was about the special handling of this packet type. If this packet type is unlikely, then why should the kernel take special care of it? Is this specific packet type going to crash the userspace application?
Not crash (hopefully) but will create confusion because it is unexpected. The userspace dataplane path is technically inactive when 'ovpn' is in use.
The idea is that any DATA_V* packet should be handled in kernelspace and userspace should not need to care.
+ /* unknown or control packet: let it bubble up to userspace */ + return 1; + }
+ peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr)); + /* some OpenVPN server implementations send data packets with the + * peer-id set to undef. In this case we skip the peer lookup by peer-id + * and we try with the transport address + */ + if (peer_id != OVPN_PEER_ID_UNDEF) { + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + net_err_ratelimited("%s: received data from unknown peer (id: %d)\n", + __func__, peer_id);
Why do we consider a peer sending us garbage our problem? Meaning, this peer miss can be not our fault but a malformed packet from a 3rd party side. E.g. nowdays I can see a lot of traces of these "active probers" in my OpenVPN logs. Shall remove this message or at least make it debug to avoid bothering users with garbage traveling Internet? Anyway we can not do anything regarding incoming traffic.
It could also be a peer that believes to be connected while 'ovpn' dropped it earlier on. So this message would help the admin/user understanding what's going on. no?
It could help troubleshooting, potentionally. On the other hand, it will flood the kernel log with whatever junk is floating around the Internet. For sure.
Well, only packets having the right opcode in it and being large enough. Because we have already dropped anything that doesn't look like a DATA_V2 packet at this point.
Maybe make it an info/notice instead of error?
At best it can be a debug message for developers. But IMHO the really best choice is to get rid of it.
But yeah, I agree with you. Will just silently drop.
+ goto drop; + } + }
-- Sergey
Thanks, Regards,
On 26/11/2024 09:49, Antonio Quartulli wrote: [...]
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
ok
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
- flush the workqueue somewhere before the netdev release
yes! This is what I was missing. This will also solve the "how can the module wait for all workers to be done before unloading?"
Actually there might be even a simpler solution: each ovpn_socket will hold a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). I can simply increase the refcounter those objects while they are referenced by the socket and decrease it when the socket is fully released (in the detach() function called by the worker).
This way the netdev cannot be released until all socket (and all peers) are gone.
This approach doesn't require any local workqueue or any other special coordination as we'll just force the whole cleanup to happen in a specific order.
Does it make sense?
Regards,
2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote:
On 26/11/2024 09:49, Antonio Quartulli wrote: [...]
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
ok
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
- flush the workqueue somewhere before the netdev release
yes! This is what I was missing. This will also solve the "how can the module wait for all workers to be done before unloading?"
Actually there might be even a simpler solution: each ovpn_socket will hold a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). I can simply increase the refcounter those objects while they are referenced by the socket and decrease it when the socket is fully released (in the detach() function called by the worker).
This way the netdev cannot be released until all socket (and all peers) are gone.
This approach doesn't require any local workqueue or any other special coordination as we'll just force the whole cleanup to happen in a specific order.
Does it make sense?
This dependency between refcounts worries me. I'm already having a hard time remembering how all objects interact together.
And since ovpn_peer_release already calls ovpn_socket_put, you'd get a refcount loop if ovpn_socket now also has a ref on the peer, no?
On 29/11/2024 14:20, Sabrina Dubroca wrote:
2024-11-27, 02:40:02 +0100, Antonio Quartulli wrote:
On 26/11/2024 09:49, Antonio Quartulli wrote: [...]
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
ok
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
- flush the workqueue somewhere before the netdev release
yes! This is what I was missing. This will also solve the "how can the module wait for all workers to be done before unloading?"
Actually there might be even a simpler solution: each ovpn_socket will hold a reference to an ovpn_peer (TCP) or to an ovpn_priv (UDP). I can simply increase the refcounter those objects while they are referenced by the socket and decrease it when the socket is fully released (in the detach() function called by the worker).
This way the netdev cannot be released until all socket (and all peers) are gone.
This approach doesn't require any local workqueue or any other special coordination as we'll just force the whole cleanup to happen in a specific order.
Does it make sense?
This dependency between refcounts worries me. I'm already having a hard time remembering how all objects interact together.
And since ovpn_peer_release already calls ovpn_socket_put, you'd get a refcount loop if ovpn_socket now also has a ref on the peer, no?
You're right. Therefore I started playing with the following approach: * implement ovpn_peer_remove() that is invoked by ovpn_peer_del(), i.e. when ovpn wants to remove the peer from its state * ovpn_peer_remove() will do all kind of cleanup and unhash, including calling ovpn_socket_put() * in turn, when the socket is released from all other contexts, it will also call ovpn_peer_put() and allow the peer to be free'd for good.
On one hand it sounds a bit clumsy, but on the other hand it allows each component to keep relying on any reference it is holding until the end.
The only downside is that we will start shutting down a peer and then keep it around until any reference is dropped. But it should work.
Regards,
2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
On 15.11.2024 17:02, Antonio Quartulli wrote:
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...]
+ skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb);
+ memset(skb->cb, 0, sizeof(skb->cb));
Why do we need to zero the control buffer here?
To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well.
AFAIR, there is no convention to clean the control buffer before the handing over. The common practice is a bit opposite, programmer shall not assume that the control buffer has been zeroed.
Not a big deal to clean it here, we just can save some CPU cycles avoiding it.
I think this was recommended by Sabrina as well.
Curious. It's macsec that does not zero it, or I've not understood how it was done.
I only remember discussing a case [1] where one function within ovpn was expecting a cleared skb->cb to behave correctly but the caller did not clear it. In general, as you said, clearing cb "to be nice to other layers" is not expected. Sorry if some comments I made were confusing.
[1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock;
+ if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL;
+ ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL;
+ /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL;
+ return ovpn_sock->ovpn;
Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption?
Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0?
Yeah, you are right. The question sounds like a riddle. I should eventually stop composing emails at midnight. Let me paraphrase it.
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point
It's more the synchronize_net in unregister_netdevice_many_notify? rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu waits for rcu_read_lock sections (see the comments for rcu_barrier and synchronize_rcu in kernel/rcu/tree.c).
free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
- flush the workqueue somewhere before the netdev release
- set ovpn_sock->ovpn = NULL before scheduling the socket detach
Going with #2, we could fully split detach into a synchronous part and async part (with async not needed for UDP). detach_sync clears the pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so that no more packets will be sent through the ovpn driver.
Related to that topic, I'm not sure what's keeping a reference on the peer to guarantee it doesn't get freed before we're done with peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this tcp stuff should move from the peer to ovpn_socket?
On 29/11/2024 17:10, Sabrina Dubroca wrote:
2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
On 15.11.2024 17:02, Antonio Quartulli wrote:
On 11/11/2024 02:54, Sergey Ryazanov wrote: [...]
+ skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb);
+ memset(skb->cb, 0, sizeof(skb->cb));
Why do we need to zero the control buffer here?
To avoid the next layer to assume the cb is clean while it is not. Other drivers do the same as well.
AFAIR, there is no convention to clean the control buffer before the handing over. The common practice is a bit opposite, programmer shall not assume that the control buffer has been zeroed.
Not a big deal to clean it here, we just can save some CPU cycles avoiding it.
I think this was recommended by Sabrina as well.
Curious. It's macsec that does not zero it, or I've not understood how it was done.
I only remember discussing a case [1] where one function within ovpn was expecting a cleared skb->cb to behave correctly but the caller did not clear it. In general, as you said, clearing cb "to be nice to other layers" is not expected. Sorry if some comments I made were confusing.
No problem at all. I misunderstood some statement and went the wrong route. Thanks a lot Sergey for pointing this out.
I am only clearing the cb before usage as required by internal assumptions.
[1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog
+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk) +{ + struct ovpn_socket *ovpn_sock;
+ if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP)) + return NULL;
+ ovpn_sock = rcu_dereference_sk_user_data(sk); + if (unlikely(!ovpn_sock)) + return NULL;
+ /* make sure that sk matches our stored transport socket */ + if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk)) + return NULL;
+ return ovpn_sock->ovpn;
Now, returning of this pointer is safe. But the following TCP transport support calls the socket release via a scheduled work. What extends socket lifetime and makes it possible to receive a UDP packet way after the interface private data release. Is it correct assumption?
Sorry you lost me when sayng "following *TCP* transp[ort support calls". This function is invoked only in UDP context. Was that a typ0?
Yeah, you are right. The question sounds like a riddle. I should eventually stop composing emails at midnight. Let me paraphrase it.
The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock netdev_run_todo rcu_barrier <- no running ovpn_udp_encap_recv after this point
It's more the synchronize_net in unregister_netdevice_many_notify? rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu waits for rcu_read_lock sections (see the comments for rcu_barrier and synchronize_rcu in kernel/rcu/tree.c).
free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER) ovpn_peer_release_p2p ovpn_peer_del_p2p ovpn_peer_put ovpn_peer_release_kref ovpn_peer_release ovpn_socket_put ovpn_socket_release_kref ovpn_socket_schedule_release schedule_work(&sock->work) netdev_run_todo rcu_barrier free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet ovpn_from_udp_sock <- returns pointer to freed memory // Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work ovpn_socket_detach ovpn_udp_socket_detach setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
- flush the workqueue somewhere before the netdev release
- set ovpn_sock->ovpn = NULL before scheduling the socket detach
Going with #2, we could fully split detach into a synchronous part and async part (with async not needed for UDP). detach_sync clears the pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so that no more packets will be sent through the ovpn driver.
Related to that topic, I'm not sure what's keeping a reference on the peer to guarantee it doesn't get freed before we're done with peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this tcp stuff should move from the peer to ovpn_socket?
Good point. It may make sense to move everything to ovpn_socket and avoid this extra dependency on the peer, while it is not needed at all.
I will play with it and see what comes out.
Thanks!
Regards,
On 29/11/2024 17:10, Sabrina Dubroca wrote:
Related to that topic, I'm not sure what's keeping a reference on the peer to guarantee it doesn't get freed before we're done with peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this tcp stuff should move from the peer to ovpn_socket?
Actually, with the new approach of "keeping the reference to the peer until the socket is gone" we can simply ensure the reference is dropped at the very end of the detach, after cancel_work_sync() is done.
This way we know for sure that every activity is done and can release the peer.
Regards,
On 29.10.2024 12:47, Antonio Quartulli wrote:
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{
- unsigned int pkt_len;
- /* we can't guarantee the packet wasn't corrupted before entering the
* VPN, therefore we give other layers a chance to check that
*/
- skb->ip_summed = CHECKSUM_NONE;
- /* skb hash for transport packet no longer valid after decapsulation */
- skb_clear_hash(skb);
- /* post-decrypt scrub -- prepare to inject encapsulated packet onto the
* interface, based on __skb_tunnel_rx() in dst.h
*/
- skb->dev = peer->ovpn->dev;
- skb_set_queue_mapping(skb, 0);
- skb_scrub_packet(skb, true);
- skb_reset_network_header(skb);
- skb_reset_transport_header(skb);
- skb_probe_transport_header(skb);
- skb_reset_inner_headers(skb);
- memset(skb->cb, 0, sizeof(skb->cb));
- /* cause packet to be "received" by the interface */
- pkt_len = skb->len;
- if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
skb) == NET_RX_SUCCESS))
nit: to improve readability, the packet delivery call can be composed like this:
pkt_len = skb->len; res = gro_cells_receive(&peer->ovpn->gro_cells, skb); if (likely(res == NET_RX_SUCCESS))
/* update RX stats with the size of decrypted packet */
dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
+}
On 12/11/2024 01:16, Sergey Ryazanov wrote:
On 29.10.2024 12:47, Antonio Quartulli wrote:
+static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) +{ + unsigned int pkt_len;
+ /* we can't guarantee the packet wasn't corrupted before entering the + * VPN, therefore we give other layers a chance to check that + */ + skb->ip_summed = CHECKSUM_NONE;
+ /* skb hash for transport packet no longer valid after decapsulation */ + skb_clear_hash(skb);
+ /* post-decrypt scrub -- prepare to inject encapsulated packet onto the + * interface, based on __skb_tunnel_rx() in dst.h + */ + skb->dev = peer->ovpn->dev; + skb_set_queue_mapping(skb, 0); + skb_scrub_packet(skb, true);
+ skb_reset_network_header(skb); + skb_reset_transport_header(skb); + skb_probe_transport_header(skb); + skb_reset_inner_headers(skb);
+ memset(skb->cb, 0, sizeof(skb->cb));
+ /* cause packet to be "received" by the interface */ + pkt_len = skb->len; + if (likely(gro_cells_receive(&peer->ovpn->gro_cells, + skb) == NET_RX_SUCCESS))
nit: to improve readability, the packet delivery call can be composed like this:
pkt_len = skb->len; res = gro_cells_receive(&peer->ovpn->gro_cells, skb); if (likely(res == NET_RX_SUCCESS))
hm, you don't like calls on two lines? :-)
ok, will change it.
Regards,
+ /* update RX stats with the size of decrypted packet */ + dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); +}
This change implements encryption/decryption and encapsulation/decapsulation of OpenVPN packets.
Support for generic crypto state is added along with a wrapper for the AEAD crypto kernel API.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/Makefile | 3 + drivers/net/ovpn/crypto.c | 153 +++++++++++++++++ drivers/net/ovpn/crypto.h | 139 ++++++++++++++++ drivers/net/ovpn/crypto_aead.c | 367 +++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/crypto_aead.h | 31 ++++ drivers/net/ovpn/io.c | 146 ++++++++++++++-- drivers/net/ovpn/io.h | 3 + drivers/net/ovpn/packet.h | 2 +- drivers/net/ovpn/peer.c | 29 ++++ drivers/net/ovpn/peer.h | 6 + drivers/net/ovpn/pktid.c | 130 +++++++++++++++ drivers/net/ovpn/pktid.h | 87 ++++++++++ drivers/net/ovpn/proto.h | 31 ++++ drivers/net/ovpn/skb.h | 4 + 14 files changed, 1120 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index 56bddc9bef83e0befde6af3c3565bb91731d7b22..ccdaeced1982c851475657860a005ff2b9dfbd13 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -8,10 +8,13 @@
obj-$(CONFIG_OVPN) := ovpn.o ovpn-y += bind.o +ovpn-y += crypto.o +ovpn-y += crypto_aead.o ovpn-y += main.o ovpn-y += io.o ovpn-y += netlink.o ovpn-y += netlink-gen.o ovpn-y += peer.o +ovpn-y += pktid.o ovpn-y += socket.o ovpn-y += udp.o diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c new file mode 100644 index 0000000000000000000000000000000000000000..f1f7510e2f735e367f96eb4982ba82c9af3c8bfc --- /dev/null +++ b/drivers/net/ovpn/crypto.c @@ -0,0 +1,153 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/types.h> +#include <linux/net.h> +#include <linux/netdevice.h> +#include <uapi/linux/ovpn.h> + +#include "ovpnstruct.h" +#include "main.h" +#include "packet.h" +#include "pktid.h" +#include "crypto_aead.h" +#include "crypto.h" + +static void ovpn_ks_destroy_rcu(struct rcu_head *head) +{ + struct ovpn_crypto_key_slot *ks; + + ks = container_of(head, struct ovpn_crypto_key_slot, rcu); + ovpn_aead_crypto_key_slot_destroy(ks); +} + +void ovpn_crypto_key_slot_release(struct kref *kref) +{ + struct ovpn_crypto_key_slot *ks; + + ks = container_of(kref, struct ovpn_crypto_key_slot, refcount); + call_rcu(&ks->rcu, ovpn_ks_destroy_rcu); +} + +/* can only be invoked when all peer references have been dropped (i.e. RCU + * release routine) + */ +void ovpn_crypto_state_release(struct ovpn_crypto_state *cs) +{ + struct ovpn_crypto_key_slot *ks; + + ks = rcu_access_pointer(cs->slots[0]); + if (ks) { + RCU_INIT_POINTER(cs->slots[0], NULL); + ovpn_crypto_key_slot_put(ks); + } + + ks = rcu_access_pointer(cs->slots[1]); + if (ks) { + RCU_INIT_POINTER(cs->slots[1], NULL); + ovpn_crypto_key_slot_put(ks); + } +} + +/* Reset the ovpn_crypto_state object in a way that is atomic + * to RCU readers. + */ +int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs, + const struct ovpn_peer_key_reset *pkr) +{ + struct ovpn_crypto_key_slot *old = NULL, *new; + u8 idx; + + if (pkr->slot != OVPN_KEY_SLOT_PRIMARY && + pkr->slot != OVPN_KEY_SLOT_SECONDARY) + return -EINVAL; + + new = ovpn_aead_crypto_key_slot_new(&pkr->key); + if (IS_ERR(new)) + return PTR_ERR(new); + + spin_lock_bh(&cs->lock); + idx = cs->primary_idx; + switch (pkr->slot) { + case OVPN_KEY_SLOT_PRIMARY: + old = rcu_replace_pointer(cs->slots[idx], new, + lockdep_is_held(&cs->lock)); + break; + case OVPN_KEY_SLOT_SECONDARY: + old = rcu_replace_pointer(cs->slots[!idx], new, + lockdep_is_held(&cs->lock)); + break; + } + spin_unlock_bh(&cs->lock); + + if (old) + ovpn_crypto_key_slot_put(old); + + return 0; +} + +void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs, + enum ovpn_key_slot slot) +{ + struct ovpn_crypto_key_slot *ks = NULL; + u8 idx; + + if (slot != OVPN_KEY_SLOT_PRIMARY && + slot != OVPN_KEY_SLOT_SECONDARY) { + pr_warn("Invalid slot to release: %u\n", slot); + return; + } + + spin_lock_bh(&cs->lock); + idx = cs->primary_idx; + switch (slot) { + case OVPN_KEY_SLOT_PRIMARY: + ks = rcu_replace_pointer(cs->slots[idx], NULL, + lockdep_is_held(&cs->lock)); + break; + case OVPN_KEY_SLOT_SECONDARY: + ks = rcu_replace_pointer(cs->slots[!idx], NULL, + lockdep_is_held(&cs->lock)); + break; + } + spin_unlock_bh(&cs->lock); + + if (!ks) { + pr_debug("Key slot already released: %u\n", slot); + return; + } + + pr_debug("deleting key slot %u, key_id=%u\n", slot, ks->key_id); + ovpn_crypto_key_slot_put(ks); +} + +/* this swap is not atomic, but there will be a very short time frame where the + * old_secondary key won't be available. This should not be a big deal as most + * likely both peers are already using the new primary at this point. + */ +void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs) +{ + const struct ovpn_crypto_key_slot *old_primary, *old_secondary; + u8 idx; + + spin_lock_bh(&cs->lock); + idx = cs->primary_idx; + old_primary = rcu_dereference_protected(cs->slots[idx], + lockdep_is_held(&cs->lock)); + old_secondary = rcu_dereference_protected(cs->slots[!idx], + lockdep_is_held(&cs->lock)); + /* perform real swap by switching the index of the primary key */ + cs->primary_idx = !cs->primary_idx; + + pr_debug("key swapped: (old primary) %d <-> (new primary) %d\n", + old_primary ? old_primary->key_id : -1, + old_secondary ? old_secondary->key_id : -1); + + spin_unlock_bh(&cs->lock); +} diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h new file mode 100644 index 0000000000000000000000000000000000000000..3b437d26b531c3034cca5343c755ef9c7ef57276 --- /dev/null +++ b/drivers/net/ovpn/crypto.h @@ -0,0 +1,139 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNCRYPTO_H_ +#define _NET_OVPN_OVPNCRYPTO_H_ + +#include "packet.h" +#include "pktid.h" + +/* info needed for both encrypt and decrypt directions */ +struct ovpn_key_direction { + const u8 *cipher_key; + size_t cipher_key_size; + const u8 *nonce_tail; /* only needed for GCM modes */ + size_t nonce_tail_size; /* only needed for GCM modes */ +}; + +/* all info for a particular symmetric key (primary or secondary) */ +struct ovpn_key_config { + enum ovpn_cipher_alg cipher_alg; + u8 key_id; + struct ovpn_key_direction encrypt; + struct ovpn_key_direction decrypt; +}; + +/* used to pass settings from netlink to the crypto engine */ +struct ovpn_peer_key_reset { + enum ovpn_key_slot slot; + struct ovpn_key_config key; +}; + +struct ovpn_crypto_key_slot { + u8 key_id; + + struct crypto_aead *encrypt; + struct crypto_aead *decrypt; + struct ovpn_nonce_tail nonce_tail_xmit; + struct ovpn_nonce_tail nonce_tail_recv; + + struct ovpn_pktid_recv pid_recv ____cacheline_aligned_in_smp; + struct ovpn_pktid_xmit pid_xmit ____cacheline_aligned_in_smp; + struct kref refcount; + struct rcu_head rcu; +}; + +struct ovpn_crypto_state { + struct ovpn_crypto_key_slot __rcu *slots[2]; + u8 primary_idx; + + /* protects primary and secondary slots */ + spinlock_t lock; +}; + +static inline bool ovpn_crypto_key_slot_hold(struct ovpn_crypto_key_slot *ks) +{ + return kref_get_unless_zero(&ks->refcount); +} + +static inline void ovpn_crypto_state_init(struct ovpn_crypto_state *cs) +{ + RCU_INIT_POINTER(cs->slots[0], NULL); + RCU_INIT_POINTER(cs->slots[1], NULL); + cs->primary_idx = 0; + spin_lock_init(&cs->lock); +} + +static inline struct ovpn_crypto_key_slot * +ovpn_crypto_key_id_to_slot(const struct ovpn_crypto_state *cs, u8 key_id) +{ + struct ovpn_crypto_key_slot *ks; + u8 idx; + + if (unlikely(!cs)) + return NULL; + + rcu_read_lock(); + idx = cs->primary_idx; + ks = rcu_dereference(cs->slots[idx]); + if (ks && ks->key_id == key_id) { + if (unlikely(!ovpn_crypto_key_slot_hold(ks))) + ks = NULL; + goto out; + } + + ks = rcu_dereference(cs->slots[idx ^ 1]); + if (ks && ks->key_id == key_id) { + if (unlikely(!ovpn_crypto_key_slot_hold(ks))) + ks = NULL; + goto out; + } + + /* when both key slots are occupied but no matching key ID is found, ks + * has to be reset to NULL to avoid carrying a stale pointer + */ + ks = NULL; +out: + rcu_read_unlock(); + + return ks; +} + +static inline struct ovpn_crypto_key_slot * +ovpn_crypto_key_slot_primary(const struct ovpn_crypto_state *cs) +{ + struct ovpn_crypto_key_slot *ks; + + rcu_read_lock(); + ks = rcu_dereference(cs->slots[cs->primary_idx]); + if (unlikely(ks && !ovpn_crypto_key_slot_hold(ks))) + ks = NULL; + rcu_read_unlock(); + + return ks; +} + +void ovpn_crypto_key_slot_release(struct kref *kref); + +static inline void ovpn_crypto_key_slot_put(struct ovpn_crypto_key_slot *ks) +{ + kref_put(&ks->refcount, ovpn_crypto_key_slot_release); +} + +int ovpn_crypto_state_reset(struct ovpn_crypto_state *cs, + const struct ovpn_peer_key_reset *pkr); + +void ovpn_crypto_key_slot_delete(struct ovpn_crypto_state *cs, + enum ovpn_key_slot slot); + +void ovpn_crypto_state_release(struct ovpn_crypto_state *cs); + +void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs); + +#endif /* _NET_OVPN_OVPNCRYPTO_H_ */ diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c new file mode 100644 index 0000000000000000000000000000000000000000..f9e3feb297b19868b1084048933796fcc7a47d6e --- /dev/null +++ b/drivers/net/ovpn/crypto_aead.c @@ -0,0 +1,367 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#include <crypto/aead.h> +#include <linux/skbuff.h> +#include <net/ip.h> +#include <net/ipv6.h> +#include <net/udp.h> + +#include "ovpnstruct.h" +#include "main.h" +#include "io.h" +#include "packet.h" +#include "pktid.h" +#include "crypto_aead.h" +#include "crypto.h" +#include "peer.h" +#include "proto.h" +#include "skb.h" + +#define AUTH_TAG_SIZE 16 + +#define ALG_NAME_AES "gcm(aes)" +#define ALG_NAME_CHACHAPOLY "rfc7539(chacha20,poly1305)" + +static int ovpn_aead_encap_overhead(const struct ovpn_crypto_key_slot *ks) +{ + return OVPN_OP_SIZE_V2 + /* OP header size */ + 4 + /* Packet ID */ + crypto_aead_authsize(ks->encrypt); /* Auth Tag */ +} + +int ovpn_aead_encrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, + struct sk_buff *skb) +{ + const unsigned int tag_size = crypto_aead_authsize(ks->encrypt); + const unsigned int head_size = ovpn_aead_encap_overhead(ks); + struct aead_request *req; + struct sk_buff *trailer; + struct scatterlist *sg; + u8 iv[NONCE_SIZE]; + int nfrags, ret; + u32 pktid, op; + + ovpn_skb_cb(skb)->peer = peer; + ovpn_skb_cb(skb)->ks = ks; + + /* Sample AEAD header format: + * 48000001 00000005 7e7046bd 444a7e28 cc6387b1 64a4d6c1 380275a... + * [ OP32 ] [seq # ] [ auth tag ] [ payload ... ] + * [4-byte + * IV head] + */ + + /* check that there's enough headroom in the skb for packet + * encapsulation, after adding network header and encryption overhead + */ + if (unlikely(skb_cow_head(skb, OVPN_HEAD_ROOM + head_size))) + return -ENOBUFS; + + /* get number of skb frags and ensure that packet data is writable */ + nfrags = skb_cow_data(skb, 0, &trailer); + if (unlikely(nfrags < 0)) + return nfrags; + + if (unlikely(nfrags + 2 > (MAX_SKB_FRAGS + 2))) + return -ENOSPC; + + ovpn_skb_cb(skb)->sg = kmalloc(sizeof(*ovpn_skb_cb(skb)->sg) * + (nfrags + 2), GFP_ATOMIC); + if (unlikely(!ovpn_skb_cb(skb)->sg)) + return -ENOMEM; + + sg = ovpn_skb_cb(skb)->sg; + + /* sg table: + * 0: op, wire nonce (AD, len=OVPN_OP_SIZE_V2+NONCE_WIRE_SIZE), + * 1, 2, 3, ..., n: payload, + * n+1: auth_tag (len=tag_size) + */ + sg_init_table(sg, nfrags + 2); + + /* build scatterlist to encrypt packet payload */ + ret = skb_to_sgvec_nomark(skb, sg + 1, 0, skb->len); + if (unlikely(nfrags != ret)) { + ret = -EINVAL; + goto free_sg; + } + + /* append auth_tag onto scatterlist */ + __skb_push(skb, tag_size); + sg_set_buf(sg + nfrags + 1, skb->data, tag_size); + + /* obtain packet ID, which is used both as a first + * 4 bytes of nonce and last 4 bytes of associated data. + */ + ret = ovpn_pktid_xmit_next(&ks->pid_xmit, &pktid); + if (unlikely(ret < 0)) + goto free_sg; + + /* concat 4 bytes packet id and 8 bytes nonce tail into 12 bytes + * nonce + */ + ovpn_pktid_aead_write(pktid, &ks->nonce_tail_xmit, iv); + + /* make space for packet id and push it to the front */ + __skb_push(skb, NONCE_WIRE_SIZE); + memcpy(skb->data, iv, NONCE_WIRE_SIZE); + + /* add packet op as head of additional data */ + op = ovpn_opcode_compose(OVPN_DATA_V2, ks->key_id, peer->id); + __skb_push(skb, OVPN_OP_SIZE_V2); + BUILD_BUG_ON(sizeof(op) != OVPN_OP_SIZE_V2); + *((__force __be32 *)skb->data) = htonl(op); + + /* AEAD Additional data */ + sg_set_buf(sg, skb->data, OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE); + + req = aead_request_alloc(ks->encrypt, GFP_ATOMIC); + if (unlikely(!req)) { + ret = -ENOMEM; + goto free_sg; + } + + ovpn_skb_cb(skb)->req = req; + + /* setup async crypto operation */ + aead_request_set_tfm(req, ks->encrypt); + aead_request_set_callback(req, 0, ovpn_encrypt_post, skb); + aead_request_set_crypt(req, sg, sg, skb->len - head_size, iv); + aead_request_set_ad(req, OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE); + + /* encrypt it */ + return crypto_aead_encrypt(req); +free_sg: + kfree(ovpn_skb_cb(skb)->sg); + ovpn_skb_cb(skb)->sg = NULL; + return ret; +} + +int ovpn_aead_decrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, + struct sk_buff *skb) +{ + const unsigned int tag_size = crypto_aead_authsize(ks->decrypt); + int ret, payload_len, nfrags; + unsigned int payload_offset; + struct aead_request *req; + struct sk_buff *trailer; + struct scatterlist *sg; + unsigned int sg_len; + u8 iv[NONCE_SIZE]; + + payload_offset = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE + tag_size; + payload_len = skb->len - payload_offset; + + ovpn_skb_cb(skb)->payload_offset = payload_offset; + ovpn_skb_cb(skb)->peer = peer; + ovpn_skb_cb(skb)->ks = ks; + + /* sanity check on packet size, payload size must be >= 0 */ + if (unlikely(payload_len < 0)) + return -EINVAL; + + /* Prepare the skb data buffer to be accessed up until the auth tag. + * This is required because this area is directly mapped into the sg + * list. + */ + if (unlikely(!pskb_may_pull(skb, payload_offset))) + return -ENODATA; + + /* get number of skb frags and ensure that packet data is writable */ + nfrags = skb_cow_data(skb, 0, &trailer); + if (unlikely(nfrags < 0)) + return nfrags; + + if (unlikely(nfrags + 2 > (MAX_SKB_FRAGS + 2))) + return -ENOSPC; + + ovpn_skb_cb(skb)->sg = kmalloc(sizeof(*ovpn_skb_cb(skb)->sg) * + (nfrags + 2), GFP_ATOMIC); + if (unlikely(!ovpn_skb_cb(skb)->sg)) + return -ENOMEM; + + sg = ovpn_skb_cb(skb)->sg; + + /* sg table: + * 0: op, wire nonce (AD, len=OVPN_OP_SIZE_V2+NONCE_WIRE_SIZE), + * 1, 2, 3, ..., n: payload, + * n+1: auth_tag (len=tag_size) + */ + sg_init_table(sg, nfrags + 2); + + /* packet op is head of additional data */ + sg_len = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE; + sg_set_buf(sg, skb->data, sg_len); + + /* build scatterlist to decrypt packet payload */ + ret = skb_to_sgvec_nomark(skb, sg + 1, payload_offset, payload_len); + if (unlikely(nfrags != ret)) { + ret = -EINVAL; + goto free_sg; + } + + /* append auth_tag onto scatterlist */ + sg_set_buf(sg + nfrags + 1, skb->data + sg_len, tag_size); + + /* copy nonce into IV buffer */ + memcpy(iv, skb->data + OVPN_OP_SIZE_V2, NONCE_WIRE_SIZE); + memcpy(iv + NONCE_WIRE_SIZE, ks->nonce_tail_recv.u8, + sizeof(struct ovpn_nonce_tail)); + + req = aead_request_alloc(ks->decrypt, GFP_ATOMIC); + if (unlikely(!req)) { + ret = -ENOMEM; + goto free_sg; + } + + ovpn_skb_cb(skb)->req = req; + + /* setup async crypto operation */ + aead_request_set_tfm(req, ks->decrypt); + aead_request_set_callback(req, 0, ovpn_decrypt_post, skb); + aead_request_set_crypt(req, sg, sg, payload_len + tag_size, iv); + + aead_request_set_ad(req, NONCE_WIRE_SIZE + OVPN_OP_SIZE_V2); + + /* decrypt it */ + return crypto_aead_decrypt(req); +free_sg: + kfree(ovpn_skb_cb(skb)->sg); + ovpn_skb_cb(skb)->sg = NULL; + return ret; +} + +/* Initialize a struct crypto_aead object */ +struct crypto_aead *ovpn_aead_init(const char *title, const char *alg_name, + const unsigned char *key, + unsigned int keylen) +{ + struct crypto_aead *aead; + int ret; + + aead = crypto_alloc_aead(alg_name, 0, 0); + if (IS_ERR(aead)) { + ret = PTR_ERR(aead); + pr_err("%s crypto_alloc_aead failed, err=%d\n", title, ret); + aead = NULL; + goto error; + } + + ret = crypto_aead_setkey(aead, key, keylen); + if (ret) { + pr_err("%s crypto_aead_setkey size=%u failed, err=%d\n", title, + keylen, ret); + goto error; + } + + ret = crypto_aead_setauthsize(aead, AUTH_TAG_SIZE); + if (ret) { + pr_err("%s crypto_aead_setauthsize failed, err=%d\n", title, + ret); + goto error; + } + + /* basic AEAD assumption */ + if (crypto_aead_ivsize(aead) != NONCE_SIZE) { + pr_err("%s IV size must be %d\n", title, NONCE_SIZE); + ret = -EINVAL; + goto error; + } + + pr_debug("********* Cipher %s (%s)\n", alg_name, title); + pr_debug("*** IV size=%u\n", crypto_aead_ivsize(aead)); + pr_debug("*** req size=%u\n", crypto_aead_reqsize(aead)); + pr_debug("*** block size=%u\n", crypto_aead_blocksize(aead)); + pr_debug("*** auth size=%u\n", crypto_aead_authsize(aead)); + pr_debug("*** alignmask=0x%x\n", crypto_aead_alignmask(aead)); + + return aead; + +error: + crypto_free_aead(aead); + return ERR_PTR(ret); +} + +void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks) +{ + if (!ks) + return; + + crypto_free_aead(ks->encrypt); + crypto_free_aead(ks->decrypt); + kfree(ks); +} + +struct ovpn_crypto_key_slot * +ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc) +{ + struct ovpn_crypto_key_slot *ks = NULL; + const char *alg_name; + int ret; + + /* validate crypto alg */ + switch (kc->cipher_alg) { + case OVPN_CIPHER_ALG_AES_GCM: + alg_name = ALG_NAME_AES; + break; + case OVPN_CIPHER_ALG_CHACHA20_POLY1305: + alg_name = ALG_NAME_CHACHAPOLY; + break; + default: + return ERR_PTR(-EOPNOTSUPP); + } + + if (sizeof(struct ovpn_nonce_tail) != kc->encrypt.nonce_tail_size || + sizeof(struct ovpn_nonce_tail) != kc->decrypt.nonce_tail_size) + return ERR_PTR(-EINVAL); + + /* build the key slot */ + ks = kmalloc(sizeof(*ks), GFP_KERNEL); + if (!ks) + return ERR_PTR(-ENOMEM); + + ks->encrypt = NULL; + ks->decrypt = NULL; + kref_init(&ks->refcount); + ks->key_id = kc->key_id; + + ks->encrypt = ovpn_aead_init("encrypt", alg_name, + kc->encrypt.cipher_key, + kc->encrypt.cipher_key_size); + if (IS_ERR(ks->encrypt)) { + ret = PTR_ERR(ks->encrypt); + ks->encrypt = NULL; + goto destroy_ks; + } + + ks->decrypt = ovpn_aead_init("decrypt", alg_name, + kc->decrypt.cipher_key, + kc->decrypt.cipher_key_size); + if (IS_ERR(ks->decrypt)) { + ret = PTR_ERR(ks->decrypt); + ks->decrypt = NULL; + goto destroy_ks; + } + + memcpy(ks->nonce_tail_xmit.u8, kc->encrypt.nonce_tail, + sizeof(struct ovpn_nonce_tail)); + memcpy(ks->nonce_tail_recv.u8, kc->decrypt.nonce_tail, + sizeof(struct ovpn_nonce_tail)); + + /* init packet ID generation/validation */ + ovpn_pktid_xmit_init(&ks->pid_xmit); + ovpn_pktid_recv_init(&ks->pid_recv); + + return ks; + +destroy_ks: + ovpn_aead_crypto_key_slot_destroy(ks); + return ERR_PTR(ret); +} diff --git a/drivers/net/ovpn/crypto_aead.h b/drivers/net/ovpn/crypto_aead.h new file mode 100644 index 0000000000000000000000000000000000000000..77ee8141599bc06b0dc664c5b0a4dae660a89238 --- /dev/null +++ b/drivers/net/ovpn/crypto_aead.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNAEAD_H_ +#define _NET_OVPN_OVPNAEAD_H_ + +#include "crypto.h" + +#include <asm/types.h> +#include <linux/skbuff.h> + +struct crypto_aead *ovpn_aead_init(const char *title, const char *alg_name, + const unsigned char *key, + unsigned int keylen); + +int ovpn_aead_encrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, + struct sk_buff *skb); +int ovpn_aead_decrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, + struct sk_buff *skb); + +struct ovpn_crypto_key_slot * +ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc); +void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks); + +#endif /* _NET_OVPN_OVPNAEAD_H_ */ diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 791a1b117125118b179cb13cdfd5fbab6523a360..4c81c4547d35d2a73f680ef1f5d8853ffbd952e0 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -7,6 +7,7 @@ * Antonio Quartulli antonio@openvpn.net */
+#include <crypto/aead.h> #include <linux/netdevice.h> #include <linux/skbuff.h> #include <net/gro_cells.h> @@ -15,6 +16,9 @@ #include "ovpnstruct.h" #include "peer.h" #include "io.h" +#include "bind.h" +#include "crypto.h" +#include "crypto_aead.h" #include "netlink.h" #include "proto.h" #include "udp.h" @@ -58,33 +62,136 @@ static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb) dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len); }
-static void ovpn_decrypt_post(struct sk_buff *skb, int ret) +void ovpn_decrypt_post(void *data, int ret) { - struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; + struct ovpn_crypto_key_slot *ks; + unsigned int payload_offset = 0; + struct sk_buff *skb = data; + struct ovpn_peer *peer; + __be16 proto; + __be32 *pid; + + /* crypto is happening asynchronously. this function will be called + * again later by the crypto callback with a proper return code + */ + if (unlikely(ret == -EINPROGRESS)) + return; + + payload_offset = ovpn_skb_cb(skb)->payload_offset; + ks = ovpn_skb_cb(skb)->ks; + peer = ovpn_skb_cb(skb)->peer; + + /* crypto is done, cleanup skb CB and its members */ + + if (likely(ovpn_skb_cb(skb)->sg)) + kfree(ovpn_skb_cb(skb)->sg); + + if (likely(ovpn_skb_cb(skb)->req)) + aead_request_free(ovpn_skb_cb(skb)->req);
if (unlikely(ret < 0)) goto drop;
+ /* PID sits after the op */ + pid = (__force __be32 *)(skb->data + OVPN_OP_SIZE_V2); + ret = ovpn_pktid_recv(&ks->pid_recv, ntohl(*pid), 0); + if (unlikely(ret < 0)) { + net_err_ratelimited("%s: PKT ID RX error: %d\n", + peer->ovpn->dev->name, ret); + goto drop; + } + + /* point to encapsulated IP packet */ + __skb_pull(skb, payload_offset); + + /* check if this is a valid datapacket that has to be delivered to the + * ovpn interface + */ + skb_reset_network_header(skb); + proto = ovpn_ip_check_protocol(skb); + if (unlikely(!proto)) { + /* check if null packet */ + if (unlikely(!pskb_may_pull(skb, 1))) { + net_info_ratelimited("%s: NULL packet received from peer %u\n", + peer->ovpn->dev->name, peer->id); + goto drop; + } + + net_info_ratelimited("%s: unsupported protocol received from peer %u\n", + peer->ovpn->dev->name, peer->id); + goto drop; + } + skb->protocol = proto; + + /* perform Reverse Path Filtering (RPF) */ + if (unlikely(!ovpn_peer_check_by_src(peer->ovpn, skb, peer))) { + if (skb_protocol_to_family(skb) == AF_INET6) + net_dbg_ratelimited("%s: RPF dropped packet from peer %u, src: %pI6c\n", + peer->ovpn->dev->name, peer->id, + &ipv6_hdr(skb)->saddr); + else + net_dbg_ratelimited("%s: RPF dropped packet from peer %u, src: %pI4\n", + peer->ovpn->dev->name, peer->id, + &ip_hdr(skb)->saddr); + goto drop; + } + ovpn_netdev_write(peer, skb); /* skb is passed to upper layer - don't free it */ skb = NULL; drop: if (unlikely(skb)) dev_core_stats_rx_dropped_inc(peer->ovpn->dev); - ovpn_peer_put(peer); + if (likely(peer)) + ovpn_peer_put(peer); + if (likely(ks)) + ovpn_crypto_key_slot_put(ks); kfree_skb(skb); }
/* pick next packet from RX queue, decrypt and forward it to the device */ void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb) { - ovpn_skb_cb(skb)->peer = peer; - ovpn_decrypt_post(skb, 0); + struct ovpn_crypto_key_slot *ks; + u8 key_id; + + /* get the key slot matching the key ID in the received packet */ + key_id = ovpn_key_id_from_skb(skb); + ks = ovpn_crypto_key_id_to_slot(&peer->crypto, key_id); + if (unlikely(!ks)) { + net_info_ratelimited("%s: no available key for peer %u, key-id: %u\n", + peer->ovpn->dev->name, peer->id, key_id); + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); + kfree_skb(skb); + return; + } + + memset(ovpn_skb_cb(skb), 0, sizeof(struct ovpn_cb)); + ovpn_decrypt_post(skb, ovpn_aead_decrypt(peer, ks, skb)); }
-static void ovpn_encrypt_post(struct sk_buff *skb, int ret) +void ovpn_encrypt_post(void *data, int ret) { - struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer; + struct ovpn_crypto_key_slot *ks; + struct sk_buff *skb = data; + struct ovpn_peer *peer; + + /* encryption is happening asynchronously. This function will be + * called later by the crypto callback with a proper return value + */ + if (unlikely(ret == -EINPROGRESS)) + return; + + ks = ovpn_skb_cb(skb)->ks; + peer = ovpn_skb_cb(skb)->peer; + + /* crypto is done, cleanup skb CB and its members */ + + if (likely(ovpn_skb_cb(skb)->sg)) + kfree(ovpn_skb_cb(skb)->sg); + + if (likely(ovpn_skb_cb(skb)->req)) + aead_request_free(ovpn_skb_cb(skb)->req);
if (unlikely(ret < 0)) goto err; @@ -104,13 +211,31 @@ static void ovpn_encrypt_post(struct sk_buff *skb, int ret) err: if (unlikely(skb)) dev_core_stats_tx_dropped_inc(peer->ovpn->dev); - ovpn_peer_put(peer); + if (likely(peer)) + ovpn_peer_put(peer); + if (likely(ks)) + ovpn_crypto_key_slot_put(ks); kfree_skb(skb); }
static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb) { - ovpn_skb_cb(skb)->peer = peer; + struct ovpn_crypto_key_slot *ks; + + if (unlikely(skb->ip_summed == CHECKSUM_PARTIAL && + skb_checksum_help(skb))) { + net_warn_ratelimited("%s: cannot compute checksum for outgoing packet\n", + peer->ovpn->dev->name); + return false; + } + + /* get primary key to be used for encrypting data */ + ks = ovpn_crypto_key_slot_primary(&peer->crypto); + if (unlikely(!ks)) { + net_warn_ratelimited("%s: error while retrieving primary key slot for peer %u\n", + peer->ovpn->dev->name, peer->id); + return false; + }
/* take a reference to the peer because the crypto code may run async. * ovpn_encrypt_post() will release it upon completion @@ -120,7 +245,8 @@ static bool ovpn_encrypt_one(struct ovpn_peer *peer, struct sk_buff *skb) return false; }
- ovpn_encrypt_post(skb, 0); + memset(ovpn_skb_cb(skb), 0, sizeof(struct ovpn_cb)); + ovpn_encrypt_post(skb, ovpn_aead_encrypt(peer, ks, skb)); return true; }
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h index 9667a0a470e0b4b427524fffb5b9b395007e5a2f..ad81dd86924689309b3299573575a1705eddaf99 100644 --- a/drivers/net/ovpn/io.h +++ b/drivers/net/ovpn/io.h @@ -14,4 +14,7 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb);
+void ovpn_encrypt_post(void *data, int ret); +void ovpn_decrypt_post(void *data, int ret); + #endif /* _NET_OVPN_OVPN_H_ */ diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h index 7ed146f5932a25f448af6da58738a7eae81007fe..e14c9bf464f742e6d27fe3133dd175996970845e 100644 --- a/drivers/net/ovpn/packet.h +++ b/drivers/net/ovpn/packet.h @@ -10,7 +10,7 @@ #ifndef _NET_OVPN_PACKET_H_ #define _NET_OVPN_PACKET_H_
-/* When the OpenVPN protocol is ran in AEAD mode, use +/* When the OpenVPN protocol is run in AEAD mode, use * the OpenVPN packet ID as the AEAD nonce: * * 00000005 521c3b01 4308c041 diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index aff3e9e99b7d2dd2fa68484d9a396d43f75a6d0b..98ae7662f1e76811e625dc5f4b4c5c884856fbd6 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -12,6 +12,8 @@
#include "ovpnstruct.h" #include "bind.h" +#include "pktid.h" +#include "crypto.h" #include "io.h" #include "main.h" #include "netlink.h" @@ -43,6 +45,7 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) peer->vpn_addrs.ipv6 = in6addr_any;
RCU_INIT_POINTER(peer->bind, NULL); + ovpn_crypto_state_init(&peer->crypto); spin_lock_init(&peer->lock); kref_init(&peer->refcount);
@@ -68,6 +71,7 @@ static void ovpn_peer_release(struct ovpn_peer *peer) if (peer->sock) ovpn_socket_put(peer->sock);
+ ovpn_crypto_state_release(&peer->crypto); ovpn_bind_reset(peer, NULL); dst_cache_destroy(&peer->dst_cache); netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); @@ -278,6 +282,31 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, return peer; }
+/** + * ovpn_peer_check_by_src - check that skb source is routed via peer + * @ovpn: the openvpn instance to search + * @skb: the packet to extract source address from + * @peer: the peer to check against the source address + * + * Return: true if the peer is matching or false otherwise + */ +bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, + struct ovpn_peer *peer) +{ + bool match = false; + + if (ovpn->mode == OVPN_MODE_P2P) { + /* in P2P mode, no matter the destination, packets are always + * sent to the single peer listening on the other side + */ + rcu_read_lock(); + match = (peer == rcu_dereference(ovpn->peer)); + rcu_read_unlock(); + } + + return match; +} + /** * ovpn_peer_add_p2p - add peer to related tables in a P2P instance * @ovpn: the instance to add the peer to diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 51955aa39f1aa85ce541e289c60e9635cadb9c48..754fea470d1b4787f64a931d6c6adc24182fc16f 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -12,6 +12,8 @@
#include <net/dst_cache.h>
+#include "crypto.h" + /** * struct ovpn_peer - the main remote peer object * @ovpn: main openvpn instance this peer belongs to @@ -20,6 +22,7 @@ * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel * @sock: the socket being used to talk to this peer + * @crypto: the crypto configuration (ciphers, keys, etc..) * @dst_cache: cache for dst_entry used to send to peer * @bind: remote peer binding * @halt: true if ovpn_peer_mark_delete was called @@ -37,6 +40,7 @@ struct ovpn_peer { struct in6_addr ipv6; } vpn_addrs; struct ovpn_socket *sock; + struct ovpn_crypto_state crypto; struct dst_cache dst_cache; struct ovpn_bind __rcu *bind; bool halt; @@ -79,5 +83,7 @@ struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id); struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, struct sk_buff *skb); +bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, + struct ovpn_peer *peer);
#endif /* _NET_OVPN_OVPNPEER_H_ */ diff --git a/drivers/net/ovpn/pktid.c b/drivers/net/ovpn/pktid.c new file mode 100644 index 0000000000000000000000000000000000000000..96dc876356706eb6e2104cf8291c1487b4441b1f --- /dev/null +++ b/drivers/net/ovpn/pktid.c @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + * James Yonan james@openvpn.net + */ + +#include <linux/atomic.h> +#include <linux/jiffies.h> +#include <linux/net.h> +#include <linux/netdevice.h> +#include <linux/types.h> + +#include "ovpnstruct.h" +#include "main.h" +#include "packet.h" +#include "pktid.h" + +void ovpn_pktid_xmit_init(struct ovpn_pktid_xmit *pid) +{ + atomic64_set(&pid->seq_num, 1); +} + +void ovpn_pktid_recv_init(struct ovpn_pktid_recv *pr) +{ + memset(pr, 0, sizeof(*pr)); + spin_lock_init(&pr->lock); +} + +/* Packet replay detection. + * Allows ID backtrack of up to REPLAY_WINDOW_SIZE - 1. + */ +int ovpn_pktid_recv(struct ovpn_pktid_recv *pr, u32 pkt_id, u32 pkt_time) +{ + const unsigned long now = jiffies; + int ret; + + /* ID must not be zero */ + if (unlikely(pkt_id == 0)) + return -EINVAL; + + spin_lock_bh(&pr->lock); + + /* expire backtracks at or below pr->id after PKTID_RECV_EXPIRE time */ + if (unlikely(time_after_eq(now, pr->expire))) + pr->id_floor = pr->id; + + /* time changed? */ + if (unlikely(pkt_time != pr->time)) { + if (pkt_time > pr->time) { + /* time moved forward, accept */ + pr->base = 0; + pr->extent = 0; + pr->id = 0; + pr->time = pkt_time; + pr->id_floor = 0; + } else { + /* time moved backward, reject */ + ret = -ETIME; + goto out; + } + } + + if (likely(pkt_id == pr->id + 1)) { + /* well-formed ID sequence (incremented by 1) */ + pr->base = REPLAY_INDEX(pr->base, -1); + pr->history[pr->base / 8] |= (1 << (pr->base % 8)); + if (pr->extent < REPLAY_WINDOW_SIZE) + ++pr->extent; + pr->id = pkt_id; + } else if (pkt_id > pr->id) { + /* ID jumped forward by more than one */ + const unsigned int delta = pkt_id - pr->id; + + if (delta < REPLAY_WINDOW_SIZE) { + unsigned int i; + + pr->base = REPLAY_INDEX(pr->base, -delta); + pr->history[pr->base / 8] |= (1 << (pr->base % 8)); + pr->extent += delta; + if (pr->extent > REPLAY_WINDOW_SIZE) + pr->extent = REPLAY_WINDOW_SIZE; + for (i = 1; i < delta; ++i) { + unsigned int newb = REPLAY_INDEX(pr->base, i); + + pr->history[newb / 8] &= ~BIT(newb % 8); + } + } else { + pr->base = 0; + pr->extent = REPLAY_WINDOW_SIZE; + memset(pr->history, 0, sizeof(pr->history)); + pr->history[0] = 1; + } + pr->id = pkt_id; + } else { + /* ID backtrack */ + const unsigned int delta = pr->id - pkt_id; + + if (delta > pr->max_backtrack) + pr->max_backtrack = delta; + if (delta < pr->extent) { + if (pkt_id > pr->id_floor) { + const unsigned int ri = REPLAY_INDEX(pr->base, + delta); + u8 *p = &pr->history[ri / 8]; + const u8 mask = (1 << (ri % 8)); + + if (*p & mask) { + ret = -EINVAL; + goto out; + } + *p |= mask; + } else { + ret = -EINVAL; + goto out; + } + } else { + ret = -EINVAL; + goto out; + } + } + + pr->expire = now + PKTID_RECV_EXPIRE; + ret = 0; +out: + spin_unlock_bh(&pr->lock); + return ret; +} diff --git a/drivers/net/ovpn/pktid.h b/drivers/net/ovpn/pktid.h new file mode 100644 index 0000000000000000000000000000000000000000..fe02f0667e1a88a8c866fe4da4e5cebfba9efbcf --- /dev/null +++ b/drivers/net/ovpn/pktid.h @@ -0,0 +1,87 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + * James Yonan james@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNPKTID_H_ +#define _NET_OVPN_OVPNPKTID_H_ + +#include "packet.h" + +/* If no packets received for this length of time, set a backtrack floor + * at highest received packet ID thus far. + */ +#define PKTID_RECV_EXPIRE (30 * HZ) + +/* Packet-ID state for transmitter */ +struct ovpn_pktid_xmit { + atomic64_t seq_num; +}; + +/* replay window sizing in bytes = 2^REPLAY_WINDOW_ORDER */ +#define REPLAY_WINDOW_ORDER 8 + +#define REPLAY_WINDOW_BYTES BIT(REPLAY_WINDOW_ORDER) +#define REPLAY_WINDOW_SIZE (REPLAY_WINDOW_BYTES * 8) +#define REPLAY_INDEX(base, i) (((base) + (i)) & (REPLAY_WINDOW_SIZE - 1)) + +/* Packet-ID state for receiver. + * Other than lock member, can be zeroed to initialize. + */ +struct ovpn_pktid_recv { + /* "sliding window" bitmask of recent packet IDs received */ + u8 history[REPLAY_WINDOW_BYTES]; + /* bit position of deque base in history */ + unsigned int base; + /* extent (in bits) of deque in history */ + unsigned int extent; + /* expiration of history in jiffies */ + unsigned long expire; + /* highest sequence number received */ + u32 id; + /* highest time stamp received */ + u32 time; + /* we will only accept backtrack IDs > id_floor */ + u32 id_floor; + unsigned int max_backtrack; + /* protects entire pktd ID state */ + spinlock_t lock; +}; + +/* Get the next packet ID for xmit */ +static inline int ovpn_pktid_xmit_next(struct ovpn_pktid_xmit *pid, u32 *pktid) +{ + const s64 seq_num = atomic64_fetch_add_unless(&pid->seq_num, 1, + 0x100000000LL); + /* when the 32bit space is over, we return an error because the packet + * ID is used to create the cipher IV and we do not want to reuse the + * same value more than once + */ + if (unlikely(seq_num == 0x100000000LL)) + return -ERANGE; + + *pktid = (u32)seq_num; + + return 0; +} + +/* Write 12-byte AEAD IV to dest */ +static inline void ovpn_pktid_aead_write(const u32 pktid, + const struct ovpn_nonce_tail *nt, + unsigned char *dest) +{ + *(__force __be32 *)(dest) = htonl(pktid); + BUILD_BUG_ON(4 + sizeof(struct ovpn_nonce_tail) != NONCE_SIZE); + memcpy(dest + 4, nt->u8, sizeof(struct ovpn_nonce_tail)); +} + +void ovpn_pktid_xmit_init(struct ovpn_pktid_xmit *pid); +void ovpn_pktid_recv_init(struct ovpn_pktid_recv *pr); + +int ovpn_pktid_recv(struct ovpn_pktid_recv *pr, u32 pkt_id, u32 pkt_time); + +#endif /* _NET_OVPN_OVPNPKTID_H_ */ diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h index 69604cf26bbf82539ee5cd5a7ac9c23920f555de..32af6b8e574381fb719a1b3b9de3ae1071cc4846 100644 --- a/drivers/net/ovpn/proto.h +++ b/drivers/net/ovpn/proto.h @@ -72,4 +72,35 @@ static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset) return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK; }
+/** + * ovpn_key_id_from_skb - extract key ID from the skb head + * @skb: the packet to extract the key ID code from + * + * Note: this function assumes that the skb head was pulled enough + * to access the first byte. + * + * Return: the key ID + */ +static inline u8 ovpn_key_id_from_skb(const struct sk_buff *skb) +{ + return *skb->data & OVPN_KEY_ID_MASK; +} + +/** + * ovpn_opcode_compose - combine OP code, key ID and peer ID to wire format + * @opcode: the OP code + * @key_id: the key ID + * @peer_id: the peer ID + * + * Return: a 4 bytes integer obtained combining all input values following the + * OpenVPN wire format. This integer can then be written to the packet header. + */ +static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id) +{ + const u8 op = (opcode << OVPN_OPCODE_SHIFT) | + (key_id & OVPN_KEY_ID_MASK); + + return (op << 24) | (peer_id & OVPN_PEER_ID_MASK); +} + #endif /* _NET_OVPN_OVPNPROTO_H_ */ diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h index e070fe6f448c0b7a9631394ebef4554f6348ef44..2a75cef403845e2262f033a78b3fa1369b8c3b5e 100644 --- a/drivers/net/ovpn/skb.h +++ b/drivers/net/ovpn/skb.h @@ -19,6 +19,10 @@
struct ovpn_cb { struct ovpn_peer *peer; + struct ovpn_crypto_key_slot *ks; + struct aead_request *req; + struct scatterlist *sg; + unsigned int payload_offset; };
static inline struct ovpn_cb *ovpn_skb_cb(struct sk_buff *skb)
Byte/packet counters for in-tunnel and transport streams are now initialized and updated as needed.
To be exported via netlink.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/Makefile | 1 + drivers/net/ovpn/crypto_aead.c | 2 ++ drivers/net/ovpn/io.c | 11 ++++++++++ drivers/net/ovpn/peer.c | 2 ++ drivers/net/ovpn/peer.h | 5 +++++ drivers/net/ovpn/skb.h | 1 + drivers/net/ovpn/stats.c | 21 +++++++++++++++++++ drivers/net/ovpn/stats.h | 47 ++++++++++++++++++++++++++++++++++++++++++ 8 files changed, 90 insertions(+)
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index ccdaeced1982c851475657860a005ff2b9dfbd13..d43fda72646bdc7644d9a878b56da0a0e5680c98 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -17,4 +17,5 @@ ovpn-y += netlink-gen.o ovpn-y += peer.o ovpn-y += pktid.o ovpn-y += socket.o +ovpn-y += stats.o ovpn-y += udp.o diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c index f9e3feb297b19868b1084048933796fcc7a47d6e..072bb0881764752520e8e26e18337c1274ce1aa4 100644 --- a/drivers/net/ovpn/crypto_aead.c +++ b/drivers/net/ovpn/crypto_aead.c @@ -48,6 +48,7 @@ int ovpn_aead_encrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, int nfrags, ret; u32 pktid, op;
+ ovpn_skb_cb(skb)->orig_len = skb->len; ovpn_skb_cb(skb)->peer = peer; ovpn_skb_cb(skb)->ks = ks;
@@ -159,6 +160,7 @@ int ovpn_aead_decrypt(struct ovpn_peer *peer, struct ovpn_crypto_key_slot *ks, payload_offset = OVPN_OP_SIZE_V2 + NONCE_WIRE_SIZE + tag_size; payload_len = skb->len - payload_offset;
+ ovpn_skb_cb(skb)->orig_len = skb->len; ovpn_skb_cb(skb)->payload_offset = payload_offset; ovpn_skb_cb(skb)->peer = peer; ovpn_skb_cb(skb)->ks = ks; diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 4c81c4547d35d2a73f680ef1f5d8853ffbd952e0..d56e74660c7be9020b5bdf7971322d41afd436d6 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -12,6 +12,7 @@ #include <linux/skbuff.h> #include <net/gro_cells.h> #include <net/gso.h> +#include <net/ip.h>
#include "ovpnstruct.h" #include "peer.h" @@ -68,6 +69,7 @@ void ovpn_decrypt_post(void *data, int ret) unsigned int payload_offset = 0; struct sk_buff *skb = data; struct ovpn_peer *peer; + unsigned int orig_len; __be16 proto; __be32 *pid;
@@ -80,6 +82,7 @@ void ovpn_decrypt_post(void *data, int ret) payload_offset = ovpn_skb_cb(skb)->payload_offset; ks = ovpn_skb_cb(skb)->ks; peer = ovpn_skb_cb(skb)->peer; + orig_len = ovpn_skb_cb(skb)->orig_len;
/* crypto is done, cleanup skb CB and its members */
@@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
+ /* increment RX stats */ + ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len); + ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len); + ovpn_netdev_write(peer, skb); /* skb is passed to upper layer - don't free it */ skb = NULL; @@ -175,6 +182,7 @@ void ovpn_encrypt_post(void *data, int ret) struct ovpn_crypto_key_slot *ks; struct sk_buff *skb = data; struct ovpn_peer *peer; + unsigned int orig_len;
/* encryption is happening asynchronously. This function will be * called later by the crypto callback with a proper return value @@ -184,6 +192,7 @@ void ovpn_encrypt_post(void *data, int ret)
ks = ovpn_skb_cb(skb)->ks; peer = ovpn_skb_cb(skb)->peer; + orig_len = ovpn_skb_cb(skb)->orig_len;
/* crypto is done, cleanup skb CB and its members */
@@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret) goto err;
skb_mark_not_on_list(skb); + ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len); + ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len);
switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP: diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 98ae7662f1e76811e625dc5f4b4c5c884856fbd6..5025bfb759d6a5f31e3f2ec094fe561fbdb9f451 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -48,6 +48,8 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) ovpn_crypto_state_init(&peer->crypto); spin_lock_init(&peer->lock); kref_init(&peer->refcount); + ovpn_peer_stats_init(&peer->vpn_stats); + ovpn_peer_stats_init(&peer->link_stats);
ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL); if (ret < 0) { diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 754fea470d1b4787f64a931d6c6adc24182fc16f..eb1e31e854fbfff25d07fba8026789e41a76c113 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -13,6 +13,7 @@ #include <net/dst_cache.h>
#include "crypto.h" +#include "stats.h"
/** * struct ovpn_peer - the main remote peer object @@ -26,6 +27,8 @@ * @dst_cache: cache for dst_entry used to send to peer * @bind: remote peer binding * @halt: true if ovpn_peer_mark_delete was called + * @vpn_stats: per-peer in-VPN TX/RX stays + * @link_stats: per-peer link/transport TX/RX stats * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..) * @lock: protects binding to peer (bind) * @refcount: reference counter @@ -44,6 +47,8 @@ struct ovpn_peer { struct dst_cache dst_cache; struct ovpn_bind __rcu *bind; bool halt; + struct ovpn_peer_stats vpn_stats; + struct ovpn_peer_stats link_stats; enum ovpn_del_peer_reason delete_reason; spinlock_t lock; /* protects bind */ struct kref refcount; diff --git a/drivers/net/ovpn/skb.h b/drivers/net/ovpn/skb.h index 2a75cef403845e2262f033a78b3fa1369b8c3b5e..96afa01466ab1a3456d1f3ca0ffd397302460d53 100644 --- a/drivers/net/ovpn/skb.h +++ b/drivers/net/ovpn/skb.h @@ -22,6 +22,7 @@ struct ovpn_cb { struct ovpn_crypto_key_slot *ks; struct aead_request *req; struct scatterlist *sg; + unsigned int orig_len; unsigned int payload_offset; };
diff --git a/drivers/net/ovpn/stats.c b/drivers/net/ovpn/stats.c new file mode 100644 index 0000000000000000000000000000000000000000..a383842c3449b73694c318837b0b92eb9afaec22 --- /dev/null +++ b/drivers/net/ovpn/stats.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/atomic.h> + +#include "stats.h" + +void ovpn_peer_stats_init(struct ovpn_peer_stats *ps) +{ + atomic64_set(&ps->rx.bytes, 0); + atomic64_set(&ps->rx.packets, 0); + + atomic64_set(&ps->tx.bytes, 0); + atomic64_set(&ps->tx.packets, 0); +} diff --git a/drivers/net/ovpn/stats.h b/drivers/net/ovpn/stats.h new file mode 100644 index 0000000000000000000000000000000000000000..868f49d25eaa8fef04a02a61c363d95f9c9ef80a --- /dev/null +++ b/drivers/net/ovpn/stats.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: James Yonan james@openvpn.net + * Antonio Quartulli antonio@openvpn.net + * Lev Stipakov lev@openvpn.net + */ + +#ifndef _NET_OVPN_OVPNSTATS_H_ +#define _NET_OVPN_OVPNSTATS_H_ + +/* one stat */ +struct ovpn_peer_stat { + atomic64_t bytes; + atomic64_t packets; +}; + +/* rx and tx stats combined */ +struct ovpn_peer_stats { + struct ovpn_peer_stat rx; + struct ovpn_peer_stat tx; +}; + +void ovpn_peer_stats_init(struct ovpn_peer_stats *ps); + +static inline void ovpn_peer_stats_increment(struct ovpn_peer_stat *stat, + const unsigned int n) +{ + atomic64_add(n, &stat->bytes); + atomic64_inc(&stat->packets); +} + +static inline void ovpn_peer_stats_increment_rx(struct ovpn_peer_stats *stats, + const unsigned int n) +{ + ovpn_peer_stats_increment(&stats->rx, n); +} + +static inline void ovpn_peer_stats_increment_tx(struct ovpn_peer_stats *stats, + const unsigned int n) +{ + ovpn_peer_stats_increment(&stats->tx, n); +} + +#endif /* _NET_OVPN_OVPNSTATS_H_ */
2024-10-29, 11:47:24 +0100, Antonio Quartulli wrote:
@@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
- /* increment RX stats */
- ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
- ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len);
[I don't know much about the userspace implementation, so maybe this is a silly question]
What's the value of keeping track of 2 separate stats if they are incremented exactly at the same time? Packet count will be the same, and the difference in bytes will be just measuring the encap overhead.
Should one of them be "packets/individual messages that get received over the UDP/TCP link" and the other "packets that get passed up to the stack"?
@@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret) goto err; skb_mark_not_on_list(skb);
- ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len);
- ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len);
switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP:
And on TX maybe something like "packets that the stack wants to send through the tunnel" and "packets that actually make it onto the UDP/TCP socket after encap/encrypt"?
On 31/10/2024 12:37, Sabrina Dubroca wrote:
2024-10-29, 11:47:24 +0100, Antonio Quartulli wrote:
@@ -136,6 +139,10 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
- /* increment RX stats */
- ovpn_peer_stats_increment_rx(&peer->vpn_stats, skb->len);
- ovpn_peer_stats_increment_rx(&peer->link_stats, orig_len);
[I don't know much about the userspace implementation, so maybe this is a silly question]
What's the value of keeping track of 2 separate stats if they are incremented exactly at the same time? Packet count will be the same, and the difference in bytes will be just measuring the encap overhead.
Should one of them be "packets/individual messages that get received over the UDP/TCP link" and the other "packets that get passed up to the stack"?
You're correct: link_stats if "received over the TCP/UDP socket", while vpn_stats if what is passing through the ovpn virtual device.
Packet count may not match though, for example when something happens between "received packet on the link" and "packet passed up to the device" (i.e. decryption error).
This makes me wonder why we increment them at the very same place.... link_stats should be increased upon RX from the socket, while vpn_stats just before delivery. I'll double check.
@@ -197,6 +206,8 @@ void ovpn_encrypt_post(void *data, int ret) goto err; skb_mark_not_on_list(skb);
- ovpn_peer_stats_increment_tx(&peer->link_stats, skb->len);
- ovpn_peer_stats_increment_tx(&peer->vpn_stats, orig_len);
switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP:
And on TX maybe something like "packets that the stack wants to send through the tunnel" and "packets that actually make it onto the UDP/TCP socket after encap/encrypt"?
Correct.
Same issue here. Increments should not happen back to back.
Thanks a lot for spotting these.
Regards,
With this change ovpn is allowed to communicate to peers also via TCP. Parsing of incoming messages is implemented through the strparser API.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/Kconfig | 1 + drivers/net/ovpn/Makefile | 1 + drivers/net/ovpn/io.c | 4 + drivers/net/ovpn/main.c | 3 + drivers/net/ovpn/peer.h | 37 ++++ drivers/net/ovpn/socket.c | 44 +++- drivers/net/ovpn/socket.h | 9 +- drivers/net/ovpn/tcp.c | 506 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/tcp.h | 44 ++++ 9 files changed, 643 insertions(+), 6 deletions(-)
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 269b73fcfd348a48174fb96b8f8d4f8788636fa8..f37ce285e61fbee3201f4095ada3230305df511b 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -118,6 +118,7 @@ config WIREGUARD_DEBUG config OVPN tristate "OpenVPN data channel offload" depends on NET && INET + select STREAM_PARSER select NET_UDP_TUNNEL select DST_CACHE select CRYPTO diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile index d43fda72646bdc7644d9a878b56da0a0e5680c98..f4d4bd87c851c8dd5b81e357315c4b22de4bd092 100644 --- a/drivers/net/ovpn/Makefile +++ b/drivers/net/ovpn/Makefile @@ -18,4 +18,5 @@ ovpn-y += peer.o ovpn-y += pktid.o ovpn-y += socket.o ovpn-y += stats.o +ovpn-y += tcp.o ovpn-y += udp.o diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index d56e74660c7be9020b5bdf7971322d41afd436d6..deda19ab87391f86964ba43088b7847d22420eee 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -22,6 +22,7 @@ #include "crypto_aead.h" #include "netlink.h" #include "proto.h" +#include "tcp.h" #include "udp.h" #include "skb.h" #include "socket.h" @@ -213,6 +214,9 @@ void ovpn_encrypt_post(void *data, int ret) case IPPROTO_UDP: ovpn_udp_send_skb(peer->ovpn, peer, skb); break; + case IPPROTO_TCP: + ovpn_tcp_send_skb(peer, skb); + break; default: /* no transport configured yet */ goto err; diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 73348765a8cf24321aa6be78e75f607d6dbffb1d..0488e395eb27d3dba1efc8ff39c023e0ac4a38dd 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -22,6 +22,7 @@ #include "io.h" #include "packet.h" #include "peer.h" +#include "tcp.h"
/* Driver info */ #define DRV_DESCRIPTION "OpenVPN data channel offload (ovpn)" @@ -237,6 +238,8 @@ static int __init ovpn_init(void) goto unreg_rtnl; }
+ ovpn_tcp_init(); + return 0;
unreg_rtnl: diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index eb1e31e854fbfff25d07fba8026789e41a76c113..2b7fa9510e362ef3646157bb0d361bab19ddaa99 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -11,6 +11,7 @@ #define _NET_OVPN_OVPNPEER_H_
#include <net/dst_cache.h> +#include <net/strparser.h>
#include "crypto.h" #include "stats.h" @@ -23,6 +24,18 @@ * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel * @sock: the socket being used to talk to this peer + * @tcp: keeps track of TCP specific state + * @tcp.strp: stream parser context (TCP only) + * @tcp.tx_work: work for deferring outgoing packet processing (TCP only) + * @tcp.user_queue: received packets that have to go to userspace (TCP only) + * @tcp.tx_in_progress: true if TX is already ongoing (TCP only) + * @tcp.out_msg.skb: packet scheduled for sending (TCP only) + * @tcp.out_msg.offset: offset where next send should start (TCP only) + * @tcp.out_msg.len: remaining data to send within packet (TCP only) + * @tcp.sk_cb.sk_data_ready: pointer to original cb (TCP only) + * @tcp.sk_cb.sk_write_space: pointer to original cb (TCP only) + * @tcp.sk_cb.prot: pointer to original prot object (TCP only) + * @tcp.sk_cb.ops: pointer to the original prot_ops object (TCP only) * @crypto: the crypto configuration (ciphers, keys, etc..) * @dst_cache: cache for dst_entry used to send to peer * @bind: remote peer binding @@ -43,6 +56,30 @@ struct ovpn_peer { struct in6_addr ipv6; } vpn_addrs; struct ovpn_socket *sock; + + /* state of the TCP reading. Needed to keep track of how much of a + * single packet has already been read from the stream and how much is + * missing + */ + struct { + struct strparser strp; + struct work_struct tx_work; + struct sk_buff_head user_queue; + bool tx_in_progress; + + struct { + struct sk_buff *skb; + int offset; + int len; + } out_msg; + + struct { + void (*sk_data_ready)(struct sock *sk); + void (*sk_write_space)(struct sock *sk); + struct proto *prot; + const struct proto_ops *ops; + } sk_cb; + } tcp; struct ovpn_crypto_state crypto; struct dst_cache dst_cache; struct ovpn_bind __rcu *bind; diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c index 964b566de69f4132806a969a455cec7f6059a0bd..a0c2a02ff20541ecef48b6b0ecc40d558d0e3e7b 100644 --- a/drivers/net/ovpn/socket.c +++ b/drivers/net/ovpn/socket.c @@ -15,6 +15,7 @@ #include "io.h" #include "peer.h" #include "socket.h" +#include "tcp.h" #include "udp.h"
static void ovpn_socket_detach(struct socket *sock) @@ -24,10 +25,26 @@ static void ovpn_socket_detach(struct socket *sock)
if (sock->sk->sk_protocol == IPPROTO_UDP) ovpn_udp_socket_detach(sock); + else if (sock->sk->sk_protocol == IPPROTO_TCP) + ovpn_tcp_socket_detach(sock);
sockfd_put(sock); }
+static void ovpn_socket_release_work(struct work_struct *work) +{ + struct ovpn_socket *sock = container_of(work, struct ovpn_socket, work); + + ovpn_socket_detach(sock->sock); + kfree_rcu(sock, rcu); +} + +static void ovpn_socket_schedule_release(struct ovpn_socket *sock) +{ + INIT_WORK(&sock->work, ovpn_socket_release_work); + schedule_work(&sock->work); +} + /** * ovpn_socket_release_kref - kref_put callback * @kref: the kref object @@ -37,8 +54,7 @@ void ovpn_socket_release_kref(struct kref *kref) struct ovpn_socket *sock = container_of(kref, struct ovpn_socket, refcount);
- ovpn_socket_detach(sock->sock); - kfree_rcu(sock, rcu); + ovpn_socket_schedule_release(sock); }
static bool ovpn_socket_hold(struct ovpn_socket *sock) @@ -70,6 +86,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
if (sock->sk->sk_protocol == IPPROTO_UDP) ret = ovpn_udp_socket_attach(sock, peer->ovpn); + else if (sock->sk->sk_protocol == IPPROTO_TCP) + ret = ovpn_tcp_socket_attach(sock, peer);
return ret; } @@ -131,14 +149,30 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer) }
ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL); - if (!ovpn_sock) - return ERR_PTR(-ENOMEM); + if (!ovpn_sock) { + ret = -ENOMEM; + goto err; + }
- ovpn_sock->ovpn = peer->ovpn; ovpn_sock->sock = sock; kref_init(&ovpn_sock->refcount);
+ /* TCP sockets are per-peer, therefore they are linked to their unique + * peer + */ + if (sock->sk->sk_protocol == IPPROTO_TCP) { + ovpn_sock->peer = peer; + } else { + /* in UDP we only link the ovpn instance since the socket is + * shared among multiple peers + */ + ovpn_sock->ovpn = peer->ovpn; + } + rcu_assign_sk_user_data(sock->sk, ovpn_sock);
return ovpn_sock; +err: + ovpn_socket_detach(sock); + return ERR_PTR(ret); } diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h index 5ad9c5073b085482da95ee8ebf40acf20bf2e4b3..bc22fff453ad8726f647a47f98ffc2219fba7b82 100644 --- a/drivers/net/ovpn/socket.h +++ b/drivers/net/ovpn/socket.h @@ -20,14 +20,21 @@ struct ovpn_peer; /** * struct ovpn_socket - a kernel socket referenced in the ovpn code * @ovpn: ovpn instance owning this socket (UDP only) + * @peer: unique peer transmitting over this socket (TCP only) * @sock: the low level sock object * @refcount: amount of contexts currently referencing this object + * @work: member used to schedule release routine (it may block) * @rcu: member used to schedule RCU destructor callback */ struct ovpn_socket { - struct ovpn_struct *ovpn; + union { + struct ovpn_struct *ovpn; + struct ovpn_peer *peer; + }; + struct socket *sock; struct kref refcount; + struct work_struct work; struct rcu_head rcu; };
diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c new file mode 100644 index 0000000000000000000000000000000000000000..d6f377a116ef029d217bdc76304f75c3d1fb062c --- /dev/null +++ b/drivers/net/ovpn/tcp.c @@ -0,0 +1,506 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + */ + +#include <linux/skbuff.h> +#include <net/hotdata.h> +#include <net/inet_common.h> +#include <net/tcp.h> +#include <net/route.h> +#include <trace/events/sock.h> + +#include "ovpnstruct.h" +#include "main.h" +#include "io.h" +#include "packet.h" +#include "peer.h" +#include "proto.h" +#include "skb.h" +#include "tcp.h" + +static struct proto ovpn_tcp_prot __ro_after_init; +static struct proto_ops ovpn_tcp_ops __ro_after_init; +static struct proto ovpn_tcp6_prot; +static struct proto_ops ovpn_tcp6_ops; +static DEFINE_MUTEX(tcp6_prot_mutex); + +static int ovpn_tcp_parse(struct strparser *strp, struct sk_buff *skb) +{ + struct strp_msg *rxm = strp_msg(skb); + __be16 blen; + u16 len; + int err; + + /* when packets are written to the TCP stream, they are prepended with + * two bytes indicating the actual packet size. + * Here we read those two bytes and move the skb data pointer to the + * beginning of the packet + */ + + if (skb->len < rxm->offset + 2) + return 0; + + err = skb_copy_bits(skb, rxm->offset, &blen, sizeof(blen)); + if (err < 0) + return err; + + len = be16_to_cpu(blen); + if (len < 2) + return -EINVAL; + + return len + 2; +} + +/* queue skb for sending to userspace via recvmsg on the socket */ +static void ovpn_tcp_to_userspace(struct ovpn_peer *peer, struct sock *sk, + struct sk_buff *skb) +{ + skb_set_owner_r(skb, sk); + memset(skb->cb, 0, sizeof(skb->cb)); + skb_queue_tail(&peer->tcp.user_queue, skb); + peer->tcp.sk_cb.sk_data_ready(sk); +} + +static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb) +{ + struct ovpn_peer *peer = container_of(strp, struct ovpn_peer, tcp.strp); + struct strp_msg *msg = strp_msg(skb); + size_t pkt_len = msg->full_len - 2; + size_t off = msg->offset + 2; + + /* ensure skb->data points to the beginning of the openvpn packet */ + if (!pskb_pull(skb, off)) { + net_warn_ratelimited("%s: packet too small\n", + peer->ovpn->dev->name); + goto err; + } + + /* strparser does not trim the skb for us, therefore we do it now */ + if (pskb_trim(skb, pkt_len) != 0) { + net_warn_ratelimited("%s: trimming skb failed\n", + peer->ovpn->dev->name); + goto err; + } + + /* we need the first byte of data to be accessible + * to extract the opcode and the key ID later on + */ + if (!pskb_may_pull(skb, 1)) { + net_warn_ratelimited("%s: packet too small to fetch opcode\n", + peer->ovpn->dev->name); + goto err; + } + + /* DATA_V2 packets are handled in kernel, the rest goes to user space */ + if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) { + /* hold reference to peer as required by ovpn_recv(). + * + * NOTE: in this context we should already be holding a + * reference to this peer, therefore ovpn_peer_hold() is + * not expected to fail + */ + if (WARN_ON(!ovpn_peer_hold(peer))) + goto err; + + ovpn_recv(peer, skb); + } else { + /* The packet size header must be there when sending the packet + * to userspace, therefore we put it back + */ + skb_push(skb, 2); + ovpn_tcp_to_userspace(peer, strp->sk, skb); + } + + return; +err: + netdev_err(peer->ovpn->dev, + "cannot process incoming TCP data for peer %u\n", peer->id); + dev_core_stats_rx_dropped_inc(peer->ovpn->dev); + kfree_skb(skb); + ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR); +} + +static int ovpn_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int flags, int *addr_len) +{ + int err = 0, off, copied = 0, ret; + struct ovpn_socket *sock; + struct ovpn_peer *peer; + struct sk_buff *skb; + + rcu_read_lock(); + sock = rcu_dereference_sk_user_data(sk); + if (!sock || !sock->peer) { + rcu_read_unlock(); + return -EBADF; + } + /* we take a reference to the peer linked to this TCP socket, because + * in turn the peer holds a reference to the socket itself. + * By doing so we also ensure that the peer stays alive along with + * the socket while executing this function + */ + ovpn_peer_hold(sock->peer); + peer = sock->peer; + rcu_read_unlock(); + + skb = __skb_recv_datagram(sk, &peer->tcp.user_queue, flags, &off, &err); + if (!skb) { + if (err == -EAGAIN && sk->sk_shutdown & RCV_SHUTDOWN) { + ret = 0; + goto out; + } + ret = err; + goto out; + } + + copied = len; + if (copied > skb->len) + copied = skb->len; + else if (copied < skb->len) + msg->msg_flags |= MSG_TRUNC; + + err = skb_copy_datagram_msg(skb, 0, msg, copied); + if (unlikely(err)) { + kfree_skb(skb); + ret = err; + goto out; + } + + if (flags & MSG_TRUNC) + copied = skb->len; + kfree_skb(skb); + ret = copied; +out: + ovpn_peer_put(peer); + return ret; +} + +void ovpn_tcp_socket_detach(struct socket *sock) +{ + struct ovpn_socket *ovpn_sock; + struct ovpn_peer *peer; + + if (!sock) + return; + + rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + + if (!ovpn_sock->peer) { + rcu_read_unlock(); + return; + } + + peer = ovpn_sock->peer; + strp_stop(&peer->tcp.strp); + + skb_queue_purge(&peer->tcp.user_queue); + + /* restore CBs that were saved in ovpn_sock_set_tcp_cb() */ + sock->sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready; + sock->sk->sk_write_space = peer->tcp.sk_cb.sk_write_space; + sock->sk->sk_prot = peer->tcp.sk_cb.prot; + sock->sk->sk_socket->ops = peer->tcp.sk_cb.ops; + rcu_assign_sk_user_data(sock->sk, NULL); + + rcu_read_unlock(); + + /* cancel any ongoing work. Done after removing the CBs so that these + * workers cannot be re-armed + */ + cancel_work_sync(&peer->tcp.tx_work); + strp_done(&peer->tcp.strp); +} + +static void ovpn_tcp_send_sock(struct ovpn_peer *peer) +{ + struct sk_buff *skb = peer->tcp.out_msg.skb; + + if (!skb) + return; + + if (peer->tcp.tx_in_progress) + return; + + peer->tcp.tx_in_progress = true; + + do { + int ret = skb_send_sock_locked(peer->sock->sock->sk, skb, + peer->tcp.out_msg.offset, + peer->tcp.out_msg.len); + if (unlikely(ret < 0)) { + if (ret == -EAGAIN) + goto out; + + net_warn_ratelimited("%s: TCP error to peer %u: %d\n", + peer->ovpn->dev->name, peer->id, + ret); + + /* in case of TCP error we can't recover the VPN + * stream therefore we abort the connection + */ + ovpn_peer_del(peer, + OVPN_DEL_PEER_REASON_TRANSPORT_ERROR); + break; + } + + peer->tcp.out_msg.len -= ret; + peer->tcp.out_msg.offset += ret; + } while (peer->tcp.out_msg.len > 0); + + if (!peer->tcp.out_msg.len) + dev_sw_netstats_tx_add(peer->ovpn->dev, 1, skb->len); + + kfree_skb(peer->tcp.out_msg.skb); + peer->tcp.out_msg.skb = NULL; + peer->tcp.out_msg.len = 0; + peer->tcp.out_msg.offset = 0; + +out: + peer->tcp.tx_in_progress = false; +} + +static void ovpn_tcp_tx_work(struct work_struct *work) +{ + struct ovpn_peer *peer; + + peer = container_of(work, struct ovpn_peer, tcp.tx_work); + + lock_sock(peer->sock->sock->sk); + ovpn_tcp_send_sock(peer); + release_sock(peer->sock->sock->sk); +} + +void ovpn_tcp_send_sock_skb(struct ovpn_peer *peer, struct sk_buff *skb) +{ + if (peer->tcp.out_msg.skb) + return; + + peer->tcp.out_msg.skb = skb; + peer->tcp.out_msg.len = skb->len; + peer->tcp.out_msg.offset = 0; + + ovpn_tcp_send_sock(peer); +} + +static int ovpn_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) +{ + struct ovpn_socket *sock; + int ret, linear = PAGE_SIZE; + struct ovpn_peer *peer; + struct sk_buff *skb; + + rcu_read_lock(); + sock = rcu_dereference_sk_user_data(sk); + peer = sock->peer; + if (unlikely(!ovpn_peer_hold(peer))) { + rcu_read_unlock(); + return -EIO; + } + rcu_read_unlock(); + + if (msg->msg_flags & ~MSG_DONTWAIT) { + ret = -EOPNOTSUPP; + goto peer_free; + } + + lock_sock(sk); + + if (peer->tcp.out_msg.skb) { + ret = -EAGAIN; + goto unlock; + } + + if (size < linear) + linear = size; + + skb = sock_alloc_send_pskb(sk, linear, size - linear, + msg->msg_flags & MSG_DONTWAIT, &ret, 0); + if (!skb) { + net_err_ratelimited("%s: skb alloc failed: %d\n", + sock->peer->ovpn->dev->name, ret); + goto unlock; + } + + skb_put(skb, linear); + skb->len = size; + skb->data_len = size - linear; + + ret = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size); + if (ret) { + kfree_skb(skb); + net_err_ratelimited("%s: skb copy from iter failed: %d\n", + sock->peer->ovpn->dev->name, ret); + goto unlock; + } + + ovpn_tcp_send_sock_skb(sock->peer, skb); + ret = size; +unlock: + release_sock(sk); +peer_free: + ovpn_peer_put(peer); + return ret; +} + +static void ovpn_tcp_data_ready(struct sock *sk) +{ + struct ovpn_socket *sock; + + trace_sk_data_ready(sk); + + rcu_read_lock(); + sock = rcu_dereference_sk_user_data(sk); + strp_data_ready(&sock->peer->tcp.strp); + rcu_read_unlock(); +} + +static void ovpn_tcp_write_space(struct sock *sk) +{ + struct ovpn_socket *sock; + + rcu_read_lock(); + sock = rcu_dereference_sk_user_data(sk); + schedule_work(&sock->peer->tcp.tx_work); + sock->peer->tcp.sk_cb.sk_write_space(sk); + rcu_read_unlock(); +} + +static void ovpn_tcp_build_protos(struct proto *new_prot, + struct proto_ops *new_ops, + const struct proto *orig_prot, + const struct proto_ops *orig_ops); + +/* Set TCP encapsulation callbacks */ +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer) +{ + struct strp_callbacks cb = { + .rcv_msg = ovpn_tcp_rcv, + .parse_msg = ovpn_tcp_parse, + }; + int ret; + + /* make sure no pre-existing encapsulation handler exists */ + if (sock->sk->sk_user_data) + return -EBUSY; + + /* sanity check */ + if (sock->sk->sk_protocol != IPPROTO_TCP) { + netdev_err(peer->ovpn->dev, + "provided socket is not TCP as expected\n"); + return -EINVAL; + } + + /* only a fully connected socket are expected. Connection should be + * handled in userspace + */ + if (sock->sk->sk_state != TCP_ESTABLISHED) { + netdev_err(peer->ovpn->dev, + "provided TCP socket is not in ESTABLISHED state: %d\n", + sock->sk->sk_state); + return -EINVAL; + } + + lock_sock(sock->sk); + + ret = strp_init(&peer->tcp.strp, sock->sk, &cb); + if (ret < 0) { + DEBUG_NET_WARN_ON_ONCE(1); + release_sock(sock->sk); + return ret; + } + + INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work); + __sk_dst_reset(sock->sk); + skb_queue_head_init(&peer->tcp.user_queue); + + /* save current CBs so that they can be restored upon socket release */ + peer->tcp.sk_cb.sk_data_ready = sock->sk->sk_data_ready; + peer->tcp.sk_cb.sk_write_space = sock->sk->sk_write_space; + peer->tcp.sk_cb.prot = sock->sk->sk_prot; + peer->tcp.sk_cb.ops = sock->sk->sk_socket->ops; + + /* assign our static CBs and prot/ops */ + sock->sk->sk_data_ready = ovpn_tcp_data_ready; + sock->sk->sk_write_space = ovpn_tcp_write_space; + + if (sock->sk->sk_family == AF_INET) { + sock->sk->sk_prot = &ovpn_tcp_prot; + sock->sk->sk_socket->ops = &ovpn_tcp_ops; + } else { + mutex_lock(&tcp6_prot_mutex); + if (!ovpn_tcp6_prot.recvmsg) + ovpn_tcp_build_protos(&ovpn_tcp6_prot, &ovpn_tcp6_ops, + sock->sk->sk_prot, + sock->sk->sk_socket->ops); + mutex_unlock(&tcp6_prot_mutex); + + sock->sk->sk_prot = &ovpn_tcp6_prot; + sock->sk->sk_socket->ops = &ovpn_tcp6_ops; + } + + /* avoid using task_frag */ + sock->sk->sk_allocation = GFP_ATOMIC; + sock->sk->sk_use_task_frag = false; + + /* enqueue the RX worker */ + strp_check_rcv(&peer->tcp.strp); + + release_sock(sock->sk); + return 0; +} + +static void ovpn_tcp_close(struct sock *sk, long timeout) +{ + struct ovpn_socket *sock; + + rcu_read_lock(); + sock = rcu_dereference_sk_user_data(sk); + + strp_stop(&sock->peer->tcp.strp); + barrier(); + + tcp_close(sk, timeout); + + ovpn_peer_del(sock->peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR); + rcu_read_unlock(); +} + +static __poll_t ovpn_tcp_poll(struct file *file, struct socket *sock, + poll_table *wait) +{ + __poll_t mask = datagram_poll(file, sock, wait); + struct ovpn_socket *ovpn_sock; + + rcu_read_lock(); + ovpn_sock = rcu_dereference_sk_user_data(sock->sk); + if (!skb_queue_empty(&ovpn_sock->peer->tcp.user_queue)) + mask |= EPOLLIN | EPOLLRDNORM; + rcu_read_unlock(); + + return mask; +} + +static void ovpn_tcp_build_protos(struct proto *new_prot, + struct proto_ops *new_ops, + const struct proto *orig_prot, + const struct proto_ops *orig_ops) +{ + memcpy(new_prot, orig_prot, sizeof(*new_prot)); + memcpy(new_ops, orig_ops, sizeof(*new_ops)); + new_prot->recvmsg = ovpn_tcp_recvmsg; + new_prot->sendmsg = ovpn_tcp_sendmsg; + new_prot->close = ovpn_tcp_close; + new_ops->poll = ovpn_tcp_poll; +} + +/* Initialize TCP static objects */ +void __init ovpn_tcp_init(void) +{ + ovpn_tcp_build_protos(&ovpn_tcp_prot, &ovpn_tcp_ops, &tcp_prot, + &inet_stream_ops); +} diff --git a/drivers/net/ovpn/tcp.h b/drivers/net/ovpn/tcp.h new file mode 100644 index 0000000000000000000000000000000000000000..fb2cd0b606b4d21114b2729c6a34212f9920c3d1 --- /dev/null +++ b/drivers/net/ovpn/tcp.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* OpenVPN data channel offload + * + * Copyright (C) 2019-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + */ + +#ifndef _NET_OVPN_TCP_H_ +#define _NET_OVPN_TCP_H_ + +#include <linux/net.h> +#include <linux/skbuff.h> +#include <linux/types.h> + +#include "peer.h" +#include "skb.h" +#include "socket.h" + +void __init ovpn_tcp_init(void); + +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer); +void ovpn_tcp_socket_detach(struct socket *sock); +void ovpn_tcp_send_sock_skb(struct ovpn_peer *peer, struct sk_buff *skb); + +/* Prepare skb and enqueue it for sending to peer. + * + * Preparation consist in prepending the skb payload with its size. + * Required by the OpenVPN protocol in order to extract packets from + * the TCP stream on the receiver side. + */ +static inline void ovpn_tcp_send_skb(struct ovpn_peer *peer, + struct sk_buff *skb) +{ + u16 len = skb->len; + + *(__be16 *)__skb_push(skb, sizeof(u16)) = htons(len); + + bh_lock_sock(peer->sock->sock->sk); + ovpn_tcp_send_sock_skb(peer, skb); + bh_unlock_sock(peer->sock->sock->sk); +} + +#endif /* _NET_OVPN_TCP_H_ */
On 29/10/2024 11:47, Antonio Quartulli wrote: [...]
- /* DATA_V2 packets are handled in kernel, the rest goes to user space */
- if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) {
/* hold reference to peer as required by ovpn_recv().
*
* NOTE: in this context we should already be holding a
* reference to this peer, therefore ovpn_peer_hold() is
* not expected to fail
*/
if (WARN_ON(!ovpn_peer_hold(peer)))
goto err;
ovpn_recv(peer, skb);
- } else {
As pointed out by Sabrina, we are indeed sending DATA_V1 packets to userspace. Not a big deal because userspace will likely ignore or drop them.
However, I will change this and mirror what we do for UDP.
Thanks.
Regards,
/* The packet size header must be there when sending the packet
* to userspace, therefore we put it back
*/
skb_push(skb, 2);
ovpn_tcp_to_userspace(peer, strp->sk, skb);
- }
- return;
2024-10-29, 11:47:25 +0100, Antonio Quartulli wrote:
+static void ovpn_socket_release_work(struct work_struct *work) +{
- struct ovpn_socket *sock = container_of(work, struct ovpn_socket, work);
- ovpn_socket_detach(sock->sock);
- kfree_rcu(sock, rcu);
+}
+static void ovpn_socket_schedule_release(struct ovpn_socket *sock) +{
- INIT_WORK(&sock->work, ovpn_socket_release_work);
- schedule_work(&sock->work);
How does module unloading know that it has to wait for this work to complete? Will ovpn_cleanup get stuck until some refcount gets released by this work?
[...]
+static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb) +{
- struct ovpn_peer *peer = container_of(strp, struct ovpn_peer, tcp.strp);
- struct strp_msg *msg = strp_msg(skb);
- size_t pkt_len = msg->full_len - 2;
- size_t off = msg->offset + 2;
- /* ensure skb->data points to the beginning of the openvpn packet */
- if (!pskb_pull(skb, off)) {
net_warn_ratelimited("%s: packet too small\n",
peer->ovpn->dev->name);
goto err;
- }
- /* strparser does not trim the skb for us, therefore we do it now */
- if (pskb_trim(skb, pkt_len) != 0) {
net_warn_ratelimited("%s: trimming skb failed\n",
peer->ovpn->dev->name);
goto err;
- }
- /* we need the first byte of data to be accessible
* to extract the opcode and the key ID later on
*/
- if (!pskb_may_pull(skb, 1)) {
net_warn_ratelimited("%s: packet too small to fetch opcode\n",
peer->ovpn->dev->name);
goto err;
- }
- /* DATA_V2 packets are handled in kernel, the rest goes to user space */
- if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) {
/* hold reference to peer as required by ovpn_recv().
*
* NOTE: in this context we should already be holding a
* reference to this peer, therefore ovpn_peer_hold() is
* not expected to fail
*/
if (WARN_ON(!ovpn_peer_hold(peer)))
goto err;
ovpn_recv(peer, skb);
- } else {
/* The packet size header must be there when sending the packet
* to userspace, therefore we put it back
*/
skb_push(skb, 2);
ovpn_tcp_to_userspace(peer, strp->sk, skb);
- }
- return;
+err:
- netdev_err(peer->ovpn->dev,
"cannot process incoming TCP data for peer %u\n", peer->id);
This should also be ratelimited, and maybe just combined with the net_warn_ratelimited just before each goto.
- dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
- kfree_skb(skb);
- ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
+}
[...]
+void ovpn_tcp_socket_detach(struct socket *sock) +{
[...]
- /* restore CBs that were saved in ovpn_sock_set_tcp_cb() */
- sock->sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready;
- sock->sk->sk_write_space = peer->tcp.sk_cb.sk_write_space;
- sock->sk->sk_prot = peer->tcp.sk_cb.prot;
- sock->sk->sk_socket->ops = peer->tcp.sk_cb.ops;
- rcu_assign_sk_user_data(sock->sk, NULL);
- rcu_read_unlock();
- /* cancel any ongoing work. Done after removing the CBs so that these
* workers cannot be re-armed
*/
I'm not sure whether a barrier is needed to prevent compiler/CPU reordering here.
- cancel_work_sync(&peer->tcp.tx_work);
- strp_done(&peer->tcp.strp);
+}
+static void ovpn_tcp_send_sock(struct ovpn_peer *peer) +{
- struct sk_buff *skb = peer->tcp.out_msg.skb;
- if (!skb)
return;
- if (peer->tcp.tx_in_progress)
return;
- peer->tcp.tx_in_progress = true;
Sorry, I never answered your question about my concerns in a previous review here.
We can reach ovpn_tcp_send_sock in two different contexts: - lock_sock (either from ovpn_tcp_sendmsg or ovpn_tcp_tx_work) - bh_lock_sock (from ovpn_tcp_send_skb, ie "data path")
These are not fully mutually exclusive. lock_sock grabs bh_lock_sock (a spinlock) for a brief period to mark the (sleeping/mutex) lock as taken, and then releases it.
So when bh_lock_sock is held, it's not possible to grab lock_sock. But when lock_sock is taken, it's still possible to grab bh_lock_sock.
The buggy scenario would be:
(data path encrypt) (sendmsg) ovpn_tcp_send_skb lock_sock bh_lock_sock + owned=1 + bh_unlock_sock bh_lock_sock ovpn_tcp_send_sock_skb ovpn_tcp_send_sock_skb !peer->tcp.out_msg.skb !peer->tcp.out_msg.skb peer->tcp.out_msg.skb = ... peer->tcp.out_msg.skb = ... ovpn_tcp_send_sock ovpn_tcp_send_sock !peer->tcp.tx_in_progress !peer->tcp.tx_in_progress peer->tcp.tx_in_progress = true peer->tcp.tx_in_progress = true // proceed // proceed
That's 2 similar races, one on out_msg.skb and one on tx_in_progress. It's a bit unlikely (but not impossible) that we'll have 2 cpus trying to call skb_send_sock_locked at the same time, but if they just overwrite each other's skb/len it's already pretty bad. The end of ovpn_tcp_send_sock might also reset peer->tcp.out_msg.* just as ovpn_tcp_send_skb -> ovpn_tcp_send_sock_skb starts setting it up (peer->tcp.out_msg.skb gets cleared, ovpn_tcp_send_sock_skb proceeds and sets skb+len, then maybe len gets reset to 0 by ovpn_tcp_send_sock).
To avoid this problem, esp_output_tcp_finish (net/ipv4/esp4.c) does:
bh_lock_sock(sk); if (sock_owned_by_user(sk)) err = espintcp_queue_out(sk, skb); else err = espintcp_push_skb(sk, skb); bh_unlock_sock(sk);
(espintcp_push_skb is roughly equivalent to ovpn_tcp_send_sock_skb)
- do {
int ret = skb_send_sock_locked(peer->sock->sock->sk, skb,
peer->tcp.out_msg.offset,
peer->tcp.out_msg.len);
if (unlikely(ret < 0)) {
if (ret == -EAGAIN)
goto out;
net_warn_ratelimited("%s: TCP error to peer %u: %d\n",
peer->ovpn->dev->name, peer->id,
ret);
/* in case of TCP error we can't recover the VPN
* stream therefore we abort the connection
*/
ovpn_peer_del(peer,
OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
break;
}
peer->tcp.out_msg.len -= ret;
peer->tcp.out_msg.offset += ret;
- } while (peer->tcp.out_msg.len > 0);
- if (!peer->tcp.out_msg.len)
dev_sw_netstats_tx_add(peer->ovpn->dev, 1, skb->len);
- kfree_skb(peer->tcp.out_msg.skb);
- peer->tcp.out_msg.skb = NULL;
- peer->tcp.out_msg.len = 0;
- peer->tcp.out_msg.offset = 0;
+out:
- peer->tcp.tx_in_progress = false;
+}
+static void ovpn_tcp_tx_work(struct work_struct *work) +{
- struct ovpn_peer *peer;
- peer = container_of(work, struct ovpn_peer, tcp.tx_work);
- lock_sock(peer->sock->sock->sk);
- ovpn_tcp_send_sock(peer);
- release_sock(peer->sock->sock->sk);
+}
+void ovpn_tcp_send_sock_skb(struct ovpn_peer *peer, struct sk_buff *skb) +{
- if (peer->tcp.out_msg.skb)
return;
That's leaking the skb? (and not counting the drop)
- peer->tcp.out_msg.skb = skb;
- peer->tcp.out_msg.len = skb->len;
- peer->tcp.out_msg.offset = 0;
- ovpn_tcp_send_sock(peer);
+}
+static int ovpn_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) +{
[...]
- ret = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
- if (ret) {
kfree_skb(skb);
net_err_ratelimited("%s: skb copy from iter failed: %d\n",
sock->peer->ovpn->dev->name, ret);
goto unlock;
- }
- ovpn_tcp_send_sock_skb(sock->peer, skb);
If we didn't send the packet (because one was already queued/in progress), we should either stash it, or tell userspace that it wasn't sent and it should retry later.
- ret = size;
+unlock:
- release_sock(sk);
+peer_free:
- ovpn_peer_put(peer);
- return ret;
+}
On 31/10/2024 16:25, Sabrina Dubroca wrote:
2024-10-29, 11:47:25 +0100, Antonio Quartulli wrote:
+static void ovpn_socket_release_work(struct work_struct *work) +{
- struct ovpn_socket *sock = container_of(work, struct ovpn_socket, work);
- ovpn_socket_detach(sock->sock);
- kfree_rcu(sock, rcu);
+}
+static void ovpn_socket_schedule_release(struct ovpn_socket *sock) +{
- INIT_WORK(&sock->work, ovpn_socket_release_work);
- schedule_work(&sock->work);
How does module unloading know that it has to wait for this work to complete? Will ovpn_cleanup get stuck until some refcount gets released by this work?
No, we have no such mechanism. Any idea how other modules do?
Actually this makes me wonder how module unloading coordinates with the code being executed. Unload may happen at any time - how do we prevent killing the code in the middle of something (regardless of scheduled workers)?
[...]
+static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb) +{
- struct ovpn_peer *peer = container_of(strp, struct ovpn_peer, tcp.strp);
- struct strp_msg *msg = strp_msg(skb);
- size_t pkt_len = msg->full_len - 2;
- size_t off = msg->offset + 2;
- /* ensure skb->data points to the beginning of the openvpn packet */
- if (!pskb_pull(skb, off)) {
net_warn_ratelimited("%s: packet too small\n",
peer->ovpn->dev->name);
goto err;
- }
- /* strparser does not trim the skb for us, therefore we do it now */
- if (pskb_trim(skb, pkt_len) != 0) {
net_warn_ratelimited("%s: trimming skb failed\n",
peer->ovpn->dev->name);
goto err;
- }
- /* we need the first byte of data to be accessible
* to extract the opcode and the key ID later on
*/
- if (!pskb_may_pull(skb, 1)) {
net_warn_ratelimited("%s: packet too small to fetch opcode\n",
peer->ovpn->dev->name);
goto err;
- }
- /* DATA_V2 packets are handled in kernel, the rest goes to user space */
- if (likely(ovpn_opcode_from_skb(skb, 0) == OVPN_DATA_V2)) {
/* hold reference to peer as required by ovpn_recv().
*
* NOTE: in this context we should already be holding a
* reference to this peer, therefore ovpn_peer_hold() is
* not expected to fail
*/
if (WARN_ON(!ovpn_peer_hold(peer)))
goto err;
ovpn_recv(peer, skb);
- } else {
/* The packet size header must be there when sending the packet
* to userspace, therefore we put it back
*/
skb_push(skb, 2);
ovpn_tcp_to_userspace(peer, strp->sk, skb);
- }
- return;
+err:
- netdev_err(peer->ovpn->dev,
"cannot process incoming TCP data for peer %u\n", peer->id);
This should also be ratelimited, and maybe just combined with the net_warn_ratelimited just before each goto.
ACK.
- dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
- kfree_skb(skb);
- ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
+}
[...]
+void ovpn_tcp_socket_detach(struct socket *sock) +{
[...]
- /* restore CBs that were saved in ovpn_sock_set_tcp_cb() */
- sock->sk->sk_data_ready = peer->tcp.sk_cb.sk_data_ready;
- sock->sk->sk_write_space = peer->tcp.sk_cb.sk_write_space;
- sock->sk->sk_prot = peer->tcp.sk_cb.prot;
- sock->sk->sk_socket->ops = peer->tcp.sk_cb.ops;
- rcu_assign_sk_user_data(sock->sk, NULL);
- rcu_read_unlock();
- /* cancel any ongoing work. Done after removing the CBs so that these
* workers cannot be re-armed
*/
I'm not sure whether a barrier is needed to prevent compiler/CPU reordering here.
I see ipsec has one in espintcp.c. I think it makes sense to add it right here.
- cancel_work_sync(&peer->tcp.tx_work);
- strp_done(&peer->tcp.strp);
+}
+static void ovpn_tcp_send_sock(struct ovpn_peer *peer) +{
- struct sk_buff *skb = peer->tcp.out_msg.skb;
- if (!skb)
return;
- if (peer->tcp.tx_in_progress)
return;
- peer->tcp.tx_in_progress = true;
Sorry, I never answered your question about my concerns in a previous review here.
We can reach ovpn_tcp_send_sock in two different contexts:
- lock_sock (either from ovpn_tcp_sendmsg or ovpn_tcp_tx_work)
- bh_lock_sock (from ovpn_tcp_send_skb, ie "data path")
These are not fully mutually exclusive. lock_sock grabs bh_lock_sock (a spinlock) for a brief period to mark the (sleeping/mutex) lock as taken, and then releases it.
So when bh_lock_sock is held, it's not possible to grab lock_sock. But when lock_sock is taken, it's still possible to grab bh_lock_sock.
The buggy scenario would be:
(data path encrypt) (sendmsg) ovpn_tcp_send_skb lock_sock bh_lock_sock + owned=1 + bh_unlock_sock bh_lock_sock ovpn_tcp_send_sock_skb ovpn_tcp_send_sock_skb !peer->tcp.out_msg.skb !peer->tcp.out_msg.skb peer->tcp.out_msg.skb = ... peer->tcp.out_msg.skb = ... ovpn_tcp_send_sock ovpn_tcp_send_sock !peer->tcp.tx_in_progress !peer->tcp.tx_in_progress peer->tcp.tx_in_progress = true peer->tcp.tx_in_progress = true // proceed // proceed
That's 2 similar races, one on out_msg.skb and one on tx_in_progress. It's a bit unlikely (but not impossible) that we'll have 2 cpus trying to call skb_send_sock_locked at the same time, but if they just overwrite each other's skb/len it's already pretty bad. The end of ovpn_tcp_send_sock might also reset peer->tcp.out_msg.* just as ovpn_tcp_send_skb -> ovpn_tcp_send_sock_skb starts setting it up (peer->tcp.out_msg.skb gets cleared, ovpn_tcp_send_sock_skb proceeds and sets skb+len, then maybe len gets reset to 0 by ovpn_tcp_send_sock).
To avoid this problem, esp_output_tcp_finish (net/ipv4/esp4.c) does:
bh_lock_sock(sk); if (sock_owned_by_user(sk)) err = espintcp_queue_out(sk, skb); else err = espintcp_push_skb(sk, skb); bh_unlock_sock(sk);
(espintcp_push_skb is roughly equivalent to ovpn_tcp_send_sock_skb)
mh I see...so basically while sendmsg is running we should stash all packets and send them out in one go upon lock release (via release_cb).
Will get that done in v12! Thanks!
- do {
int ret = skb_send_sock_locked(peer->sock->sock->sk, skb,
peer->tcp.out_msg.offset,
peer->tcp.out_msg.len);
if (unlikely(ret < 0)) {
if (ret == -EAGAIN)
goto out;
net_warn_ratelimited("%s: TCP error to peer %u: %d\n",
peer->ovpn->dev->name, peer->id,
ret);
/* in case of TCP error we can't recover the VPN
* stream therefore we abort the connection
*/
ovpn_peer_del(peer,
OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
break;
}
peer->tcp.out_msg.len -= ret;
peer->tcp.out_msg.offset += ret;
- } while (peer->tcp.out_msg.len > 0);
- if (!peer->tcp.out_msg.len)
dev_sw_netstats_tx_add(peer->ovpn->dev, 1, skb->len);
- kfree_skb(peer->tcp.out_msg.skb);
- peer->tcp.out_msg.skb = NULL;
- peer->tcp.out_msg.len = 0;
- peer->tcp.out_msg.offset = 0;
+out:
- peer->tcp.tx_in_progress = false;
+}
+static void ovpn_tcp_tx_work(struct work_struct *work) +{
- struct ovpn_peer *peer;
- peer = container_of(work, struct ovpn_peer, tcp.tx_work);
- lock_sock(peer->sock->sock->sk);
- ovpn_tcp_send_sock(peer);
- release_sock(peer->sock->sock->sk);
+}
+void ovpn_tcp_send_sock_skb(struct ovpn_peer *peer, struct sk_buff *skb) +{
- if (peer->tcp.out_msg.skb)
return;
That's leaking the skb? (and not counting the drop)
we should not lose this packet..[continues below]
- peer->tcp.out_msg.skb = skb;
- peer->tcp.out_msg.len = skb->len;
- peer->tcp.out_msg.offset = 0;
- ovpn_tcp_send_sock(peer);
+}
+static int ovpn_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) +{
[...]
- ret = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
- if (ret) {
kfree_skb(skb);
net_err_ratelimited("%s: skb copy from iter failed: %d\n",
sock->peer->ovpn->dev->name, ret);
goto unlock;
- }
- ovpn_tcp_send_sock_skb(sock->peer, skb);
If we didn't send the packet (because one was already queued/in progress), we should either stash it, or tell userspace that it wasn't sent and it should retry later.
In the part you omitted we have the following (while holding lock_sock):
320 if (peer->tcp.out_msg.skb) { 321 ret = -EAGAIN; 322 goto unlock; 323 }
Therefore ovpn_tcp_send_sock_skb() should never drop the packet if this condition was already evaluated false.
But if I understood all your notes correctly, the data path may still overlap here because it never holds the lock_sock and thus fill .skb after ovpn_tcp_sendmsg() has checked it. Am I right?
However, this race should not be possible anymore once I implement the sock_owned_by_user() check you suggested above. Because ovpn_tcp_sendmsg() will be invoked with owned=1 which will prevent the data path from attempting to send more packets.
Do you agree?
Thanks a lot for pointing this out! The inner mechanisms of TCP socks are still a bit obscure to me.
Regards,
Hi Antonio,
the question was addressed to Sabrina, but since I've already touched this topic in the another patch, let me put my 2c here.
On 16.11.2024 02:33, Antonio Quartulli wrote:
On 31/10/2024 16:25, Sabrina Dubroca wrote:
2024-10-29, 11:47:25 +0100, Antonio Quartulli wrote:
+static void ovpn_socket_release_work(struct work_struct *work) +{ + struct ovpn_socket *sock = container_of(work, struct ovpn_socket, work);
+ ovpn_socket_detach(sock->sock); + kfree_rcu(sock, rcu); +}
+static void ovpn_socket_schedule_release(struct ovpn_socket *sock) +{ + INIT_WORK(&sock->work, ovpn_socket_release_work); + schedule_work(&sock->work);
How does module unloading know that it has to wait for this work to complete? Will ovpn_cleanup get stuck until some refcount gets released by this work?
No, we have no such mechanism. Any idea how other modules do?
Actually this makes me wonder how module unloading coordinates with the code being executed. Unload may happen at any time - how do we prevent killing the code in the middle of something (regardless of scheduled workers)?
Right question! There is a workqueue flushing API intended for synchronization with work(s) execution.
Here, the system workqueue was used, so technically a flush_scheduled_work() call somewhere in the module_exit handler would be enough.
On another hand, flushing the system workqueue considered a not so good practice. It's recommended to use a local workqueue. You can find a good example of switching from the system to a local workqueue in cc271ab86606 ("wwan_hwsim: Avoid flush_scheduled_work() usage").
And if the workqueue is definitely empty at a time of module unloading, e.g. due to flushing on netdev removing, there no requirement to flush it again.
-- Sergey
On 26/11/2024 02:05, Sergey Ryazanov wrote:
Hi Antonio,
the question was addressed to Sabrina, but since I've already touched this topic in the another patch, let me put my 2c here.
On 16.11.2024 02:33, Antonio Quartulli wrote:
On 31/10/2024 16:25, Sabrina Dubroca wrote:
2024-10-29, 11:47:25 +0100, Antonio Quartulli wrote:
+static void ovpn_socket_release_work(struct work_struct *work) +{ + struct ovpn_socket *sock = container_of(work, struct ovpn_socket, work);
+ ovpn_socket_detach(sock->sock); + kfree_rcu(sock, rcu); +}
+static void ovpn_socket_schedule_release(struct ovpn_socket *sock) +{ + INIT_WORK(&sock->work, ovpn_socket_release_work); + schedule_work(&sock->work);
How does module unloading know that it has to wait for this work to complete? Will ovpn_cleanup get stuck until some refcount gets released by this work?
No, we have no such mechanism. Any idea how other modules do?
Actually this makes me wonder how module unloading coordinates with the code being executed. Unload may happen at any time - how do we prevent killing the code in the middle of something (regardless of scheduled workers)?
Right question! There is a workqueue flushing API intended for synchronization with work(s) execution.
Here, the system workqueue was used, so technically a flush_scheduled_work() call somewhere in the module_exit handler would be enough.
On another hand, flushing the system workqueue considered a not so good practice. It's recommended to use a local workqueue. You can find a good example of switching from the system to a local workqueue in cc271ab86606 ("wwan_hwsim: Avoid flush_scheduled_work() usage").
And if the workqueue is definitely empty at a time of module unloading, e.g. due to flushing on netdev removing, there no requirement to flush it again.
ACK. I wanted to avoid using a local workqueue, but if we have pending work that needs flushing I indeed see no other way.
Regards,
-- Sergey
With this change an ovpn instance will be able to stay connected to multiple remote endpoints.
This functionality is strictly required when running ovpn on an OpenVPN server.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/main.c | 55 +++++++++++++- drivers/net/ovpn/ovpnstruct.h | 19 +++++ drivers/net/ovpn/peer.c | 166 ++++++++++++++++++++++++++++++++++++++++-- drivers/net/ovpn/peer.h | 9 +++ 4 files changed, 243 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 0488e395eb27d3dba1efc8ff39c023e0ac4a38dd..c7453127ab640d7268c1ce919a87cc5419fac9ee 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -30,6 +30,9 @@
static void ovpn_struct_free(struct net_device *net) { + struct ovpn_struct *ovpn = netdev_priv(net); + + kfree(ovpn->peers); }
static int ovpn_net_init(struct net_device *dev) @@ -133,12 +136,52 @@ static void ovpn_setup(struct net_device *dev) SET_NETDEV_DEVTYPE(dev, &ovpn_type); }
+static int ovpn_mp_alloc(struct ovpn_struct *ovpn) +{ + struct in_device *dev_v4; + int i; + + if (ovpn->mode != OVPN_MODE_MP) + return 0; + + dev_v4 = __in_dev_get_rtnl(ovpn->dev); + if (dev_v4) { + /* disable redirects as Linux gets confused by ovpn + * handling same-LAN routing. + * This happens because a multipeer interface is used as + * relay point between hosts in the same subnet, while + * in a classic LAN this would not be needed because the + * two hosts would be able to talk directly. + */ + IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false); + IPV4_DEVCONF_ALL(dev_net(ovpn->dev), SEND_REDIRECTS) = false; + } + + /* the peer container is fairly large, therefore we allocate it only in + * MP mode + */ + ovpn->peers = kzalloc(sizeof(*ovpn->peers), GFP_KERNEL); + if (!ovpn->peers) + return -ENOMEM; + + spin_lock_init(&ovpn->peers->lock); + + for (i = 0; i < ARRAY_SIZE(ovpn->peers->by_id); i++) { + INIT_HLIST_HEAD(&ovpn->peers->by_id[i]); + INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_vpn_addr[i], i); + INIT_HLIST_NULLS_HEAD(&ovpn->peers->by_transp_addr[i], i); + } + + return 0; +} + static int ovpn_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { struct ovpn_struct *ovpn = netdev_priv(dev); enum ovpn_mode mode = OVPN_MODE_P2P; + int err;
if (data && data[IFLA_OVPN_MODE]) { mode = nla_get_u8(data[IFLA_OVPN_MODE]); @@ -149,6 +192,10 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, ovpn->mode = mode; spin_lock_init(&ovpn->lock);
+ err = ovpn_mp_alloc(ovpn); + if (err < 0) + return err; + /* turn carrier explicitly off after registration, this way state is * clearly defined */ @@ -197,8 +244,14 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, netif_carrier_off(dev); ovpn->registered = false;
- if (ovpn->mode == OVPN_MODE_P2P) + switch (ovpn->mode) { + case OVPN_MODE_P2P: ovpn_peer_release_p2p(ovpn); + break; + case OVPN_MODE_MP: + ovpn_peers_free(ovpn); + break; + } break; case NETDEV_POST_INIT: case NETDEV_GOING_DOWN: diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index 4a48fc048890ab1cda78bc104fe3034b4a49d226..12ed5e22c2108c9f143d1984048eb40c887cac63 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -15,6 +15,23 @@ #include <uapi/linux/if_link.h> #include <uapi/linux/ovpn.h>
+/** + * struct ovpn_peer_collection - container of peers for MultiPeer mode + * @by_id: table of peers index by ID + * @by_vpn_addr: table of peers indexed by VPN IP address (items can be + * rehashed on the fly due to peer IP change) + * @by_transp_addr: table of peers indexed by transport address (items can be + * rehashed on the fly due to peer IP change) + * @lock: protects writes to peer tables + */ +struct ovpn_peer_collection { + DECLARE_HASHTABLE(by_id, 12); + struct hlist_nulls_head by_vpn_addr[1 << 12]; + struct hlist_nulls_head by_transp_addr[1 << 12]; + + spinlock_t lock; /* protects writes to peer tables */ +}; + /** * struct ovpn_struct - per ovpn interface state * @dev: the actual netdev representing the tunnel @@ -22,6 +39,7 @@ * @registered: whether dev is still registered with netdev or not * @mode: device operation mode (i.e. p2p, mp, ..) * @lock: protect this object + * @peers: data structures holding multi-peer references * @peer: in P2P mode, this is the only remote peer * @dev_list: entry for the module wide device list * @gro_cells: pointer to the Generic Receive Offload cell @@ -32,6 +50,7 @@ struct ovpn_struct { bool registered; enum ovpn_mode mode; spinlock_t lock; /* protect writing to the ovpn_struct object */ + struct ovpn_peer_collection *peers; struct ovpn_peer __rcu *peer; struct list_head dev_list; struct gro_cells gro_cells; diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 5025bfb759d6a5f31e3f2ec094fe561fbdb9f451..73ef509faab9701192a45ffe78a46dbbbeab01c2 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -9,6 +9,7 @@
#include <linux/skbuff.h> #include <linux/list.h> +#include <linux/hashtable.h>
#include "ovpnstruct.h" #include "bind.h" @@ -64,17 +65,16 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) return peer; }
-/** - * ovpn_peer_release - release peer private members - * @peer: the peer to release - */ static void ovpn_peer_release(struct ovpn_peer *peer) { if (peer->sock) ovpn_socket_put(peer->sock);
ovpn_crypto_state_release(&peer->crypto); + spin_lock_bh(&peer->lock); ovpn_bind_reset(peer, NULL); + spin_unlock_bh(&peer->lock); + dst_cache_destroy(&peer->dst_cache); netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker); kfree_rcu(peer, rcu); @@ -309,6 +309,89 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, return match; }
+#define ovpn_get_hash_head(_tbl, _key, _key_len) ({ \ + typeof(_tbl) *__tbl = &(_tbl); \ + (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \ + +/** + * ovpn_peer_add_mp - add peer to related tables in a MP instance + * @ovpn: the instance to add the peer to + * @peer: the peer to add + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_peer_add_mp(struct ovpn_struct *ovpn, struct ovpn_peer *peer) +{ + struct sockaddr_storage sa = { 0 }; + struct hlist_nulls_head *nhead; + struct sockaddr_in6 *sa6; + struct sockaddr_in *sa4; + struct ovpn_bind *bind; + struct ovpn_peer *tmp; + size_t salen; + int ret = 0; + + spin_lock_bh(&ovpn->peers->lock); + /* do not add duplicates */ + tmp = ovpn_peer_get_by_id(ovpn, peer->id); + if (tmp) { + ovpn_peer_put(tmp); + ret = -EEXIST; + goto out; + } + + bind = rcu_dereference_protected(peer->bind, true); + /* peers connected via TCP have bind == NULL */ + if (bind) { + switch (bind->remote.in4.sin_family) { + case AF_INET: + sa4 = (struct sockaddr_in *)&sa; + + sa4->sin_family = AF_INET; + sa4->sin_addr.s_addr = bind->remote.in4.sin_addr.s_addr; + sa4->sin_port = bind->remote.in4.sin_port; + salen = sizeof(*sa4); + break; + case AF_INET6: + sa6 = (struct sockaddr_in6 *)&sa; + + sa6->sin6_family = AF_INET6; + sa6->sin6_addr = bind->remote.in6.sin6_addr; + sa6->sin6_port = bind->remote.in6.sin6_port; + salen = sizeof(*sa6); + break; + default: + ret = -EPROTONOSUPPORT; + goto out; + } + + nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &sa, + salen); + hlist_nulls_add_head_rcu(&peer->hash_entry_transp_addr, nhead); + } + + hlist_add_head_rcu(&peer->hash_entry_id, + ovpn_get_hash_head(ovpn->peers->by_id, &peer->id, + sizeof(peer->id))); + + if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) { + nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, + &peer->vpn_addrs.ipv4, + sizeof(peer->vpn_addrs.ipv4)); + hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead); + } + + if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) { + nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, + &peer->vpn_addrs.ipv6, + sizeof(peer->vpn_addrs.ipv6)); + hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead); + } +out: + spin_unlock_bh(&ovpn->peers->lock); + return ret; +} + /** * ovpn_peer_add_p2p - add peer to related tables in a P2P instance * @ovpn: the instance to add the peer to @@ -349,6 +432,8 @@ static int ovpn_peer_add_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *peer) int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) { switch (ovpn->mode) { + case OVPN_MODE_MP: + return ovpn_peer_add_mp(ovpn, peer); case OVPN_MODE_P2P: return ovpn_peer_add_p2p(ovpn, peer); default: @@ -356,6 +441,51 @@ int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) } }
+/** + * ovpn_peer_unhash - remove peer reference from all hashtables + * @peer: the peer to remove + * @reason: the delete reason to attach to the peer + */ +static void ovpn_peer_unhash(struct ovpn_peer *peer, + enum ovpn_del_peer_reason reason) + __must_hold(&ovpn->peers->lock) +{ + hlist_del_init_rcu(&peer->hash_entry_id); + + hlist_nulls_del_init_rcu(&peer->hash_entry_addr4); + hlist_nulls_del_init_rcu(&peer->hash_entry_addr6); + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); + + ovpn_peer_put(peer); + peer->delete_reason = reason; +} + +/** + * ovpn_peer_del_mp - delete peer from related tables in a MP instance + * @peer: the peer to delete + * @reason: reason why the peer was deleted (sent to userspace) + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_peer_del_mp(struct ovpn_peer *peer, + enum ovpn_del_peer_reason reason) + __must_hold(&peer->ovpn->peers->lock) +{ + struct ovpn_peer *tmp; + int ret = -ENOENT; + + tmp = ovpn_peer_get_by_id(peer->ovpn, peer->id); + if (tmp == peer) { + ovpn_peer_unhash(peer, reason); + ret = 0; + } + + if (tmp) + ovpn_peer_put(tmp); + + return ret; +} + /** * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance * @peer: the peer to delete @@ -411,10 +541,36 @@ void ovpn_peer_release_p2p(struct ovpn_struct *ovpn) */ int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason) { + int ret; + switch (peer->ovpn->mode) { + case OVPN_MODE_MP: + spin_lock_bh(&peer->ovpn->peers->lock); + ret = ovpn_peer_del_mp(peer, reason); + spin_unlock_bh(&peer->ovpn->peers->lock); + return ret; case OVPN_MODE_P2P: - return ovpn_peer_del_p2p(peer, reason); + spin_lock_bh(&peer->ovpn->lock); + ret = ovpn_peer_del_p2p(peer, reason); + spin_unlock_bh(&peer->ovpn->lock); + return ret; default: return -EOPNOTSUPP; } } + +/** + * ovpn_peers_free - free all peers in the instance + * @ovpn: the instance whose peers should be released + */ +void ovpn_peers_free(struct ovpn_struct *ovpn) +{ + struct hlist_node *tmp; + struct ovpn_peer *peer; + int bkt; + + spin_lock_bh(&ovpn->peers->lock); + hash_for_each_safe(ovpn->peers->by_id, bkt, tmp, peer, hash_entry_id) + ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN); + spin_unlock_bh(&ovpn->peers->lock); +} diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 2b7fa9510e362ef3646157bb0d361bab19ddaa99..942b90c84a0fb9e6fbb96f6df7f7842a9f738caf 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -23,6 +23,10 @@ * @vpn_addrs: IP addresses assigned over the tunnel * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel + * @hash_entry_id: entry in the peer ID hashtable + * @hash_entry_addr4: entry in the peer IPv4 hashtable + * @hash_entry_addr6: entry in the peer IPv6 hashtable + * @hash_entry_transp_addr: entry in the peer transport address hashtable * @sock: the socket being used to talk to this peer * @tcp: keeps track of TCP specific state * @tcp.strp: stream parser context (TCP only) @@ -55,6 +59,10 @@ struct ovpn_peer { struct in_addr ipv4; struct in6_addr ipv6; } vpn_addrs; + struct hlist_node hash_entry_id; + struct hlist_nulls_node hash_entry_addr4; + struct hlist_nulls_node hash_entry_addr6; + struct hlist_nulls_node hash_entry_transp_addr; struct ovpn_socket *sock;
/* state of the TCP reading. Needed to keep track of how much of a @@ -119,6 +127,7 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id); int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer); int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason); void ovpn_peer_release_p2p(struct ovpn_struct *ovpn); +void ovpn_peers_free(struct ovpn_struct *ovpn);
struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb);
In a multi-peer scenario there are a number of situations when a specific peer needs to be looked up.
We may want to lookup a peer by: 1. its ID 2. its VPN destination IP 3. its transport IP/port couple
For each of the above, there is a specific routing table referencing all peers for fast look up.
Case 2. is a bit special in the sense that an outgoing packet may not be sent to the peer VPN IP directly, but rather to a network behind it. For this reason we first perform a nexthop lookup in the system routing table and then we use the retrieved nexthop as peer search key.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/peer.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 264 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 73ef509faab9701192a45ffe78a46dbbbeab01c2..c7dc9032c2b55fd42befc1f3e7a0eca893a96576 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -10,6 +10,7 @@ #include <linux/skbuff.h> #include <linux/list.h> #include <linux/hashtable.h> +#include <net/ip6_route.h>
#include "ovpnstruct.h" #include "bind.h" @@ -125,6 +126,94 @@ static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb, return true; }
+/** + * ovpn_nexthop_from_skb4 - retrieve IPv4 nexthop for outgoing skb + * @skb: the outgoing packet + * + * Return: the IPv4 of the nexthop + */ +static __be32 ovpn_nexthop_from_skb4(struct sk_buff *skb) +{ + const struct rtable *rt = skb_rtable(skb); + + if (rt && rt->rt_uses_gateway) + return rt->rt_gw4; + + return ip_hdr(skb)->daddr; +} + +/** + * ovpn_nexthop_from_skb6 - retrieve IPv6 nexthop for outgoing skb + * @skb: the outgoing packet + * + * Return: the IPv6 of the nexthop + */ +static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb) +{ + const struct rt6_info *rt = skb_rt6_info(skb); + + if (!rt || !(rt->rt6i_flags & RTF_GATEWAY)) + return ipv6_hdr(skb)->daddr; + + return rt->rt6i_gateway; +} + +#define ovpn_get_hash_head(_tbl, _key, _key_len) ({ \ + typeof(_tbl) *__tbl = &(_tbl); \ + (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \ + +/** + * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address + * @ovpn: the openvpn instance to search + * @addr: VPN IPv4 to use as search key + * + * Refcounter is not increased for the returned peer. + * + * Return: the peer if found or NULL otherwise + */ +static struct ovpn_peer *ovpn_peer_get_by_vpn_addr4(struct ovpn_struct *ovpn, + __be32 addr) +{ + struct hlist_nulls_head *nhead; + struct hlist_nulls_node *ntmp; + struct ovpn_peer *tmp; + + nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, &addr, + sizeof(addr)); + + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr4) + if (addr == tmp->vpn_addrs.ipv4.s_addr) + return tmp; + + return NULL; +} + +/** + * ovpn_peer_get_by_vpn_addr6 - retrieve peer by its VPN IPv6 address + * @ovpn: the openvpn instance to search + * @addr: VPN IPv6 to use as search key + * + * Refcounter is not increased for the returned peer. + * + * Return: the peer if found or NULL otherwise + */ +static struct ovpn_peer *ovpn_peer_get_by_vpn_addr6(struct ovpn_struct *ovpn, + struct in6_addr *addr) +{ + struct hlist_nulls_head *nhead; + struct hlist_nulls_node *ntmp; + struct ovpn_peer *tmp; + + nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, addr, + sizeof(*addr)); + + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, hash_entry_addr6) + if (ipv6_addr_equal(addr, &tmp->vpn_addrs.ipv6)) + return tmp; + + return NULL; +} + /** * ovpn_peer_transp_match - check if sockaddr and peer binding match * @peer: the peer to get the binding from @@ -202,14 +291,44 @@ ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb) { - struct ovpn_peer *peer = NULL; + struct ovpn_peer *tmp, *peer = NULL; struct sockaddr_storage ss = { 0 }; + struct hlist_nulls_head *nhead; + struct hlist_nulls_node *ntmp; + size_t sa_len;
if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) return NULL;
if (ovpn->mode == OVPN_MODE_P2P) - peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); + return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); + + switch (ss.ss_family) { + case AF_INET: + sa_len = sizeof(struct sockaddr_in); + break; + case AF_INET6: + sa_len = sizeof(struct sockaddr_in6); + break; + default: + return NULL; + } + + nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len); + + rcu_read_lock(); + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, + hash_entry_transp_addr) { + if (!ovpn_peer_transp_match(tmp, &ss)) + continue; + + if (!ovpn_peer_hold(tmp)) + continue; + + peer = tmp; + break; + } + rcu_read_unlock();
return peer; } @@ -244,10 +363,27 @@ static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn, */ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) { - struct ovpn_peer *peer = NULL; + struct ovpn_peer *tmp, *peer = NULL; + struct hlist_head *head;
if (ovpn->mode == OVPN_MODE_P2P) - peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id); + return ovpn_peer_get_by_id_p2p(ovpn, peer_id); + + head = ovpn_get_hash_head(ovpn->peers->by_id, &peer_id, + sizeof(peer_id)); + + rcu_read_lock(); + hlist_for_each_entry_rcu(tmp, head, hash_entry_id) { + if (tmp->id != peer_id) + continue; + + if (!ovpn_peer_hold(tmp)) + continue; + + peer = tmp; + break; + } + rcu_read_unlock();
return peer; } @@ -269,6 +405,8 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, struct sk_buff *skb) { struct ovpn_peer *peer = NULL; + struct in6_addr addr6; + __be32 addr4;
/* in P2P mode, no matter the destination, packets are always sent to * the single peer listening on the other side @@ -279,11 +417,109 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, if (unlikely(peer && !ovpn_peer_hold(peer))) peer = NULL; rcu_read_unlock(); + return peer; }
+ rcu_read_lock(); + switch (skb_protocol_to_family(skb)) { + case AF_INET: + addr4 = ovpn_nexthop_from_skb4(skb); + peer = ovpn_peer_get_by_vpn_addr4(ovpn, addr4); + break; + case AF_INET6: + addr6 = ovpn_nexthop_from_skb6(skb); + peer = ovpn_peer_get_by_vpn_addr6(ovpn, &addr6); + break; + } + + if (unlikely(peer && !ovpn_peer_hold(peer))) + peer = NULL; + rcu_read_unlock(); + return peer; }
+/** + * ovpn_nexthop_from_rt4 - look up the IPv4 nexthop for the given destination + * @ovpn: the private data representing the current VPN session + * @dest: the destination to be looked up + * + * Looks up in the IPv4 system routing table the IP of the nexthop to be used + * to reach the destination passed as argument. If no nexthop can be found, the + * destination itself is returned as it probably has to be used as nexthop. + * + * Return: the IP of the next hop if found or dest itself otherwise + */ +static __be32 ovpn_nexthop_from_rt4(struct ovpn_struct *ovpn, __be32 dest) +{ + struct rtable *rt; + struct flowi4 fl = { + .daddr = dest + }; + + rt = ip_route_output_flow(dev_net(ovpn->dev), &fl, NULL); + if (IS_ERR(rt)) { + net_dbg_ratelimited("%s: no route to host %pI4\n", __func__, + &dest); + /* if we end up here this packet is probably going to be + * thrown away later + */ + return dest; + } + + if (!rt->rt_uses_gateway) + goto out; + + dest = rt->rt_gw4; +out: + ip_rt_put(rt); + return dest; +} + +/** + * ovpn_nexthop_from_rt6 - look up the IPv6 nexthop for the given destination + * @ovpn: the private data representing the current VPN session + * @dest: the destination to be looked up + * + * Looks up in the IPv6 system routing table the IP of the nexthop to be used + * to reach the destination passed as argument. If no nexthop can be found, the + * destination itself is returned as it probably has to be used as nexthop. + * + * Return: the IP of the next hop if found or dest itself otherwise + */ +static struct in6_addr ovpn_nexthop_from_rt6(struct ovpn_struct *ovpn, + struct in6_addr dest) +{ +#if IS_ENABLED(CONFIG_IPV6) + struct dst_entry *entry; + struct rt6_info *rt; + struct flowi6 fl = { + .daddr = dest, + }; + + entry = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ovpn->dev), NULL, &fl, + NULL); + if (IS_ERR(entry)) { + net_dbg_ratelimited("%s: no route to host %pI6c\n", __func__, + &dest); + /* if we end up here this packet is probably going to be + * thrown away later + */ + return dest; + } + + rt = dst_rt6_info(entry); + + if (!(rt->rt6i_flags & RTF_GATEWAY)) + goto out; + + dest = rt->rt6i_gateway; +out: + dst_release((struct dst_entry *)rt); +#endif + return dest; +} + /** * ovpn_peer_check_by_src - check that skb source is routed via peer * @ovpn: the openvpn instance to search @@ -296,6 +532,8 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, struct ovpn_peer *peer) { bool match = false; + struct in6_addr addr6; + __be32 addr4;
if (ovpn->mode == OVPN_MODE_P2P) { /* in P2P mode, no matter the destination, packets are always @@ -304,15 +542,33 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, rcu_read_lock(); match = (peer == rcu_dereference(ovpn->peer)); rcu_read_unlock(); + return match; + } + + /* This function performs a reverse path check, therefore we now + * lookup the nexthop we would use if we wanted to route a packet + * to the source IP. If the nexthop matches the sender we know the + * latter is valid and we allow the packet to come in + */ + + switch (skb_protocol_to_family(skb)) { + case AF_INET: + addr4 = ovpn_nexthop_from_rt4(ovpn, ip_hdr(skb)->saddr); + rcu_read_lock(); + match = (peer == ovpn_peer_get_by_vpn_addr4(ovpn, addr4)); + rcu_read_unlock(); + break; + case AF_INET6: + addr6 = ovpn_nexthop_from_rt6(ovpn, ipv6_hdr(skb)->saddr); + rcu_read_lock(); + match = (peer == ovpn_peer_get_by_vpn_addr6(ovpn, &addr6)); + rcu_read_unlock(); + break; }
return match; }
-#define ovpn_get_hash_head(_tbl, _key, _key_len) ({ \ - typeof(_tbl) *__tbl = &(_tbl); \ - (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \ - /** * ovpn_peer_add_mp - add peer to related tables in a MP instance * @ovpn: the instance to add the peer to
2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote:
struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb) {
- struct ovpn_peer *peer = NULL;
- struct ovpn_peer *tmp, *peer = NULL; struct sockaddr_storage ss = { 0 };
- struct hlist_nulls_head *nhead;
- struct hlist_nulls_node *ntmp;
- size_t sa_len;
if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) return NULL; if (ovpn->mode == OVPN_MODE_P2P)
peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
- switch (ss.ss_family) {
- case AF_INET:
sa_len = sizeof(struct sockaddr_in);
break;
- case AF_INET6:
sa_len = sizeof(struct sockaddr_in6);
break;
- default:
return NULL;
- }
You could get rid of that switch by having ovpn_peer_skb_to_sockaddr also set sa_len (or return 0/the size).
- nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len);
- rcu_read_lock();
- hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead,
hash_entry_transp_addr) {
I think that's missing the retry in case we ended up in the wrong bucket due to a peer rehash?
if (!ovpn_peer_transp_match(tmp, &ss))
continue;
if (!ovpn_peer_hold(tmp))
continue;
peer = tmp;
break;
- }
- rcu_read_unlock();
return peer; }
On 04.11.2024 13:26, Sabrina Dubroca wrote:
2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote:
struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb) {
- struct ovpn_peer *peer = NULL;
- struct ovpn_peer *tmp, *peer = NULL; struct sockaddr_storage ss = { 0 };
- struct hlist_nulls_head *nhead;
- struct hlist_nulls_node *ntmp;
- size_t sa_len;
if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) return NULL; if (ovpn->mode == OVPN_MODE_P2P)
peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
- switch (ss.ss_family) {
- case AF_INET:
sa_len = sizeof(struct sockaddr_in);
break;
- case AF_INET6:
sa_len = sizeof(struct sockaddr_in6);
break;
- default:
return NULL;
- }
You could get rid of that switch by having ovpn_peer_skb_to_sockaddr also set sa_len (or return 0/the size).
- nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len);
- rcu_read_lock();
- hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead,
hash_entry_transp_addr) {
I think that's missing the retry in case we ended up in the wrong bucket due to a peer rehash?
Nice catch! I am also wondering why the 'nulls' variant was selected, but there are no nulls value verification with the search respin.
Since we started discussing the list API, why the 'nulls' variant is used for address hash tables and the normal variant is used for the peer-id lookup?
if (!ovpn_peer_transp_match(tmp, &ss))
continue;
if (!ovpn_peer_hold(tmp))
continue;
peer = tmp;
break;
- }
- rcu_read_unlock();
return peer; }
-- Sergey
On 12/11/2024 02:18, Sergey Ryazanov wrote:
On 04.11.2024 13:26, Sabrina Dubroca wrote:
2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote:
struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct sk_buff *skb) { - struct ovpn_peer *peer = NULL; + struct ovpn_peer *tmp, *peer = NULL; struct sockaddr_storage ss = { 0 }; + struct hlist_nulls_head *nhead; + struct hlist_nulls_node *ntmp; + size_t sa_len; if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss))) return NULL; if (ovpn->mode == OVPN_MODE_P2P) - peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss); + return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
+ switch (ss.ss_family) { + case AF_INET: + sa_len = sizeof(struct sockaddr_in); + break; + case AF_INET6: + sa_len = sizeof(struct sockaddr_in6); + break; + default: + return NULL; + }
You could get rid of that switch by having ovpn_peer_skb_to_sockaddr also set sa_len (or return 0/the size).
Yeah, makes sense. Thanks!
+ nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len);
+ rcu_read_lock(); + hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead, + hash_entry_transp_addr) {
I think that's missing the retry in case we ended up in the wrong bucket due to a peer rehash?
Oh, for some reason I convinced myself that this is handled behind the scene, but indeed the lookup must be explicitly restarted.
will fix it, thanks for pointing this out!
Nice catch! I am also wondering why the 'nulls' variant was selected, but there are no nulls value verification with the search respin.
Since we started discussing the list API, why the 'nulls' variant is used for address hash tables and the normal variant is used for the peer-id lookup?
Because the nulls variant is used only for tables where a re-hash can happen.
The peer-id table does not expect its objected to be re-used or re-hashed since the ID of a peer cannot change throughout its lifetime.
Regards,
+ if (!ovpn_peer_transp_match(tmp, &ss)) + continue;
+ if (!ovpn_peer_hold(tmp)) + continue;
+ peer = tmp; + break; + } + rcu_read_unlock(); return peer; }
-- Sergey
OpenVPN supports configuring a periodic keepalive packet. message to allow the remote endpoint detect link failures.
This change implements the keepalive sending and timer expiring logic.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/io.c | 77 +++++++++++++++++ drivers/net/ovpn/io.h | 5 ++ drivers/net/ovpn/main.c | 3 + drivers/net/ovpn/ovpnstruct.h | 2 + drivers/net/ovpn/peer.c | 188 ++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/peer.h | 15 ++++ drivers/net/ovpn/proto.h | 2 - 7 files changed, 290 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index deda19ab87391f86964ba43088b7847d22420eee..63c140138bf98e5d1df79a2565b666d86513323d 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -27,6 +27,33 @@ #include "skb.h" #include "socket.h"
+const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE] = { + 0x2a, 0x18, 0x7b, 0xf3, 0x64, 0x1e, 0xb4, 0xcb, + 0x07, 0xed, 0x2d, 0x0a, 0x98, 0x1f, 0xc7, 0x48 +}; + +/** + * ovpn_is_keepalive - check if skb contains a keepalive message + * @skb: packet to check + * + * Assumes that the first byte of skb->data is defined. + * + * Return: true if skb contains a keepalive or false otherwise + */ +static bool ovpn_is_keepalive(struct sk_buff *skb) +{ + if (*skb->data != ovpn_keepalive_message[0]) + return false; + + if (skb->len != OVPN_KEEPALIVE_SIZE) + return false; + + if (!pskb_may_pull(skb, OVPN_KEEPALIVE_SIZE)) + return false; + + return !memcmp(skb->data, ovpn_keepalive_message, OVPN_KEEPALIVE_SIZE); +} + /* Called after decrypt to write the IP packet to the device. * This method is expected to manage/free the skb. */ @@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
+ /* keep track of last received authenticated packet for keepalive */ + peer->last_recv = ktime_get_real_seconds(); + /* point to encapsulated IP packet */ __skb_pull(skb, payload_offset);
@@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
+ if (ovpn_is_keepalive(skb)) { + net_dbg_ratelimited("%s: ping received from peer %u\n", + peer->ovpn->dev->name, peer->id); + goto drop; + } + net_info_ratelimited("%s: unsupported protocol received from peer %u\n", peer->ovpn->dev->name, peer->id); goto drop; @@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret) /* no transport configured yet */ goto err; } + + /* keep track of last sent packet for keepalive */ + peer->last_sent = ktime_get_real_seconds(); + /* skb passed down the stack - don't free it */ skb = NULL; err: @@ -361,3 +401,40 @@ netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev) kfree_skb_list(skb); return NET_XMIT_DROP; } + +/** + * ovpn_xmit_special - encrypt and transmit an out-of-band message to peer + * @peer: peer to send the message to + * @data: message content + * @len: message length + * + * Assumes that caller holds a reference to peer + */ +void ovpn_xmit_special(struct ovpn_peer *peer, const void *data, + const unsigned int len) +{ + struct ovpn_struct *ovpn; + struct sk_buff *skb; + + ovpn = peer->ovpn; + if (unlikely(!ovpn)) + return; + + skb = alloc_skb(256 + len, GFP_ATOMIC); + if (unlikely(!skb)) + return; + + skb_reserve(skb, 128); + skb->priority = TC_PRIO_BESTEFFORT; + __skb_put_data(skb, data, len); + + /* increase reference counter when passing peer to sending queue */ + if (!ovpn_peer_hold(peer)) { + netdev_dbg(ovpn->dev, "%s: cannot hold peer reference for sending special packet\n", + __func__); + kfree_skb(skb); + return; + } + + ovpn_send(ovpn, skb, peer); +} diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h index ad81dd86924689309b3299573575a1705eddaf99..eb224114152c29f42aadf026212e8d278006b490 100644 --- a/drivers/net/ovpn/io.h +++ b/drivers/net/ovpn/io.h @@ -10,9 +10,14 @@ #ifndef _NET_OVPN_OVPN_H_ #define _NET_OVPN_OVPN_H_
+#define OVPN_KEEPALIVE_SIZE 16 +extern const unsigned char ovpn_keepalive_message[OVPN_KEEPALIVE_SIZE]; + netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb); +void ovpn_xmit_special(struct ovpn_peer *peer, const void *data, + const unsigned int len);
void ovpn_encrypt_post(void *data, int ret); void ovpn_decrypt_post(void *data, int ret); diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index c7453127ab640d7268c1ce919a87cc5419fac9ee..1bd563e3f16f49dd01c897fbe79cbd90f4b8e9aa 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -191,6 +191,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev, ovpn->dev = dev; ovpn->mode = mode; spin_lock_init(&ovpn->lock); + INIT_DELAYED_WORK(&ovpn->keepalive_work, ovpn_peer_keepalive_work);
err = ovpn_mp_alloc(ovpn); if (err < 0) @@ -244,6 +245,8 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb, netif_carrier_off(dev); ovpn->registered = false;
+ cancel_delayed_work_sync(&ovpn->keepalive_work); + switch (ovpn->mode) { case OVPN_MODE_P2P: ovpn_peer_release_p2p(ovpn); diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h index 12ed5e22c2108c9f143d1984048eb40c887cac63..4ac00d550ecb9f84c6c132dd2bdc0a3fc0ab342c 100644 --- a/drivers/net/ovpn/ovpnstruct.h +++ b/drivers/net/ovpn/ovpnstruct.h @@ -43,6 +43,7 @@ struct ovpn_peer_collection { * @peer: in P2P mode, this is the only remote peer * @dev_list: entry for the module wide device list * @gro_cells: pointer to the Generic Receive Offload cell + * @keepalive_work: struct used to schedule keepalive periodic job */ struct ovpn_struct { struct net_device *dev; @@ -54,6 +55,7 @@ struct ovpn_struct { struct ovpn_peer __rcu *peer; struct list_head dev_list; struct gro_cells gro_cells; + struct delayed_work keepalive_work; };
#endif /* _NET_OVPN_OVPNSTRUCT_H_ */ diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index c7dc9032c2b55fd42befc1f3e7a0eca893a96576..e8a42212af391916b5321e729f7e8a864d0a541f 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -22,6 +22,34 @@ #include "peer.h" #include "socket.h"
+/** + * ovpn_peer_keepalive_set - configure keepalive values for peer + * @peer: the peer to configure + * @interval: outgoing keepalive interval + * @timeout: incoming keepalive timeout + */ +void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout) +{ + time64_t now = ktime_get_real_seconds(); + + netdev_dbg(peer->ovpn->dev, + "%s: scheduling keepalive for peer %u: interval=%u timeout=%u\n", + __func__, peer->id, interval, timeout); + + peer->keepalive_interval = interval; + peer->last_sent = now; + peer->keepalive_xmit_exp = now + interval; + + peer->keepalive_timeout = timeout; + peer->last_recv = now; + peer->keepalive_recv_exp = now + timeout; + + /* now that interval and timeout have been changed, kick + * off the worker so that the next delay can be recomputed + */ + mod_delayed_work(system_wq, &peer->ovpn->keepalive_work, 0); +} + /** * ovpn_peer_new - allocate and initialize a new peer object * @ovpn: the openvpn instance inside which the peer should be created @@ -815,6 +843,19 @@ int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason) } }
+static int ovpn_peer_del_nolock(struct ovpn_peer *peer, + enum ovpn_del_peer_reason reason) +{ + switch (peer->ovpn->mode) { + case OVPN_MODE_MP: + return ovpn_peer_del_mp(peer, reason); + case OVPN_MODE_P2P: + return ovpn_peer_del_p2p(peer, reason); + default: + return -EOPNOTSUPP; + } +} + /** * ovpn_peers_free - free all peers in the instance * @ovpn: the instance whose peers should be released @@ -830,3 +871,150 @@ void ovpn_peers_free(struct ovpn_struct *ovpn) ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN); spin_unlock_bh(&ovpn->peers->lock); } + +static time64_t ovpn_peer_keepalive_work_single(struct ovpn_peer *peer, + time64_t now) +{ + time64_t next_run1, next_run2, delta; + unsigned long timeout, interval; + bool expired; + + spin_lock_bh(&peer->lock); + /* we expect both timers to be configured at the same time, + * therefore bail out if either is not set + */ + if (!peer->keepalive_timeout || !peer->keepalive_interval) { + spin_unlock_bh(&peer->lock); + return 0; + } + + /* check for peer timeout */ + expired = false; + timeout = peer->keepalive_timeout; + delta = now - peer->last_recv; + if (delta < timeout) { + peer->keepalive_recv_exp = now + timeout - delta; + next_run1 = peer->keepalive_recv_exp; + } else if (peer->keepalive_recv_exp > now) { + next_run1 = peer->keepalive_recv_exp; + } else { + expired = true; + } + + if (expired) { + /* peer is dead -> kill it and move on */ + spin_unlock_bh(&peer->lock); + netdev_dbg(peer->ovpn->dev, "peer %u expired\n", + peer->id); + ovpn_peer_del_nolock(peer, OVPN_DEL_PEER_REASON_EXPIRED); + return 0; + } + + /* check for peer keepalive */ + expired = false; + interval = peer->keepalive_interval; + delta = now - peer->last_sent; + if (delta < interval) { + peer->keepalive_xmit_exp = now + interval - delta; + next_run2 = peer->keepalive_xmit_exp; + } else if (peer->keepalive_xmit_exp > now) { + next_run2 = peer->keepalive_xmit_exp; + } else { + expired = true; + next_run2 = now + interval; + } + spin_unlock_bh(&peer->lock); + + if (expired) { + /* a keepalive packet is required */ + netdev_dbg(peer->ovpn->dev, + "sending keepalive to peer %u\n", + peer->id); + ovpn_xmit_special(peer, ovpn_keepalive_message, + sizeof(ovpn_keepalive_message)); + } + + if (next_run1 < next_run2) + return next_run1; + + return next_run2; +} + +static time64_t ovpn_peer_keepalive_work_mp(struct ovpn_struct *ovpn, + time64_t now) +{ + time64_t tmp_next_run, next_run = 0; + struct hlist_node *tmp; + struct ovpn_peer *peer; + int bkt; + + spin_lock_bh(&ovpn->peers->lock); + hash_for_each_safe(ovpn->peers->by_id, bkt, tmp, peer, hash_entry_id) { + tmp_next_run = ovpn_peer_keepalive_work_single(peer, now); + if (!tmp_next_run) + continue; + + /* the next worker run will be scheduled based on the shortest + * required interval across all peers + */ + if (!next_run || tmp_next_run < next_run) + next_run = tmp_next_run; + } + spin_unlock_bh(&ovpn->peers->lock); + + return next_run; +} + +static time64_t ovpn_peer_keepalive_work_p2p(struct ovpn_struct *ovpn, + time64_t now) +{ + struct ovpn_peer *peer; + time64_t next_run = 0; + + spin_lock_bh(&ovpn->lock); + peer = rcu_dereference_protected(ovpn->peer, + lockdep_is_held(&ovpn->lock)); + if (peer) + next_run = ovpn_peer_keepalive_work_single(peer, now); + spin_unlock_bh(&ovpn->lock); + + return next_run; +} + +/** + * ovpn_peer_keepalive_work - run keepalive logic on each known peer + * @work: pointer to the work member of the related ovpn object + * + * Each peer has two timers (if configured): + * 1. peer timeout: when no data is received for a certain interval, + * the peer is considered dead and it gets killed. + * 2. peer keepalive: when no data is sent to a certain peer for a + * certain interval, a special 'keepalive' packet is explicitly sent. + * + * This function iterates across the whole peer collection while + * checking the timers described above. + */ +void ovpn_peer_keepalive_work(struct work_struct *work) +{ + struct ovpn_struct *ovpn = container_of(work, struct ovpn_struct, + keepalive_work.work); + time64_t next_run = 0, now = ktime_get_real_seconds(); + + switch (ovpn->mode) { + case OVPN_MODE_MP: + next_run = ovpn_peer_keepalive_work_mp(ovpn, now); + break; + case OVPN_MODE_P2P: + next_run = ovpn_peer_keepalive_work_p2p(ovpn, now); + break; + } + + /* prevent rearming if the interface is being destroyed */ + if (next_run > 0 && ovpn->registered) { + netdev_dbg(ovpn->dev, + "scheduling keepalive work: now=%llu next_run=%llu delta=%llu\n", + next_run, now, next_run - now); + schedule_delayed_work(&ovpn->keepalive_work, + (next_run - now) * HZ); + } +} diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 942b90c84a0fb9e6fbb96f6df7f7842a9f738caf..952927ae78a3ab753aaf2c6cc6f77121bdac34be 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -43,6 +43,12 @@ * @crypto: the crypto configuration (ciphers, keys, etc..) * @dst_cache: cache for dst_entry used to send to peer * @bind: remote peer binding + * @keepalive_interval: seconds after which a new keepalive should be sent + * @keepalive_xmit_exp: future timestamp when next keepalive should be sent + * @last_sent: timestamp of the last successfully sent packet + * @keepalive_timeout: seconds after which an inactive peer is considered dead + * @keepalive_recv_exp: future timestamp when the peer should expire + * @last_recv: timestamp of the last authenticated received packet * @halt: true if ovpn_peer_mark_delete was called * @vpn_stats: per-peer in-VPN TX/RX stays * @link_stats: per-peer link/transport TX/RX stats @@ -91,6 +97,12 @@ struct ovpn_peer { struct ovpn_crypto_state crypto; struct dst_cache dst_cache; struct ovpn_bind __rcu *bind; + unsigned long keepalive_interval; + unsigned long keepalive_xmit_exp; + time64_t last_sent; + unsigned long keepalive_timeout; + unsigned long keepalive_recv_exp; + time64_t last_recv; bool halt; struct ovpn_peer_stats vpn_stats; struct ovpn_peer_stats link_stats; @@ -137,4 +149,7 @@ struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, struct ovpn_peer *peer);
+void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout); +void ovpn_peer_keepalive_work(struct work_struct *work); + #endif /* _NET_OVPN_OVPNPEER_H_ */ diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h index 32af6b8e574381fb719a1b3b9de3ae1071cc4846..0de8bafadc89ebb85ce40de95ef394588738a4ad 100644 --- a/drivers/net/ovpn/proto.h +++ b/drivers/net/ovpn/proto.h @@ -35,8 +35,6 @@ #define OVPN_OP_SIZE_V2 4 #define OVPN_PEER_ID_MASK 0x00FFFFFF #define OVPN_PEER_ID_UNDEF 0x00FFFFFF -/* first byte of keepalive message */ -#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a /* first byte of exit message */ #define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28
2024-10-29, 11:47:28 +0100, Antonio Quartulli wrote:
@@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
- /* keep track of last received authenticated packet for keepalive */
- peer->last_recv = ktime_get_real_seconds();
It doesn't look like we're locking the peer here so that should be a WRITE_ONCE() (and READ_ONCE(peer->last_recv) for all reads).
- /* point to encapsulated IP packet */ __skb_pull(skb, payload_offset);
@@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
if (ovpn_is_keepalive(skb)) {
net_dbg_ratelimited("%s: ping received from peer %u\n",
peer->ovpn->dev->name, peer->id);
goto drop;
To help with debugging connectivity issues, maybe keepalives shouldn't be counted as drops? (consume_skb instead of kfree_skb, and not incrementing rx_dropped) The packet was successfully received and did all it had to do.
}
- net_info_ratelimited("%s: unsupported protocol received from peer %u\n", peer->ovpn->dev->name, peer->id); goto drop;
@@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret) /* no transport configured yet */ goto err; }
- /* keep track of last sent packet for keepalive */
- peer->last_sent = ktime_get_real_seconds();
And another WRITE_ONCE() here (also paired with READ_ONCE() on the read side).
+static int ovpn_peer_del_nolock(struct ovpn_peer *peer,
enum ovpn_del_peer_reason reason)
+{
- switch (peer->ovpn->mode) {
- case OVPN_MODE_MP:
I think it would be nice to add
lockdep_assert_held(&peer->ovpn->peers->lock);
return ovpn_peer_del_mp(peer, reason);
- case OVPN_MODE_P2P:
and here
lockdep_assert_held(&peer->ovpn->lock);
(I had to check that ovpn_peer_del_nolock is indeed called with those locks held since they're taken by ovpn_peer_keepalive_work_{mp,p2p}, adding these assertions would make it clear that ovpn_peer_del_nolock is not an unsafe version of ovpn_peer_del)
return ovpn_peer_del_p2p(peer, reason);
- default:
return -EOPNOTSUPP;
- }
+}
/**
- ovpn_peers_free - free all peers in the instance
- @ovpn: the instance whose peers should be released
@@ -830,3 +871,150 @@ void ovpn_peers_free(struct ovpn_struct *ovpn) ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN); spin_unlock_bh(&ovpn->peers->lock); }
+static time64_t ovpn_peer_keepalive_work_single(struct ovpn_peer *peer,
time64_t now)
+{
- time64_t next_run1, next_run2, delta;
- unsigned long timeout, interval;
- bool expired;
- spin_lock_bh(&peer->lock);
- /* we expect both timers to be configured at the same time,
* therefore bail out if either is not set
*/
- if (!peer->keepalive_timeout || !peer->keepalive_interval) {
spin_unlock_bh(&peer->lock);
return 0;
- }
- /* check for peer timeout */
- expired = false;
- timeout = peer->keepalive_timeout;
- delta = now - peer->last_recv;
I'm not sure that's always > 0 if we finish decrypting a packet just as the workqueue starts:
ovpn_peer_keepalive_work now = ...
ovpn_decrypt_post peer->last_recv = ...
ovpn_peer_keepalive_work_single delta: now < peer->last_recv
- if (delta < timeout) {
peer->keepalive_recv_exp = now + timeout - delta;
I'd shorten that to
peer->keepalive_recv_exp = peer->last_recv + timeout;
it's a bit more readable to my eyes and avoids risks of wrapping values.
So I'd probably get rid of delta and go with:
last_recv = READ_ONCE(peer->last_recv) if (now < last_recv + timeout) { peer->keepalive_recv_exp = last_recv + timeout; next_run1 = peer->keepalive_recv_exp; } else if ...
next_run1 = peer->keepalive_recv_exp;
- } else if (peer->keepalive_recv_exp > now) {
next_run1 = peer->keepalive_recv_exp;
- } else {
expired = true;
- }
[...]
- /* check for peer keepalive */
- expired = false;
- interval = peer->keepalive_interval;
- delta = now - peer->last_sent;
- if (delta < interval) {
peer->keepalive_xmit_exp = now + interval - delta;
next_run2 = peer->keepalive_xmit_exp;
and same here
On 05/11/2024 19:10, Sabrina Dubroca wrote:
2024-10-29, 11:47:28 +0100, Antonio Quartulli wrote:
@@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
- /* keep track of last received authenticated packet for keepalive */
- peer->last_recv = ktime_get_real_seconds();
It doesn't look like we're locking the peer here so that should be a WRITE_ONCE() (and READ_ONCE(peer->last_recv) for all reads).
Is that because last_recv is 64 bit long (and might be more than one word on certain architectures)?
I don't remember having to do so for reading/writing 32 bit long integers.
I presume we need a WRITE_ONCE also upon initialization in ovpn_peer_keepalive_set() right? We still want to coordinate that with other reads/writes.
- /* point to encapsulated IP packet */ __skb_pull(skb, payload_offset);
@@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
if (ovpn_is_keepalive(skb)) {
net_dbg_ratelimited("%s: ping received from peer %u\n",
peer->ovpn->dev->name, peer->id);
goto drop;
To help with debugging connectivity issues, maybe keepalives shouldn't be counted as drops? (consume_skb instead of kfree_skb, and not incrementing rx_dropped) The packet was successfully received and did all it had to do.
you're absolutely right. Will change that.
}
- net_info_ratelimited("%s: unsupported protocol received from peer %u\n", peer->ovpn->dev->name, peer->id); goto drop;
@@ -221,6 +257,10 @@ void ovpn_encrypt_post(void *data, int ret) /* no transport configured yet */ goto err; }
- /* keep track of last sent packet for keepalive */
- peer->last_sent = ktime_get_real_seconds();
And another WRITE_ONCE() here (also paired with READ_ONCE() on the read side).
Yap
+static int ovpn_peer_del_nolock(struct ovpn_peer *peer,
enum ovpn_del_peer_reason reason)
+{
- switch (peer->ovpn->mode) {
- case OVPN_MODE_MP:
I think it would be nice to add
lockdep_assert_held(&peer->ovpn->peers->lock);
return ovpn_peer_del_mp(peer, reason);
- case OVPN_MODE_P2P:
and here
lockdep_assert_held(&peer->ovpn->lock);
Yeah, good idea. __must_hold() can't work here, so lockdep_assert_held is definitely the way to go.
(I had to check that ovpn_peer_del_nolock is indeed called with those locks held since they're taken by ovpn_peer_keepalive_work_{mp,p2p}, adding these assertions would make it clear that ovpn_peer_del_nolock is not an unsafe version of ovpn_peer_del)
Right, it makes sense.
return ovpn_peer_del_p2p(peer, reason);
- default:
return -EOPNOTSUPP;
- }
+}
- /**
- ovpn_peers_free - free all peers in the instance
- @ovpn: the instance whose peers should be released
@@ -830,3 +871,150 @@ void ovpn_peers_free(struct ovpn_struct *ovpn) ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN); spin_unlock_bh(&ovpn->peers->lock); }
+static time64_t ovpn_peer_keepalive_work_single(struct ovpn_peer *peer,
time64_t now)
+{
- time64_t next_run1, next_run2, delta;
- unsigned long timeout, interval;
- bool expired;
- spin_lock_bh(&peer->lock);
- /* we expect both timers to be configured at the same time,
* therefore bail out if either is not set
*/
- if (!peer->keepalive_timeout || !peer->keepalive_interval) {
spin_unlock_bh(&peer->lock);
return 0;
- }
- /* check for peer timeout */
- expired = false;
- timeout = peer->keepalive_timeout;
- delta = now - peer->last_recv;
I'm not sure that's always > 0 if we finish decrypting a packet just as the workqueue starts:
ovpn_peer_keepalive_work now = ...
ovpn_decrypt_post peer->last_recv = ...
ovpn_peer_keepalive_work_single delta: now < peer->last_recv
Yeah, there is nothing preventing this from happening...but is this truly a problem? The math should still work, no?
However:
- if (delta < timeout) {
peer->keepalive_recv_exp = now + timeout - delta;
I'd shorten that to
peer->keepalive_recv_exp = peer->last_recv + timeout;
it's a bit more readable to my eyes and avoids risks of wrapping values.
So I'd probably get rid of delta and go with:
last_recv = READ_ONCE(peer->last_recv) if (now < last_recv + timeout) { peer->keepalive_recv_exp = last_recv + timeout; next_run1 = peer->keepalive_recv_exp; } else if ...
next_run1 = peer->keepalive_recv_exp;
- } else if (peer->keepalive_recv_exp > now) {
next_run1 = peer->keepalive_recv_exp;
- } else {
expired = true;
- }
I agree this is simpler to read and gets rid of some extra operations.
[note: I took inspiration from nat_keepalive_work_single() - it could be simplified as well I guess]
[...]
- /* check for peer keepalive */
- expired = false;
- interval = peer->keepalive_interval;
- delta = now - peer->last_sent;
- if (delta < interval) {
peer->keepalive_xmit_exp = now + interval - delta;
next_run2 = peer->keepalive_xmit_exp;
and same here
Yeah, will change both. Thanks!
Regards,
2024-11-12, 14:20:45 +0100, Antonio Quartulli wrote:
On 05/11/2024 19:10, Sabrina Dubroca wrote:
2024-10-29, 11:47:28 +0100, Antonio Quartulli wrote:
@@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
- /* keep track of last received authenticated packet for keepalive */
- peer->last_recv = ktime_get_real_seconds();
It doesn't look like we're locking the peer here so that should be a WRITE_ONCE() (and READ_ONCE(peer->last_recv) for all reads).
Is that because last_recv is 64 bit long (and might be more than one word on certain architectures)?
I don't remember having to do so for reading/writing 32 bit long integers.
AFAIK it's not just that. The compiler is free to do the read/write in any way it wants when you don't specify _ONCE. On the read side, it could read from memory a single time or multiple times (getting possibly different values each time), or maybe split the load (possibly reading chunks from different values being written in parallel).
I presume we need a WRITE_ONCE also upon initialization in ovpn_peer_keepalive_set() right? We still want to coordinate that with other reads/writes.
I think it makes sense, yes.
- /* point to encapsulated IP packet */ __skb_pull(skb, payload_offset);
@@ -121,6 +151,12 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
if (ovpn_is_keepalive(skb)) {
net_dbg_ratelimited("%s: ping received from peer %u\n",
peer->ovpn->dev->name, peer->id);
goto drop;
To help with debugging connectivity issues, maybe keepalives shouldn't be counted as drops? (consume_skb instead of kfree_skb, and not incrementing rx_dropped) The packet was successfully received and did all it had to do.
you're absolutely right. Will change that.
Thanks.
- /* check for peer timeout */
- expired = false;
- timeout = peer->keepalive_timeout;
- delta = now - peer->last_recv;
I'm not sure that's always > 0 if we finish decrypting a packet just as the workqueue starts:
ovpn_peer_keepalive_work now = ...
ovpn_decrypt_post peer->last_recv = ...
ovpn_peer_keepalive_work_single delta: now < peer->last_recv
Yeah, there is nothing preventing this from happening...but is this truly a problem? The math should still work, no?
We'll fail "delta < timeout" (which we shouldn't), so we'll end up either in the "expired = true" case, or not updating keepalive_recv_exp. Both of these seem not ideal.
However:
- if (delta < timeout) {
peer->keepalive_recv_exp = now + timeout - delta;
I'd shorten that to
peer->keepalive_recv_exp = peer->last_recv + timeout;
it's a bit more readable to my eyes and avoids risks of wrapping values.
So I'd probably get rid of delta and go with:
last_recv = READ_ONCE(peer->last_recv) if (now < last_recv + timeout) { peer->keepalive_recv_exp = last_recv + timeout; next_run1 = peer->keepalive_recv_exp; } else if ...
next_run1 = peer->keepalive_recv_exp;
- } else if (peer->keepalive_recv_exp > now) {
next_run1 = peer->keepalive_recv_exp;
- } else {
expired = true;
- }
I agree this is simpler to read and gets rid of some extra operations.
[note: I took inspiration from nat_keepalive_work_single() - it could be simplified as well I guess]
Ah, ok. I wanted to review this code when it was posted but didn't have time :(
[...]
- /* check for peer keepalive */
- expired = false;
- interval = peer->keepalive_interval;
- delta = now - peer->last_sent;
- if (delta < interval) {
peer->keepalive_xmit_exp = now + interval - delta;
next_run2 = peer->keepalive_xmit_exp;
and same here
Yeah, will change both. Thanks!
Thanks.
On 13/11/2024 11:36, Sabrina Dubroca wrote:
2024-11-12, 14:20:45 +0100, Antonio Quartulli wrote:
On 05/11/2024 19:10, Sabrina Dubroca wrote:
2024-10-29, 11:47:28 +0100, Antonio Quartulli wrote:
@@ -105,6 +132,9 @@ void ovpn_decrypt_post(void *data, int ret) goto drop; }
- /* keep track of last received authenticated packet for keepalive */
- peer->last_recv = ktime_get_real_seconds();
It doesn't look like we're locking the peer here so that should be a WRITE_ONCE() (and READ_ONCE(peer->last_recv) for all reads).
Is that because last_recv is 64 bit long (and might be more than one word on certain architectures)?
I don't remember having to do so for reading/writing 32 bit long integers.
AFAIK it's not just that. The compiler is free to do the read/write in any way it wants when you don't specify _ONCE. On the read side, it could read from memory a single time or multiple times (getting possibly different values each time), or maybe split the load (possibly reading chunks from different values being written in parallel).
Ok, thanks. Will switch to WRITE/READ_ONE then.
I presume we need a WRITE_ONCE also upon initialization in ovpn_peer_keepalive_set() right? We still want to coordinate that with other reads/writes.
I think it makes sense, yes.
ACK
[...]
- /* check for peer timeout */
- expired = false;
- timeout = peer->keepalive_timeout;
- delta = now - peer->last_recv;
I'm not sure that's always > 0 if we finish decrypting a packet just as the workqueue starts:
ovpn_peer_keepalive_work now = ... ovpn_decrypt_post peer->last_recv = ... ovpn_peer_keepalive_work_single delta: now < peer->last_recv
Yeah, there is nothing preventing this from happening...but is this truly a problem? The math should still work, no?
We'll fail "delta < timeout" (which we shouldn't), so we'll end up either in the "expired = true" case, or not updating keepalive_recv_exp. Both of these seem not ideal.
delta is signed, so it'll end up being a negative value and "delta < timeout" should not fail then. Unless I am missing something.
Anyway, this was just an exercise to understand what was going on. I already changed the code as per your suggestion (the fact that we are still discussing this chunk proves that it needed to be simplified :))
However:
- if (delta < timeout) {
peer->keepalive_recv_exp = now + timeout - delta;
I'd shorten that to
peer->keepalive_recv_exp = peer->last_recv + timeout;
it's a bit more readable to my eyes and avoids risks of wrapping values.
So I'd probably get rid of delta and go with:
last_recv = READ_ONCE(peer->last_recv) if (now < last_recv + timeout) { peer->keepalive_recv_exp = last_recv + timeout; next_run1 = peer->keepalive_recv_exp; } else if ...
next_run1 = peer->keepalive_recv_exp;
- } else if (peer->keepalive_recv_exp > now) {
next_run1 = peer->keepalive_recv_exp;
- } else {
expired = true;
- }
I agree this is simpler to read and gets rid of some extra operations.
[note: I took inspiration from nat_keepalive_work_single() - it could be simplified as well I guess]
Ah, ok. I wanted to review this code when it was posted but didn't have time :(
It can still be fixed ;)
Thanks. Regards,
2024-11-14, 09:12:01 +0100, Antonio Quartulli wrote:
On 13/11/2024 11:36, Sabrina Dubroca wrote:
2024-11-12, 14:20:45 +0100, Antonio Quartulli wrote:
On 05/11/2024 19:10, Sabrina Dubroca wrote:
2024-10-29, 11:47:28 +0100, Antonio Quartulli wrote:
- /* check for peer timeout */
- expired = false;
- timeout = peer->keepalive_timeout;
- delta = now - peer->last_recv;
I'm not sure that's always > 0 if we finish decrypting a packet just as the workqueue starts:
ovpn_peer_keepalive_work now = ... ovpn_decrypt_post peer->last_recv = ... ovpn_peer_keepalive_work_single delta: now < peer->last_recv
Yeah, there is nothing preventing this from happening...but is this truly a problem? The math should still work, no?
We'll fail "delta < timeout" (which we shouldn't), so we'll end up either in the "expired = true" case, or not updating keepalive_recv_exp. Both of these seem not ideal.
delta is signed, so it'll end up being a negative value and "delta < timeout" should not fail then. Unless I am missing something.
But timeout is "unsigned long", so the comparison will be done as unsigned.
Anyway, this was just an exercise to understand what was going on. I already changed the code as per your suggestion (the fact that we are still discussing this chunk proves that it needed to be simplified :))
:)
On 12/11/2024 14:20, Antonio Quartulli wrote: [...]
+static int ovpn_peer_del_nolock(struct ovpn_peer *peer, + enum ovpn_del_peer_reason reason) +{ + switch (peer->ovpn->mode) { + case OVPN_MODE_MP:
I think it would be nice to add
lockdep_assert_held(&peer->ovpn->peers->lock);
Sabrina, in other places I have used the sparse notation __must_hold() instead. Is there any preference in regards to lockdep vs sparse?
I could switch them all to lockdep_assert_held if needed.
Regards,
+ return ovpn_peer_del_mp(peer, reason); + case OVPN_MODE_P2P:
and here
lockdep_assert_held(&peer->ovpn->lock);
Yeah, good idea. __must_hold() can't work here, so lockdep_assert_held is definitely the way to go.
(I had to check that ovpn_peer_del_nolock is indeed called with those locks held since they're taken by ovpn_peer_keepalive_work_{mp,p2p}, adding these assertions would make it clear that ovpn_peer_del_nolock is not an unsafe version of ovpn_peer_del)
Right, it makes sense.
+ return ovpn_peer_del_p2p(peer, reason); + default: + return -EOPNOTSUPP; + } +}
/** * ovpn_peers_free - free all peers in the instance * @ovpn: the instance whose peers should be released @@ -830,3 +871,150 @@ void ovpn_peers_free(struct ovpn_struct *ovpn) ovpn_peer_unhash(peer, OVPN_DEL_PEER_REASON_TEARDOWN); spin_unlock_bh(&ovpn->peers->lock); }
+static time64_t ovpn_peer_keepalive_work_single(struct ovpn_peer *peer, + time64_t now) +{ + time64_t next_run1, next_run2, delta; + unsigned long timeout, interval; + bool expired;
+ spin_lock_bh(&peer->lock); + /* we expect both timers to be configured at the same time, + * therefore bail out if either is not set + */ + if (!peer->keepalive_timeout || !peer->keepalive_interval) { + spin_unlock_bh(&peer->lock); + return 0; + }
+ /* check for peer timeout */ + expired = false; + timeout = peer->keepalive_timeout; + delta = now - peer->last_recv;
I'm not sure that's always > 0 if we finish decrypting a packet just as the workqueue starts:
ovpn_peer_keepalive_work now = ...
ovpn_decrypt_post peer->last_recv = ...
ovpn_peer_keepalive_work_single delta: now < peer->last_recv
Yeah, there is nothing preventing this from happening...but is this truly a problem? The math should still work, no?
However:
+ if (delta < timeout) { + peer->keepalive_recv_exp = now + timeout - delta;
I'd shorten that to
peer->keepalive_recv_exp = peer->last_recv + timeout;
it's a bit more readable to my eyes and avoids risks of wrapping values.
So I'd probably get rid of delta and go with:
last_recv = READ_ONCE(peer->last_recv) if (now < last_recv + timeout) { peer->keepalive_recv_exp = last_recv + timeout; next_run1 = peer->keepalive_recv_exp; } else if ...
+ next_run1 = peer->keepalive_recv_exp; + } else if (peer->keepalive_recv_exp > now) { + next_run1 = peer->keepalive_recv_exp; + } else { + expired = true; + }
I agree this is simpler to read and gets rid of some extra operations.
[note: I took inspiration from nat_keepalive_work_single() - it could be simplified as well I guess]
[...]
+ /* check for peer keepalive */ + expired = false; + interval = peer->keepalive_interval; + delta = now - peer->last_sent; + if (delta < interval) { + peer->keepalive_xmit_exp = now + interval - delta; + next_run2 = peer->keepalive_xmit_exp;
and same here
Yeah, will change both. Thanks!
Regards,
2024-11-22, 10:41:26 +0100, Antonio Quartulli wrote:
On 12/11/2024 14:20, Antonio Quartulli wrote: [...]
+static int ovpn_peer_del_nolock(struct ovpn_peer *peer, + enum ovpn_del_peer_reason reason) +{ + switch (peer->ovpn->mode) { + case OVPN_MODE_MP:
I think it would be nice to add
lockdep_assert_held(&peer->ovpn->peers->lock);
Sabrina, in other places I have used the sparse notation __must_hold() instead. Is there any preference in regards to lockdep vs sparse?
I could switch them all to lockdep_assert_held if needed.
__must_hold has the advantage of being checked at compile time (though I'm not sure it's that reliable [1]), so you don't need to run a test that actually hits that particular code path.
In this case I see lockdep_assert_held as mainly documenting that the locking that makes ovpn_peer_del_nolock safe (as safe as ovpn_peer_del) is provided by its caller. The splat for incorrect use on debug kernels is a bonus. Sprinkling lockdep_assert_held all over ovpn might be bloating the code too much, but I'm not opposed to adding them if it helps.
[1] I ran sparse on drivers/net/ovpn/peer.c before/after removing the locking from ovpn_peer_del and didn't get any warnings. sparse is good to detect imbalances (function that locks without unlocking), but maybe don't trust __must_hold for more than documenting expectations.
[note: if you end up merging ovpn->peers->lock with ovpn->lock as we've discussed somewhere else, the locking around keepalive and ovpn_peer_del becomes a bit less hairy]
On 22/11/2024 17:18, Sabrina Dubroca wrote:
2024-11-22, 10:41:26 +0100, Antonio Quartulli wrote:
On 12/11/2024 14:20, Antonio Quartulli wrote: [...]
+static int ovpn_peer_del_nolock(struct ovpn_peer *peer, + enum ovpn_del_peer_reason reason) +{ + switch (peer->ovpn->mode) { + case OVPN_MODE_MP:
I think it would be nice to add
lockdep_assert_held(&peer->ovpn->peers->lock);
Sabrina, in other places I have used the sparse notation __must_hold() instead. Is there any preference in regards to lockdep vs sparse?
I could switch them all to lockdep_assert_held if needed.
__must_hold has the advantage of being checked at compile time (though I'm not sure it's that reliable [1]), so you don't need to run a test that actually hits that particular code path.
In this case I see lockdep_assert_held as mainly documenting that the locking that makes ovpn_peer_del_nolock safe (as safe as ovpn_peer_del) is provided by its caller. The splat for incorrect use on debug kernels is a bonus. Sprinkling lockdep_assert_held all over ovpn might be bloating the code too much, but I'm not opposed to adding them if it helps.
[1] I ran sparse on drivers/net/ovpn/peer.c before/after removing the locking from ovpn_peer_del and didn't get any warnings. sparse is good to detect imbalances (function that locks without unlocking), but maybe don't trust __must_hold for more than documenting expectations.
Same here. I didn't expect that. Then I think it's better to rely on lockdep_assert_held() for this kind of assumptions.
[note: if you end up merging ovpn->peers->lock with ovpn->lock as we've discussed somewhere else, the locking around keepalive and ovpn_peer_del becomes a bit less hairy]
Yeah, this is happening.
Thanks a lot!
Regards,
In case of UDP links, the local endpoint used to communicate with a given peer may change without a connection restart.
Add support for learning the new address in case of change.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/peer.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/peer.h | 3 +++ 2 files changed, 48 insertions(+)
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index e8a42212af391916b5321e729f7e8a864d0a541f..3f67d200e283213fcb732d10f9edeb53e0a0e9ee 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -416,6 +416,51 @@ struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id) return peer; }
+/** + * ovpn_peer_update_local_endpoint - update local endpoint for peer + * @peer: peer to update the endpoint for + * @skb: incoming packet to retrieve the destination address (local) from + */ +void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer, + struct sk_buff *skb) +{ + struct ovpn_bind *bind; + + rcu_read_lock(); + bind = rcu_dereference(peer->bind); + if (unlikely(!bind)) + goto unlock; + + spin_lock_bh(&peer->lock); + switch (skb_protocol_to_family(skb)) { + case AF_INET: + if (unlikely(bind->local.ipv4.s_addr != ip_hdr(skb)->daddr)) { + netdev_dbg(peer->ovpn->dev, + "%s: learning local IPv4 for peer %d (%pI4 -> %pI4)\n", + __func__, peer->id, &bind->local.ipv4.s_addr, + &ip_hdr(skb)->daddr); + bind->local.ipv4.s_addr = ip_hdr(skb)->daddr; + } + break; + case AF_INET6: + if (unlikely(!ipv6_addr_equal(&bind->local.ipv6, + &ipv6_hdr(skb)->daddr))) { + netdev_dbg(peer->ovpn->dev, + "%s: learning local IPv6 for peer %d (%pI6c -> %pI6c\n", + __func__, peer->id, &bind->local.ipv6, + &ipv6_hdr(skb)->daddr); + bind->local.ipv6 = ipv6_hdr(skb)->daddr; + } + break; + default: + break; + } + spin_unlock_bh(&peer->lock); + +unlock: + rcu_read_unlock(); +} + /** * ovpn_peer_get_by_dst - Lookup peer to send skb to * @ovpn: the private data representing the current VPN session diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 952927ae78a3ab753aaf2c6cc6f77121bdac34be..1a8638d266b11a4a80ee2f088394d47a7798c3af 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -152,4 +152,7 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout); void ovpn_peer_keepalive_work(struct work_struct *work);
+void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer, + struct sk_buff *skb); + #endif /* _NET_OVPN_OVPNPEER_H_ */
A peer connected via UDP may change its IP address without reconnecting (float).
Add support for detecting and updating the new peer IP/port in case of floating.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/bind.c | 10 ++-- drivers/net/ovpn/io.c | 9 ++++ drivers/net/ovpn/peer.c | 129 ++++++++++++++++++++++++++++++++++++++++++++++-- drivers/net/ovpn/peer.h | 2 + 4 files changed, 139 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c index b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a..d17d078c5730bf4336dc87f45cdba3f6b8cad770 100644 --- a/drivers/net/ovpn/bind.c +++ b/drivers/net/ovpn/bind.c @@ -47,12 +47,8 @@ struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss) * @new: the new bind to assign */ void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new) + __must_hold(&peer->lock) { - struct ovpn_bind *old; - - spin_lock_bh(&peer->lock); - old = rcu_replace_pointer(peer->bind, new, true); - spin_unlock_bh(&peer->lock); - - kfree_rcu(old, rcu); + kfree_rcu(rcu_replace_pointer(peer->bind, new, + lockdep_is_held(&peer->lock)), rcu); } diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) /* keep track of last received authenticated packet for keepalive */ peer->last_recv = ktime_get_real_seconds();
+ if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) { + /* check if this peer changed it's IP address and update + * state + */ + ovpn_peer_float(peer, skb); + /* update source endpoint for this peer */ + ovpn_peer_update_local_endpoint(peer, skb); + } + /* point to encapsulated IP packet */ __skb_pull(skb, payload_offset);
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 3f67d200e283213fcb732d10f9edeb53e0a0e9ee..da6215bbb643592e4567e61e4b4976d367ed109c 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -94,6 +94,131 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) return peer; }
+/** + * ovpn_peer_reset_sockaddr - recreate binding for peer + * @peer: peer to recreate the binding for + * @ss: sockaddr to use as remote endpoint for the binding + * @local_ip: local IP for the binding + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, + const struct sockaddr_storage *ss, + const u8 *local_ip) + __must_hold(&peer->lock) +{ + struct ovpn_bind *bind; + size_t ip_len; + + /* create new ovpn_bind object */ + bind = ovpn_bind_from_sockaddr(ss); + if (IS_ERR(bind)) + return PTR_ERR(bind); + + if (local_ip) { + if (ss->ss_family == AF_INET) { + ip_len = sizeof(struct in_addr); + } else if (ss->ss_family == AF_INET6) { + ip_len = sizeof(struct in6_addr); + } else { + netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n", + __func__); + kfree(bind); + return -EINVAL; + } + + memcpy(&bind->local, local_ip, ip_len); + } + + /* set binding */ + ovpn_bind_reset(peer, bind); + + return 0; +} + +#define ovpn_get_hash_head(_tbl, _key, _key_len) ({ \ + typeof(_tbl) *__tbl = &(_tbl); \ + (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \ + +/** + * ovpn_peer_float - update remote endpoint for peer + * @peer: peer to update the remote endpoint for + * @skb: incoming packet to retrieve the source address (remote) from + */ +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) +{ + struct hlist_nulls_head *nhead; + struct sockaddr_storage ss; + const u8 *local_ip = NULL; + struct sockaddr_in6 *sa6; + struct sockaddr_in *sa; + struct ovpn_bind *bind; + sa_family_t family; + size_t salen; + + rcu_read_lock(); + bind = rcu_dereference(peer->bind); + if (unlikely(!bind)) { + rcu_read_unlock(); + return; + } + + spin_lock_bh(&peer->lock); + if (likely(ovpn_bind_skb_src_match(bind, skb))) + goto unlock; + + family = skb_protocol_to_family(skb); + + if (bind->remote.in4.sin_family == family) + local_ip = (u8 *)&bind->local; + + switch (family) { + case AF_INET: + sa = (struct sockaddr_in *)&ss; + sa->sin_family = AF_INET; + sa->sin_addr.s_addr = ip_hdr(skb)->saddr; + sa->sin_port = udp_hdr(skb)->source; + salen = sizeof(*sa); + break; + case AF_INET6: + sa6 = (struct sockaddr_in6 *)&ss; + sa6->sin6_family = AF_INET6; + sa6->sin6_addr = ipv6_hdr(skb)->saddr; + sa6->sin6_port = udp_hdr(skb)->source; + sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr, + skb->skb_iif); + salen = sizeof(*sa6); + break; + default: + goto unlock; + } + + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, + peer->id, &ss); + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, + local_ip); + spin_unlock_bh(&peer->lock); + rcu_read_unlock(); + + /* rehashing is required only in MP mode as P2P has one peer + * only and thus there is no hashtable + */ + if (peer->ovpn->mode == OVPN_MODE_MP) { + spin_lock_bh(&peer->ovpn->peers->lock); + /* remove old hashing */ + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); + /* re-add with new transport address */ + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr, + &ss, salen); + hlist_nulls_add_head_rcu(&peer->hash_entry_transp_addr, nhead); + spin_unlock_bh(&peer->ovpn->peers->lock); + } + return; +unlock: + spin_unlock_bh(&peer->lock); + rcu_read_unlock(); +} + static void ovpn_peer_release(struct ovpn_peer *peer) { if (peer->sock) @@ -186,10 +311,6 @@ static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb) return rt->rt6i_gateway; }
-#define ovpn_get_hash_head(_tbl, _key, _key_len) ({ \ - typeof(_tbl) *__tbl = &(_tbl); \ - (&(*__tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(*__tbl)]); }) \ - /** * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address * @ovpn: the openvpn instance to search diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 1a8638d266b11a4a80ee2f088394d47a7798c3af..940cea5372ec0375cfe3e673154a1e0248978409 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -155,4 +155,6 @@ void ovpn_peer_keepalive_work(struct work_struct *work); void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer, struct sk_buff *skb);
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb); + #endif /* _NET_OVPN_OVPNPEER_H_ */
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
+static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
const struct sockaddr_storage *ss,
const u8 *local_ip)
- __must_hold(&peer->lock)
+{
- struct ovpn_bind *bind;
- size_t ip_len;
- /* create new ovpn_bind object */
- bind = ovpn_bind_from_sockaddr(ss);
- if (IS_ERR(bind))
return PTR_ERR(bind);
- if (local_ip) {
if (ss->ss_family == AF_INET) {
ip_len = sizeof(struct in_addr);
} else if (ss->ss_family == AF_INET6) {
ip_len = sizeof(struct in6_addr);
} else {
netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n",
__func__);
ratelimited since that can be triggered from packet processing?
[...]
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) +{
[...]
- switch (family) {
- case AF_INET:
sa = (struct sockaddr_in *)&ss;
sa->sin_family = AF_INET;
sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
sa->sin_port = udp_hdr(skb)->source;
salen = sizeof(*sa);
break;
- case AF_INET6:
sa6 = (struct sockaddr_in6 *)&ss;
sa6->sin6_family = AF_INET6;
sa6->sin6_addr = ipv6_hdr(skb)->saddr;
sa6->sin6_port = udp_hdr(skb)->source;
sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
skb->skb_iif);
salen = sizeof(*sa6);
break;
- default:
goto unlock;
- }
- netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
%u for peer->id?
and ratelimited too, probably.
(also in ovpn_peer_update_local_endpoint in the previous patch)
peer->id, &ss);
- ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
local_ip);
skip the rehash if this fails? peer->bind will still be the old one so moving it to the new hash chain won't help (the lookup will fail).
On 04/11/2024 12:24, Sabrina Dubroca wrote:
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
+static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
const struct sockaddr_storage *ss,
const u8 *local_ip)
- __must_hold(&peer->lock)
+{
- struct ovpn_bind *bind;
- size_t ip_len;
- /* create new ovpn_bind object */
- bind = ovpn_bind_from_sockaddr(ss);
- if (IS_ERR(bind))
return PTR_ERR(bind);
- if (local_ip) {
if (ss->ss_family == AF_INET) {
ip_len = sizeof(struct in_addr);
} else if (ss->ss_family == AF_INET6) {
ip_len = sizeof(struct in6_addr);
} else {
netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n",
__func__);
ratelimited since that can be triggered from packet processing?
ACK
[...]
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) +{
[...]
- switch (family) {
- case AF_INET:
sa = (struct sockaddr_in *)&ss;
sa->sin_family = AF_INET;
sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
sa->sin_port = udp_hdr(skb)->source;
salen = sizeof(*sa);
break;
- case AF_INET6:
sa6 = (struct sockaddr_in6 *)&ss;
sa6->sin6_family = AF_INET6;
sa6->sin6_addr = ipv6_hdr(skb)->saddr;
sa6->sin6_port = udp_hdr(skb)->source;
sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
skb->skb_iif);
salen = sizeof(*sa6);
break;
- default:
goto unlock;
- }
- netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
%u for peer->id?
and ratelimited too, probably.
(also in ovpn_peer_update_local_endpoint in the previous patch)
Technically we don't expect that frequent float/endpoint updates, but should they happen..better to be protected.
ACK
peer->id, &ss);
- ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
local_ip);
skip the rehash if this fails? peer->bind will still be the old one so moving it to the new hash chain won't help (the lookup will fail).
Yeah, it makes sense.
Thanks a lot. Regards,
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) /* keep track of last received authenticated packet for keepalive */ peer->last_recv = ktime_get_real_seconds();
- if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
What prevents peer->sock from being replaced and released concurrently?
Or possibly reading the error value that ovpn_socket_new can return before peer->sock is reset to NULL, just noticed this in ovpn_nl_peer_modify:
if (attrs[OVPN_A_PEER_SOCKET]) { // ... peer->sock = ovpn_socket_new(sock, peer); if (IS_ERR(peer->sock)) { // ... peer->sock = NULL;
(ovpn_encrypt_post has a similar check on peer->sock->sock->sk->sk_protocol that I don't think is safe either)
/* check if this peer changed it's IP address and update
* state
*/
ovpn_peer_float(peer, skb);
/* update source endpoint for this peer */
ovpn_peer_update_local_endpoint(peer, skb);
Why not do both in the same function? They're not called anywhere else (at least in this version of the series). They both modify peer->bind depending on skb_protocol_to_family(skb), and operate under peer->lock.
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) +{
- struct hlist_nulls_head *nhead;
- struct sockaddr_storage ss;
- const u8 *local_ip = NULL;
- struct sockaddr_in6 *sa6;
- struct sockaddr_in *sa;
- struct ovpn_bind *bind;
- sa_family_t family;
- size_t salen;
- rcu_read_lock();
- bind = rcu_dereference(peer->bind);
- if (unlikely(!bind)) {
rcu_read_unlock();
return;
- }
- spin_lock_bh(&peer->lock);
You could take the lock from the start, instead of using rcu_read_lock to get peer->bind. It would guarantee that the bind we got isn't already being replaced just as we wait to update it. And same in ovpn_peer_update_local_endpoint, it would make sure we're updating the local IP for the active bind.
(sorry I didn't think about that last time we discussed this)
- if (likely(ovpn_bind_skb_src_match(bind, skb)))
goto unlock;
- family = skb_protocol_to_family(skb);
On 12/11/2024 11:56, Sabrina Dubroca wrote:
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) /* keep track of last received authenticated packet for keepalive */ peer->last_recv = ktime_get_real_seconds();
- if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
What prevents peer->sock from being replaced and released concurrently?
Technically nothing. Userspace currently does not even support updating a peer socket at runtime, but I wanted ovpn to be flexible enough from the beginning.
One approach might be to go back to peer->sock being unmutable and forget about this.
OTOH, if we want to keep this flexibility (which I think is nice), I think I should make peer->sock an RCU pointer and access it accordingly. Does it make sense?
Or possibly reading the error value that ovpn_socket_new can return before peer->sock is reset to NULL, just noticed this in ovpn_nl_peer_modify:
if (attrs[OVPN_A_PEER_SOCKET]) { // ... peer->sock = ovpn_socket_new(sock, peer); if (IS_ERR(peer->sock)) { // ... peer->sock = NULL;
(ovpn_encrypt_post has a similar check on peer->sock->sock->sk->sk_protocol that I don't think is safe either)
Yap, agreed.
/* check if this peer changed it's IP address and update
* state
*/
ovpn_peer_float(peer, skb);
/* update source endpoint for this peer */
ovpn_peer_update_local_endpoint(peer, skb);
Why not do both in the same function? They're not called anywhere else (at least in this version of the series). They both modify peer->bind depending on skb_protocol_to_family(skb), and operate under peer->lock.
I never considered to do so as I just always assumed the two to be two separate features/routines.
I think it's a good idea and I would get rid of a few common instructions (along with acquiring the lock twice). Thanks!
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) +{
- struct hlist_nulls_head *nhead;
- struct sockaddr_storage ss;
- const u8 *local_ip = NULL;
- struct sockaddr_in6 *sa6;
- struct sockaddr_in *sa;
- struct ovpn_bind *bind;
- sa_family_t family;
- size_t salen;
- rcu_read_lock();
- bind = rcu_dereference(peer->bind);
- if (unlikely(!bind)) {
rcu_read_unlock();
return;
- }
- spin_lock_bh(&peer->lock);
You could take the lock from the start, instead of using rcu_read_lock to get peer->bind. It would guarantee that the bind we got isn't already being replaced just as we wait to update it. And same in ovpn_peer_update_local_endpoint, it would make sure we're updating the local IP for the active bind.
(sorry I didn't think about that last time we discussed this)
no worries :) and I like the idea. will do that, thanks.
- if (likely(ovpn_bind_skb_src_match(bind, skb)))
goto unlock;
- family = skb_protocol_to_family(skb);
2024-11-12, 15:03:00 +0100, Antonio Quartulli wrote:
On 12/11/2024 11:56, Sabrina Dubroca wrote:
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) /* keep track of last received authenticated packet for keepalive */ peer->last_recv = ktime_get_real_seconds();
- if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
What prevents peer->sock from being replaced and released concurrently?
Technically nothing. Userspace currently does not even support updating a peer socket at runtime, but I wanted ovpn to be flexible enough from the beginning.
Is there a reason to do that? With TCP the peer would have to reconnect, and I guess fully restart the whole process (become a new peer with a new ID etc). With UDP, do you need to replace the socket?
One approach might be to go back to peer->sock being unmutable and forget about this.
OTOH, if we want to keep this flexibility (which I think is nice), I think I should make peer->sock an RCU pointer and access it accordingly.
You already use kfree_rcu for ovpn_socket, so the only difference would be the __rcu annotation and helpers? (+ rcu_read_lock/unlock in a few places)
Adding rcu_read_lock for peer->sock in ovpn_tcp_tx_work looks painful... (another place that I missed where things could go bad if the socket was updated in the current implementation, btw)
Maybe save that for later since you don't have a use case for it yet?
On 13/11/2024 12:25, Sabrina Dubroca wrote:
2024-11-12, 15:03:00 +0100, Antonio Quartulli wrote:
On 12/11/2024 11:56, Sabrina Dubroca wrote:
2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret) /* keep track of last received authenticated packet for keepalive */ peer->last_recv = ktime_get_real_seconds();
- if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
What prevents peer->sock from being replaced and released concurrently?
Technically nothing. Userspace currently does not even support updating a peer socket at runtime, but I wanted ovpn to be flexible enough from the beginning.
Is there a reason to do that? With TCP the peer would have to reconnect, and I guess fully restart the whole process (become a new peer with a new ID etc). With UDP, do you need to replace the socket?
At the moment userspace won't try to do that, but I can foresee some future use cases: i.e. a peer that switches to a different interface and needs to open a new socket to keep sending data.
Moreover, in userspace we're currently working on multisocket support (theoretically server side only), therefore I can imagine a peer floating from one socket to the other while keeping the session alive.
This is all work in progress, but not that far in the future.
For TCP, you're right, although at some point we may even implement transport reconnections without losing the VPN state (this is not even planned, just a brain dump).
One approach might be to go back to peer->sock being unmutable and forget about this.
OTOH, if we want to keep this flexibility (which I think is nice), I think I should make peer->sock an RCU pointer and access it accordingly.
You already use kfree_rcu for ovpn_socket, so the only difference would be the __rcu annotation and helpers? (+ rcu_read_lock/unlock in a few places)
Adding rcu_read_lock for peer->sock in ovpn_tcp_tx_work looks painful... (another place that I missed where things could go bad if the socket was updated in the current implementation, btw)
Maybe save that for later since you don't have a use case for it yet?
I agree with you. I'll make the socket unmutable again and I'll work on this later on.
Thanks a lot for digging with me into this.
Regards,
This change introduces the netlink command needed to add, delete and retrieve/dump known peers. Userspace is expected to use these commands to handle known peer lifecycles.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/netlink.c | 578 ++++++++++++++++++++++++++++++++++++++++++++- drivers/net/ovpn/peer.c | 48 ++-- drivers/net/ovpn/peer.h | 5 + 3 files changed, 609 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index 2cc34eb1d1d870c6705714cb971c3c5dfb04afda..d504445325ef82db04f87367c858adaf025f6297 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -7,6 +7,7 @@ */
#include <linux/netdevice.h> +#include <linux/types.h> #include <net/genetlink.h>
#include <uapi/linux/ovpn.h> @@ -16,6 +17,10 @@ #include "io.h" #include "netlink.h" #include "netlink-gen.h" +#include "bind.h" +#include "packet.h" +#include "peer.h" +#include "socket.h"
MODULE_ALIAS_GENL_FAMILY(OVPN_FAMILY_NAME);
@@ -86,29 +91,592 @@ void ovpn_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, netdev_put(ovpn->dev, &ovpn->dev_tracker); }
+static int ovpn_nl_attr_sockaddr_remote(struct nlattr **attrs, + struct sockaddr_storage *ss) +{ + struct sockaddr_in6 *sin6; + struct sockaddr_in *sin; + struct in6_addr *in6; + __be16 port = 0; + __be32 *in; + int af; + + ss->ss_family = AF_UNSPEC; + + if (attrs[OVPN_A_PEER_REMOTE_PORT]) + port = nla_get_be16(attrs[OVPN_A_PEER_REMOTE_PORT]); + + if (attrs[OVPN_A_PEER_REMOTE_IPV4]) { + af = AF_INET; + ss->ss_family = AF_INET; + in = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV4]); + } else if (attrs[OVPN_A_PEER_REMOTE_IPV6]) { + af = AF_INET6; + ss->ss_family = AF_INET6; + in6 = nla_data(attrs[OVPN_A_PEER_REMOTE_IPV6]); + } else { + return AF_UNSPEC; + } + + switch (ss->ss_family) { + case AF_INET6: + /* If this is a regular IPv6 just break and move on, + * otherwise switch to AF_INET and extract the IPv4 accordingly + */ + if (!ipv6_addr_v4mapped(in6)) { + sin6 = (struct sockaddr_in6 *)ss; + sin6->sin6_port = port; + memcpy(&sin6->sin6_addr, in6, sizeof(*in6)); + break; + } + + /* v4-mapped-v6 address */ + ss->ss_family = AF_INET; + in = &in6->s6_addr32[3]; + fallthrough; + case AF_INET: + sin = (struct sockaddr_in *)ss; + sin->sin_port = port; + sin->sin_addr.s_addr = *in; + break; + } + + /* don't return ss->ss_family as it may have changed in case of + * v4-mapped-v6 address + */ + return af; +} + +static u8 *ovpn_nl_attr_local_ip(struct nlattr **attrs) +{ + u8 *addr6; + + if (!attrs[OVPN_A_PEER_LOCAL_IPV4] && !attrs[OVPN_A_PEER_LOCAL_IPV6]) + return NULL; + + if (attrs[OVPN_A_PEER_LOCAL_IPV4]) + return nla_data(attrs[OVPN_A_PEER_LOCAL_IPV4]); + + addr6 = nla_data(attrs[OVPN_A_PEER_LOCAL_IPV6]); + /* this is an IPv4-mapped IPv6 address, therefore extract the actual + * v4 address from the last 4 bytes + */ + if (ipv6_addr_v4mapped((struct in6_addr *)addr6)) + return addr6 + 12; + + return addr6; +} + +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn, + struct genl_info *info, + struct nlattr **attrs) +{ + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, + OVPN_A_PEER_ID)) + return -EINVAL; + + if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify both remote IPv4 or IPv6 address"); + return -EINVAL; + } + + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && + !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify remote port without IP address"); + return -EINVAL; + } + + if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && + attrs[OVPN_A_PEER_LOCAL_IPV4]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify local IPv4 address without remote"); + return -EINVAL; + } + + if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && + attrs[OVPN_A_PEER_LOCAL_IPV6]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify local IPV6 address without remote"); + return -EINVAL; + } + + if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && + attrs[OVPN_A_PEER_REMOTE_IPV6_SCOPE_ID]) { + NL_SET_ERR_MSG_MOD(info->extack, + "cannot specify scope id without remote IPv6 address"); + return -EINVAL; + } + + /* VPN IPs are needed only in MP mode for selecting the right peer */ + if (ovpn->mode == OVPN_MODE_P2P && (attrs[OVPN_A_PEER_VPN_IPV4] || + attrs[OVPN_A_PEER_VPN_IPV6])) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "VPN IP unexpected in P2P mode"); + return -EINVAL; + } + + if ((attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] && + !attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) || + (!attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] && + attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT])) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "keepalive interval and timeout are required together"); + return -EINVAL; + } + + return 0; +} + +/** + * ovpn_nl_peer_modify - modify the peer attributes according to the incoming msg + * @peer: the peer to modify + * @info: generic netlink info from the user request + * @attrs: the attributes from the user request + * + * Return: a negative error code in case of failure, 0 on success or 1 on + * success and the VPN IPs have been modified (requires rehashing in MP + * mode) + */ +static int ovpn_nl_peer_modify(struct ovpn_peer *peer, struct genl_info *info, + struct nlattr **attrs) +{ + struct sockaddr_storage ss = {}; + u32 sockfd, interv, timeout; + struct socket *sock = NULL; + u8 *local_ip = NULL; + bool rehash = false; + int ret; + + if (attrs[OVPN_A_PEER_SOCKET]) { + /* lookup the fd in the kernel table and extract the socket + * object + */ + sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]); + /* sockfd_lookup() increases sock's refcounter */ + sock = sockfd_lookup(sockfd, &ret); + if (!sock) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot lookup peer socket (fd=%u): %d", + sockfd, ret); + return -ENOTSOCK; + } + + /* Only when using UDP as transport protocol the remote endpoint + * can be configured so that ovpn knows where to send packets + * to. + * + * In case of TCP, the socket is connected to the peer and ovpn + * will just send bytes over it, without the need to specify a + * destination. + */ + if (sock->sk->sk_protocol != IPPROTO_UDP && + (attrs[OVPN_A_PEER_REMOTE_IPV4] || + attrs[OVPN_A_PEER_REMOTE_IPV6])) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "unexpected remote IP address for non UDP socket"); + sockfd_put(sock); + return -EINVAL; + } + + if (peer->sock) + ovpn_socket_put(peer->sock); + + peer->sock = ovpn_socket_new(sock, peer); + if (IS_ERR(peer->sock)) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot encapsulate socket: %ld", + PTR_ERR(peer->sock)); + sockfd_put(sock); + peer->sock = NULL; + return -ENOTSOCK; + } + } + + if (ovpn_nl_attr_sockaddr_remote(attrs, &ss) != AF_UNSPEC) { + /* we carry the local IP in a generic container. + * ovpn_peer_reset_sockaddr() will properly interpret it + * based on ss.ss_family + */ + local_ip = ovpn_nl_attr_local_ip(attrs); + + spin_lock_bh(&peer->lock); + /* set peer sockaddr */ + ret = ovpn_peer_reset_sockaddr(peer, &ss, local_ip); + if (ret < 0) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot set peer sockaddr: %d", + ret); + spin_unlock_bh(&peer->lock); + return ret; + } + spin_unlock_bh(&peer->lock); + } + + if (attrs[OVPN_A_PEER_VPN_IPV4]) { + rehash = true; + peer->vpn_addrs.ipv4.s_addr = + nla_get_in_addr(attrs[OVPN_A_PEER_VPN_IPV4]); + } + + if (attrs[OVPN_A_PEER_VPN_IPV6]) { + rehash = true; + peer->vpn_addrs.ipv6 = + nla_get_in6_addr(attrs[OVPN_A_PEER_VPN_IPV6]); + } + + /* when setting the keepalive, both parameters have to be configured */ + if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] && + attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) { + interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]); + timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]); + ovpn_peer_keepalive_set(peer, interv, timeout); + } + + netdev_dbg(peer->ovpn->dev, + "%s: peer id=%u endpoint=%pIScp/%s VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n", + __func__, peer->id, &ss, + peer->sock->sock->sk->sk_prot_creator->name, + &peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6); + + return rehash ? 1 : 0; +} + int ovpn_nl_peer_new_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; + struct ovpn_struct *ovpn = info->user_ptr[0]; + struct ovpn_peer *peer; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER], + ovpn_peer_nl_policy, info->extack); + if (ret) + return ret; + + ret = ovpn_nl_peer_precheck(ovpn, info, attrs); + if (ret < 0) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, + OVPN_A_PEER_SOCKET)) + return -EINVAL; + + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]); + peer = ovpn_peer_new(ovpn, peer_id); + if (IS_ERR(peer)) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot create new peer object for peer %u: %ld", + peer_id, PTR_ERR(peer)); + return PTR_ERR(peer); + } + + ret = ovpn_nl_peer_modify(peer, info, attrs); + if (ret < 0) + goto peer_release; + + ret = ovpn_peer_add(ovpn, peer); + if (ret < 0) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot add new peer (id=%u) to hashtable: %d\n", + peer->id, ret); + goto peer_release; + } + + return 0; + +peer_release: + /* release right away because peer is not used in any context */ + ovpn_peer_release(peer); + + return ret; }
int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; + struct ovpn_struct *ovpn = info->user_ptr[0]; + struct ovpn_peer *peer; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER], + ovpn_peer_nl_policy, info->extack); + if (ret) + return ret; + + ret = ovpn_nl_peer_precheck(ovpn, info, attrs); + if (ret < 0) + return ret; + + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]); + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) + return -ENOENT; + + ret = ovpn_nl_peer_modify(peer, info, attrs); + if (ret < 0) { + ovpn_peer_put(peer); + return ret; + } + + /* ret == 1 means that VPN IPv4/6 has been modified and rehashing + * is required + */ + if (ret > 0) { + spin_lock_bh(&ovpn->peers->lock); + ovpn_peer_hash_vpn_ip(peer); + spin_unlock_bh(&ovpn->peers->lock); + } + + ovpn_peer_put(peer); + + return 0; +} + +static int ovpn_nl_send_peer(struct sk_buff *skb, const struct genl_info *info, + const struct ovpn_peer *peer, u32 portid, u32 seq, + int flags) +{ + const struct ovpn_bind *bind; + struct nlattr *attr; + void *hdr; + + hdr = genlmsg_put(skb, portid, seq, &ovpn_nl_family, flags, + OVPN_CMD_PEER_GET); + if (!hdr) + return -ENOBUFS; + + attr = nla_nest_start(skb, OVPN_A_PEER); + if (!attr) + goto err; + + if (nla_put_u32(skb, OVPN_A_PEER_ID, peer->id)) + goto err; + + if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) + if (nla_put_in_addr(skb, OVPN_A_PEER_VPN_IPV4, + peer->vpn_addrs.ipv4.s_addr)) + goto err; + + if (!ipv6_addr_equal(&peer->vpn_addrs.ipv6, &in6addr_any)) + if (nla_put_in6_addr(skb, OVPN_A_PEER_VPN_IPV6, + &peer->vpn_addrs.ipv6)) + goto err; + + if (nla_put_u32(skb, OVPN_A_PEER_KEEPALIVE_INTERVAL, + peer->keepalive_interval) || + nla_put_u32(skb, OVPN_A_PEER_KEEPALIVE_TIMEOUT, + peer->keepalive_timeout)) + goto err; + + rcu_read_lock(); + bind = rcu_dereference(peer->bind); + if (bind) { + if (bind->remote.in4.sin_family == AF_INET) { + if (nla_put_in_addr(skb, OVPN_A_PEER_REMOTE_IPV4, + bind->remote.in4.sin_addr.s_addr) || + nla_put_net16(skb, OVPN_A_PEER_REMOTE_PORT, + bind->remote.in4.sin_port) || + nla_put_in_addr(skb, OVPN_A_PEER_LOCAL_IPV4, + bind->local.ipv4.s_addr)) + goto err_unlock; + } else if (bind->remote.in4.sin_family == AF_INET6) { + if (nla_put_in6_addr(skb, OVPN_A_PEER_REMOTE_IPV6, + &bind->remote.in6.sin6_addr) || + nla_put_u32(skb, OVPN_A_PEER_REMOTE_IPV6_SCOPE_ID, + bind->remote.in6.sin6_scope_id) || + nla_put_net16(skb, OVPN_A_PEER_REMOTE_PORT, + bind->remote.in6.sin6_port) || + nla_put_in6_addr(skb, OVPN_A_PEER_LOCAL_IPV6, + &bind->local.ipv6)) + goto err_unlock; + } + } + rcu_read_unlock(); + + if (nla_put_net16(skb, OVPN_A_PEER_LOCAL_PORT, + inet_sk(peer->sock->sock->sk)->inet_sport) || + /* VPN RX stats */ + nla_put_uint(skb, OVPN_A_PEER_VPN_RX_BYTES, + atomic64_read(&peer->vpn_stats.rx.bytes)) || + nla_put_uint(skb, OVPN_A_PEER_VPN_RX_PACKETS, + atomic64_read(&peer->vpn_stats.rx.packets)) || + /* VPN TX stats */ + nla_put_uint(skb, OVPN_A_PEER_VPN_TX_BYTES, + atomic64_read(&peer->vpn_stats.tx.bytes)) || + nla_put_uint(skb, OVPN_A_PEER_VPN_TX_PACKETS, + atomic64_read(&peer->vpn_stats.tx.packets)) || + /* link RX stats */ + nla_put_uint(skb, OVPN_A_PEER_LINK_RX_BYTES, + atomic64_read(&peer->link_stats.rx.bytes)) || + nla_put_uint(skb, OVPN_A_PEER_LINK_RX_PACKETS, + atomic64_read(&peer->link_stats.rx.packets)) || + /* link TX stats */ + nla_put_uint(skb, OVPN_A_PEER_LINK_TX_BYTES, + atomic64_read(&peer->link_stats.tx.bytes)) || + nla_put_uint(skb, OVPN_A_PEER_LINK_TX_PACKETS, + atomic64_read(&peer->link_stats.tx.packets))) + goto err; + + nla_nest_end(skb, attr); + genlmsg_end(skb, hdr); + + return 0; +err_unlock: + rcu_read_unlock(); +err: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; }
int ovpn_nl_peer_get_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; + struct ovpn_struct *ovpn = info->user_ptr[0]; + struct ovpn_peer *peer; + struct sk_buff *msg; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER], + ovpn_peer_nl_policy, info->extack); + if (ret) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, + OVPN_A_PEER_ID)) + return -EINVAL; + + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]); + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot find peer with id %u", peer_id); + return -ENOENT; + } + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) { + ret = -ENOMEM; + goto err; + } + + ret = ovpn_nl_send_peer(msg, info, peer, info->snd_portid, + info->snd_seq, 0); + if (ret < 0) { + nlmsg_free(msg); + goto err; + } + + ret = genlmsg_reply(msg, info); +err: + ovpn_peer_put(peer); + return ret; }
int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { - return -EOPNOTSUPP; + const struct genl_info *info = genl_info_dump(cb); + int bkt, last_idx = cb->args[1], dumped = 0; + struct ovpn_struct *ovpn; + struct ovpn_peer *peer; + + ovpn = ovpn_get_dev_from_attrs(sock_net(cb->skb->sk), info); + if (IS_ERR(ovpn)) + return PTR_ERR(ovpn); + + if (ovpn->mode == OVPN_MODE_P2P) { + /* if we already dumped a peer it means we are done */ + if (last_idx) + goto out; + + rcu_read_lock(); + peer = rcu_dereference(ovpn->peer); + if (peer) { + if (ovpn_nl_send_peer(skb, info, peer, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI) == 0) + dumped++; + } + rcu_read_unlock(); + } else { + rcu_read_lock(); + hash_for_each_rcu(ovpn->peers->by_id, bkt, peer, + hash_entry_id) { + /* skip already dumped peers that were dumped by + * previous invocations + */ + if (last_idx > 0) { + last_idx--; + continue; + } + + if (ovpn_nl_send_peer(skb, info, peer, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI) < 0) + break; + + /* count peers being dumped during this invocation */ + dumped++; + } + rcu_read_unlock(); + } + +out: + netdev_put(ovpn->dev, &ovpn->dev_tracker); + + /* sum up peers dumped in this message, so that at the next invocation + * we can continue from where we left + */ + cb->args[1] += dumped; + return skb->len; }
int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; + struct ovpn_struct *ovpn = info->user_ptr[0]; + struct ovpn_peer *peer; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER], + ovpn_peer_nl_policy, info->extack); + if (ret) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs, + OVPN_A_PEER_ID)) + return -EINVAL; + + peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]); + + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) + return -ENOENT; + + netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id); + ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE); + ovpn_peer_put(peer); + + return ret; }
int ovpn_nl_key_new_doit(struct sk_buff *skb, struct genl_info *info) diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index da6215bbb643592e4567e61e4b4976d367ed109c..8cfe1997ec116ae4fe74cd7105d228569e2a66a9 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -102,9 +102,9 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) * * Return: 0 on success or a negative error code otherwise */ -static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, - const struct sockaddr_storage *ss, - const u8 *local_ip) +int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, + const struct sockaddr_storage *ss, + const u8 *local_ip) __must_hold(&peer->lock) { struct ovpn_bind *bind; @@ -219,7 +219,7 @@ void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) rcu_read_unlock(); }
-static void ovpn_peer_release(struct ovpn_peer *peer) +void ovpn_peer_release(struct ovpn_peer *peer) { if (peer->sock) ovpn_socket_put(peer->sock); @@ -763,6 +763,32 @@ bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, return match; }
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer) + __must_hold(&peer->ovpn->peers->lock) +{ + struct hlist_nulls_head *nhead; + + if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) { + /* remove potential old hashing */ + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); + + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, + &peer->vpn_addrs.ipv4, + sizeof(peer->vpn_addrs.ipv4)); + hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead); + } + + if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) { + /* remove potential old hashing */ + hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr); + + nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr, + &peer->vpn_addrs.ipv6, + sizeof(peer->vpn_addrs.ipv6)); + hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead); + } +} + /** * ovpn_peer_add_mp - add peer to related tables in a MP instance * @ovpn: the instance to add the peer to @@ -824,19 +850,7 @@ static int ovpn_peer_add_mp(struct ovpn_struct *ovpn, struct ovpn_peer *peer) ovpn_get_hash_head(ovpn->peers->by_id, &peer->id, sizeof(peer->id)));
- if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) { - nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, - &peer->vpn_addrs.ipv4, - sizeof(peer->vpn_addrs.ipv4)); - hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead); - } - - if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) { - nhead = ovpn_get_hash_head(ovpn->peers->by_vpn_addr, - &peer->vpn_addrs.ipv6, - sizeof(peer->vpn_addrs.ipv6)); - hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead); - } + ovpn_peer_hash_vpn_ip(peer); out: spin_unlock_bh(&ovpn->peers->lock); return ret; diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 940cea5372ec0375cfe3e673154a1e0248978409..1adecd0f79f8f4a202110543223fc448c09e5175 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -124,6 +124,7 @@ static inline bool ovpn_peer_hold(struct ovpn_peer *peer) return kref_get_unless_zero(&peer->refcount); }
+void ovpn_peer_release(struct ovpn_peer *peer); void ovpn_peer_release_kref(struct kref *kref);
/** @@ -146,6 +147,7 @@ struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn, struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id); struct ovpn_peer *ovpn_peer_get_by_dst(struct ovpn_struct *ovpn, struct sk_buff *skb); +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer); bool ovpn_peer_check_by_src(struct ovpn_struct *ovpn, struct sk_buff *skb, struct ovpn_peer *peer);
@@ -156,5 +158,8 @@ void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer, struct sk_buff *skb);
void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb); +int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, + const struct sockaddr_storage *ss, + const u8 *local_ip);
#endif /* _NET_OVPN_OVPNPEER_H_ */
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
struct genl_info *info,
struct nlattr **attrs)
+{
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify both remote IPv4 or IPv6 address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
!attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify remote port without IP address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
attrs[OVPN_A_PEER_LOCAL_IPV4]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPv4 address without remote");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
attrs[OVPN_A_PEER_LOCAL_IPV6]) {
I think these consistency checks should account for v4mapped addresses. With remote=v4mapped and local=v6 we'll end up with an incorrect ipv4 "local" address (taken out of the ipv6 address's first 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) out of that.
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPV6 address without remote");
return -EINVAL;
- }
[...]
int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info) {
[...]
- ret = ovpn_nl_peer_modify(peer, info, attrs);
- if (ret < 0) {
ovpn_peer_put(peer);
return ret;
- }
- /* ret == 1 means that VPN IPv4/6 has been modified and rehashing
* is required
*/
- if (ret > 0) {
&& mode == MP ?
I don't see ovpn_nl_peer_modify checking that before returning 1, and in P2P mode ovpn->peers will be NULL.
spin_lock_bh(&ovpn->peers->lock);
ovpn_peer_hash_vpn_ip(peer);
spin_unlock_bh(&ovpn->peers->lock);
- }
- ovpn_peer_put(peer);
- return 0;
+}
int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) {
[...]
- } else {
rcu_read_lock();
hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
hash_entry_id) {
/* skip already dumped peers that were dumped by
* previous invocations
*/
if (last_idx > 0) {
last_idx--;
continue;
}
If a peer that was dumped during a previous invocation is removed in between, we'll miss one that's still present in the overall dump. I don't know how much it matters (I guses it depends on how the results of this dump are used by userspace), so I'll let you decide if this needs to be fixed immediately or if it can be ignored for now.
if (ovpn_nl_send_peer(skb, info, peer,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI) < 0)
break;
/* count peers being dumped during this invocation */
dumped++;
}
rcu_read_unlock();
- }
+out:
- netdev_put(ovpn->dev, &ovpn->dev_tracker);
- /* sum up peers dumped in this message, so that at the next invocation
* we can continue from where we left
*/
- cb->args[1] += dumped;
- return skb->len;
} int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) {
- return -EOPNOTSUPP;
- struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
- struct ovpn_struct *ovpn = info->user_ptr[0];
- struct ovpn_peer *peer;
- u32 peer_id;
- int ret;
- if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
return -EINVAL;
- ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
ovpn_peer_nl_policy, info->extack);
- if (ret)
return ret;
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
- peer = ovpn_peer_get_by_id(ovpn, peer_id);
- if (!peer)
maybe c/p the extack from ovpn_nl_peer_get_doit?
return -ENOENT;
- netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id);
- ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
- ovpn_peer_put(peer);
- return ret;
}
On 04/11/2024 16:14, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
struct genl_info *info,
struct nlattr **attrs)
+{
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify both remote IPv4 or IPv6 address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
!attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify remote port without IP address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
attrs[OVPN_A_PEER_LOCAL_IPV4]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPv4 address without remote");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
attrs[OVPN_A_PEER_LOCAL_IPV6]) {
I think these consistency checks should account for v4mapped addresses. With remote=v4mapped and local=v6 we'll end up with an incorrect ipv4 "local" address (taken out of the ipv6 address's first 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) out of that.
Right, a v4mapped address would fool this check. How about checking if both or none addresses are v4mapped? This way we should prevent such cases.
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPV6 address without remote");
return -EINVAL;
- }
[...]
int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info) {
[...]
- ret = ovpn_nl_peer_modify(peer, info, attrs);
- if (ret < 0) {
ovpn_peer_put(peer);
return ret;
- }
- /* ret == 1 means that VPN IPv4/6 has been modified and rehashing
* is required
*/
- if (ret > 0) {
&& mode == MP ?
I don't see ovpn_nl_peer_modify checking that before returning 1, and in P2P mode ovpn->peers will be NULL.
Right. I was wondering if it's better to add the check on the return statement of ovpn_nl_peer_modify...but I think it's more functional to add it here, as per your suggestion.
spin_lock_bh(&ovpn->peers->lock);
ovpn_peer_hash_vpn_ip(peer);
spin_unlock_bh(&ovpn->peers->lock);
- }
- ovpn_peer_put(peer);
- return 0;
+}
int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) {
[...]
- } else {
rcu_read_lock();
hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
hash_entry_id) {
/* skip already dumped peers that were dumped by
* previous invocations
*/
if (last_idx > 0) {
last_idx--;
continue;
}
If a peer that was dumped during a previous invocation is removed in between, we'll miss one that's still present in the overall dump. I don't know how much it matters (I guses it depends on how the results of this dump are used by userspace), so I'll let you decide if this needs to be fixed immediately or if it can be ignored for now.
True, this is a risk I assumed. Not extremely important if you ask me, but do you have any suggestion how to avoid this in an elegant and lockless way?
IIRC I got inspired by the station dump in the mac80211 code, which probably assumes the same risk.
if (ovpn_nl_send_peer(skb, info, peer,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI) < 0)
break;
/* count peers being dumped during this invocation */
dumped++;
}
rcu_read_unlock();
- }
+out:
- netdev_put(ovpn->dev, &ovpn->dev_tracker);
- /* sum up peers dumped in this message, so that at the next invocation
* we can continue from where we left
*/
- cb->args[1] += dumped;
- return skb->len; }
int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) {
- return -EOPNOTSUPP;
- struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
- struct ovpn_struct *ovpn = info->user_ptr[0];
- struct ovpn_peer *peer;
- u32 peer_id;
- int ret;
- if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
return -EINVAL;
- ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
ovpn_peer_nl_policy, info->extack);
- if (ret)
return ret;
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
- peer = ovpn_peer_get_by_id(ovpn, peer_id);
- if (!peer)
maybe c/p the extack from ovpn_nl_peer_get_doit?
Yes, will do.
Thanks a lot. Regards,
return -ENOENT;
- netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id);
- ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
- ovpn_peer_put(peer);
- return ret; }
2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote:
On 04/11/2024 16:14, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
struct genl_info *info,
struct nlattr **attrs)
+{
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify both remote IPv4 or IPv6 address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
!attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify remote port without IP address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
attrs[OVPN_A_PEER_LOCAL_IPV4]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPv4 address without remote");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
attrs[OVPN_A_PEER_LOCAL_IPV6]) {
I think these consistency checks should account for v4mapped addresses. With remote=v4mapped and local=v6 we'll end up with an incorrect ipv4 "local" address (taken out of the ipv6 address's first 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) out of that.
Right, a v4mapped address would fool this check. How about checking if both or none addresses are v4mapped? This way we should prevent such cases.
I don't know when userspace would use v4mapped addresses, but treating a v4mapped address as a "proper" ipv4 address should work with the rest of the code, since you already have the conversion in ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you could do something like (rough idea and completely untested):
static int get_family(attr_v4, attr_v6) { if (attr_v4) return AF_INET; if (attr_v6) { if (ipv6_addr_v4mapped(attr_v6) return AF_INET; return AF_INET6; } return AF_UNSPEC; }
// in _precheck: // keep the attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6] check // maybe add a similar one for LOCAL_IPV4 && LOCAL_IPV6
remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]); local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]); if (remote_family != local_family) { extack "incompatible address families"; return -EINVAL; }
That would mirror the conversion that ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do.
int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) {
[...]
- } else {
rcu_read_lock();
hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
hash_entry_id) {
/* skip already dumped peers that were dumped by
* previous invocations
*/
if (last_idx > 0) {
last_idx--;
continue;
}
If a peer that was dumped during a previous invocation is removed in between, we'll miss one that's still present in the overall dump. I don't know how much it matters (I guses it depends on how the results of this dump are used by userspace), so I'll let you decide if this needs to be fixed immediately or if it can be ignored for now.
True, this is a risk I assumed. Not extremely important if you ask me, but do you have any suggestion how to avoid this in an elegant and lockless way?
No, inconsistent dumps are an old problem with netlink, so I'm just mentioning it as something to be aware of. You can add genl_dump_check_consistent to let userspace know that it may have gotten incorrect information (you'll need to keep a counter and increment it when a peer is added/removed). On a very busy server you may never manage to get a consistent dump, if peers are going up and down very fast.
There's been some progress for dumping netdevices in commit 759ab1edb56c ("net: store netdevs in an xarray"), but that can still return incorrect data.
On 13/11/2024 17:56, Sabrina Dubroca wrote:
2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote:
On 04/11/2024 16:14, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
struct genl_info *info,
struct nlattr **attrs)
+{
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify both remote IPv4 or IPv6 address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
!attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify remote port without IP address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
attrs[OVPN_A_PEER_LOCAL_IPV4]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPv4 address without remote");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
attrs[OVPN_A_PEER_LOCAL_IPV6]) {
I think these consistency checks should account for v4mapped addresses. With remote=v4mapped and local=v6 we'll end up with an incorrect ipv4 "local" address (taken out of the ipv6 address's first 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) out of that.
Right, a v4mapped address would fool this check. How about checking if both or none addresses are v4mapped? This way we should prevent such cases.
I don't know when userspace would use v4mapped addresses,
It happens when listening on [::] with a v6 socket that has no "IPV6_V6ONLY" set to true (you can check ipv6(7) for more details). This socket can receive IPv4 connections, which are implemented using v4mapped addresses. In this case both remote and local are going to be v4mapped. However, the sanity check should make sure nobody can inject bogus combinations.
but treating a v4mapped address as a "proper" ipv4 address should work with the rest of the code, since you already have the conversion in ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you could do something like (rough idea and completely untested):
static int get_family(attr_v4, attr_v6) { if (attr_v4) return AF_INET; if (attr_v6) { if (ipv6_addr_v4mapped(attr_v6) return AF_INET; return AF_INET6; } return AF_UNSPEC; } // in _precheck: // keep the attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6] check // maybe add a similar one for LOCAL_IPV4 && LOCAL_IPV6
the latter is already covered by:
192 if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && 193 attrs[OVPN_A_PEER_LOCAL_IPV4]) { 194 NL_SET_ERR_MSG_MOD(info->extack, 195 "cannot specify local IPv4 address without remote"); 196 return -EINVAL; 197 } 198 199 if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && 200 attrs[OVPN_A_PEER_LOCAL_IPV6]) { 201 NL_SET_ERR_MSG_MOD(info->extack, 202 "cannot specify local IPV6 address without remote"); 203 return -EINVAL; 204 }
remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]); local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]); if (remote_family != local_family) { extack "incompatible address families"; return -EINVAL; }
That would mirror the conversion that ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do.
Yeah, pretty much what I was suggested, but in a more explicit manner. I like it.
int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) {
[...]
- } else {
rcu_read_lock();
hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
hash_entry_id) {
/* skip already dumped peers that were dumped by
* previous invocations
*/
if (last_idx > 0) {
last_idx--;
continue;
}
If a peer that was dumped during a previous invocation is removed in between, we'll miss one that's still present in the overall dump. I don't know how much it matters (I guses it depends on how the results of this dump are used by userspace), so I'll let you decide if this needs to be fixed immediately or if it can be ignored for now.
True, this is a risk I assumed. Not extremely important if you ask me, but do you have any suggestion how to avoid this in an elegant and lockless way?
No, inconsistent dumps are an old problem with netlink, so I'm just mentioning it as something to be aware of. You can add genl_dump_check_consistent to let userspace know that it may have gotten incorrect information (you'll need to keep a counter and increment it when a peer is added/removed). On a very busy server you may never manage to get a consistent dump, if peers are going up and down very fast.
There's been some progress for dumping netdevices in commit 759ab1edb56c ("net: store netdevs in an xarray"), but that can still return incorrect data.
Got it. I'll keep it as it is for now, since this is not critical.
Thanks a lot.
Regards,
2024-11-14, 10:21:18 +0100, Antonio Quartulli wrote:
On 13/11/2024 17:56, Sabrina Dubroca wrote:
2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote:
On 04/11/2024 16:14, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
struct genl_info *info,
struct nlattr **attrs)
+{
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify both remote IPv4 or IPv6 address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
!attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify remote port without IP address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
attrs[OVPN_A_PEER_LOCAL_IPV4]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPv4 address without remote");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
attrs[OVPN_A_PEER_LOCAL_IPV6]) {
I think these consistency checks should account for v4mapped addresses. With remote=v4mapped and local=v6 we'll end up with an incorrect ipv4 "local" address (taken out of the ipv6 address's first 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) out of that.
Right, a v4mapped address would fool this check. How about checking if both or none addresses are v4mapped? This way we should prevent such cases.
I don't know when userspace would use v4mapped addresses,
It happens when listening on [::] with a v6 socket that has no "IPV6_V6ONLY" set to true (you can check ipv6(7) for more details). This socket can receive IPv4 connections, which are implemented using v4mapped addresses. In this case both remote and local are going to be v4mapped.
I'm familiar with v4mapped addresses, but I wasn't sure the userspace part would actually passed them as peer. But I guess it would when the peer connects over ipv4 on an ipv6 socket.
So the combination of PEER_IPV4 with LOCAL_IPV6(v4mapped) should never happen? In that case I guess we just need to check that we got 2 attributes of the same type (both _IPV4 or both _IPV6) and if we got _IPV6, that they're either both v4mapped or both not. Might be a tiny bit simpler than what I was suggesting below.
However, the sanity check should make sure nobody can inject bogus combinations.
but treating a v4mapped address as a "proper" ipv4 address should work with the rest of the code, since you already have the conversion in ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you could do something like (rough idea and completely untested):
static int get_family(attr_v4, attr_v6) { if (attr_v4) return AF_INET; if (attr_v6) { if (ipv6_addr_v4mapped(attr_v6) return AF_INET; return AF_INET6; } return AF_UNSPEC; } // in _precheck: // keep the attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6] check // maybe add a similar one for LOCAL_IPV4 && LOCAL_IPV6
the latter is already covered by:
192 if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && 193 attrs[OVPN_A_PEER_LOCAL_IPV4]) { 194 NL_SET_ERR_MSG_MOD(info->extack, 195 "cannot specify local IPv4 address without remote"); 196 return -EINVAL; 197 } 198 199 if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && 200 attrs[OVPN_A_PEER_LOCAL_IPV6]) { 201 NL_SET_ERR_MSG_MOD(info->extack, 202 "cannot specify local IPV6 address without remote"); 203 return -EINVAL; 204 }
LOCAL_IPV4 combined with REMOTE_IPV6 should be fine if the remote is v4mapped. And conversely, LOCAL_IPV6 combined with REMOTE_IPV6 isn't ok if remote is v4mapped. So those checks should go away and be replaced with the "get_family" thing, but that requires at most one of the _IPV4/_IPV6 attributes to be present to behave consistently.
remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]); local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]); if (remote_family != local_family) { extack "incompatible address families"; return -EINVAL; }
That would mirror the conversion that ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do.
Yeah, pretty much what I was suggested, but in a more explicit manner. I like it.
Cool.
BTW, I guess scope_id should only be used when it's not a v4mapped address? So the "cannot specify scope id without remote IPv6 address" check should probably use:
if (remote_family != AF_INET6)
(or split it into !attrs[OVPN_A_PEER_REMOTE_IPV6] and remote_family != AF_INET6 to have a fully specific extack message, but maybe that's overkill)
On 20/11/2024 12:12, Sabrina Dubroca wrote:
2024-11-14, 10:21:18 +0100, Antonio Quartulli wrote:
On 13/11/2024 17:56, Sabrina Dubroca wrote:
2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote:
On 04/11/2024 16:14, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
struct genl_info *info,
struct nlattr **attrs)
+{
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
OVPN_A_PEER_ID))
return -EINVAL;
- if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify both remote IPv4 or IPv6 address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
!attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify remote port without IP address");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
attrs[OVPN_A_PEER_LOCAL_IPV4]) {
NL_SET_ERR_MSG_MOD(info->extack,
"cannot specify local IPv4 address without remote");
return -EINVAL;
- }
- if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
attrs[OVPN_A_PEER_LOCAL_IPV6]) {
I think these consistency checks should account for v4mapped addresses. With remote=v4mapped and local=v6 we'll end up with an incorrect ipv4 "local" address (taken out of the ipv6 address's first 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped, we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address) out of that.
Right, a v4mapped address would fool this check. How about checking if both or none addresses are v4mapped? This way we should prevent such cases.
I don't know when userspace would use v4mapped addresses,
It happens when listening on [::] with a v6 socket that has no "IPV6_V6ONLY" set to true (you can check ipv6(7) for more details). This socket can receive IPv4 connections, which are implemented using v4mapped addresses. In this case both remote and local are going to be v4mapped.
I'm familiar with v4mapped addresses, but I wasn't sure the userspace part would actually passed them as peer. But I guess it would when the peer connects over ipv4 on an ipv6 socket.
So the combination of PEER_IPV4 with LOCAL_IPV6(v4mapped) should never happen? In that case I guess we just need to check that we got 2 attributes of the same type (both _IPV4 or both _IPV6) and if we got _IPV6, that they're either both v4mapped or both not. Might be a tiny bit simpler than what I was suggesting below.
Exactly - this is what I was originally suggesting, but your solution is just a bit cleaner imho.
However, the sanity check should make sure nobody can inject bogus combinations.
but treating a v4mapped address as a "proper" ipv4 address should work with the rest of the code, since you already have the conversion in ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you could do something like (rough idea and completely untested):
static int get_family(attr_v4, attr_v6) { if (attr_v4) return AF_INET; if (attr_v6) { if (ipv6_addr_v4mapped(attr_v6) return AF_INET; return AF_INET6; } return AF_UNSPEC; } // in _precheck: // keep the attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6] check // maybe add a similar one for LOCAL_IPV4 && LOCAL_IPV6
the latter is already covered by:
192 if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && 193 attrs[OVPN_A_PEER_LOCAL_IPV4]) { 194 NL_SET_ERR_MSG_MOD(info->extack, 195 "cannot specify local IPv4 address without remote"); 196 return -EINVAL; 197 } 198 199 if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && 200 attrs[OVPN_A_PEER_LOCAL_IPV6]) { 201 NL_SET_ERR_MSG_MOD(info->extack, 202 "cannot specify local IPV6 address without remote"); 203 return -EINVAL; 204 }
LOCAL_IPV4 combined with REMOTE_IPV6 should be fine if the remote is v4mapped. And conversely, LOCAL_IPV6 combined with REMOTE_IPV6 isn't ok if remote is v4mapped. So those checks should go away and be replaced with the "get_family" thing, but that requires at most one of the _IPV4/_IPV6 attributes to be present to behave consistently.
I don't expect to receive a mix of _IPV4 and _IPV6, because the assumption is that either both addresses are v4mapped or none.
Userspace fetches the addresses from the received packet, so I presume they will both be exposed as v4mapped if we are in this special case.
Hence, I don't truly want to allow combining them.
Does it make sense?
remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]); local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]); if (remote_family != local_family) { extack "incompatible address families"; return -EINVAL; }
That would mirror the conversion that ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do.
Yeah, pretty much what I was suggested, but in a more explicit manner. I like it.
Cool.
BTW, I guess scope_id should only be used when it's not a v4mapped address? So the "cannot specify scope id without remote IPv6 address" check should probably use:
if (remote_family != AF_INET6)
Right!
(or split it into !attrs[OVPN_A_PEER_REMOTE_IPV6] and remote_family != AF_INET6 to have a fully specific extack message, but maybe that's overkill)
Yeah, maybe splitting works better.
Thanks a lot!
Regards,
2024-11-20, 12:34:08 +0100, Antonio Quartulli wrote:
On 20/11/2024 12:12, Sabrina Dubroca wrote:
[...]
I don't know when userspace would use v4mapped addresses,
It happens when listening on [::] with a v6 socket that has no "IPV6_V6ONLY" set to true (you can check ipv6(7) for more details). This socket can receive IPv4 connections, which are implemented using v4mapped addresses. In this case both remote and local are going to be v4mapped.
I'm familiar with v4mapped addresses, but I wasn't sure the userspace part would actually passed them as peer. But I guess it would when the peer connects over ipv4 on an ipv6 socket.
So the combination of PEER_IPV4 with LOCAL_IPV6(v4mapped) should never happen? In that case I guess we just need to check that we got 2 attributes of the same type (both _IPV4 or both _IPV6) and if we got _IPV6, that they're either both v4mapped or both not. Might be a tiny bit simpler than what I was suggesting below.
Exactly - this is what I was originally suggesting, but your solution is just a bit cleaner imho.
Ok.
However, the sanity check should make sure nobody can inject bogus combinations.
but treating a v4mapped address as a "proper" ipv4 address should work with the rest of the code, since you already have the conversion in ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you could do something like (rough idea and completely untested):
static int get_family(attr_v4, attr_v6) { if (attr_v4) return AF_INET; if (attr_v6) { if (ipv6_addr_v4mapped(attr_v6) return AF_INET; return AF_INET6; } return AF_UNSPEC; } // in _precheck: // keep the attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6] check // maybe add a similar one for LOCAL_IPV4 && LOCAL_IPV6
the latter is already covered by:
192 if (!attrs[OVPN_A_PEER_REMOTE_IPV4] && 193 attrs[OVPN_A_PEER_LOCAL_IPV4]) { 194 NL_SET_ERR_MSG_MOD(info->extack, 195 "cannot specify local IPv4 address without remote"); 196 return -EINVAL; 197 } 198 199 if (!attrs[OVPN_A_PEER_REMOTE_IPV6] && 200 attrs[OVPN_A_PEER_LOCAL_IPV6]) { 201 NL_SET_ERR_MSG_MOD(info->extack, 202 "cannot specify local IPV6 address without remote"); 203 return -EINVAL; 204 }
LOCAL_IPV4 combined with REMOTE_IPV6 should be fine if the remote is v4mapped. And conversely, LOCAL_IPV6 combined with REMOTE_IPV6 isn't ok if remote is v4mapped. So those checks should go away and be replaced with the "get_family" thing, but that requires at most one of the _IPV4/_IPV6 attributes to be present to behave consistently.
I don't expect to receive a mix of _IPV4 and _IPV6, because the assumption is that either both addresses are v4mapped or none.
Userspace fetches the addresses from the received packet, so I presume they will both be exposed as v4mapped if we are in this special case.
Hence, I don't truly want to allow combining them.
Does it make sense?
Yup, thanks.
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_modify(struct ovpn_peer *peer, struct genl_info *info,
struct nlattr **attrs)
+{
- struct sockaddr_storage ss = {};
- u32 sockfd, interv, timeout;
- struct socket *sock = NULL;
- u8 *local_ip = NULL;
- bool rehash = false;
- int ret;
- if (attrs[OVPN_A_PEER_SOCKET]) {
/* lookup the fd in the kernel table and extract the socket
* object
*/
sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]);
/* sockfd_lookup() increases sock's refcounter */
sock = sockfd_lookup(sockfd, &ret);
if (!sock) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot lookup peer socket (fd=%u): %d",
sockfd, ret);
return -ENOTSOCK;
}
/* Only when using UDP as transport protocol the remote endpoint
* can be configured so that ovpn knows where to send packets
* to.
*
* In case of TCP, the socket is connected to the peer and ovpn
* will just send bytes over it, without the need to specify a
* destination.
*/
if (sock->sk->sk_protocol != IPPROTO_UDP &&
(attrs[OVPN_A_PEER_REMOTE_IPV4] ||
attrs[OVPN_A_PEER_REMOTE_IPV6])) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"unexpected remote IP address for non UDP socket");
sockfd_put(sock);
return -EINVAL;
}
if (peer->sock)
ovpn_socket_put(peer->sock);
peer->sock = ovpn_socket_new(sock, peer);
I don't see anything preventing concurrent updates of peer->sock. I think peer->lock should be taken from the start of ovpn_nl_peer_modify. Concurrent changes to peer->vpn_addrs and peer->keepalive_* are also not prevented with the current code.
if (IS_ERR(peer->sock)) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot encapsulate socket: %ld",
PTR_ERR(peer->sock));
sockfd_put(sock);
peer->sock = NULL;
return -ENOTSOCK;
}
- }
- if (ovpn_nl_attr_sockaddr_remote(attrs, &ss) != AF_UNSPEC) {
/* we carry the local IP in a generic container.
* ovpn_peer_reset_sockaddr() will properly interpret it
* based on ss.ss_family
*/
local_ip = ovpn_nl_attr_local_ip(attrs);
spin_lock_bh(&peer->lock);
/* set peer sockaddr */
ret = ovpn_peer_reset_sockaddr(peer, &ss, local_ip);
if (ret < 0) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot set peer sockaddr: %d",
ret);
spin_unlock_bh(&peer->lock);
return ret;
}
spin_unlock_bh(&peer->lock);
- }
- if (attrs[OVPN_A_PEER_VPN_IPV4]) {
rehash = true;
peer->vpn_addrs.ipv4.s_addr =
nla_get_in_addr(attrs[OVPN_A_PEER_VPN_IPV4]);
- }
- if (attrs[OVPN_A_PEER_VPN_IPV6]) {
rehash = true;
peer->vpn_addrs.ipv6 =
nla_get_in6_addr(attrs[OVPN_A_PEER_VPN_IPV6]);
- }
- /* when setting the keepalive, both parameters have to be configured */
- if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] &&
attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) {
interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]);
timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]);
ovpn_peer_keepalive_set(peer, interv, timeout);
- }
- netdev_dbg(peer->ovpn->dev,
"%s: peer id=%u endpoint=%pIScp/%s VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n",
__func__, peer->id, &ss,
peer->sock->sock->sk->sk_prot_creator->name,
&peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6);
- return rehash ? 1 : 0;
+}
[...]
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
Changes to peer->vpn_addrs are not protected by peers->lock, so those could be getting updated while we're rehashing (and taking peer->lock in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent that).
+{
- struct hlist_nulls_head *nhead;
- if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv4,
sizeof(peer->vpn_addrs.ipv4));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead);
- }
- if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv6,
sizeof(peer->vpn_addrs.ipv6));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead);
- }
+}
On 11/11/2024 16:41, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+static int ovpn_nl_peer_modify(struct ovpn_peer *peer, struct genl_info *info,
struct nlattr **attrs)
+{
- struct sockaddr_storage ss = {};
- u32 sockfd, interv, timeout;
- struct socket *sock = NULL;
- u8 *local_ip = NULL;
- bool rehash = false;
- int ret;
- if (attrs[OVPN_A_PEER_SOCKET]) {
/* lookup the fd in the kernel table and extract the socket
* object
*/
sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]);
/* sockfd_lookup() increases sock's refcounter */
sock = sockfd_lookup(sockfd, &ret);
if (!sock) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot lookup peer socket (fd=%u): %d",
sockfd, ret);
return -ENOTSOCK;
}
/* Only when using UDP as transport protocol the remote endpoint
* can be configured so that ovpn knows where to send packets
* to.
*
* In case of TCP, the socket is connected to the peer and ovpn
* will just send bytes over it, without the need to specify a
* destination.
*/
if (sock->sk->sk_protocol != IPPROTO_UDP &&
(attrs[OVPN_A_PEER_REMOTE_IPV4] ||
attrs[OVPN_A_PEER_REMOTE_IPV6])) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"unexpected remote IP address for non UDP socket");
sockfd_put(sock);
return -EINVAL;
}
if (peer->sock)
ovpn_socket_put(peer->sock);
peer->sock = ovpn_socket_new(sock, peer);
I don't see anything preventing concurrent updates of peer->sock. I think peer->lock should be taken from the start of ovpn_nl_peer_modify. Concurrent changes to peer->vpn_addrs and peer->keepalive_* are also not prevented with the current code.
Yeah, this came up to my mind as well when checking the keepalive worker code.
I'll make sure all updates happen under lock.
if (IS_ERR(peer->sock)) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot encapsulate socket: %ld",
PTR_ERR(peer->sock));
sockfd_put(sock);
peer->sock = NULL;
return -ENOTSOCK;
}
- }
- if (ovpn_nl_attr_sockaddr_remote(attrs, &ss) != AF_UNSPEC) {
/* we carry the local IP in a generic container.
* ovpn_peer_reset_sockaddr() will properly interpret it
* based on ss.ss_family
*/
local_ip = ovpn_nl_attr_local_ip(attrs);
spin_lock_bh(&peer->lock);
/* set peer sockaddr */
ret = ovpn_peer_reset_sockaddr(peer, &ss, local_ip);
if (ret < 0) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot set peer sockaddr: %d",
ret);
spin_unlock_bh(&peer->lock);
return ret;
}
spin_unlock_bh(&peer->lock);
- }
- if (attrs[OVPN_A_PEER_VPN_IPV4]) {
rehash = true;
peer->vpn_addrs.ipv4.s_addr =
nla_get_in_addr(attrs[OVPN_A_PEER_VPN_IPV4]);
- }
- if (attrs[OVPN_A_PEER_VPN_IPV6]) {
rehash = true;
peer->vpn_addrs.ipv6 =
nla_get_in6_addr(attrs[OVPN_A_PEER_VPN_IPV6]);
- }
- /* when setting the keepalive, both parameters have to be configured */
- if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] &&
attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) {
interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]);
timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]);
ovpn_peer_keepalive_set(peer, interv, timeout);
- }
- netdev_dbg(peer->ovpn->dev,
"%s: peer id=%u endpoint=%pIScp/%s VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n",
__func__, peer->id, &ss,
peer->sock->sock->sk->sk_prot_creator->name,
&peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6);
- return rehash ? 1 : 0;
+}
[...]
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
Changes to peer->vpn_addrs are not protected by peers->lock, so those could be getting updated while we're rehashing (and taking peer->lock in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent that).
/me screams :-D
Indeed peers->lock is only about protecting the lists, not the content of the listed objects.
How about acquiring the peers->lock before calling ovpn_nl_peer_modify()? This way we prevent concurrent updates to interfere with each other, while at the same time we avoid concurrent adds/dels of the peer (the second part should already be protected as of today).
None of them is time critical and the lock should avoid the issue you mentioned.
Thanks a lot.
Regards,
+{
- struct hlist_nulls_head *nhead;
- if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv4,
sizeof(peer->vpn_addrs.ipv4));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead);
- }
- if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv6,
sizeof(peer->vpn_addrs.ipv6));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead);
- }
+}
2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote:
On 11/11/2024 16:41, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
Changes to peer->vpn_addrs are not protected by peers->lock, so those could be getting updated while we're rehashing (and taking peer->lock in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent that).
/me screams :-D
Sorry :)
Indeed peers->lock is only about protecting the lists, not the content of the listed objects.
How about acquiring the peers->lock before calling ovpn_nl_peer_modify()?
It seems like it would work. Maybe a bit weird to have conditional locking (MP mode only), but ok. You already have this lock ordering (hold peers->lock before taking peer->lock) in ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing the same thing in the netlink code.
Then I would also do that in ovpn_peer_float to protect that rehash.
It feels like peers->lock is turning into a duplicate of ovpn->lock. ovpn->lock used for P2P mode, peers->lock used equivalently for MP mode. You might consider merging them (but I wouldn't see it as necessary for merging the series unless there's a locking issue with the current proposal).
On 13/11/2024 12:05, Sabrina Dubroca wrote:
2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote:
On 11/11/2024 16:41, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
Changes to peer->vpn_addrs are not protected by peers->lock, so those could be getting updated while we're rehashing (and taking peer->lock in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent that).
/me screams :-D
Sorry :)
Indeed peers->lock is only about protecting the lists, not the content of the listed objects.
How about acquiring the peers->lock before calling ovpn_nl_peer_modify()?
It seems like it would work. Maybe a bit weird to have conditional locking (MP mode only), but ok. You already have this lock ordering (hold peers->lock before taking peer->lock) in ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing the same thing in the netlink code.
Yeah.
Then I would also do that in ovpn_peer_float to protect that rehash.
I am not extremely comfortable with this, because it means acquiring peers->lock on every packet (right now we do so only on peer->lock) and it may defeat the advantage of the RCU locking on the hashtables. Wouldn't you agree?
An alternative would be to hold peer->lock for the entire function, but this will lead to dead locks...no go either.
It feels like peers->lock is turning into a duplicate of ovpn->lock. ovpn->lock used for P2P mode, peers->lock used equivalently for MP mode. You might consider merging them (but I wouldn't see it as necessary for merging the series unless there's a locking issue with the current proposal).
I agree: ovpn->lock was introduced to protect ovpn's fields, but actually the only one e protect is peer.
They are truly the same and I could therefore get rid of ovpn->peers->lock and always use ovpn->lock.
Will see how invasive this is and decide whether to commit it to v12 or not.
Thanks!
Regards,
2024-11-14, 11:32:36 +0100, Antonio Quartulli wrote:
On 13/11/2024 12:05, Sabrina Dubroca wrote:
2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote:
On 11/11/2024 16:41, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
Changes to peer->vpn_addrs are not protected by peers->lock, so those could be getting updated while we're rehashing (and taking peer->lock in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent that).
/me screams :-D
Sorry :)
Indeed peers->lock is only about protecting the lists, not the content of the listed objects.
How about acquiring the peers->lock before calling ovpn_nl_peer_modify()?
It seems like it would work. Maybe a bit weird to have conditional locking (MP mode only), but ok. You already have this lock ordering (hold peers->lock before taking peer->lock) in ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing the same thing in the netlink code.
Yeah.
Then I would also do that in ovpn_peer_float to protect that rehash.
I am not extremely comfortable with this, because it means acquiring peers->lock on every packet (right now we do so only on peer->lock) and it may defeat the advantage of the RCU locking on the hashtables. Wouldn't you agree?
Hmpf, yeah. Then I think you could keep most of the current code, except doing the rehash under both locks (peers + peer), and get ss+sa_len for the rehash directly from peer->bind (instead of using the ones we just defined locally in ovpn_peer_float, since they may have changed while we released peer->lock to grab peers->lock). We may end up "rehashing" twice into the same bucket if we have 2 concurrent peer_float calls (call 1 sets remote r1, call 2 sets a new one r2, call 1 hashes according to r2, call 2 also rehashes based on r2). That should be ok (it can happen anyway that a "real" rehash lands in the same bucket).
peer_float { spin_lock(peer) match/update bind spin_unlock(peer)
if (MP) { spin_lock(peers) spin_lock(peer) rehash using peer->bind->remote rather than ss spin_unlock(peer) spin_unlock(peers) } }
Does that sound reasonable?
On 29/11/2024 18:00, Sabrina Dubroca wrote:
2024-11-14, 11:32:36 +0100, Antonio Quartulli wrote:
On 13/11/2024 12:05, Sabrina Dubroca wrote:
2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote:
On 11/11/2024 16:41, Sabrina Dubroca wrote:
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
Changes to peer->vpn_addrs are not protected by peers->lock, so those could be getting updated while we're rehashing (and taking peer->lock in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent that).
/me screams :-D
Sorry :)
Indeed peers->lock is only about protecting the lists, not the content of the listed objects.
How about acquiring the peers->lock before calling ovpn_nl_peer_modify()?
It seems like it would work. Maybe a bit weird to have conditional locking (MP mode only), but ok. You already have this lock ordering (hold peers->lock before taking peer->lock) in ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing the same thing in the netlink code.
Yeah.
Then I would also do that in ovpn_peer_float to protect that rehash.
I am not extremely comfortable with this, because it means acquiring peers->lock on every packet (right now we do so only on peer->lock) and it may defeat the advantage of the RCU locking on the hashtables. Wouldn't you agree?
Hmpf, yeah. Then I think you could keep most of the current code, except doing the rehash under both locks (peers + peer), and get ss+sa_len for the rehash directly from peer->bind (instead of using the ones we just defined locally in ovpn_peer_float, since they may have changed while we released peer->lock to grab peers->lock). We may end up "rehashing" twice into the same bucket if we have 2 concurrent peer_float calls (call 1 sets remote r1, call 2 sets a new one r2, call 1 hashes according to r2, call 2 also rehashes based on r2). That should be ok (it can happen anyway that a "real" rehash lands in the same bucket).
I think the double rehashing is ok. It's a double float happening so we expect a double rehashing in any case.
peer_float { spin_lock(peer) match/update bind spin_unlock(peer)
if (MP) { spin_lock(peers) spin_lock(peer) rehash using peer->bind->remote rather than ss spin_unlock(peer) spin_unlock(peers) } }
Does that sound reasonable?
Yeah, not very elegant, but this is what we need :)
Thanks!
Regards,
[I'm still thinking about the locking problems for ovpn_peer_float, but just noticed this while staring at the rehash code]
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
+{
- struct hlist_nulls_head *nhead;
- if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
s/hash_entry_transp_addr/hash_entry_addr4/ ?
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv4,
sizeof(peer->vpn_addrs.ipv4));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead);
- }
- if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
s/hash_entry_transp_addr/hash_entry_addr6/ ?
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv6,
sizeof(peer->vpn_addrs.ipv6));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead);
- }
+}
On 21/11/2024 17:02, Sabrina Dubroca wrote:
[I'm still thinking about the locking problems for ovpn_peer_float, but just noticed this while staring at the rehash code]
2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
+void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
- __must_hold(&peer->ovpn->peers->lock)
+{
- struct hlist_nulls_head *nhead;
- if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
s/hash_entry_transp_addr/hash_entry_addr4/ ?
cr0p. very good catch! Thanks
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv4,
sizeof(peer->vpn_addrs.ipv4));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead);
- }
- if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) {
/* remove potential old hashing */
hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
s/hash_entry_transp_addr/hash_entry_addr6/ ?
Thanks² This is what happens when you copy/paste code around.
nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
&peer->vpn_addrs.ipv6,
sizeof(peer->vpn_addrs.ipv6));
hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead);
- }
+}
Regards,
This change introduces the netlink commands needed to add, get, delete and swap keys for a specific peer.
Userspace is expected to use these commands to create, inspect (non sensible data only), destroy and rotate session keys for a specific peer.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/crypto.c | 42 ++++++ drivers/net/ovpn/crypto.h | 4 + drivers/net/ovpn/crypto_aead.c | 17 +++ drivers/net/ovpn/crypto_aead.h | 2 + drivers/net/ovpn/netlink.c | 308 ++++++++++++++++++++++++++++++++++++++++- 5 files changed, 369 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c index f1f7510e2f735e367f96eb4982ba82c9af3c8bfc..cfb014c947b968752ba3dab84ec42dc8ec086379 100644 --- a/drivers/net/ovpn/crypto.c +++ b/drivers/net/ovpn/crypto.c @@ -151,3 +151,45 @@ void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs)
spin_unlock_bh(&cs->lock); } + +/** + * ovpn_crypto_config_get - populate keyconf object with non-sensible key data + * @cs: the crypto state to extract the key data from + * @slot: the specific slot to inspect + * @keyconf: the output object to populate + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_crypto_config_get(struct ovpn_crypto_state *cs, + enum ovpn_key_slot slot, + struct ovpn_key_config *keyconf) +{ + struct ovpn_crypto_key_slot *ks; + int idx; + + switch (slot) { + case OVPN_KEY_SLOT_PRIMARY: + idx = cs->primary_idx; + break; + case OVPN_KEY_SLOT_SECONDARY: + idx = !cs->primary_idx; + break; + default: + return -EINVAL; + } + + rcu_read_lock(); + ks = rcu_dereference(cs->slots[idx]); + if (!ks || (ks && !ovpn_crypto_key_slot_hold(ks))) { + rcu_read_unlock(); + return -ENOENT; + } + rcu_read_unlock(); + + keyconf->cipher_alg = ovpn_aead_crypto_alg(ks); + keyconf->key_id = ks->key_id; + + ovpn_crypto_key_slot_put(ks); + + return 0; +} diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h index 3b437d26b531c3034cca5343c755ef9c7ef57276..96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe 100644 --- a/drivers/net/ovpn/crypto.h +++ b/drivers/net/ovpn/crypto.h @@ -136,4 +136,8 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state *cs);
void ovpn_crypto_key_slots_swap(struct ovpn_crypto_state *cs);
+int ovpn_crypto_config_get(struct ovpn_crypto_state *cs, + enum ovpn_key_slot slot, + struct ovpn_key_config *keyconf); + #endif /* _NET_OVPN_OVPNCRYPTO_H_ */ diff --git a/drivers/net/ovpn/crypto_aead.c b/drivers/net/ovpn/crypto_aead.c index 072bb0881764752520e8e26e18337c1274ce1aa4..25e4e4a453b2bc499aec9a192fe3d86ba1aac511 100644 --- a/drivers/net/ovpn/crypto_aead.c +++ b/drivers/net/ovpn/crypto_aead.c @@ -367,3 +367,20 @@ ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc) ovpn_aead_crypto_key_slot_destroy(ks); return ERR_PTR(ret); } + +enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks) +{ + const char *alg_name; + + if (!ks->encrypt) + return OVPN_CIPHER_ALG_NONE; + + alg_name = crypto_tfm_alg_name(crypto_aead_tfm(ks->encrypt)); + + if (!strcmp(alg_name, ALG_NAME_AES)) + return OVPN_CIPHER_ALG_AES_GCM; + else if (!strcmp(alg_name, ALG_NAME_CHACHAPOLY)) + return OVPN_CIPHER_ALG_CHACHA20_POLY1305; + else + return OVPN_CIPHER_ALG_NONE; +} diff --git a/drivers/net/ovpn/crypto_aead.h b/drivers/net/ovpn/crypto_aead.h index 77ee8141599bc06b0dc664c5b0a4dae660a89238..fb65be82436edd7ff89b171f7a89c9103b617d1f 100644 --- a/drivers/net/ovpn/crypto_aead.h +++ b/drivers/net/ovpn/crypto_aead.h @@ -28,4 +28,6 @@ struct ovpn_crypto_key_slot * ovpn_aead_crypto_key_slot_new(const struct ovpn_key_config *kc); void ovpn_aead_crypto_key_slot_destroy(struct ovpn_crypto_key_slot *ks);
+enum ovpn_cipher_alg ovpn_aead_crypto_alg(struct ovpn_crypto_key_slot *ks); + #endif /* _NET_OVPN_OVPNAEAD_H_ */ diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index d504445325ef82db04f87367c858adaf025f6297..fe9377b9b8145784917460cd5f222bc7fae4d8db 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -18,6 +18,7 @@ #include "netlink.h" #include "netlink-gen.h" #include "bind.h" +#include "crypto.h" #include "packet.h" #include "peer.h" #include "socket.h" @@ -679,24 +680,323 @@ int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info) return ret; }
+static int ovpn_nl_get_key_dir(struct genl_info *info, struct nlattr *key, + enum ovpn_cipher_alg cipher, + struct ovpn_key_direction *dir) +{ + struct nlattr *attrs[OVPN_A_KEYDIR_MAX + 1]; + int ret; + + ret = nla_parse_nested(attrs, OVPN_A_KEYDIR_MAX, key, + ovpn_keydir_nl_policy, info->extack); + if (ret) + return ret; + + switch (cipher) { + case OVPN_CIPHER_ALG_AES_GCM: + case OVPN_CIPHER_ALG_CHACHA20_POLY1305: + if (NL_REQ_ATTR_CHECK(info->extack, key, attrs, + OVPN_A_KEYDIR_CIPHER_KEY) || + NL_REQ_ATTR_CHECK(info->extack, key, attrs, + OVPN_A_KEYDIR_NONCE_TAIL)) + return -EINVAL; + + dir->cipher_key = nla_data(attrs[OVPN_A_KEYDIR_CIPHER_KEY]); + dir->cipher_key_size = nla_len(attrs[OVPN_A_KEYDIR_CIPHER_KEY]); + + /* These algorithms require a 96bit nonce, + * Construct it by combining 4-bytes packet id and + * 8-bytes nonce-tail from userspace + */ + dir->nonce_tail = nla_data(attrs[OVPN_A_KEYDIR_NONCE_TAIL]); + dir->nonce_tail_size = nla_len(attrs[OVPN_A_KEYDIR_NONCE_TAIL]); + break; + default: + NL_SET_ERR_MSG_MOD(info->extack, "unsupported cipher"); + return -EINVAL; + } + + return 0; +} + +/** + * ovpn_nl_key_new_doit - configure a new key for the specified peer + * @skb: incoming netlink message + * @info: genetlink metadata + * + * This function allows the user to install a new key in the peer crypto + * state. + * Each peer has two 'slots', namely 'primary' and 'secondary', where + * keys can be installed. The key in the 'primary' slot is used for + * encryption, while both keys can be used for decryption by matching the + * key ID carried in the incoming packet. + * + * The user is responsible for rotating keys when necessary. The user + * may fetch peer traffic statistics via netlink in order to better + * identify the right time to rotate keys. + * The renegotiation follows these steps: + * 1. a new key is computed by the user and is installed in the 'secondary' + * slot + * 2. at user discretion (usually after a predetermined time) 'primary' and + * 'secondary' contents are swapped and the new key starts being used for + * encryption, while the old key is kept around for decryption of late + * packets. + * + * Return: 0 on success or a negative error code otherwise. + */ int ovpn_nl_key_new_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1]; + struct ovpn_struct *ovpn = info->user_ptr[0]; + struct ovpn_peer_key_reset pkr; + struct ovpn_peer *peer; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_KEYCONF)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX, + info->attrs[OVPN_A_KEYCONF], + ovpn_keyconf_nl_policy, info->extack); + if (ret) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_PEER_ID)) + return -EINVAL; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_SLOT) || + NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_KEY_ID) || + NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_CIPHER_ALG) || + NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_ENCRYPT_DIR) || + NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_DECRYPT_DIR)) + return -EINVAL; + + peer_id = nla_get_u32(attrs[OVPN_A_KEYCONF_PEER_ID]); + pkr.slot = nla_get_u8(attrs[OVPN_A_KEYCONF_SLOT]); + pkr.key.key_id = nla_get_u16(attrs[OVPN_A_KEYCONF_KEY_ID]); + pkr.key.cipher_alg = nla_get_u16(attrs[OVPN_A_KEYCONF_CIPHER_ALG]); + + ret = ovpn_nl_get_key_dir(info, attrs[OVPN_A_KEYCONF_ENCRYPT_DIR], + pkr.key.cipher_alg, &pkr.key.encrypt); + if (ret < 0) + return ret; + + ret = ovpn_nl_get_key_dir(info, attrs[OVPN_A_KEYCONF_DECRYPT_DIR], + pkr.key.cipher_alg, &pkr.key.decrypt); + if (ret < 0) + return ret; + + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "no peer with id %u to set key for", + peer_id); + return -ENOENT; + } + + ret = ovpn_crypto_state_reset(&peer->crypto, &pkr); + if (ret < 0) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot install new key for peer %u", + peer_id); + goto out; + } + + netdev_dbg(ovpn->dev, "%s: new key installed (id=%u) for peer %u\n", + __func__, pkr.key.key_id, peer_id); +out: + ovpn_peer_put(peer); + return ret; +} + +static int ovpn_nl_send_key(struct sk_buff *skb, const struct genl_info *info, + u32 peer_id, enum ovpn_key_slot slot, + const struct ovpn_key_config *keyconf, u32 portid, + u32 seq, int flags) +{ + struct nlattr *attr; + void *hdr; + + hdr = genlmsg_put(skb, portid, seq, &ovpn_nl_family, flags, + OVPN_CMD_KEY_GET); + if (!hdr) + return -ENOBUFS; + + attr = nla_nest_start(skb, OVPN_A_KEYCONF); + if (!attr) + goto err; + + if (nla_put_u32(skb, OVPN_A_KEYCONF_PEER_ID, peer_id)) + goto err; + + if (nla_put_u32(skb, OVPN_A_KEYCONF_SLOT, slot) || + nla_put_u32(skb, OVPN_A_KEYCONF_KEY_ID, keyconf->key_id) || + nla_put_u32(skb, OVPN_A_KEYCONF_CIPHER_ALG, keyconf->cipher_alg)) + goto err; + + nla_nest_end(skb, attr); + genlmsg_end(skb, hdr); + + return 0; +err: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; }
int ovpn_nl_key_get_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1]; + struct ovpn_struct *ovpn = info->user_ptr[0]; + struct ovpn_key_config keyconf = { 0 }; + enum ovpn_key_slot slot; + struct ovpn_peer *peer; + struct sk_buff *msg; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_KEYCONF)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX, + info->attrs[OVPN_A_KEYCONF], + ovpn_keyconf_nl_policy, info->extack); + if (ret) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_PEER_ID)) + return -EINVAL; + + peer_id = nla_get_u32(attrs[OVPN_A_KEYCONF_PEER_ID]); + + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot find peer with id %u", 0); + return -ENOENT; + } + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_SLOT)) + return -EINVAL; + + slot = nla_get_u32(attrs[OVPN_A_KEYCONF_SLOT]); + + ret = ovpn_crypto_config_get(&peer->crypto, slot, &keyconf); + if (ret < 0) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "cannot extract key from slot %u for peer %u", + slot, peer_id); + goto err; + } + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) { + ret = -ENOMEM; + goto err; + } + + ret = ovpn_nl_send_key(msg, info, peer->id, slot, &keyconf, + info->snd_portid, info->snd_seq, 0); + if (ret < 0) { + nlmsg_free(msg); + goto err; + } + + ret = genlmsg_reply(msg, info); +err: + ovpn_peer_put(peer); + return ret; }
int ovpn_nl_key_swap_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct ovpn_struct *ovpn = info->user_ptr[0]; + struct nlattr *attrs[OVPN_A_PEER_MAX + 1]; + struct ovpn_peer *peer; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_KEYCONF)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX, + info->attrs[OVPN_A_KEYCONF], + ovpn_keyconf_nl_policy, info->extack); + if (ret) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_PEER_ID)) + return -EINVAL; + + peer_id = nla_get_u32(attrs[OVPN_A_KEYCONF_PEER_ID]); + + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "no peer with id %u to swap keys for", + peer_id); + return -ENOENT; + } + + ovpn_crypto_key_slots_swap(&peer->crypto); + ovpn_peer_put(peer); + + return 0; }
int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) { - return -EOPNOTSUPP; + struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1]; + struct ovpn_struct *ovpn = info->user_ptr[0]; + enum ovpn_key_slot slot; + struct ovpn_peer *peer; + u32 peer_id; + int ret; + + if (GENL_REQ_ATTR_CHECK(info, OVPN_A_KEYCONF)) + return -EINVAL; + + ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX, + info->attrs[OVPN_A_KEYCONF], + ovpn_keyconf_nl_policy, info->extack); + if (ret) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_PEER_ID)) + return -EINVAL; + + if (ret) + return ret; + + if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs, + OVPN_A_KEYCONF_SLOT)) + return -EINVAL; + + peer_id = nla_get_u32(attrs[OVPN_A_KEYCONF_PEER_ID]); + slot = nla_get_u8(attrs[OVPN_A_KEYCONF_SLOT]); + + peer = ovpn_peer_get_by_id(ovpn, peer_id); + if (!peer) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "no peer with id %u to delete key for", + peer_id); + return -ENOENT; + } + + ovpn_crypto_key_slot_delete(&peer->crypto, slot); + ovpn_peer_put(peer); + + return 0; }
/**
2024-10-29, 11:47:32 +0100, Antonio Quartulli wrote:
This change introduces the netlink commands needed to add, get, delete and swap keys for a specific peer.
Userspace is expected to use these commands to create, inspect (non sensible data only), destroy and rotate session keys for a specific
nit: s/sensible/sensitive/
+int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
enum ovpn_key_slot slot,
struct ovpn_key_config *keyconf)
+{
[...]
- rcu_read_lock();
- ks = rcu_dereference(cs->slots[idx]);
- if (!ks || (ks && !ovpn_crypto_key_slot_hold(ks))) {
rcu_read_unlock();
return -ENOENT;
- }
- rcu_read_unlock();
You could stay under rcu_read_lock a little bit longer and avoid taking a reference just to release it immediately.
- keyconf->cipher_alg = ovpn_aead_crypto_alg(ks);
- keyconf->key_id = ks->key_id;
- ovpn_crypto_key_slot_put(ks);
- return 0;
+}
[...]
int ovpn_nl_key_get_doit(struct sk_buff *skb, struct genl_info *info) {
[...]
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_PEER_ID))
return -EINVAL;
- peer_id = nla_get_u32(attrs[OVPN_A_KEYCONF_PEER_ID]);
- peer = ovpn_peer_get_by_id(ovpn, peer_id);
- if (!peer) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot find peer with id %u", 0);
peer_id?
return -ENOENT;
- }
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_SLOT))
return -EINVAL;
Move this check before ovpn_peer_get_by_id? We're leaking a reference on the peer.
- slot = nla_get_u32(attrs[OVPN_A_KEYCONF_SLOT]);
- ret = ovpn_crypto_config_get(&peer->crypto, slot, &keyconf);
- if (ret < 0) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot extract key from slot %u for peer %u",
slot, peer_id);
goto err;
- }
- msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
- if (!msg) {
ret = -ENOMEM;
goto err;
- }
- ret = ovpn_nl_send_key(msg, info, peer->id, slot, &keyconf,
info->snd_portid, info->snd_seq, 0);
info->snd_portid and info->snd_seq can be extracted from info directly in ovpn_nl_send_key since there's no other caller, and flags=0 can be skipped as well.
- if (ret < 0) {
nlmsg_free(msg);
goto err;
- }
- ret = genlmsg_reply(msg, info);
+err:
- ovpn_peer_put(peer);
- return ret;
}
[...]
int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) {
- return -EOPNOTSUPP;
- struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1];
- struct ovpn_struct *ovpn = info->user_ptr[0];
- enum ovpn_key_slot slot;
- struct ovpn_peer *peer;
- u32 peer_id;
- int ret;
- if (GENL_REQ_ATTR_CHECK(info, OVPN_A_KEYCONF))
return -EINVAL;
- ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX,
info->attrs[OVPN_A_KEYCONF],
ovpn_keyconf_nl_policy, info->extack);
- if (ret)
return ret;
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_PEER_ID))
return -EINVAL;
- if (ret)
return ret;
leftover?
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_SLOT))
return -EINVAL;
On 05/11/2024 11:16, Sabrina Dubroca wrote:
2024-10-29, 11:47:32 +0100, Antonio Quartulli wrote:
This change introduces the netlink commands needed to add, get, delete and swap keys for a specific peer.
Userspace is expected to use these commands to create, inspect (non sensible data only), destroy and rotate session keys for a specific
nit: s/sensible/sensitive/
+int ovpn_crypto_config_get(struct ovpn_crypto_state *cs,
enum ovpn_key_slot slot,
struct ovpn_key_config *keyconf)
+{
[...]
- rcu_read_lock();
- ks = rcu_dereference(cs->slots[idx]);
- if (!ks || (ks && !ovpn_crypto_key_slot_hold(ks))) {
rcu_read_unlock();
return -ENOENT;
- }
- rcu_read_unlock();
You could stay under rcu_read_lock a little bit longer and avoid taking a reference just to release it immediately.
ACK.
- keyconf->cipher_alg = ovpn_aead_crypto_alg(ks);
- keyconf->key_id = ks->key_id;
- ovpn_crypto_key_slot_put(ks);
- return 0;
+}
[...]
int ovpn_nl_key_get_doit(struct sk_buff *skb, struct genl_info *info) {
[...]
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_PEER_ID))
return -EINVAL;
- peer_id = nla_get_u32(attrs[OVPN_A_KEYCONF_PEER_ID]);
- peer = ovpn_peer_get_by_id(ovpn, peer_id);
- if (!peer) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot find peer with id %u", 0);
peer_id?
return -ENOENT;
- }
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_SLOT))
return -EINVAL;
Move this check before ovpn_peer_get_by_id? We're leaking a reference on the peer.
ACK
- slot = nla_get_u32(attrs[OVPN_A_KEYCONF_SLOT]);
- ret = ovpn_crypto_config_get(&peer->crypto, slot, &keyconf);
- if (ret < 0) {
NL_SET_ERR_MSG_FMT_MOD(info->extack,
"cannot extract key from slot %u for peer %u",
slot, peer_id);
goto err;
- }
- msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
- if (!msg) {
ret = -ENOMEM;
goto err;
- }
- ret = ovpn_nl_send_key(msg, info, peer->id, slot, &keyconf,
info->snd_portid, info->snd_seq, 0);
info->snd_portid and info->snd_seq can be extracted from info directly in ovpn_nl_send_key since there's no other caller, and flags=0 can be skipped as well.
I tried to keep the signature similar to send_peer, but indeed they can both be simplified.
- if (ret < 0) {
nlmsg_free(msg);
goto err;
- }
- ret = genlmsg_reply(msg, info);
+err:
- ovpn_peer_put(peer);
- return ret; }
[...]
int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) {
- return -EOPNOTSUPP;
- struct nlattr *attrs[OVPN_A_KEYCONF_MAX + 1];
- struct ovpn_struct *ovpn = info->user_ptr[0];
- enum ovpn_key_slot slot;
- struct ovpn_peer *peer;
- u32 peer_id;
- int ret;
- if (GENL_REQ_ATTR_CHECK(info, OVPN_A_KEYCONF))
return -EINVAL;
- ret = nla_parse_nested(attrs, OVPN_A_KEYCONF_MAX,
info->attrs[OVPN_A_KEYCONF],
ovpn_keyconf_nl_policy, info->extack);
- if (ret)
return ret;
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_PEER_ID))
return -EINVAL;
- if (ret)
return ret;
leftover?
very likely.
Thanks a lot
Regards,
- if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_KEYCONF], attrs,
OVPN_A_KEYCONF_SLOT))
return -EINVAL;
IV wrap-around is cryptographically dangerous for a number of ciphers, therefore kill the key and inform userspace (via netlink) should the IV space go exhausted.
Userspace has two ways of deciding when the key has to be renewed before exhausting the IV space: 1) time based approach: after X seconds/minutes userspace generates a new key and sends it to the kernel. This is based on guestimate and normally default timer value works well.
2) packet count based approach: after X packets/bytes userspace generates a new key and sends it to the kernel. Userspace keeps track of the amount of traffic by periodically polling GET_PEER and fetching the VPN/LINK stats.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/crypto.c | 19 ++++++++++++++++ drivers/net/ovpn/crypto.h | 2 ++ drivers/net/ovpn/io.c | 13 +++++++++++ drivers/net/ovpn/netlink.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/netlink.h | 2 ++ 5 files changed, 91 insertions(+)
diff --git a/drivers/net/ovpn/crypto.c b/drivers/net/ovpn/crypto.c index cfb014c947b968752ba3dab84ec42dc8ec086379..a2346bc630be9b60604282d20a33321c277bc56f 100644 --- a/drivers/net/ovpn/crypto.c +++ b/drivers/net/ovpn/crypto.c @@ -55,6 +55,25 @@ void ovpn_crypto_state_release(struct ovpn_crypto_state *cs) } }
+/* removes the key matching the specified id from the crypto context */ +void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id) +{ + struct ovpn_crypto_key_slot *ks = NULL; + + spin_lock_bh(&cs->lock); + if (rcu_access_pointer(cs->slots[0])->key_id == key_id) { + ks = rcu_replace_pointer(cs->slots[0], NULL, + lockdep_is_held(&cs->lock)); + } else if (rcu_access_pointer(cs->slots[1])->key_id == key_id) { + ks = rcu_replace_pointer(cs->slots[1], NULL, + lockdep_is_held(&cs->lock)); + } + spin_unlock_bh(&cs->lock); + + if (ks) + ovpn_crypto_key_slot_put(ks); +} + /* Reset the ovpn_crypto_state object in a way that is atomic * to RCU readers. */ diff --git a/drivers/net/ovpn/crypto.h b/drivers/net/ovpn/crypto.h index 96fd41f4b81b74f8a3ecfe33ee24ba0122d222fe..b7a7be752d54f1f8bcd548e0a714511efcaf68a8 100644 --- a/drivers/net/ovpn/crypto.h +++ b/drivers/net/ovpn/crypto.h @@ -140,4 +140,6 @@ int ovpn_crypto_config_get(struct ovpn_crypto_state *cs, enum ovpn_key_slot slot, struct ovpn_key_config *keyconf);
+void ovpn_crypto_kill_key(struct ovpn_crypto_state *cs, u8 key_id); + #endif /* _NET_OVPN_OVPNCRYPTO_H_ */ diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261..c04791a508e5c0ae292b7b5d8098096c676b2f99 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -248,6 +248,19 @@ void ovpn_encrypt_post(void *data, int ret) if (likely(ovpn_skb_cb(skb)->req)) aead_request_free(ovpn_skb_cb(skb)->req);
+ if (unlikely(ret == -ERANGE)) { + /* we ran out of IVs and we must kill the key as it can't be + * use anymore + */ + netdev_warn(peer->ovpn->dev, + "killing key %u for peer %u\n", ks->key_id, + peer->id); + ovpn_crypto_kill_key(&peer->crypto, ks->key_id); + /* let userspace know so that a new key must be negotiated */ + ovpn_nl_key_swap_notify(peer, ks->key_id); + goto err; + } + if (unlikely(ret < 0)) goto err;
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index fe9377b9b8145784917460cd5f222bc7fae4d8db..2b2ba1a810a0e87fb9ffb43b988fa52725a9589b 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) return 0; }
+/** + * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed + * @peer: the peer whose key needs to be renewed + * @key_id: the ID of the key that needs to be renewed + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id) +{ + struct nlattr *k_attr; + struct sk_buff *msg; + int ret = -EMSGSIZE; + void *hdr; + + netdev_info(peer->ovpn->dev, "peer with id %u must rekey - primary key unusable.\n", + peer->id); + + msg = nlmsg_new(100, GFP_ATOMIC); + if (!msg) + return -ENOMEM; + + hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_KEY_SWAP_NTF); + if (!hdr) { + ret = -ENOBUFS; + goto err_free_msg; + } + + if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex)) + goto err_cancel_msg; + + k_attr = nla_nest_start(msg, OVPN_A_KEYCONF); + if (!k_attr) + goto err_cancel_msg; + + if (nla_put_u32(msg, OVPN_A_KEYCONF_PEER_ID, peer->id)) + goto err_cancel_msg; + + if (nla_put_u16(msg, OVPN_A_KEYCONF_KEY_ID, key_id)) + goto err_cancel_msg; + + nla_nest_end(msg, k_attr); + genlmsg_end(msg, hdr); + + genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg, + 0, OVPN_NLGRP_PEERS, GFP_ATOMIC); + + return 0; + +err_cancel_msg: + genlmsg_cancel(msg, hdr); +err_free_msg: + nlmsg_free(msg); + return ret; +} + /** * ovpn_nl_register - perform any needed registration in the NL subsustem * diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h index 9e87cf11d1e9813b7a75ddf3705ab7d5fabe899f..33390b13c8904d40b629662005a9eb92ff617c3b 100644 --- a/drivers/net/ovpn/netlink.h +++ b/drivers/net/ovpn/netlink.h @@ -12,4 +12,6 @@ int ovpn_nl_register(void); void ovpn_nl_unregister(void);
+int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id); + #endif /* _NET_OVPN_NETLINK_H_ */
2024-10-29, 11:47:33 +0100, Antonio Quartulli wrote:
+int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id) +{
[...]
- nla_nest_end(msg, k_attr);
- genlmsg_end(msg, hdr);
- genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg,
0, OVPN_NLGRP_PEERS, GFP_ATOMIC);
Is openvpn meant to support moving the device to a different netns? In that case I'm not sure the netns the ovpn netdevice is in is the right one, the userspace client will be in the encap socket's netns instead of the netdevice's?
(same thing in the next patch)
On 05/11/2024 11:33, Sabrina Dubroca wrote:
2024-10-29, 11:47:33 +0100, Antonio Quartulli wrote:
+int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id) +{
[...]
- nla_nest_end(msg, k_attr);
- genlmsg_end(msg, hdr);
- genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg,
0, OVPN_NLGRP_PEERS, GFP_ATOMIC);
Is openvpn meant to support moving the device to a different netns? In that case I'm not sure the netns the ovpn netdevice is in is the right one, the userspace client will be in the encap socket's netns instead of the netdevice's?
(same thing in the next patch)
Well, moving between netns's may not be among the most common use cases, but I can see people doing all kind of weird things, if not forbidden.
Hence, I would not assume the netdevice to always stay in the same netns all time long.
This said, what you say assumes that the userspace process won't change netns after having added the peer. I think we can live with that.
I will change this call to use the sock's netns then.
Thanks a lot!
Regards,
2024-11-12, 16:44:09 +0100, Antonio Quartulli wrote:
On 05/11/2024 11:33, Sabrina Dubroca wrote:
2024-10-29, 11:47:33 +0100, Antonio Quartulli wrote:
+int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id) +{
[...]
- nla_nest_end(msg, k_attr);
- genlmsg_end(msg, hdr);
- genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg,
0, OVPN_NLGRP_PEERS, GFP_ATOMIC);
Is openvpn meant to support moving the device to a different netns? In that case I'm not sure the netns the ovpn netdevice is in is the right one, the userspace client will be in the encap socket's netns instead of the netdevice's?
(same thing in the next patch)
Well, moving between netns's may not be among the most common use cases, but I can see people doing all kind of weird things, if not forbidden.
The idea would be that only the ovpn device is in one particular netns, so that no packets can make it out of that netns unencrypted. I don't know if anybody actually does that.
Hence, I would not assume the netdevice to always stay in the same netns all time long.
This said, what you say assumes that the userspace process won't change netns after having added the peer.
That shouldn't matter as long as it's still listening to multicast messages in the original netns.
Around that same "which netns" question, ovpn_udp{4,6}_output uses the socket's, but ovpn_nexthop_from_rt{4,6} uses the netdev's.
On 13/11/2024 15:28, Sabrina Dubroca wrote:
2024-11-12, 16:44:09 +0100, Antonio Quartulli wrote:
On 05/11/2024 11:33, Sabrina Dubroca wrote:
2024-10-29, 11:47:33 +0100, Antonio Quartulli wrote:
+int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id) +{
[...]
- nla_nest_end(msg, k_attr);
- genlmsg_end(msg, hdr);
- genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg,
0, OVPN_NLGRP_PEERS, GFP_ATOMIC);
Is openvpn meant to support moving the device to a different netns? In that case I'm not sure the netns the ovpn netdevice is in is the right one, the userspace client will be in the encap socket's netns instead of the netdevice's?
(same thing in the next patch)
Well, moving between netns's may not be among the most common use cases, but I can see people doing all kind of weird things, if not forbidden.
The idea would be that only the ovpn device is in one particular netns, so that no packets can make it out of that netns unencrypted. I don't know if anybody actually does that.
Well I can imagine starting openvpn in the main netns and moving the device afterwards to something more restrictive (i.e. even a docker specific netns).
Hence, I would not assume the netdevice to always stay in the same netns all time long.
This said, what you say assumes that the userspace process won't change netns after having added the peer.
That shouldn't matter as long as it's still listening to multicast messages in the original netns.
Oky.
Around that same "which netns" question, ovpn_udp{4,6}_output uses the socket's, but ovpn_nexthop_from_rt{4,6} uses the netdev's.
I think this is ok, because routing related decision should be taken in the netns where the device is, but the transport layer should make decisions based on where the socket lives.
Regards,
2024-11-14, 11:38:51 +0100, Antonio Quartulli wrote:
On 13/11/2024 15:28, Sabrina Dubroca wrote:
Around that same "which netns" question, ovpn_udp{4,6}_output uses the socket's, but ovpn_nexthop_from_rt{4,6} uses the netdev's.
I think this is ok, because routing related decision should be taken in the netns where the device is, but the transport layer should make decisions based on where the socket lives.
Right, thanks for checking.
Whenever a peer is deleted, send a notification to userspace so that it can react accordingly.
This is most important when a peer is deleted due to ping timeout, because it all happens in kernelspace and thus userspace has no direct way to learn about it.
Signed-off-by: Antonio Quartulli antonio@openvpn.net --- drivers/net/ovpn/netlink.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/net/ovpn/netlink.h | 1 + drivers/net/ovpn/peer.c | 1 + 3 files changed, 57 insertions(+)
diff --git a/drivers/net/ovpn/netlink.c b/drivers/net/ovpn/netlink.c index 2b2ba1a810a0e87fb9ffb43b988fa52725a9589b..4d7d835cb47fd1f03d7cdafa2eda9f03065b8024 100644 --- a/drivers/net/ovpn/netlink.c +++ b/drivers/net/ovpn/netlink.c @@ -999,6 +999,61 @@ int ovpn_nl_key_del_doit(struct sk_buff *skb, struct genl_info *info) return 0; }
+/** + * ovpn_nl_peer_del_notify - notify userspace about peer being deleted + * @peer: the peer being deleted + * + * Return: 0 on success or a negative error code otherwise + */ +int ovpn_nl_peer_del_notify(struct ovpn_peer *peer) +{ + struct sk_buff *msg; + struct nlattr *attr; + int ret = -EMSGSIZE; + void *hdr; + + netdev_info(peer->ovpn->dev, "deleting peer with id %u, reason %d\n", + peer->id, peer->delete_reason); + + msg = nlmsg_new(100, GFP_ATOMIC); + if (!msg) + return -ENOMEM; + + hdr = genlmsg_put(msg, 0, 0, &ovpn_nl_family, 0, OVPN_CMD_PEER_DEL_NTF); + if (!hdr) { + ret = -ENOBUFS; + goto err_free_msg; + } + + if (nla_put_u32(msg, OVPN_A_IFINDEX, peer->ovpn->dev->ifindex)) + goto err_cancel_msg; + + attr = nla_nest_start(msg, OVPN_A_PEER); + if (!attr) + goto err_cancel_msg; + + if (nla_put_u8(msg, OVPN_A_PEER_DEL_REASON, peer->delete_reason)) + goto err_cancel_msg; + + if (nla_put_u32(msg, OVPN_A_PEER_ID, peer->id)) + goto err_cancel_msg; + + nla_nest_end(msg, attr); + + genlmsg_end(msg, hdr); + + genlmsg_multicast_netns(&ovpn_nl_family, dev_net(peer->ovpn->dev), msg, + 0, OVPN_NLGRP_PEERS, GFP_ATOMIC); + + return 0; + +err_cancel_msg: + genlmsg_cancel(msg, hdr); +err_free_msg: + nlmsg_free(msg); + return ret; +} + /** * ovpn_nl_key_swap_notify - notify userspace peer's key must be renewed * @peer: the peer whose key needs to be renewed diff --git a/drivers/net/ovpn/netlink.h b/drivers/net/ovpn/netlink.h index 33390b13c8904d40b629662005a9eb92ff617c3b..4ab3abcf23dba11f6b92e3d69e700693adbc671b 100644 --- a/drivers/net/ovpn/netlink.h +++ b/drivers/net/ovpn/netlink.h @@ -12,6 +12,7 @@ int ovpn_nl_register(void); void ovpn_nl_unregister(void);
+int ovpn_nl_peer_del_notify(struct ovpn_peer *peer); int ovpn_nl_key_swap_notify(struct ovpn_peer *peer, u8 key_id);
#endif /* _NET_OVPN_NETLINK_H_ */ diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index 8cfe1997ec116ae4fe74cd7105d228569e2a66a9..91c608f1ffa1d9dd1535ba308b6adc933dbbf1f1 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -242,6 +242,7 @@ void ovpn_peer_release_kref(struct kref *kref) { struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount);
+ ovpn_nl_peer_del_notify(peer); ovpn_peer_release(peer); }
Implement support for basic ethtool functionality.
Note that ovpn is a virtual device driver, therefore various ethtool APIs are just not meaningful and thus not implemented.
Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Andrew Lunn andrew@lunn.ch --- drivers/net/ovpn/main.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 1bd563e3f16f49dd01c897fbe79cbd90f4b8e9aa..9dcf51ae1497dda17d418b762011b04bfd0521df 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -7,6 +7,7 @@ * James Yonan james@openvpn.net */
+#include <linux/ethtool.h> #include <linux/genetlink.h> #include <linux/module.h> #include <linux/netdevice.h> @@ -96,6 +97,19 @@ bool ovpn_dev_is_valid(const struct net_device *dev) return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit; }
+static void ovpn_get_drvinfo(struct net_device *dev, + struct ethtool_drvinfo *info) +{ + strscpy(info->driver, OVPN_FAMILY_NAME, sizeof(info->driver)); + strscpy(info->bus_info, "ovpn", sizeof(info->bus_info)); +} + +static const struct ethtool_ops ovpn_ethtool_ops = { + .get_drvinfo = ovpn_get_drvinfo, + .get_link = ethtool_op_get_link, + .get_ts_info = ethtool_op_get_ts_info, +}; + static void ovpn_setup(struct net_device *dev) { /* compute the overhead considering AEAD encryption */ @@ -111,6 +125,7 @@ static void ovpn_setup(struct net_device *dev)
dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
+ dev->ethtool_ops = &ovpn_ethtool_ops; dev->netdev_ops = &ovpn_netdev_ops;
dev->priv_destructor = ovpn_struct_free;
The ovpn-cli tool can be compiled and used as selftest for the ovpn kernel module.
[NOTE: it depends on libmedtls for decoding base64-encoded keys]
ovpn-cli implements the netlink and RTNL APIs and can thus be integrated in any script for more automated testing.
Along with the tool, 4 scripts are provided that perform basic functionality tests by means of network namespaces. These scripts take part to the kselftest automation.
The output of the tool, which will appear in the kselftest reports, is a list of steps performed by the scripts plus some output coming from the execution of `ping` and `iperf`. In general it is useful only in case of failure, in order to understand which step has failed and why.
Cc: linux-kselftest@vger.kernel.org Signed-off-by: Antonio Quartulli antonio@openvpn.net Reviewed-by: Shuah Khan skhan@linuxfoundation.org --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ovpn/.gitignore | 2 + tools/testing/selftests/net/ovpn/Makefile | 17 + tools/testing/selftests/net/ovpn/config | 10 + tools/testing/selftests/net/ovpn/data64.key | 5 + tools/testing/selftests/net/ovpn/ovpn-cli.c | 2370 ++++++++++++++++++++ tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 + .../testing/selftests/net/ovpn/test-chachapoly.sh | 9 + tools/testing/selftests/net/ovpn/test-float.sh | 9 + tools/testing/selftests/net/ovpn/test-tcp.sh | 9 + tools/testing/selftests/net/ovpn/test.sh | 183 ++ tools/testing/selftests/net/ovpn/udp_peers.txt | 5 + 13 files changed, 2626 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index cf3d55c3e98aaea8f8817faed99dd7499cd59a71..110485aec73ae5bfeef4f228490ed76e28e01870 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17295,6 +17295,7 @@ T: git https://github.com/OpenVPN/linux-kernel-ovpn.git F: Documentation/netlink/specs/ovpn.yaml F: drivers/net/ovpn/ F: include/uapi/linux/ovpn.h +F: tools/testing/selftests/net/ovpn/
OPENVSWITCH M: Pravin B Shelar pshelar@ovn.org diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 363d031a16f7e14152c904e6b68dab1f90c98392..be42906ecb11d4b0f9866d2c04b0e8fb27a2b995 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -68,6 +68,7 @@ TARGETS += net/hsr TARGETS += net/mptcp TARGETS += net/netfilter TARGETS += net/openvswitch +TARGETS += net/ovpn TARGETS += net/packetdrill TARGETS += net/rds TARGETS += net/tcp_ao diff --git a/tools/testing/selftests/net/ovpn/.gitignore b/tools/testing/selftests/net/ovpn/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..ee44c081ca7c089933659689303c303a9fa9713b --- /dev/null +++ b/tools/testing/selftests/net/ovpn/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0+ +ovpn-cli diff --git a/tools/testing/selftests/net/ovpn/Makefile b/tools/testing/selftests/net/ovpn/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..c76d8fd953c5674941c8c2787813063b1bce180f --- /dev/null +++ b/tools/testing/selftests/net/ovpn/Makefile @@ -0,0 +1,17 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2020-2024 OpenVPN, Inc. +# +CFLAGS = -pedantic -Wextra -Wall -Wl,--no-as-needed -g -O0 -ggdb $(KHDR_INCLUDES) +CFLAGS += $(shell pkg-config --cflags libnl-3.0 libnl-genl-3.0) + +LDFLAGS = -lmbedtls -lmbedcrypto +LDFLAGS += $(shell pkg-config --libs libnl-3.0 libnl-genl-3.0) + +TEST_PROGS = test.sh \ + test-chachapoly.sh \ + test-tcp.sh \ + test-float.sh + +TEST_GEN_FILES = ovpn-cli + +include ../../lib.mk diff --git a/tools/testing/selftests/net/ovpn/config b/tools/testing/selftests/net/ovpn/config new file mode 100644 index 0000000000000000000000000000000000000000..71946ba9fa175c191725e369eb9b973503d9d9c4 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/config @@ -0,0 +1,10 @@ +CONFIG_NET=y +CONFIG_INET=y +CONFIG_STREAM_PARSER=y +CONFIG_NET_UDP_TUNNEL=y +CONFIG_DST_CACHE=y +CONFIG_CRYPTO=y +CONFIG_CRYPTO_AES=y +CONFIG_CRYPTO_GCM=y +CONFIG_CRYPTO_CHACHA20POLY1305=y +CONFIG_OVPN=m diff --git a/tools/testing/selftests/net/ovpn/data64.key b/tools/testing/selftests/net/ovpn/data64.key new file mode 100644 index 0000000000000000000000000000000000000000..a99e88c4e290f58b12f399b857b873f308d9ba09 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/data64.key @@ -0,0 +1,5 @@ +jRqMACN7d7/aFQNT8S7jkrBD8uwrgHbG5OQZP2eu4R1Y7tfpS2bf5RHv06Vi163CGoaIiTX99R3B +ia9ycAH8Wz1+9PWv51dnBLur9jbShlgZ2QHLtUc4a/gfT7zZwULXuuxdLnvR21DDeMBaTbkgbai9 +uvAa7ne1liIgGFzbv+Bas4HDVrygxIxuAnP5Qgc3648IJkZ0QEXPF+O9f0n5+QIvGCxkAUVx+5K6 +KIs+SoeWXnAopELmoGSjUpFtJbagXK82HfdqpuUxT2Tnuef0/14SzVE/vNleBNu2ZbyrSAaah8tE +BofkPJUBFY+YQcfZNM5Dgrw3i+Bpmpq/gpdg5w== diff --git a/tools/testing/selftests/net/ovpn/ovpn-cli.c b/tools/testing/selftests/net/ovpn/ovpn-cli.c new file mode 100644 index 0000000000000000000000000000000000000000..046dd069aaaf4e5b091947bd57ed79f8519a780f --- /dev/null +++ b/tools/testing/selftests/net/ovpn/ovpn-cli.c @@ -0,0 +1,2370 @@ +// SPDX-License-Identifier: GPL-2.0 +/* OpenVPN data channel accelerator + * + * Copyright (C) 2020-2024 OpenVPN, Inc. + * + * Author: Antonio Quartulli antonio@openvpn.net + */ + +#include <stdio.h> +#include <inttypes.h> +#include <stdbool.h> +#include <string.h> +#include <errno.h> +#include <unistd.h> +#include <arpa/inet.h> +#include <net/if.h> +#include <netinet/in.h> +#include <time.h> + +#include <linux/ovpn.h> +#include <linux/types.h> +#include <linux/netlink.h> + +#include <netlink/socket.h> +#include <netlink/netlink.h> +#include <netlink/genl/genl.h> +#include <netlink/genl/family.h> +#include <netlink/genl/ctrl.h> + +#include <mbedtls/base64.h> +#include <mbedtls/error.h> + +#include <sys/socket.h> + +/* defines to make checkpatch happy */ +#define strscpy strncpy +#define __always_unused __attribute__((__unused__)) + +/* libnl < 3.5.0 does not set the NLA_F_NESTED on its own, therefore we + * have to explicitly do it to prevent the kernel from failing upon + * parsing of the message + */ +#define nla_nest_start(_msg, _type) \ + nla_nest_start(_msg, (_type) | NLA_F_NESTED) + +uint64_t nla_get_uint(struct nlattr *attr) +{ + if (nla_len(attr) == sizeof(uint32_t)) + return nla_get_u32(attr); + else + return nla_get_u64(attr); +} + +typedef int (*ovpn_nl_cb)(struct nl_msg *msg, void *arg); + +enum ovpn_key_direction { + KEY_DIR_IN = 0, + KEY_DIR_OUT, +}; + +#define KEY_LEN (256 / 8) +#define NONCE_LEN 8 + +#define PEER_ID_UNDEF 0x00FFFFFF + +struct nl_ctx { + struct nl_sock *nl_sock; + struct nl_msg *nl_msg; + struct nl_cb *nl_cb; + + int ovpn_dco_id; +}; + +enum ovpn_cmd { + CMD_INVALID, + CMD_NEW_IFACE, + CMD_DEL_IFACE, + CMD_LISTEN, + CMD_CONNECT, + CMD_NEW_PEER, + CMD_NEW_MULTI_PEER, + CMD_SET_PEER, + CMD_DEL_PEER, + CMD_GET_PEER, + CMD_NEW_KEY, + CMD_DEL_KEY, + CMD_GET_KEY, + CMD_SWAP_KEYS, + CMD_LISTEN_MCAST, +}; + +struct ovpn_ctx { + enum ovpn_cmd cmd; + + __u8 key_enc[KEY_LEN]; + __u8 key_dec[KEY_LEN]; + __u8 nonce[NONCE_LEN]; + + enum ovpn_cipher_alg cipher; + + sa_family_t sa_family; + + unsigned long peer_id; + unsigned long lport; + + union { + struct sockaddr_in in4; + struct sockaddr_in6 in6; + } remote; + + union { + struct sockaddr_in in4; + struct sockaddr_in6 in6; + } peer_ip; + + bool peer_ip_set; + + unsigned int ifindex; + char ifname[IFNAMSIZ]; + enum ovpn_mode mode; + bool mode_set; + + int socket; + int cli_socket; + + __u32 keepalive_interval; + __u32 keepalive_timeout; + + enum ovpn_key_direction key_dir; + enum ovpn_key_slot key_slot; + int key_id; + + const char *peers_file; +}; + +static int ovpn_nl_recvmsgs(struct nl_ctx *ctx) +{ + int ret; + + ret = nl_recvmsgs(ctx->nl_sock, ctx->nl_cb); + + switch (ret) { + case -NLE_INTR: + fprintf(stderr, + "netlink received interrupt due to signal - ignoring\n"); + break; + case -NLE_NOMEM: + fprintf(stderr, "netlink out of memory error\n"); + break; + case -NLE_AGAIN: + fprintf(stderr, + "netlink reports blocking read - aborting wait\n"); + break; + default: + if (ret) + fprintf(stderr, "netlink reports error (%d): %s\n", + ret, nl_geterror(-ret)); + break; + } + + return ret; +} + +static struct nl_ctx *nl_ctx_alloc_flags(struct ovpn_ctx *ovpn, int cmd, + int flags) +{ + struct nl_ctx *ctx; + int err, ret; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) + return NULL; + + ctx->nl_sock = nl_socket_alloc(); + if (!ctx->nl_sock) { + fprintf(stderr, "cannot allocate netlink socket\n"); + goto err_free; + } + + nl_socket_set_buffer_size(ctx->nl_sock, 8192, 8192); + + ret = genl_connect(ctx->nl_sock); + if (ret) { + fprintf(stderr, "cannot connect to generic netlink: %s\n", + nl_geterror(ret)); + goto err_sock; + } + + /* enable Extended ACK for detailed error reporting */ + err = 1; + setsockopt(nl_socket_get_fd(ctx->nl_sock), SOL_NETLINK, NETLINK_EXT_ACK, + &err, sizeof(err)); + + ctx->ovpn_dco_id = genl_ctrl_resolve(ctx->nl_sock, OVPN_FAMILY_NAME); + if (ctx->ovpn_dco_id < 0) { + fprintf(stderr, "cannot find ovpn_dco netlink component: %d\n", + ctx->ovpn_dco_id); + goto err_free; + } + + ctx->nl_msg = nlmsg_alloc(); + if (!ctx->nl_msg) { + fprintf(stderr, "cannot allocate netlink message\n"); + goto err_sock; + } + + ctx->nl_cb = nl_cb_alloc(NL_CB_DEFAULT); + if (!ctx->nl_cb) { + fprintf(stderr, "failed to allocate netlink callback\n"); + goto err_msg; + } + + nl_socket_set_cb(ctx->nl_sock, ctx->nl_cb); + + genlmsg_put(ctx->nl_msg, 0, 0, ctx->ovpn_dco_id, 0, flags, cmd, 0); + + if (ovpn->ifindex > 0) + NLA_PUT_U32(ctx->nl_msg, OVPN_A_IFINDEX, ovpn->ifindex); + + return ctx; +nla_put_failure: +err_msg: + nlmsg_free(ctx->nl_msg); +err_sock: + nl_socket_free(ctx->nl_sock); +err_free: + free(ctx); + return NULL; +} + +static struct nl_ctx *nl_ctx_alloc(struct ovpn_ctx *ovpn, int cmd) +{ + return nl_ctx_alloc_flags(ovpn, cmd, 0); +} + +static void nl_ctx_free(struct nl_ctx *ctx) +{ + if (!ctx) + return; + + nl_socket_free(ctx->nl_sock); + nlmsg_free(ctx->nl_msg); + nl_cb_put(ctx->nl_cb); + free(ctx); +} + +static int ovpn_nl_cb_error(struct sockaddr_nl (*nla)__always_unused, + struct nlmsgerr *err, void *arg) +{ + struct nlmsghdr *nlh = (struct nlmsghdr *)err - 1; + struct nlattr *tb_msg[NLMSGERR_ATTR_MAX + 1]; + int len = nlh->nlmsg_len; + struct nlattr *attrs; + int *ret = arg; + int ack_len = sizeof(*nlh) + sizeof(int) + sizeof(*nlh); + + *ret = err->error; + + if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS)) + return NL_STOP; + + if (!(nlh->nlmsg_flags & NLM_F_CAPPED)) + ack_len += err->msg.nlmsg_len - sizeof(*nlh); + + if (len <= ack_len) + return NL_STOP; + + attrs = (void *)((uint8_t *)nlh + ack_len); + len -= ack_len; + + nla_parse(tb_msg, NLMSGERR_ATTR_MAX, attrs, len, NULL); + if (tb_msg[NLMSGERR_ATTR_MSG]) { + len = strnlen((char *)nla_data(tb_msg[NLMSGERR_ATTR_MSG]), + nla_len(tb_msg[NLMSGERR_ATTR_MSG])); + fprintf(stderr, "kernel error: %*s\n", len, + (char *)nla_data(tb_msg[NLMSGERR_ATTR_MSG])); + } + + if (tb_msg[NLMSGERR_ATTR_MISS_NEST]) { + fprintf(stderr, "missing required nesting type %u\n", + nla_get_u32(tb_msg[NLMSGERR_ATTR_MISS_NEST])); + } + + if (tb_msg[NLMSGERR_ATTR_MISS_TYPE]) { + fprintf(stderr, "missing required attribute type %u\n", + nla_get_u32(tb_msg[NLMSGERR_ATTR_MISS_TYPE])); + } + + return NL_STOP; +} + +static int ovpn_nl_cb_finish(struct nl_msg (*msg)__always_unused, + void *arg) +{ + int *status = arg; + + *status = 0; + return NL_SKIP; +} + +static int ovpn_nl_cb_ack(struct nl_msg (*msg)__always_unused, + void *arg) +{ + int *status = arg; + + *status = 0; + return NL_STOP; +} + +static int ovpn_nl_msg_send(struct nl_ctx *ctx, ovpn_nl_cb cb) +{ + int status = 1; + + nl_cb_err(ctx->nl_cb, NL_CB_CUSTOM, ovpn_nl_cb_error, &status); + nl_cb_set(ctx->nl_cb, NL_CB_FINISH, NL_CB_CUSTOM, ovpn_nl_cb_finish, + &status); + nl_cb_set(ctx->nl_cb, NL_CB_ACK, NL_CB_CUSTOM, ovpn_nl_cb_ack, &status); + + if (cb) + nl_cb_set(ctx->nl_cb, NL_CB_VALID, NL_CB_CUSTOM, cb, ctx); + + nl_send_auto_complete(ctx->nl_sock, ctx->nl_msg); + + while (status == 1) + ovpn_nl_recvmsgs(ctx); + + if (status < 0) + fprintf(stderr, "failed to send netlink message: %s (%d)\n", + strerror(-status), status); + + return status; +} + +static int ovpn_parse_key(const char *file, struct ovpn_ctx *ctx) +{ + int idx_enc, idx_dec, ret = -1; + unsigned char *ckey = NULL; + __u8 *bkey = NULL; + size_t olen = 0; + long ckey_len; + FILE *fp; + + fp = fopen(file, "r"); + if (!fp) { + fprintf(stderr, "cannot open: %s\n", file); + return -1; + } + + /* get file size */ + fseek(fp, 0L, SEEK_END); + ckey_len = ftell(fp); + rewind(fp); + + /* if the file is longer, let's just read a portion */ + if (ckey_len > 256) + ckey_len = 256; + + ckey = malloc(ckey_len); + if (!ckey) + goto err; + + ret = fread(ckey, 1, ckey_len, fp); + if (ret != ckey_len) { + fprintf(stderr, + "couldn't read enough data from key file: %dbytes read\n", + ret); + goto err; + } + + olen = 0; + ret = mbedtls_base64_decode(NULL, 0, &olen, ckey, ckey_len); + if (ret != MBEDTLS_ERR_BASE64_BUFFER_TOO_SMALL) { + char buf[256]; + + mbedtls_strerror(ret, buf, sizeof(buf)); + fprintf(stderr, "unexpected base64 error1: %s (%d)\n", buf, + ret); + + goto err; + } + + bkey = malloc(olen); + if (!bkey) { + fprintf(stderr, "cannot allocate binary key buffer\n"); + goto err; + } + + ret = mbedtls_base64_decode(bkey, olen, &olen, ckey, ckey_len); + if (ret) { + char buf[256]; + + mbedtls_strerror(ret, buf, sizeof(buf)); + fprintf(stderr, "unexpected base64 error2: %s (%d)\n", buf, + ret); + + goto err; + } + + if (olen < 2 * KEY_LEN + NONCE_LEN) { + fprintf(stderr, + "not enough data in key file, found %zdB but needs %dB\n", + olen, 2 * KEY_LEN + NONCE_LEN); + goto err; + } + + switch (ctx->key_dir) { + case KEY_DIR_IN: + idx_enc = 0; + idx_dec = 1; + break; + case KEY_DIR_OUT: + idx_enc = 1; + idx_dec = 0; + break; + default: + goto err; + } + + memcpy(ctx->key_enc, bkey + KEY_LEN * idx_enc, KEY_LEN); + memcpy(ctx->key_dec, bkey + KEY_LEN * idx_dec, KEY_LEN); + memcpy(ctx->nonce, bkey + 2 * KEY_LEN, NONCE_LEN); + + ret = 0; + +err: + fclose(fp); + free(bkey); + free(ckey); + + return ret; +} + +static int ovpn_parse_cipher(const char *cipher, struct ovpn_ctx *ctx) +{ + if (strcmp(cipher, "aes") == 0) + ctx->cipher = OVPN_CIPHER_ALG_AES_GCM; + else if (strcmp(cipher, "chachapoly") == 0) + ctx->cipher = OVPN_CIPHER_ALG_CHACHA20_POLY1305; + else if (strcmp(cipher, "none") == 0) + ctx->cipher = OVPN_CIPHER_ALG_NONE; + else + return -ENOTSUP; + + return 0; +} + +static int ovpn_parse_key_direction(const char *dir, struct ovpn_ctx *ctx) +{ + int in_dir; + + in_dir = strtoll(dir, NULL, 10); + switch (in_dir) { + case KEY_DIR_IN: + case KEY_DIR_OUT: + ctx->key_dir = in_dir; + break; + default: + fprintf(stderr, + "invalid key direction provided. Can be 0 or 1 only\n"); + return -1; + } + + return 0; +} + +static int ovpn_socket(struct ovpn_ctx *ctx, sa_family_t family, int proto) +{ + struct sockaddr_storage local_sock = { 0 }; + struct sockaddr_in6 *in6; + struct sockaddr_in *in; + int ret, s, sock_type; + size_t sock_len; + + if (proto == IPPROTO_UDP) + sock_type = SOCK_DGRAM; + else if (proto == IPPROTO_TCP) + sock_type = SOCK_STREAM; + else + return -EINVAL; + + s = socket(family, sock_type, 0); + if (s < 0) { + perror("cannot create socket"); + return -1; + } + + switch (family) { + case AF_INET: + in = (struct sockaddr_in *)&local_sock; + in->sin_family = family; + in->sin_port = htons(ctx->lport); + in->sin_addr.s_addr = htonl(INADDR_ANY); + sock_len = sizeof(*in); + break; + case AF_INET6: + in6 = (struct sockaddr_in6 *)&local_sock; + in6->sin6_family = family; + in6->sin6_port = htons(ctx->lport); + in6->sin6_addr = in6addr_any; + sock_len = sizeof(*in6); + break; + default: + return -1; + } + + int opt = 1; + + ret = setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); + + if (ret < 0) { + perror("setsockopt for SO_REUSEADDR"); + return ret; + } + + ret = setsockopt(s, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt)); + if (ret < 0) { + perror("setsockopt for SO_REUSEPORT"); + return ret; + } + + if (family == AF_INET6) { + opt = 0; + if (setsockopt(s, IPPROTO_IPV6, IPV6_V6ONLY, &opt, + sizeof(opt))) { + perror("failed to set IPV6_V6ONLY"); + return -1; + } + } + + ret = bind(s, (struct sockaddr *)&local_sock, sock_len); + if (ret < 0) { + perror("cannot bind socket"); + goto err_socket; + } + + ctx->socket = s; + ctx->sa_family = family; + return 0; + +err_socket: + close(s); + return -1; +} + +static int ovpn_udp_socket(struct ovpn_ctx *ctx, sa_family_t family) +{ + return ovpn_socket(ctx, family, IPPROTO_UDP); +} + +static int ovpn_listen(struct ovpn_ctx *ctx, sa_family_t family) +{ + int ret; + + ret = ovpn_socket(ctx, family, IPPROTO_TCP); + if (ret < 0) + return ret; + + ret = listen(ctx->socket, 10); + if (ret < 0) { + perror("listen"); + close(ctx->socket); + return -1; + } + + return 0; +} + +static int ovpn_accept(struct ovpn_ctx *ctx) +{ + socklen_t socklen; + int ret; + + socklen = sizeof(ctx->remote); + ret = accept(ctx->socket, (struct sockaddr *)&ctx->remote, &socklen); + if (ret < 0) { + perror("accept"); + goto err; + } + + fprintf(stderr, "Connection received!\n"); + + switch (socklen) { + case sizeof(struct sockaddr_in): + case sizeof(struct sockaddr_in6): + break; + default: + fprintf(stderr, "error: expecting IPv4 or IPv6 connection\n"); + close(ret); + ret = -EINVAL; + goto err; + } + + return ret; +err: + close(ctx->socket); + return ret; +} + +static int ovpn_connect(struct ovpn_ctx *ovpn) +{ + socklen_t socklen; + int s, ret; + + s = socket(ovpn->remote.in4.sin_family, SOCK_STREAM, 0); + if (s < 0) { + perror("cannot create socket"); + return -1; + } + + switch (ovpn->remote.in4.sin_family) { + case AF_INET: + socklen = sizeof(struct sockaddr_in); + break; + case AF_INET6: + socklen = sizeof(struct sockaddr_in6); + break; + default: + return -EOPNOTSUPP; + } + + ret = connect(s, (struct sockaddr *)&ovpn->remote, socklen); + if (ret < 0) { + perror("connect"); + goto err; + } + + fprintf(stderr, "connected\n"); + + ovpn->socket = s; + + return 0; +err: + close(s); + return ret; +} + +static int ovpn_new_peer(struct ovpn_ctx *ovpn, bool is_tcp) +{ + struct nlattr *attr; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_PEER_NEW); + if (!ctx) + return -ENOMEM; + + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_SOCKET, ovpn->socket); + + if (!is_tcp) { + switch (ovpn->remote.in4.sin_family) { + case AF_INET: + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_REMOTE_IPV4, + ovpn->remote.in4.sin_addr.s_addr); + NLA_PUT_U16(ctx->nl_msg, OVPN_A_PEER_REMOTE_PORT, + ovpn->remote.in4.sin_port); + break; + case AF_INET6: + NLA_PUT(ctx->nl_msg, OVPN_A_PEER_REMOTE_IPV6, + sizeof(ovpn->remote.in6.sin6_addr), + &ovpn->remote.in6.sin6_addr); + NLA_PUT_U32(ctx->nl_msg, + OVPN_A_PEER_REMOTE_IPV6_SCOPE_ID, + ovpn->remote.in6.sin6_scope_id); + NLA_PUT_U16(ctx->nl_msg, OVPN_A_PEER_REMOTE_PORT, + ovpn->remote.in6.sin6_port); + break; + default: + fprintf(stderr, + "Invalid family for remote socket address\n"); + goto nla_put_failure; + } + } + + if (ovpn->peer_ip_set) { + switch (ovpn->peer_ip.in4.sin_family) { + case AF_INET: + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_VPN_IPV4, + ovpn->peer_ip.in4.sin_addr.s_addr); + break; + case AF_INET6: + NLA_PUT(ctx->nl_msg, OVPN_A_PEER_VPN_IPV6, + sizeof(struct in6_addr), + &ovpn->peer_ip.in6.sin6_addr); + break; + default: + fprintf(stderr, "Invalid family for peer address\n"); + goto nla_put_failure; + } + } + + nla_nest_end(ctx->nl_msg, attr); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_set_peer(struct ovpn_ctx *ovpn) +{ + struct nlattr *attr; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_PEER_SET); + if (!ctx) + return -ENOMEM; + + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_KEEPALIVE_INTERVAL, + ovpn->keepalive_interval); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_KEEPALIVE_TIMEOUT, + ovpn->keepalive_timeout); + nla_nest_end(ctx->nl_msg, attr); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_del_peer(struct ovpn_ctx *ovpn) +{ + struct nlattr *attr; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_PEER_DEL); + if (!ctx) + return -ENOMEM; + + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + nla_nest_end(ctx->nl_msg, attr); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_handle_peer(struct nl_msg *msg, void (*arg)__always_unused) +{ + struct nlattr *pattrs[OVPN_A_PEER_MAX + 1]; + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *attrs[OVPN_A_MAX + 1]; + __u16 rport = 0, lport = 0; + + nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL); + + if (!attrs[OVPN_A_PEER]) { + fprintf(stderr, "no packet content in netlink message\n"); + return NL_SKIP; + } + + nla_parse(pattrs, OVPN_A_PEER_MAX, nla_data(attrs[OVPN_A_PEER]), + nla_len(attrs[OVPN_A_PEER]), NULL); + + if (pattrs[OVPN_A_PEER_ID]) + fprintf(stderr, "* Peer %u\n", + nla_get_u32(pattrs[OVPN_A_PEER_ID])); + + if (pattrs[OVPN_A_PEER_VPN_IPV4]) { + char buf[INET_ADDRSTRLEN]; + + inet_ntop(AF_INET, nla_data(pattrs[OVPN_A_PEER_VPN_IPV4]), + buf, sizeof(buf)); + fprintf(stderr, "\tVPN IPv4: %s\n", buf); + } + + if (pattrs[OVPN_A_PEER_VPN_IPV6]) { + char buf[INET6_ADDRSTRLEN]; + + inet_ntop(AF_INET6, nla_data(pattrs[OVPN_A_PEER_VPN_IPV6]), + buf, sizeof(buf)); + fprintf(stderr, "\tVPN IPv6: %s\n", buf); + } + + if (pattrs[OVPN_A_PEER_LOCAL_PORT]) + lport = ntohs(nla_get_u16(pattrs[OVPN_A_PEER_LOCAL_PORT])); + + if (pattrs[OVPN_A_PEER_REMOTE_PORT]) + rport = ntohs(nla_get_u16(pattrs[OVPN_A_PEER_REMOTE_PORT])); + + if (pattrs[OVPN_A_PEER_REMOTE_IPV6]) { + void *ip = pattrs[OVPN_A_PEER_REMOTE_IPV6]; + char buf[INET6_ADDRSTRLEN]; + int scope_id = -1; + + if (pattrs[OVPN_A_PEER_REMOTE_IPV6_SCOPE_ID]) { + void *p = pattrs[OVPN_A_PEER_REMOTE_IPV6_SCOPE_ID]; + + scope_id = nla_get_u32(p); + } + + inet_ntop(AF_INET6, nla_data(ip), buf, sizeof(buf)); + fprintf(stderr, "\tRemote: %s:%hu (scope-id: %u)\n", buf, rport, + scope_id); + + if (pattrs[OVPN_A_PEER_LOCAL_IPV6]) { + void *ip = pattrs[OVPN_A_PEER_LOCAL_IPV6]; + + inet_ntop(AF_INET6, nla_data(ip), buf, sizeof(buf)); + fprintf(stderr, "\tLocal: %s:%hu\n", buf, lport); + } + } + + if (pattrs[OVPN_A_PEER_REMOTE_IPV4]) { + void *ip = pattrs[OVPN_A_PEER_REMOTE_IPV4]; + char buf[INET_ADDRSTRLEN]; + + inet_ntop(AF_INET, nla_data(ip), buf, sizeof(buf)); + fprintf(stderr, "\tRemote: %s:%hu\n", buf, rport); + + if (pattrs[OVPN_A_PEER_LOCAL_IPV4]) { + void *p = pattrs[OVPN_A_PEER_LOCAL_IPV4]; + + inet_ntop(AF_INET, nla_data(p), buf, sizeof(buf)); + fprintf(stderr, "\tLocal: %s:%hu\n", buf, lport); + } + } + + if (pattrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]) { + void *p = pattrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]; + + fprintf(stderr, "\tKeepalive interval: %u sec\n", + nla_get_u32(p)); + } + + if (pattrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) + fprintf(stderr, "\tKeepalive timeout: %u sec\n", + nla_get_u32(pattrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT])); + + if (pattrs[OVPN_A_PEER_VPN_RX_BYTES]) + fprintf(stderr, "\tVPN RX bytes: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_VPN_RX_BYTES])); + + if (pattrs[OVPN_A_PEER_VPN_TX_BYTES]) + fprintf(stderr, "\tVPN TX bytes: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_VPN_TX_BYTES])); + + if (pattrs[OVPN_A_PEER_VPN_RX_PACKETS]) + fprintf(stderr, "\tVPN RX packets: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_VPN_RX_PACKETS])); + + if (pattrs[OVPN_A_PEER_VPN_TX_PACKETS]) + fprintf(stderr, "\tVPN TX packets: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_VPN_TX_PACKETS])); + + if (pattrs[OVPN_A_PEER_LINK_RX_BYTES]) + fprintf(stderr, "\tLINK RX bytes: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_LINK_RX_BYTES])); + + if (pattrs[OVPN_A_PEER_LINK_TX_BYTES]) + fprintf(stderr, "\tLINK TX bytes: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_LINK_TX_BYTES])); + + if (pattrs[OVPN_A_PEER_LINK_RX_PACKETS]) + fprintf(stderr, "\tLINK RX packets: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_LINK_RX_PACKETS])); + + if (pattrs[OVPN_A_PEER_LINK_TX_PACKETS]) + fprintf(stderr, "\tLINK TX packets: %" PRIu64 "\n", + nla_get_uint(pattrs[OVPN_A_PEER_LINK_TX_PACKETS])); + + return NL_SKIP; +} + +static int ovpn_get_peer(struct ovpn_ctx *ovpn) +{ + int flags = 0, ret = -1; + struct nlattr *attr; + struct nl_ctx *ctx; + + if (ovpn->peer_id == PEER_ID_UNDEF) + flags = NLM_F_DUMP; + + ctx = nl_ctx_alloc_flags(ovpn, OVPN_CMD_PEER_GET, flags); + if (!ctx) + return -ENOMEM; + + if (ovpn->peer_id != PEER_ID_UNDEF) { + attr = nla_nest_start(ctx->nl_msg, OVPN_A_PEER); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_PEER_ID, ovpn->peer_id); + nla_nest_end(ctx->nl_msg, attr); + } + + ret = ovpn_nl_msg_send(ctx, ovpn_handle_peer); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_new_key(struct ovpn_ctx *ovpn) +{ + struct nlattr *keyconf, *key_dir; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_KEY_NEW); + if (!ctx) + return -ENOMEM; + + keyconf = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_PEER_ID, ovpn->peer_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_SLOT, ovpn->key_slot); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_KEY_ID, ovpn->key_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_CIPHER_ALG, ovpn->cipher); + + key_dir = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF_ENCRYPT_DIR); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_CIPHER_KEY, KEY_LEN, ovpn->key_enc); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_NONCE_TAIL, NONCE_LEN, ovpn->nonce); + nla_nest_end(ctx->nl_msg, key_dir); + + key_dir = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF_DECRYPT_DIR); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_CIPHER_KEY, KEY_LEN, ovpn->key_dec); + NLA_PUT(ctx->nl_msg, OVPN_A_KEYDIR_NONCE_TAIL, NONCE_LEN, ovpn->nonce); + nla_nest_end(ctx->nl_msg, key_dir); + + nla_nest_end(ctx->nl_msg, keyconf); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_del_key(struct ovpn_ctx *ovpn) +{ + struct nlattr *keyconf; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_KEY_DEL); + if (!ctx) + return -ENOMEM; + + keyconf = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_PEER_ID, ovpn->peer_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_SLOT, ovpn->key_slot); + nla_nest_end(ctx->nl_msg, keyconf); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_handle_key(struct nl_msg *msg, void (*arg)__always_unused) +{ + struct nlattr *kattrs[OVPN_A_KEYCONF_MAX + 1]; + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *attrs[OVPN_A_MAX + 1]; + + nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL); + + if (!attrs[OVPN_A_KEYCONF]) { + fprintf(stderr, "no packet content in netlink message\n"); + return NL_SKIP; + } + + nla_parse(kattrs, OVPN_A_KEYCONF_MAX, nla_data(attrs[OVPN_A_KEYCONF]), + nla_len(attrs[OVPN_A_KEYCONF]), NULL); + + if (kattrs[OVPN_A_KEYCONF_PEER_ID]) + fprintf(stderr, "* Peer %u\n", + nla_get_u32(kattrs[OVPN_A_KEYCONF_PEER_ID])); + if (kattrs[OVPN_A_KEYCONF_SLOT]) { + fprintf(stderr, "\t- Slot: "); + switch (nla_get_u32(kattrs[OVPN_A_KEYCONF_SLOT])) { + case OVPN_KEY_SLOT_PRIMARY: + fprintf(stderr, "primary\n"); + break; + case OVPN_KEY_SLOT_SECONDARY: + fprintf(stderr, "secondary\n"); + break; + default: + fprintf(stderr, "invalid (%u)\n", + nla_get_u32(kattrs[OVPN_A_KEYCONF_SLOT])); + break; + } + } + if (kattrs[OVPN_A_KEYCONF_KEY_ID]) + fprintf(stderr, "\t- Key ID: %u\n", + nla_get_u32(kattrs[OVPN_A_KEYCONF_KEY_ID])); + if (kattrs[OVPN_A_KEYCONF_CIPHER_ALG]) { + fprintf(stderr, "\t- Cipher: "); + switch (nla_get_u32(kattrs[OVPN_A_KEYCONF_CIPHER_ALG])) { + case OVPN_CIPHER_ALG_NONE: + fprintf(stderr, "none\n"); + break; + case OVPN_CIPHER_ALG_AES_GCM: + fprintf(stderr, "aes-gcm\n"); + break; + case OVPN_CIPHER_ALG_CHACHA20_POLY1305: + fprintf(stderr, "chacha20poly1305\n"); + break; + default: + fprintf(stderr, "invalid (%u)\n", + nla_get_u32(kattrs[OVPN_A_KEYCONF_CIPHER_ALG])); + break; + } + } + + return NL_SKIP; +} + +static int ovpn_get_key(struct ovpn_ctx *ovpn) +{ + struct nlattr *keyconf; + struct nl_ctx *ctx; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_KEY_GET); + if (!ctx) + return -ENOMEM; + + keyconf = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_PEER_ID, ovpn->peer_id); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_SLOT, ovpn->key_slot); + nla_nest_end(ctx->nl_msg, keyconf); + + ret = ovpn_nl_msg_send(ctx, ovpn_handle_key); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +static int ovpn_swap_keys(struct ovpn_ctx *ovpn) +{ + struct nl_ctx *ctx; + struct nlattr *kc; + int ret = -1; + + ctx = nl_ctx_alloc(ovpn, OVPN_CMD_KEY_SWAP); + if (!ctx) + return -ENOMEM; + + kc = nla_nest_start(ctx->nl_msg, OVPN_A_KEYCONF); + NLA_PUT_U32(ctx->nl_msg, OVPN_A_KEYCONF_PEER_ID, ovpn->peer_id); + nla_nest_end(ctx->nl_msg, kc); + + ret = ovpn_nl_msg_send(ctx, NULL); +nla_put_failure: + nl_ctx_free(ctx); + return ret; +} + +/** + * Helper function used to easily add attributes to a rtnl message + */ +static int ovpn_addattr(struct nlmsghdr *n, int maxlen, int type, + const void *data, int alen) +{ + int len = RTA_LENGTH(alen); + struct rtattr *rta; + + if ((int)(NLMSG_ALIGN(n->nlmsg_len) + RTA_ALIGN(len)) > maxlen) { + fprintf(stderr, "%s: rtnl: message exceeded bound of %d\n", + __func__, maxlen); + return -EMSGSIZE; + } + + rta = nlmsg_tail(n); + rta->rta_type = type; + rta->rta_len = len; + + if (!data) + memset(RTA_DATA(rta), 0, alen); + else + memcpy(RTA_DATA(rta), data, alen); + + n->nlmsg_len = NLMSG_ALIGN(n->nlmsg_len) + RTA_ALIGN(len); + + return 0; +} + +static struct rtattr *ovpn_nest_start(struct nlmsghdr *msg, size_t max_size, + int attr) +{ + struct rtattr *nest = nlmsg_tail(msg); + + if (ovpn_addattr(msg, max_size, attr, NULL, 0) < 0) + return NULL; + + return nest; +} + +static void ovpn_nest_end(struct nlmsghdr *msg, struct rtattr *nest) +{ + nest->rta_len = (uint8_t *)nlmsg_tail(msg) - (uint8_t *)nest; +} + +#define RT_SNDBUF_SIZE (1024 * 2) +#define RT_RCVBUF_SIZE (1024 * 4) + +/** + * Open RTNL socket + */ +static int ovpn_rt_socket(void) +{ + int sndbuf = RT_SNDBUF_SIZE, rcvbuf = RT_RCVBUF_SIZE, fd; + + fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (fd < 0) { + fprintf(stderr, "%s: cannot open netlink socket\n", __func__); + return fd; + } + + if (setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf, + sizeof(sndbuf)) < 0) { + fprintf(stderr, "%s: SO_SNDBUF\n", __func__); + close(fd); + return -1; + } + + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, + sizeof(rcvbuf)) < 0) { + fprintf(stderr, "%s: SO_RCVBUF\n", __func__); + close(fd); + return -1; + } + + return fd; +} + +/** + * Bind socket to Netlink subsystem + */ +static int ovpn_rt_bind(int fd, uint32_t groups) +{ + struct sockaddr_nl local = { 0 }; + socklen_t addr_len; + + local.nl_family = AF_NETLINK; + local.nl_groups = groups; + + if (bind(fd, (struct sockaddr *)&local, sizeof(local)) < 0) { + fprintf(stderr, "%s: cannot bind netlink socket: %d\n", + __func__, errno); + return -errno; + } + + addr_len = sizeof(local); + if (getsockname(fd, (struct sockaddr *)&local, &addr_len) < 0) { + fprintf(stderr, "%s: cannot getsockname: %d\n", __func__, + errno); + return -errno; + } + + if (addr_len != sizeof(local)) { + fprintf(stderr, "%s: wrong address length %d\n", __func__, + addr_len); + return -EINVAL; + } + + if (local.nl_family != AF_NETLINK) { + fprintf(stderr, "%s: wrong address family %d\n", __func__, + local.nl_family); + return -EINVAL; + } + + return 0; +} + +typedef int (*ovpn_parse_reply_cb)(struct nlmsghdr *msg, void *arg); + +/** + * Send Netlink message and run callback on reply (if specified) + */ +static int ovpn_rt_send(struct nlmsghdr *payload, pid_t peer, + unsigned int groups, ovpn_parse_reply_cb cb, + void *arg_cb) +{ + int len, rem_len, fd, ret, rcv_len; + struct sockaddr_nl nladdr = { 0 }; + struct nlmsgerr *err; + struct nlmsghdr *h; + char buf[1024 * 16]; + struct iovec iov = { + .iov_base = payload, + .iov_len = payload->nlmsg_len, + }; + struct msghdr nlmsg = { + .msg_name = &nladdr, + .msg_namelen = sizeof(nladdr), + .msg_iov = &iov, + .msg_iovlen = 1, + }; + + nladdr.nl_family = AF_NETLINK; + nladdr.nl_pid = peer; + nladdr.nl_groups = groups; + + payload->nlmsg_seq = time(NULL); + + /* no need to send reply */ + if (!cb) + payload->nlmsg_flags |= NLM_F_ACK; + + fd = ovpn_rt_socket(); + if (fd < 0) { + fprintf(stderr, "%s: can't open rtnl socket\n", __func__); + return -errno; + } + + ret = ovpn_rt_bind(fd, 0); + if (ret < 0) { + fprintf(stderr, "%s: can't bind rtnl socket\n", __func__); + ret = -errno; + goto out; + } + + ret = sendmsg(fd, &nlmsg, 0); + if (ret < 0) { + fprintf(stderr, "%s: rtnl: error on sendmsg()\n", __func__); + ret = -errno; + goto out; + } + + /* prepare buffer to store RTNL replies */ + memset(buf, 0, sizeof(buf)); + iov.iov_base = buf; + + while (1) { + /* + * iov_len is modified by recvmsg(), therefore has to be initialized before + * using it again + */ + iov.iov_len = sizeof(buf); + rcv_len = recvmsg(fd, &nlmsg, 0); + if (rcv_len < 0) { + if (errno == EINTR || errno == EAGAIN) { + fprintf(stderr, "%s: interrupted call\n", + __func__); + continue; + } + fprintf(stderr, "%s: rtnl: error on recvmsg()\n", + __func__); + ret = -errno; + goto out; + } + + if (rcv_len == 0) { + fprintf(stderr, + "%s: rtnl: socket reached unexpected EOF\n", + __func__); + ret = -EIO; + goto out; + } + + if (nlmsg.msg_namelen != sizeof(nladdr)) { + fprintf(stderr, + "%s: sender address length: %u (expected %zu)\n", + __func__, nlmsg.msg_namelen, sizeof(nladdr)); + ret = -EIO; + goto out; + } + + h = (struct nlmsghdr *)buf; + while (rcv_len >= (int)sizeof(*h)) { + len = h->nlmsg_len; + rem_len = len - sizeof(*h); + + if (rem_len < 0 || len > rcv_len) { + if (nlmsg.msg_flags & MSG_TRUNC) { + fprintf(stderr, "%s: truncated message\n", + __func__); + ret = -EIO; + goto out; + } + fprintf(stderr, "%s: malformed message: len=%d\n", + __func__, len); + ret = -EIO; + goto out; + } + + if (h->nlmsg_type == NLMSG_DONE) { + ret = 0; + goto out; + } + + if (h->nlmsg_type == NLMSG_ERROR) { + err = (struct nlmsgerr *)NLMSG_DATA(h); + if (rem_len < (int)sizeof(struct nlmsgerr)) { + fprintf(stderr, "%s: ERROR truncated\n", + __func__); + ret = -EIO; + goto out; + } + + if (err->error) { + fprintf(stderr, "%s: (%d) %s\n", + __func__, err->error, + strerror(-err->error)); + ret = err->error; + goto out; + } + + ret = 0; + if (cb) { + int r = cb(h, arg_cb); + + if (r <= 0) + ret = r; + } + goto out; + } + + if (cb) { + int r = cb(h, arg_cb); + + if (r <= 0) { + ret = r; + goto out; + } + } else { + fprintf(stderr, "%s: RTNL: unexpected reply\n", + __func__); + } + + rcv_len -= NLMSG_ALIGN(len); + h = (struct nlmsghdr *)((uint8_t *)h + + NLMSG_ALIGN(len)); + } + + if (nlmsg.msg_flags & MSG_TRUNC) { + fprintf(stderr, "%s: message truncated\n", __func__); + continue; + } + + if (rcv_len) { + fprintf(stderr, "%s: rtnl: %d not parsed bytes\n", + __func__, rcv_len); + ret = -1; + goto out; + } + } +out: + close(fd); + + return ret; +} + +struct ovpn_link_req { + struct nlmsghdr n; + struct ifinfomsg i; + char buf[256]; +}; + +static int ovpn_new_iface(struct ovpn_ctx *ovpn) +{ + struct rtattr *linkinfo, *data; + struct ovpn_link_req req = { 0 }; + int ret = -1; + + fprintf(stdout, "Creating interface %s with mode %u\n", ovpn->ifname, + ovpn->mode); + + req.n.nlmsg_len = NLMSG_LENGTH(sizeof(req.i)); + req.n.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; + req.n.nlmsg_type = RTM_NEWLINK; + + if (ovpn_addattr(&req.n, sizeof(req), IFLA_IFNAME, ovpn->ifname, + strlen(ovpn->ifname) + 1) < 0) + goto err; + + linkinfo = ovpn_nest_start(&req.n, sizeof(req), IFLA_LINKINFO); + if (!linkinfo) + goto err; + + if (ovpn_addattr(&req.n, sizeof(req), IFLA_INFO_KIND, OVPN_FAMILY_NAME, + strlen(OVPN_FAMILY_NAME) + 1) < 0) + goto err; + + if (ovpn->mode_set) { + data = ovpn_nest_start(&req.n, sizeof(req), IFLA_INFO_DATA); + if (!data) + goto err; + + if (ovpn_addattr(&req.n, sizeof(req), IFLA_OVPN_MODE, + &ovpn->mode, sizeof(uint8_t)) < 0) + goto err; + + ovpn_nest_end(&req.n, data); + } + + ovpn_nest_end(&req.n, linkinfo); + + req.i.ifi_family = AF_PACKET; + + ret = ovpn_rt_send(&req.n, 0, 0, NULL, NULL); +err: + return ret; +} + +static int ovpn_del_iface(struct ovpn_ctx *ovpn) +{ + struct ovpn_link_req req = { 0 }; + + fprintf(stdout, "Deleting interface %s ifindex %u\n", ovpn->ifname, + ovpn->ifindex); + + req.n.nlmsg_len = NLMSG_LENGTH(sizeof(req.i)); + req.n.nlmsg_flags = NLM_F_REQUEST; + req.n.nlmsg_type = RTM_DELLINK; + + req.i.ifi_family = AF_PACKET; + req.i.ifi_index = ovpn->ifindex; + + return ovpn_rt_send(&req.n, 0, 0, NULL, NULL); +} + +static int nl_seq_check(struct nl_msg (*msg)__always_unused, + void (*arg)__always_unused) +{ + return NL_OK; +} + +struct mcast_handler_args { + const char *group; + int id; +}; + +static int mcast_family_handler(struct nl_msg *msg, void *arg) +{ + struct mcast_handler_args *grp = arg; + struct nlattr *tb[CTRL_ATTR_MAX + 1]; + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *mcgrp; + int rem_mcgrp; + + nla_parse(tb, CTRL_ATTR_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL); + + if (!tb[CTRL_ATTR_MCAST_GROUPS]) + return NL_SKIP; + + nla_for_each_nested(mcgrp, tb[CTRL_ATTR_MCAST_GROUPS], rem_mcgrp) { + struct nlattr *tb_mcgrp[CTRL_ATTR_MCAST_GRP_MAX + 1]; + + nla_parse(tb_mcgrp, CTRL_ATTR_MCAST_GRP_MAX, + nla_data(mcgrp), nla_len(mcgrp), NULL); + + if (!tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME] || + !tb_mcgrp[CTRL_ATTR_MCAST_GRP_ID]) + continue; + if (strncmp(nla_data(tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME]), + grp->group, nla_len(tb_mcgrp[CTRL_ATTR_MCAST_GRP_NAME]))) + continue; + grp->id = nla_get_u32(tb_mcgrp[CTRL_ATTR_MCAST_GRP_ID]); + break; + } + + return NL_SKIP; +} + +static int mcast_error_handler(struct sockaddr_nl (*nla)__always_unused, + struct nlmsgerr *err, void *arg) +{ + int *ret = arg; + + *ret = err->error; + return NL_STOP; +} + +static int mcast_ack_handler(struct nl_msg (*msg)__always_unused, void *arg) +{ + int *ret = arg; + + *ret = 0; + return NL_STOP; +} + +static int ovpn_handle_msg(struct nl_msg *msg, void *arg) +{ + struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg)); + struct nlattr *attrs[OVPN_A_MAX + 1]; + struct nlmsghdr *nlh = nlmsg_hdr(msg); + //enum ovpn_del_peer_reason reason; + char ifname[IF_NAMESIZE]; + int *ret = arg; + __u32 ifindex; + + fprintf(stderr, "received message from ovpn-dco\n"); + + *ret = -1; + + if (!genlmsg_valid_hdr(nlh, 0)) { + fprintf(stderr, "invalid header\n"); + return NL_STOP; + } + + if (nla_parse(attrs, OVPN_A_MAX, genlmsg_attrdata(gnlh, 0), + genlmsg_attrlen(gnlh, 0), NULL)) { + fprintf(stderr, "received bogus data from ovpn-dco\n"); + return NL_STOP; + } + + if (!attrs[OVPN_A_IFINDEX]) { + fprintf(stderr, "no ifindex in this message\n"); + return NL_STOP; + } + + ifindex = nla_get_u32(attrs[OVPN_A_IFINDEX]); + if (!if_indextoname(ifindex, ifname)) { + fprintf(stderr, "cannot resolve ifname for ifindex: %u\n", + ifindex); + return NL_STOP; + } + + switch (gnlh->cmd) { + case OVPN_CMD_PEER_DEL_NTF: + /*if (!attrs[OVPN_A_DEL_PEER_REASON]) { + * fprintf(stderr, "no reason in DEL_PEER message\n"); + * return NL_STOP; + *} + * + *reason = nla_get_u8(attrs[OVPN_A_DEL_PEER_REASON]); + *fprintf(stderr, + * "received CMD_DEL_PEER, ifname: %s reason: %d\n", + * ifname, reason); + */ + fprintf(stdout, "received CMD_PEER_DEL_NTF\n"); + break; + case OVPN_CMD_KEY_SWAP_NTF: + fprintf(stdout, "received CMD_KEY_SWAP_NTF\n"); + break; + default: + fprintf(stderr, "received unknown command: %d\n", gnlh->cmd); + return NL_STOP; + } + + *ret = 0; + return NL_OK; +} + +static int ovpn_get_mcast_id(struct nl_sock *sock, const char *family, + const char *group) +{ + struct nl_msg *msg; + struct nl_cb *cb; + int ret, ctrlid; + struct mcast_handler_args grp = { + .group = group, + .id = -ENOENT, + }; + + msg = nlmsg_alloc(); + if (!msg) + return -ENOMEM; + + cb = nl_cb_alloc(NL_CB_DEFAULT); + if (!cb) { + ret = -ENOMEM; + goto out_fail_cb; + } + + ctrlid = genl_ctrl_resolve(sock, "nlctrl"); + + genlmsg_put(msg, 0, 0, ctrlid, 0, 0, CTRL_CMD_GETFAMILY, 0); + + ret = -ENOBUFS; + NLA_PUT_STRING(msg, CTRL_ATTR_FAMILY_NAME, family); + + ret = nl_send_auto_complete(sock, msg); + if (ret < 0) + goto nla_put_failure; + + ret = 1; + + nl_cb_err(cb, NL_CB_CUSTOM, mcast_error_handler, &ret); + nl_cb_set(cb, NL_CB_ACK, NL_CB_CUSTOM, mcast_ack_handler, &ret); + nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, mcast_family_handler, &grp); + + while (ret > 0) + nl_recvmsgs(sock, cb); + + if (ret == 0) + ret = grp.id; + nla_put_failure: + nl_cb_put(cb); + out_fail_cb: + nlmsg_free(msg); + return ret; +} + +static int ovpn_listen_mcast(void) +{ + struct nl_sock *sock; + struct nl_cb *cb; + int mcid, ret; + + sock = nl_socket_alloc(); + if (!sock) { + fprintf(stderr, "cannot allocate netlink socket\n"); + goto err_free; + } + + nl_socket_set_buffer_size(sock, 8192, 8192); + + ret = genl_connect(sock); + if (ret < 0) { + fprintf(stderr, "cannot connect to generic netlink: %s\n", + nl_geterror(ret)); + goto err_free; + } + + mcid = ovpn_get_mcast_id(sock, OVPN_FAMILY_NAME, OVPN_MCGRP_PEERS); + if (mcid < 0) { + fprintf(stderr, "cannot get mcast group: %s\n", + nl_geterror(mcid)); + goto err_free; + } + + ret = nl_socket_add_membership(sock, mcid); + if (ret) { + fprintf(stderr, "failed to join mcast group: %d\n", ret); + goto err_free; + } + + ret = 1; + cb = nl_cb_alloc(NL_CB_DEFAULT); + nl_cb_set(cb, NL_CB_SEQ_CHECK, NL_CB_CUSTOM, nl_seq_check, NULL); + nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, ovpn_handle_msg, &ret); + nl_cb_err(cb, NL_CB_CUSTOM, ovpn_nl_cb_error, &ret); + + while (ret == 1) { + int err = nl_recvmsgs(sock, cb); + + if (err < 0) { + fprintf(stderr, + "cannot receive netlink message: (%d) %s\n", + err, nl_geterror(-err)); + ret = -1; + break; + } + } + + nl_cb_put(cb); +err_free: + nl_socket_free(sock); + return ret; +} + +static void usage(const char *cmd) +{ + fprintf(stderr, + "Usage %s <command> <iface> [arguments..]\n", + cmd); + fprintf(stderr, "where <command> can be one of the following\n\n"); + + fprintf(stderr, "* new_iface <iface> [mode]: create new ovpn interface\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tmode:\n"); + fprintf(stderr, "\t\t- P2P for peer-to-peer mode (i.e. client)\n"); + fprintf(stderr, "\t\t- MP for multi-peer mode (i.e. server)\n"); + + fprintf(stderr, "* del_iface <iface>: delete ovpn interface\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + + fprintf(stderr, + "* listen <iface> <lport> <peers_file> [ipv6]: listen for incoming peer TCP connections\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tlport: TCP port to listen to\n"); + fprintf(stderr, + "\tpeers_file: file containing one peer per line: Line format:\n"); + fprintf(stderr, "\t\t<peer_id> <vpnaddr>\n"); + fprintf(stderr, + "\tipv6: whether the socket should listen to the IPv6 wildcard address\n"); + + fprintf(stderr, + "* connect <iface> <peer_id> <raddr> <rport> [key_file]: start connecting peer of TCP-based VPN session\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tpeer_id: peer ID of the connecting peer\n"); + fprintf(stderr, "\traddr: peer IP address to connect to\n"); + fprintf(stderr, "\trport: peer TCP port to connect to\n"); + fprintf(stderr, + "\tkey_file: file containing the symmetric key for encryption\n"); + + fprintf(stderr, + "* new_peer <iface> <peer_id> <lport> <raddr> <rport> [vpnaddr]: add new peer\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tlport: local UDP port to bind to\n"); + fprintf(stderr, + "\tpeer_id: peer ID to be used in data packets to/from this peer\n"); + fprintf(stderr, "\traddr: peer IP address\n"); + fprintf(stderr, "\trport: peer UDP port\n"); + fprintf(stderr, "\tvpnaddr: peer VPN IP\n"); + + fprintf(stderr, + "* new_multi_peer <iface> <lport> <peers_file>: add multiple peers as listed in the file\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tlport: local UDP port to bind to\n"); + fprintf(stderr, + "\tpeers_file: text file containing one peer per line. Line format:\n"); + fprintf(stderr, "\t\t<peer_id> <raddr> <rport> <vpnaddr>\n"); + + fprintf(stderr, + "* set_peer <iface> <peer_id> <keepalive_interval> <keepalive_timeout>: set peer attributes\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tpeer_id: peer ID of the peer to modify\n"); + fprintf(stderr, + "\tkeepalive_interval: interval for sending ping messages\n"); + fprintf(stderr, + "\tkeepalive_timeout: time after which a peer is timed out\n"); + + fprintf(stderr, "* del_peer <iface> <peer_id>: delete peer\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tpeer_id: peer ID of the peer to delete\n"); + + fprintf(stderr, "* get_peer <iface> [peer_id]: retrieve peer(s) status\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, + "\tpeer_id: peer ID of the peer to query. All peers are returned if omitted\n"); + + fprintf(stderr, + "* new_key <iface> <peer_id> <slot> <key_id> <cipher> <key_dir> <key_file>: set data channel key\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, + "\tpeer_id: peer ID of the peer to configure the key for\n"); + fprintf(stderr, "\tslot: either 1 (primary) or 2 (secondary)\n"); + fprintf(stderr, "\tkey_id: an ID from 0 to 7\n"); + fprintf(stderr, + "\tcipher: cipher to use, supported: aes (AES-GCM), chachapoly (CHACHA20POLY1305)\n"); + fprintf(stderr, + "\tkey_dir: key direction, must 0 on one host and 1 on the other\n"); + fprintf(stderr, "\tkey_file: file containing the pre-shared key\n"); + + fprintf(stderr, + "* del_key <iface> <peer_id> [slot]: erase existing data channel key\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tpeer_id: peer ID of the peer to modify\n"); + fprintf(stderr, "\tslot: slot to erase. PRIMARY if omitted\n"); + + fprintf(stderr, + "* get_key <iface> <peer_id> <slot>: retrieve non sensible key data\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tpeer_id: peer ID of the peer to query\n"); + fprintf(stderr, "\tslot: either 1 (primary) or 2 (secondary)\n"); + + fprintf(stderr, + "* swap_keys <iface> <peer_id>: swap content of primary and secondary key slots\n"); + fprintf(stderr, "\tiface: ovpn interface name\n"); + fprintf(stderr, "\tpeer_id: peer ID of the peer to modify\n"); + + fprintf(stderr, + "* listen_mcast: listen to ovpn netlink multicast messages\n"); +} + +static int ovpn_parse_remote(struct ovpn_ctx *ovpn, const char *host, + const char *service, const char *vpnip) +{ + int ret; + struct addrinfo *result; + struct addrinfo hints = { + .ai_family = ovpn->sa_family, + .ai_socktype = SOCK_DGRAM, + .ai_protocol = IPPROTO_UDP + }; + + if (host) { + ret = getaddrinfo(host, service, &hints, &result); + if (ret == EAI_NONAME || ret == EAI_FAIL) + return -1; + + if (!(result->ai_family == AF_INET && + result->ai_addrlen == sizeof(struct sockaddr_in)) && + !(result->ai_family == AF_INET6 && + result->ai_addrlen == sizeof(struct sockaddr_in6))) { + ret = -EINVAL; + goto out; + } + + memcpy(&ovpn->remote, result->ai_addr, result->ai_addrlen); + } + + if (vpnip) { + ret = getaddrinfo(vpnip, NULL, &hints, &result); + if (ret == EAI_NONAME || ret == EAI_FAIL) + return -1; + + if (!(result->ai_family == AF_INET && + result->ai_addrlen == sizeof(struct sockaddr_in)) && + !(result->ai_family == AF_INET6 && + result->ai_addrlen == sizeof(struct sockaddr_in6))) { + ret = -EINVAL; + goto out; + } + + memcpy(&ovpn->peer_ip, result->ai_addr, result->ai_addrlen); + ovpn->sa_family = result->ai_family; + + ovpn->peer_ip_set = true; + } + + ret = 0; +out: + freeaddrinfo(result); + return ret; +} + +static int ovpn_parse_new_peer(struct ovpn_ctx *ovpn, const char *peer_id, + const char *raddr, const char *rport, + const char *vpnip) +{ + ovpn->peer_id = strtoul(peer_id, NULL, 10); + if (errno == ERANGE || ovpn->peer_id > PEER_ID_UNDEF) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + return ovpn_parse_remote(ovpn, raddr, rport, vpnip); +} + +static int ovpn_parse_key_slot(const char *arg, struct ovpn_ctx *ovpn) +{ + int slot = strtoul(arg, NULL, 10); + + if (errno == ERANGE || slot < 1 || slot > 2) { + fprintf(stderr, "key slot out of range\n"); + return -1; + } + + switch (slot) { + case 1: + ovpn->key_slot = OVPN_KEY_SLOT_PRIMARY; + break; + case 2: + ovpn->key_slot = OVPN_KEY_SLOT_SECONDARY; + break; + } + + return 0; +} + +static int ovpn_send_tcp_data(int socket) +{ + uint16_t len = htons(1000); + uint8_t buf[1002]; + int ret; + + memcpy(buf, &len, sizeof(len)); + memset(buf + sizeof(len), 0x86, sizeof(buf) - sizeof(len)); + + ret = send(socket, buf, sizeof(buf), 0); + + fprintf(stdout, "Sent %u bytes over TCP socket\n", ret); + + return ret > 0 ? 0 : ret; +} + +static int ovpn_recv_tcp_data(int socket) +{ + uint8_t buf[1002]; + uint16_t len; + int ret; + + ret = recv(socket, buf, sizeof(buf), 0); + + if (ret < 2) { + fprintf(stderr, ">>>> Error while reading TCP data: %d\n", ret); + return ret; + } + + memcpy(&len, buf, sizeof(len)); + len = ntohs(len); + + fprintf(stdout, ">>>> Received %u bytes over TCP socket, header: %u\n", + ret, len); + +/* int i; + * for (i = 2; i < ret; i++) { + * fprintf(stdout, "0x%.2x ", buf[i]); + * if (i && !((i - 2) % 16)) + * fprintf(stdout, "\n"); + * } + * fprintf(stdout, "\n"); + */ + return 0; +} + +static enum ovpn_cmd ovpn_parse_cmd(const char *cmd) +{ + if (!strcmp(cmd, "new_iface")) + return CMD_NEW_IFACE; + + if (!strcmp(cmd, "del_iface")) + return CMD_DEL_IFACE; + + if (!strcmp(cmd, "listen")) + return CMD_LISTEN; + + if (!strcmp(cmd, "connect")) + return CMD_CONNECT; + + if (!strcmp(cmd, "new_peer")) + return CMD_NEW_PEER; + + if (!strcmp(cmd, "new_multi_peer")) + return CMD_NEW_MULTI_PEER; + + if (!strcmp(cmd, "set_peer")) + return CMD_SET_PEER; + + if (!strcmp(cmd, "del_peer")) + return CMD_DEL_PEER; + + if (!strcmp(cmd, "get_peer")) + return CMD_GET_PEER; + + if (!strcmp(cmd, "new_key")) + return CMD_NEW_KEY; + + if (!strcmp(cmd, "del_key")) + return CMD_DEL_KEY; + + if (!strcmp(cmd, "get_key")) + return CMD_GET_KEY; + + if (!strcmp(cmd, "swap_keys")) + return CMD_SWAP_KEYS; + + if (!strcmp(cmd, "listen_mcast")) + return CMD_LISTEN_MCAST; + + return CMD_INVALID; +} + +static int ovpn_run_cmd(struct ovpn_ctx *ovpn) +{ + char peer_id[10], vpnip[INET6_ADDRSTRLEN], raddr[128], rport[10]; + int n, ret; + FILE *fp; + + switch (ovpn->cmd) { + case CMD_NEW_IFACE: + ret = ovpn_new_iface(ovpn); + break; + case CMD_DEL_IFACE: + ret = ovpn_del_iface(ovpn); + break; + case CMD_LISTEN: + ret = ovpn_listen(ovpn, ovpn->sa_family); + if (ret < 0) { + fprintf(stderr, "cannot listen on TCP socket\n"); + return ret; + } + + fp = fopen(ovpn->peers_file, "r"); + if (!fp) { + fprintf(stderr, "cannot open file: %s\n", + ovpn->peers_file); + return -1; + } + + while ((n = fscanf(fp, "%s %s\n", peer_id, vpnip)) == 2) { + struct ovpn_ctx peer_ctx = { 0 }; + + peer_ctx.ifindex = ovpn->ifindex; + peer_ctx.sa_family = ovpn->sa_family; + + peer_ctx.socket = ovpn_accept(ovpn); + if (peer_ctx.socket < 0) { + fprintf(stderr, "cannot accept connection!\n"); + return -1; + } + + /* store the socket of the first peer to test TCP I/O */ + if (ovpn->cli_socket < 0) + ovpn->cli_socket = peer_ctx.socket; + + ret = ovpn_parse_new_peer(&peer_ctx, peer_id, NULL, + NULL, vpnip); + if (ret < 0) { + fprintf(stderr, "error while parsing line\n"); + return -1; + } + + ret = ovpn_new_peer(&peer_ctx, true); + if (ret < 0) { + fprintf(stderr, + "cannot add peer to VPN: %s %s\n", + peer_id, vpnip); + return ret; + } + } + + if (ovpn->cli_socket >= 0) + ret = ovpn_recv_tcp_data(ovpn->cli_socket); + + break; + case CMD_CONNECT: + ret = ovpn_connect(ovpn); + if (ret < 0) { + fprintf(stderr, "cannot connect TCP socket\n"); + return ret; + } + + ret = ovpn_new_peer(ovpn, true); + if (ret < 0) { + fprintf(stderr, "cannot add peer to VPN\n"); + close(ovpn->socket); + return ret; + } + + if (ovpn->cipher != OVPN_CIPHER_ALG_NONE) { + ret = ovpn_new_key(ovpn); + if (ret < 0) { + fprintf(stderr, "cannot set key\n"); + return ret; + } + } + + ret = ovpn_send_tcp_data(ovpn->socket); + break; + case CMD_NEW_PEER: + ret = ovpn_udp_socket(ovpn, AF_INET6); //ovpn->sa_family ? + if (ret < 0) + return ret; + + ret = ovpn_new_peer(ovpn, false); + break; + case CMD_NEW_MULTI_PEER: + ret = ovpn_udp_socket(ovpn, AF_INET6); + if (ret < 0) + return ret; + + fp = fopen(ovpn->peers_file, "r"); + if (!fp) { + fprintf(stderr, "cannot open file: %s\n", + ovpn->peers_file); + return -1; + } + + while ((n = fscanf(fp, "%s %s %s %s\n", peer_id, raddr, rport, + vpnip)) == 4) { + struct ovpn_ctx peer_ctx = { 0 }; + + peer_ctx.ifindex = ovpn->ifindex; + peer_ctx.socket = ovpn->socket; + peer_ctx.sa_family = AF_UNSPEC; + + ret = ovpn_parse_new_peer(&peer_ctx, peer_id, raddr, + rport, vpnip); + if (ret < 0) { + fprintf(stderr, "error while parsing line\n"); + return -1; + } + + ret = ovpn_new_peer(&peer_ctx, false); + if (ret < 0) { + fprintf(stderr, + "cannot add peer to VPN: %s %s %s %s\n", + peer_id, raddr, rport, vpnip); + return ret; + } + } + break; + case CMD_SET_PEER: + ret = ovpn_set_peer(ovpn); + break; + case CMD_DEL_PEER: + ret = ovpn_del_peer(ovpn); + break; + case CMD_GET_PEER: + if (ovpn->peer_id == PEER_ID_UNDEF) + fprintf(stderr, "List of peers connected to: %s\n", + ovpn->ifname); + + ret = ovpn_get_peer(ovpn); + break; + case CMD_NEW_KEY: + ret = ovpn_new_key(ovpn); + break; + case CMD_DEL_KEY: + ret = ovpn_del_key(ovpn); + break; + case CMD_GET_KEY: + ret = ovpn_get_key(ovpn); + break; + case CMD_SWAP_KEYS: + ret = ovpn_swap_keys(ovpn); + break; + case CMD_LISTEN_MCAST: + ret = ovpn_listen_mcast(); + break; + case CMD_INVALID: + break; + } + + return ret; +} + +static int ovpn_parse_cmd_args(struct ovpn_ctx *ovpn, int argc, char *argv[]) +{ + int ret; + + /* no args required for LISTEN_MCAST */ + if (ovpn->cmd == CMD_LISTEN_MCAST) + return 0; + + /* all commands need an ifname */ + if (argc < 3) + return -EINVAL; + + strscpy(ovpn->ifname, argv[2], IFNAMSIZ - 1); + ovpn->ifname[IFNAMSIZ - 1] = '\0'; + + /* all commands, except NEW_IFNAME, needs an ifindex */ + if (ovpn->cmd != CMD_NEW_IFACE) { + ovpn->ifindex = if_nametoindex(ovpn->ifname); + if (!ovpn->ifindex) { + fprintf(stderr, "cannot find interface: %s\n", + strerror(errno)); + return -1; + } + } + + switch (ovpn->cmd) { + case CMD_NEW_IFACE: + if (argc < 4) + break; + + if (!strcmp(argv[3], "P2P")) { + ovpn->mode = OVPN_MODE_P2P; + } else if (!strcmp(argv[3], "MP")) { + ovpn->mode = OVPN_MODE_MP; + } else { + fprintf(stderr, "Cannot parse iface mode: %s\n", + argv[3]); + return -1; + } + ovpn->mode_set = true; + break; + case CMD_DEL_IFACE: + break; + case CMD_LISTEN: + if (argc < 5) + return -EINVAL; + + ovpn->lport = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn->lport > 65535) { + fprintf(stderr, "lport value out of range\n"); + return -1; + } + + ovpn->peers_file = argv[4]; + + if (argc > 5 && !strcmp(argv[5], "ipv6")) + ovpn->sa_family = AF_INET6; + break; + case CMD_CONNECT: + if (argc < 6) + return -EINVAL; + + ovpn->sa_family = AF_INET; + + ret = ovpn_parse_new_peer(ovpn, argv[3], argv[4], argv[5], + NULL); + if (ret < 0) { + fprintf(stderr, "Cannot parse remote peer data\n"); + return -1; + } + + if (argc > 6) { + ovpn->key_slot = OVPN_KEY_SLOT_PRIMARY; + ovpn->key_id = 0; + ovpn->cipher = OVPN_CIPHER_ALG_AES_GCM; + ovpn->key_dir = KEY_DIR_OUT; + + ret = ovpn_parse_key(argv[6], ovpn); + if (ret) + return -1; + } + break; + case CMD_NEW_PEER: + if (argc < 7) + return -EINVAL; + + ovpn->lport = strtoul(argv[4], NULL, 10); + if (errno == ERANGE || ovpn->lport > 65535) { + fprintf(stderr, "lport value out of range\n"); + return -1; + } + + const char *vpnip = (argc > 7) ? argv[7] : NULL; + + ret = ovpn_parse_new_peer(ovpn, argv[3], argv[5], argv[6], + vpnip); + if (ret < 0) + return -1; + break; + case CMD_NEW_MULTI_PEER: + if (argc < 5) + return -EINVAL; + + ovpn->lport = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn->lport > 65535) { + fprintf(stderr, "lport value out of range\n"); + return -1; + } + + ovpn->peers_file = argv[4]; + break; + case CMD_SET_PEER: + if (argc < 6) + return -EINVAL; + + ovpn->peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn->peer_id > PEER_ID_UNDEF) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + ovpn->keepalive_interval = strtoul(argv[4], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, + "keepalive interval value out of range\n"); + return -1; + } + + ovpn->keepalive_timeout = strtoul(argv[5], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, + "keepalive interval value out of range\n"); + return -1; + } + break; + case CMD_DEL_PEER: + if (argc < 4) + return -EINVAL; + + ovpn->peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn->peer_id > PEER_ID_UNDEF) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + break; + case CMD_GET_PEER: + ovpn->peer_id = PEER_ID_UNDEF; + if (argc > 3) { + ovpn->peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE || ovpn->peer_id > PEER_ID_UNDEF) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + } + break; + case CMD_NEW_KEY: + if (argc < 9) + return -EINVAL; + + ovpn->peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + ret = ovpn_parse_key_slot(argv[4], ovpn); + if (ret) + return -1; + + ovpn->key_id = strtoul(argv[5], NULL, 10); + if (errno == ERANGE || ovpn->key_id > 2) { + fprintf(stderr, "key ID out of range\n"); + return -1; + } + + ret = ovpn_parse_cipher(argv[6], ovpn); + if (ret < 0) + return -1; + + ret = ovpn_parse_key_direction(argv[7], ovpn); + if (ret < 0) + return -1; + + ret = ovpn_parse_key(argv[8], ovpn); + if (ret) + return -1; + break; + case CMD_DEL_KEY: + if (argc < 4) + return -EINVAL; + + ovpn->peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + ret = ovpn_parse_key_slot(argv[4], ovpn); + if (ret) + return ret; + break; + case CMD_GET_KEY: + if (argc < 5) + return -EINVAL; + + ovpn->peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + + ret = ovpn_parse_key_slot(argv[4], ovpn); + if (ret) + return ret; + break; + case CMD_SWAP_KEYS: + if (argc < 4) + return -EINVAL; + + ovpn->peer_id = strtoul(argv[3], NULL, 10); + if (errno == ERANGE) { + fprintf(stderr, "peer ID value out of range\n"); + return -1; + } + break; + case CMD_LISTEN_MCAST: + break; + case CMD_INVALID: + break; + } + + return 0; +} + +int main(int argc, char *argv[]) +{ + struct ovpn_ctx ovpn; + int ret; + + if (argc < 2) { + usage(argv[0]); + return -1; + } + + memset(&ovpn, 0, sizeof(ovpn)); + ovpn.sa_family = AF_INET; + ovpn.cipher = OVPN_CIPHER_ALG_NONE; + ovpn.cli_socket = -1; + + ovpn.cmd = ovpn_parse_cmd(argv[1]); + if (ovpn.cmd == CMD_INVALID) { + fprintf(stderr, "Error: unknown command.\n\n"); + usage(argv[0]); + return -1; + } + + ret = ovpn_parse_cmd_args(&ovpn, argc, argv); + if (ret < 0) { + fprintf(stderr, "Error: invalid arguments.\n\n"); + if (ret == -EINVAL) + usage(argv[0]); + return ret; + } + + ret = ovpn_run_cmd(&ovpn); + if (ret) + fprintf(stderr, "Cannot execute command: %s (%d)\n", + strerror(-ret), ret); + + return ret; +} diff --git a/tools/testing/selftests/net/ovpn/tcp_peers.txt b/tools/testing/selftests/net/ovpn/tcp_peers.txt new file mode 100644 index 0000000000000000000000000000000000000000..d753eebe8716ed3588334ad766981e883ed2469a --- /dev/null +++ b/tools/testing/selftests/net/ovpn/tcp_peers.txt @@ -0,0 +1,5 @@ +1 5.5.5.2 +2 5.5.5.3 +3 5.5.5.4 +4 5.5.5.5 +5 5.5.5.6 diff --git a/tools/testing/selftests/net/ovpn/test-chachapoly.sh b/tools/testing/selftests/net/ovpn/test-chachapoly.sh new file mode 100755 index 0000000000000000000000000000000000000000..79788f10d33b9682ed27590a48d136eb50b2202c --- /dev/null +++ b/tools/testing/selftests/net/ovpn/test-chachapoly.sh @@ -0,0 +1,9 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli antonio@openvpn.net + +ALG="chachapoly" + +source test.sh diff --git a/tools/testing/selftests/net/ovpn/test-float.sh b/tools/testing/selftests/net/ovpn/test-float.sh new file mode 100755 index 0000000000000000000000000000000000000000..93e1b729861d6b3f9f3f2e19d84e524c293ee3cf --- /dev/null +++ b/tools/testing/selftests/net/ovpn/test-float.sh @@ -0,0 +1,9 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli antonio@openvpn.net + +FLOAT="1" + +source test.sh diff --git a/tools/testing/selftests/net/ovpn/test-tcp.sh b/tools/testing/selftests/net/ovpn/test-tcp.sh new file mode 100755 index 0000000000000000000000000000000000000000..7542f595cc5696396513ed029cb96fe3b922d0e4 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/test-tcp.sh @@ -0,0 +1,9 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli antonio@openvpn.net + +PROTO="TCP" + +source test.sh diff --git a/tools/testing/selftests/net/ovpn/test.sh b/tools/testing/selftests/net/ovpn/test.sh new file mode 100755 index 0000000000000000000000000000000000000000..07f3a82df8f3cb8e4d18cc4cbbee3bd6880396b0 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/test.sh @@ -0,0 +1,183 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2020-2024 OpenVPN, Inc. +# +# Author: Antonio Quartulli antonio@openvpn.net + +#set -x +set -e + +UDP_PEERS_FILE=${UDP_PEERS_FILE:-udp_peers.txt} +TCP_PEERS_FILE=${TCP_PEERS_FILE:-tcp_peers.txt} +OVPN_CLI=${OVPN_CLI:-./ovpn-cli} +ALG=${ALG:-aes} +PROTO=${PROTO:-UDP} +FLOAT=${FLOAT:-0} + +create_ns() { + ip netns add peer${1} +} + +setup_ns() { + MODE="P2P" + + if [ ${1} -eq 0 ]; then + MODE="MP" + for p in $(seq 1 ${NUM_PEERS}); do + ip link add veth${p} netns peer0 type veth peer name veth${p} netns peer${p} + + ip -n peer0 addr add 10.10.${p}.1/24 dev veth${p} + ip -n peer0 link set veth${p} up + + ip -n peer${p} addr add 10.10.${p}.2/24 dev veth${p} + ip -n peer${p} link set veth${p} up + done + fi + + ip netns exec peer${1} ${OVPN_CLI} new_iface tun${1} $MODE + ip -n peer${1} addr add ${2} dev tun${1} + ip -n peer${1} link set tun${1} up +} + +add_peer() { + if [ "${PROTO}" == "UDP" ]; then + if [ ${1} -eq 0 ]; then + ip netns exec peer0 ${OVPN_CLI} new_multi_peer tun0 1 ${UDP_PEERS_FILE} + + for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 ${ALG} 0 \ + data64.key + done + else + ip netns exec peer${1} ${OVPN_CLI} new_peer tun${1} ${1} 1 10.10.${1}.1 1 + ip netns exec peer${1} ${OVPN_CLI} new_key tun${1} ${1} 1 0 ${ALG} 1 \ + data64.key + fi + else + if [ ${1} -eq 0 ]; then + (ip netns exec peer0 ${OVPN_CLI} listen tun0 1 ${TCP_PEERS_FILE} && { + for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 \ + ${ALG} 0 data64.key + done + }) & + sleep 5 + else + ip netns exec peer${1} ${OVPN_CLI} connect tun${1} ${1} 10.10.${1}.1 1 \ + data64.key + fi + fi +} + +cleanup() { + for p in $(seq 1 10); do + ip -n peer0 link del veth${p} 2>/dev/null || true + done + for p in $(seq 0 10); do + ip netns exec peer${p} ${OVPN_CLI} del_iface tun${p} 2>/dev/null || true + ip netns del peer${p} 2>/dev/null || true + done +} + +if [ "${PROTO}" == "UDP" ]; then + NUM_PEERS=${NUM_PEERS:-$(wc -l ${UDP_PEERS_FILE} | awk '{print $1}')} +else + NUM_PEERS=${NUM_PEERS:-$(wc -l ${TCP_PEERS_FILE} | awk '{print $1}')} +fi + +cleanup + +modprobe -q ovpn || true + +for p in $(seq 0 ${NUM_PEERS}); do + create_ns ${p} +done + +for p in $(seq 0 ${NUM_PEERS}); do + setup_ns ${p} 5.5.5.$((${p} + 1))/24 +done + +for p in $(seq 0 ${NUM_PEERS}); do + add_peer ${p} +done + +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 60 120 + ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} ${p} 60 120 +done + +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ping -qfc 1000 -w 5 5.5.5.$((${p} + 1)) +done + +if [ "$FLOAT" == "1" ]; then + # make clients float.. + for p in $(seq 1 ${NUM_PEERS}); do + ip -n peer${p} addr del 10.10.${p}.2/24 dev veth${p} + ip -n peer${p} addr add 10.10.${p}.3/24 dev veth${p} + done + for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer${p} ping -qfc 1000 -w 5 5.5.5.1 + done +fi + +ip netns exec peer0 iperf3 -1 -s & +sleep 1 +ip netns exec peer1 iperf3 -Z -t 3 -c 5.5.5.1 + +echo "Adding secondary key and then swap:" +for p in $(seq 1 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 2 1 ${ALG} 0 data64.key + ip netns exec peer${p} ${OVPN_CLI} new_key tun${p} ${p} 2 1 ${ALG} 1 data64.key + ip netns exec peer${p} ${OVPN_CLI} swap_keys tun${p} ${p} +done + +sleep 1 +echo "Querying all peers:" +ip netns exec peer0 ${OVPN_CLI} get_peer tun0 +ip netns exec peer1 ${OVPN_CLI} get_peer tun1 + +echo "Querying peer 1:" +ip netns exec peer0 ${OVPN_CLI} get_peer tun0 1 + +echo "Querying non-existent peer 10:" +ip netns exec peer0 ${OVPN_CLI} get_peer tun0 10 || true + +echo "Deleting peer 1:" +ip netns exec peer0 ${OVPN_CLI} del_peer tun0 1 +ip netns exec peer1 ${OVPN_CLI} del_peer tun1 1 + +echo "Querying keys:" +for p in $(seq 2 ${NUM_PEERS}); do + ip netns exec peer${p} ${OVPN_CLI} get_key tun${p} ${p} 1 + ip netns exec peer${p} ${OVPN_CLI} get_key tun${p} ${p} 2 +done + +echo "Deleting keys:" +for p in $(seq 2 ${NUM_PEERS}); do + ip netns exec peer${p} ${OVPN_CLI} del_key tun${p} ${p} 1 + ip netns exec peer${p} ${OVPN_CLI} del_key tun${p} ${p} 2 +done + +echo "Setting timeout to 10s MP:" +# bring ifaces down to prevent traffic being sent +for p in $(seq 0 ${NUM_PEERS}); do + ip -n peer${p} link set tun${p} down +done +# set short timeout +for p in $(seq 2 ${NUM_PEERS}); do + ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 10 10 || true + ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} ${p} 0 0 +done +# wait for peers to timeout +sleep 15 + +echo "Setting timeout to 10s P2P:" +for p in $(seq 2 ${NUM_PEERS}); do + ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} ${p} 10 10 +done +sleep 15 + +cleanup + +modprobe -r ovpn || true diff --git a/tools/testing/selftests/net/ovpn/udp_peers.txt b/tools/testing/selftests/net/ovpn/udp_peers.txt new file mode 100644 index 0000000000000000000000000000000000000000..32f14bd9347a63e58438311b6d880b9fef768aa2 --- /dev/null +++ b/tools/testing/selftests/net/ovpn/udp_peers.txt @@ -0,0 +1,5 @@ +1 10.10.1.2 1 5.5.5.2 +2 10.10.2.2 1 5.5.5.3 +3 10.10.3.2 1 5.5.5.4 +4 10.10.4.2 1 5.5.5.5 +5 10.10.5.2 1 5.5.5.6
It seems some little changes to ovpn.yaml were not reflected in the generated files I committed.
Specifically I changed some U32 to BE32 (IPv4 addresses) and files were not regenerated before committing.
(I saw the failure in patchwork about this)
It seems I'll have to send v12 after all.
Cheers,
On 29/10/2024 11:47, Antonio Quartulli wrote:
Notable changes from v10:
- extended commit message of 23/23 with brief description of the output
- Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net
Please note that some patches were already reviewed by Andre Lunn, Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag since no major code modification has happened since the review.
The latest code can also be found at:
https://github.com/OpenVPN/linux-kernel-ovpn
Thanks a lot! Best Regards,
Antonio Quartulli OpenVPN Inc.
Antonio Quartulli (23): netlink: add NLA_POLICY_MAX_LEN macro net: introduce OpenVPN Data Channel Offload (ovpn) ovpn: add basic netlink support ovpn: add basic interface creation/destruction/management routines ovpn: keep carrier always on ovpn: introduce the ovpn_peer object ovpn: introduce the ovpn_socket object ovpn: implement basic TX path (UDP) ovpn: implement basic RX path (UDP) ovpn: implement packet processing ovpn: store tunnel and transport statistics ovpn: implement TCP transport ovpn: implement multi-peer support ovpn: implement peer lookup logic ovpn: implement keepalive mechanism ovpn: add support for updating local UDP endpoint ovpn: add support for peer floating ovpn: implement peer add/get/dump/delete via netlink ovpn: implement key add/get/del/swap via netlink ovpn: kill key and notify userspace in case of IV exhaustion ovpn: notify userspace when a peer is deleted ovpn: add basic ethtool support testing/selftests: add test tool and scripts for ovpn module
Documentation/netlink/specs/ovpn.yaml | 362 +++ MAINTAINERS | 11 + drivers/net/Kconfig | 14 + drivers/net/Makefile | 1 + drivers/net/ovpn/Makefile | 22 + drivers/net/ovpn/bind.c | 54 + drivers/net/ovpn/bind.h | 117 + drivers/net/ovpn/crypto.c | 214 ++ drivers/net/ovpn/crypto.h | 145 ++ drivers/net/ovpn/crypto_aead.c | 386 ++++ drivers/net/ovpn/crypto_aead.h | 33 + drivers/net/ovpn/io.c | 462 ++++ drivers/net/ovpn/io.h | 25 + drivers/net/ovpn/main.c | 337 +++ drivers/net/ovpn/main.h | 24 + drivers/net/ovpn/netlink-gen.c | 212 ++ drivers/net/ovpn/netlink-gen.h | 41 + drivers/net/ovpn/netlink.c | 1135 ++++++++++ drivers/net/ovpn/netlink.h | 18 + drivers/net/ovpn/ovpnstruct.h | 61 + drivers/net/ovpn/packet.h | 40 + drivers/net/ovpn/peer.c | 1201 ++++++++++ drivers/net/ovpn/peer.h | 165 ++ drivers/net/ovpn/pktid.c | 130 ++ drivers/net/ovpn/pktid.h | 87 + drivers/net/ovpn/proto.h | 104 + drivers/net/ovpn/skb.h | 56 + drivers/net/ovpn/socket.c | 178 ++ drivers/net/ovpn/socket.h | 55 + drivers/net/ovpn/stats.c | 21 + drivers/net/ovpn/stats.h | 47 + drivers/net/ovpn/tcp.c | 506 +++++ drivers/net/ovpn/tcp.h | 44 + drivers/net/ovpn/udp.c | 406 ++++ drivers/net/ovpn/udp.h | 26 + include/net/netlink.h | 1 + include/uapi/linux/if_link.h | 15 + include/uapi/linux/ovpn.h | 109 + include/uapi/linux/udp.h | 1 + tools/net/ynl/ynl-gen-c.py | 4 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ovpn/.gitignore | 2 + tools/testing/selftests/net/ovpn/Makefile | 17 + tools/testing/selftests/net/ovpn/config | 10 + tools/testing/selftests/net/ovpn/data64.key | 5 + tools/testing/selftests/net/ovpn/ovpn-cli.c | 2370 ++++++++++++++++++++ tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 + .../testing/selftests/net/ovpn/test-chachapoly.sh | 9 + tools/testing/selftests/net/ovpn/test-float.sh | 9 + tools/testing/selftests/net/ovpn/test-tcp.sh | 9 + tools/testing/selftests/net/ovpn/test.sh | 183 ++ tools/testing/selftests/net/ovpn/udp_peers.txt | 5 + 52 files changed, 9494 insertions(+), 1 deletion(-)
base-commit: ab101c553bc1f76a839163d1dc0d1e715ad6bb4e change-id: 20241002-b4-ovpn-eeee35c694a2
Best regards,
On Thu, 31 Oct 2024 11:00:05 +0100 Antonio Quartulli wrote:
It seems some little changes to ovpn.yaml were not reflected in the generated files I committed.
Specifically I changed some U32 to BE32 (IPv4 addresses) and files were not regenerated before committing.
(I saw the failure in patchwork about this)
It seems I'll have to send v12 after all.
I'll apply patch 1 already, it's tangentially related, and no point rebasing.
Hello:
This series was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Tue, 29 Oct 2024 11:47:13 +0100 you wrote:
Notable changes from v10:
- extended commit message of 23/23 with brief description of the output
- Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net
Please note that some patches were already reviewed by Andre Lunn, Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag since no major code modification has happened since the review.
[...]
Here is the summary with links: - [net-next,v11,01/23] netlink: add NLA_POLICY_MAX_LEN macro https://git.kernel.org/netdev/net-next/c/4138e9ec0093 - [net-next,v11,02/23] net: introduce OpenVPN Data Channel Offload (ovpn) (no matching commit) - [net-next,v11,03/23] ovpn: add basic netlink support (no matching commit) - [net-next,v11,04/23] ovpn: add basic interface creation/destruction/management routines (no matching commit) - [net-next,v11,05/23] ovpn: keep carrier always on (no matching commit) - [net-next,v11,06/23] ovpn: introduce the ovpn_peer object (no matching commit) - [net-next,v11,07/23] ovpn: introduce the ovpn_socket object (no matching commit) - [net-next,v11,08/23] ovpn: implement basic TX path (UDP) (no matching commit) - [net-next,v11,09/23] ovpn: implement basic RX path (UDP) (no matching commit) - [net-next,v11,10/23] ovpn: implement packet processing (no matching commit) - [net-next,v11,11/23] ovpn: store tunnel and transport statistics (no matching commit) - [net-next,v11,12/23] ovpn: implement TCP transport (no matching commit) - [net-next,v11,13/23] ovpn: implement multi-peer support (no matching commit) - [net-next,v11,14/23] ovpn: implement peer lookup logic (no matching commit) - [net-next,v11,15/23] ovpn: implement keepalive mechanism (no matching commit) - [net-next,v11,16/23] ovpn: add support for updating local UDP endpoint (no matching commit) - [net-next,v11,17/23] ovpn: add support for peer floating (no matching commit) - [net-next,v11,18/23] ovpn: implement peer add/get/dump/delete via netlink (no matching commit) - [net-next,v11,19/23] ovpn: implement key add/get/del/swap via netlink (no matching commit) - [net-next,v11,20/23] ovpn: kill key and notify userspace in case of IV exhaustion (no matching commit) - [net-next,v11,21/23] ovpn: notify userspace when a peer is deleted (no matching commit) - [net-next,v11,22/23] ovpn: add basic ethtool support (no matching commit) - [net-next,v11,23/23] testing/selftests: add test tool and scripts for ovpn module (no matching commit)
You are awesome, thank you!
Hi Antonio,
On 29.10.2024 12:47, Antonio Quartulli wrote:
Notable changes from v10:
- extended commit message of 23/23 with brief description of the output
- Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net
Please note that some patches were already reviewed by Andre Lunn, Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag since no major code modification has happened since the review.
The latest code can also be found at:
As I promised many months ago I am starting publishing some nit picks regarding the series. The review was started when series was V3 "rebasing" the review to every next version to publish it at once. But I lost this race to the new version releasing velocity :) So, I am going to publish it patch-by-patch.
Anyway you and all participants have done a great progress toward making accelerator part of the kernel. Most of considerable things already resolved so do not wait me please to finish picking every nit.
Regarding "big" topics I have only two concerns: link creation using RTNL and a switch statement usage. In the corresponding thread, I asked Jiri to clarify that "should" regarding .newlink implementation. Hope he will have a chance to find a time to reply.
For the 'switch' statement, I see a repeating pattern of handling mode-or family-specific cases like this:
int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) { switch (ovpn->mode) { case OVPN_MODE_MP: return ovpn_peer_add_mp(ovpn, peer); case OVPN_MODE_P2P: return ovpn_peer_add_p2p(ovpn, peer); default: return -EOPNOTSUPP; } }
or
void ovpn_encrypt_post(void *data, int ret) { ... switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP: ovpn_udp_send_skb(peer->ovpn, peer, skb); break; case IPPROTO_TCP: ovpn_tcp_send_skb(peer, skb); break; default: /* no transport configured yet */ goto err; } ... }
or
void ovpn_peer_keepalive_work(...) { ... switch (ovpn->mode) { case OVPN_MODE_MP: next_run = ovpn_peer_keepalive_work_mp(ovpn, now); break; case OVPN_MODE_P2P: next_run = ovpn_peer_keepalive_work_p2p(ovpn, now); break; } ... }
Did you consider to implement mode specific operations as a set of operations like this:
ovpn_ops { int (*peer_add)(struct ovpn_struct *ovpn, struct ovpn_peer *peer); int (*peer_del)(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason); void (*send_skb)(struct ovpn_peer *peer, struct sk_buff *skb); time64_t (*keepalive_work)(...); };
Initialize them during the interface creation and invoke these operations indirectly. E.g.
int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) { return ovpn->ops->peer_add(ovpn, peer); }
void ovpn_encrypt_post(void *data, int ret) { ... ovpn->ops->send_skb(peer, skb); ... }
void ovpn_peer_keepalive_work(...) { ... next_run = ovpn->ops->keepalive_work(ovpn, now); ... }
Anyway the module has all these option values in advance during the network interface creation phase and I believe replacing 'switch' statements with indirect calls can make code easy to read.
-- Sergey
On 06/11/2024 02:18, Sergey Ryazanov wrote:
Hi Antonio,
On 29.10.2024 12:47, Antonio Quartulli wrote:
Notable changes from v10:
- extended commit message of 23/23 with brief description of the output
- Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-
b87530777be7@openvpn.net
Please note that some patches were already reviewed by Andre Lunn, Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag since no major code modification has happened since the review.
The latest code can also be found at:
As I promised many months ago I am starting publishing some nit picks regarding the series.
Thanks and welcome back!
The review was started when series was V3 "rebasing" the review to every next version to publish it at once. But I lost this race to the new version releasing velocity :) So, I am going to publish it patch-by-patch.
Anyway you and all participants have done a great progress toward making accelerator part of the kernel. Most of considerable things already resolved so do not wait me please to finish picking every nit.
I'll go through them all and judge what's meaningful to add to v12 and what can be postponed for later improvements.
Regarding "big" topics I have only two concerns: link creation using RTNL and a switch statement usage. In the corresponding thread, I asked Jiri to clarify that "should" regarding .newlink implementation. Hope he will have a chance to find a time to reply.
True, but to be honest at this point I am fine with sticking to RTNL, also because we will soon introduce the ability to create 'persistent' ifaces, which a user should be able to create before starting openvpn.
Going through RTNL for this is the best choice IMHO, therefore we have an extra use case in favour of this approach (next to what Jiri already mentioned).
For the 'switch' statement, I see a repeating pattern of handling mode- or family-specific cases like this:
int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) { switch (ovpn->mode) { case OVPN_MODE_MP: return ovpn_peer_add_mp(ovpn, peer); case OVPN_MODE_P2P: return ovpn_peer_add_p2p(ovpn, peer); default: return -EOPNOTSUPP; } }
or
void ovpn_encrypt_post(void *data, int ret) { ... switch (peer->sock->sock->sk->sk_protocol) { case IPPROTO_UDP: ovpn_udp_send_skb(peer->ovpn, peer, skb); break; case IPPROTO_TCP: ovpn_tcp_send_skb(peer, skb); break; default: /* no transport configured yet */ goto err; } ... }
or
void ovpn_peer_keepalive_work(...) { ... switch (ovpn->mode) { case OVPN_MODE_MP: next_run = ovpn_peer_keepalive_work_mp(ovpn, now); break; case OVPN_MODE_P2P: next_run = ovpn_peer_keepalive_work_p2p(ovpn, now); break; } ... }
Did you consider to implement mode specific operations as a set of operations like this:
ovpn_ops { int (*peer_add)(struct ovpn_struct *ovpn, struct ovpn_peer *peer); int (*peer_del)(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason); void (*send_skb)(struct ovpn_peer *peer, struct sk_buff *skb); time64_t (*keepalive_work)(...); };
Initialize them during the interface creation and invoke these operations indirectly. E.g.
int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer) { return ovpn->ops->peer_add(ovpn, peer); }
void ovpn_encrypt_post(void *data, int ret) { ... ovpn->ops->send_skb(peer, skb); ... }
void ovpn_peer_keepalive_work(...) { ... next_run = ovpn->ops->keepalive_work(ovpn, now); ... }
Anyway the module has all these option values in advance during the network interface creation phase and I believe replacing 'switch' statements with indirect calls can make code easy to read.
I see this was already discussed with Sabrina under another patch and I have the same opinion.
To me the switch/case approach looks cleaner and I truly like it, especially when enums are involved.
ops/callbacks are fine when they can be redefined at runtime (i.e. a proto that can be registered by another module), but this is not the case here. I also feel that with ops it's not easy to understand what call is truly being made by just looking at the caller context and reading can be harder.
So I truly prefer to stick to this schema.
Thanks a lot for sharing your point though.
Regards,
On 14.11.2024 17:33, Antonio Quartulli wrote:
On 06/11/2024 02:18, Sergey Ryazanov wrote:
Regarding "big" topics I have only two concerns: link creation using RTNL and a switch statement usage. In the corresponding thread, I asked Jiri to clarify that "should" regarding .newlink implementation. Hope he will have a chance to find a time to reply.
True, but to be honest at this point I am fine with sticking to RTNL, also because we will soon introduce the ability to create 'persistent' ifaces, which a user should be able to create before starting openvpn.
Could you share the use case for this functionality?
Going through RTNL for this is the best choice IMHO, therefore we have an extra use case in favour of this approach (next to what Jiri already mentioned).
In absence of arguments it's hard to understand, what's the "best" meaning. So, I'm still not sure is it worth to split uAPI between two interfaces. Anyway, it's up to maintainers to decide is it mergeable in this form or not. I just shared some arguments for the full management interface in GENL.
-- Sergey
On 14/11/2024 23:10, Sergey Ryazanov wrote:
On 14.11.2024 17:33, Antonio Quartulli wrote:
On 06/11/2024 02:18, Sergey Ryazanov wrote:
Regarding "big" topics I have only two concerns: link creation using RTNL and a switch statement usage. In the corresponding thread, I asked Jiri to clarify that "should" regarding .newlink implementation. Hope he will have a chance to find a time to reply.
True, but to be honest at this point I am fine with sticking to RTNL, also because we will soon introduce the ability to create 'persistent' ifaces, which a user should be able to create before starting openvpn.
Could you share the use case for this functionality?
This is better asked to the users mailing list, but, for example, we recently had a discussion about this with the VyOS guys, where they want to create the interface and have it fit the various firewall/routing/chachacha logic, before any daemon is started.
In OpenVPN userspace we already support the --mktun directive to help with this specific use case, so it's a long existing use case.
Going through RTNL for this is the best choice IMHO, therefore we have an extra use case in favour of this approach (next to what Jiri already mentioned).
In absence of arguments it's hard to understand, what's the "best" meaning.
well, that's why I added "IMHO" :)
So, I'm still not sure is it worth to split uAPI between two interfaces. Anyway, it's up to maintainers to decide is it mergeable in this form or not. I just shared some arguments for the full management interface in GENL.
well, doing it differently from all other virtual drivers also requires some important reason IMHO.
Anyway, I like the idea that iproute2 can be used to create interfaces, without the need to have another userspace tool for that.
Regards,
linux-kselftest-mirror@lists.linaro.org