This is similar to TCP-MD5 in functionality but it's sufficiently different that packet formats and interfaces are incompatible. Compared to TCP-MD5 more algorithms are supported and multiple keys can be used on the same connection but there is still no negotiation mechanism.
Expected use-case is protecting long-duration BGP/LDP connections between routers using pre-shared keys. The goal of this series is to allow routers using the Linux TCP stack to interoperate with vendors such as Cisco and Juniper. An fully-featured userspace implementation using this patchset exists but it is not open.
A completely unrelated series that implements the same features was posted recently: https://lore.kernel.org/netdev/20220818170005.747015-1-dima@arista.com/
The biggest difference is that this series puts TCP-AO key on a global instead of per-socket list and that it attempts to make kernel-mode key selection decisions instead of very strictly requiring userspace to make all decisions.
I believe my approach greatly simplifies userspace implementation. The biggest difference in this iteration of the patch series is adding per-key lifetime values based on RFC8177 in order to implement kernel-mode key rollover.
Older versions still required userspace to tweak the NOSEND/NORECV flags and always pick rnextkeyid explicitly, but now no active "key management" should be required on established socket - Just set correct flags and expiration dates and the kernel can perform key rollover itself. You can see a (simple) test of that behavior here:
https://github.com/cdleonard/tcp-authopt-test/blob/main/tcp_authopt_test/tes...
The main implementation of this behavior is patch 17.
Very very old versions of this series had per-socket keys but that approach was prone to an issue when key change made on a listen socket between "synack" and "accept" did not affect the new socket.
My solution was to make keys global, the Arista solution is to require userspace to query the key list on accepted sockets and update them. This offloads responsibility for an ABI race to userspace. It can be made to work.
Here are some known flaws and limitations:
* Crypto API is used with buffers on the stack and inside struct sock, this might not work on all arches. I'm currently only testing x64 VMs * Interaction with FASTOPEN not tested. * Traffic key is not cached (reducing performance). * All lookups examine all keys, ignoring optimization opportunities * Overlaping MKTs can be configured despite what RFC5925 says. This is considered "misconfiguration by userspace" and it would make sense for the kernel to be more aggressive here.
Some testing support is included in nettest and fcnal-test.sh, similar to the current level of tcp-md5 testing.
A more elaborate test suite using pytest and scapy is available out of tree: https://github.com/cdleonard/tcp-authopt-test There is an automatic system that runs that test suite in vagrant in gitlab-ci: https://gitlab.com/cdleonard/vagrantcpao That test suite fully covers the ABI of this patchset.
Changes for frr (obsolete): https://github.com/FRRouting/frr/pull/9442 That PR was made early for ABI feedback, it has many issues.
Changes for yabgp (obsolete): https://github.com/cdleonard/yabgp/commits/tcp_authopt This was used for interoperability testing with cisco. Would need updates for global keys to avoid leaks.
Changes since PATCH v7: * Add lifetime fields to struct tcp_authopt_key * Fix not checking MD5 after unexpected AO. Link to v7: https://lore.kernel.org/netdev/cover.1660852705.git.cdleonard@gmail.com/
Changes since PATCH v6: * Squash "remove unused noops" patch (forgot to do this before v5 send). * Make TCP_REPAIR_AUTHOPT fail if (!tp->repair) * Add {snd,rcv}_seq to struct tcp_repair_authopt next to {snd,rcv}_sne. The fact that internally snd_sne is maintained as a 64-bit extension of sne_nxt is a problem for TCP_REPAIR implementation in userspace which might not have access to snd_nxt during live traffic. By exposing a full 64-bit “recent sequence number” to userspace it's possible to ignore which exact SEQ number the SNE value is an extension of. * Fix ipv6_addr_is_prefix helper; it was incorrect and dependant on uninitialized stack memory. This was caught by test suite after many rebases. * Implement ipv4-mapped-ipv6 support, request by Eric Dumazet Link: https://lore.kernel.org/netdev/cover.1658815925.git.cdleonard@gmail.com/
Changes since PATCH v5: * Rebased on recent net-next, including recent changes refactoring md5 * Use to skb_drop_reason * Fix using sock_kmalloc for key alloc but regular kfree for free. Use kmalloc because keys are global * Fix mentioning non-existent copy_from_sockopt in doc for _copy_from_sockptr_tolerant * If no valid keys are available for a destination then report a socket error instead of sending unsigned traffic * Remove several noop implementations which are always called from ifdef * Fix build issues in all scenarios, including -Werror at every point. * Split "tcp: Refactor tcp_inbound_md5_hash into tcp_inbound_sig_hash" into a separate commit. * Add TCP_AUTHOPT_FLAG_ACTIVE to distinguish between "keys configured for socket" and "connection authenticated". A listen socket with authentication enabled will return other sockets with authentication enabled on accept() but if no key is configured for the peer then authentication will be inactive. * Add support for TCP_REPAIR_AUTHOPT new sockopts which loads/saves the AO-specific information. Link: https://lore.kernel.org/netdev/cover.1643026076.git.cdleonard@gmail.com/
Changes since PATCH v4: * Move the traffic_key context_bytes header to stack. If it's a constant string then ahash can fail unexpectedly. * Fix allowing unsigned traffic if all keys are marked norecv. * Fix crashing in __tcp_authopt_alg_init on failure. * Try to respect the rnextkeyid from SYN on SYNACK (new patch) * Fix incorrect check for TCP_AUTHOPT_KEY_DEL in __tcp_authopt_select_key * Improve docs on __tcp_authopt_select_key * Fix build with CONFIG_PROC_FS=n (kernel build robot) * Fix build with CONFIG_IPV6=n (kernel build robot) Link: https://lore.kernel.org/netdev/cover.1640273966.git.cdleonard@gmail.com/
Changes since PATCH v3: * Made keys global (per-netns rather than per-sock). * Add /proc/net/tcp_authopt with a table of keys (not sockets). * Fix part of the shash/ahash conversion having slipped from patch 3 to patch 5 * Fix tcp_parse_sig_options assigning NULL incorrectly when both MD5 and AO are disabled (kernel build robot) * Fix sparse endianness warnings in prefix match (kernel build robot) * Fix several incorrect RCU annotations reported by sparse (kernel build robot) Link: https://lore.kernel.org/netdev/cover.1638962992.git.cdleonard@gmail.com/
Changes since PATCH v2: * Protect tcp_authopt_alg_get/put_tfm with local_bh_disable instead of preempt_disable. This caused signature corruption when send path executing with BH enabled was interrupted by recv. * Fix accepted keyids not configured locally as "unexpected". If any key is configured that matches the peer then traffic MUST be signed. * Fix issues related to sne rollover during handshake itself. (Francesco) * Implement and test prefixlen (David) * Replace shash with ahash and reuse some of the MD5 code (Dmitry) * Parse md5+ao options only once in the same function (Dmitry) * Pass tcp_authopt_info into inbound check path, this avoids second rcu dereference for same packet. * Pass tcp_request_socket into inbound check path instead of just listen socket. This is required for SNE rollover during handshake and clearifies ISN handling. * Do not allow disabling via sysctl after enabling once, this is difficult to support well (David) * Verbose check for sysctl_tcp_authopt (Dmitry) * Use netif_index_is_l3_master (David) * Cleanup ipvx_addr_match (David) * Add a #define tcp_authopt_needed to wrap static key usage because it looks nicer. * Replace rcu_read_lock with rcu_dereference_protected in SNE updates (Eric) * Remove test suite Link: https://lore.kernel.org/netdev/cover.1635784253.git.cdleonard@gmail.com/
Changes since PATCH v1: * Implement Sequence Number Extension * Implement l3index for vrf: TCP_AUTHOPT_KEY_IFINDEX as equivalent of TCP_MD5SIG_FLAG_IFINDEX * Expand TCP-AO tests in fcnal-test.sh to near-parity with md5. * Show addr/port on failure similar to md5 * Remove tox dependency from test suite (create venv directly) * Switch default pytest output format to TAP (kselftest standard) * Fix _copy_from_sockptr_tolerant stack corruption on short sockopts. This was covered in test but error was invisible without STACKPROTECTOR=y * Fix sysctl_tcp_authopt check in tcp_get_authopt_val before memset. This was harmless because error code is checked in getsockopt anyway. * Fix dropping md5 packets on all sockets with AO enabled * Fix checking (key->recv_id & TCP_AUTHOPT_KEY_ADDR_BIND) instead of key->flags in tcp_authopt_key_match_exact * Fix PATCH 1/19 not compiling due to missing "int err" declaration * Add ratelimited message for AO and MD5 both present * Export all symbols required by CONFIG_IPV6=m (again) * Fix compilation with CONFIG_TCP_AUTHOPT=y CONFIG_TCP_MD5SIG=n * Fix checkpatch issues * Pass -rrequirements.txt to tox to avoid dependency variation. Link: https://lore.kernel.org/netdev/cover.1632240523.git.cdleonard@gmail.com/
Changes since RFCv3: * Implement TCP_AUTHOPT handling for timewait and reset replies. Write tests to execute these paths by injecting packets with scapy * Handle combining md5 and authopt: if both are configured use authopt. * Fix locking issues around send_key, introduced in on of the later patches. * Handle IPv4-mapped-IPv6 addresses: it used to be that an ipv4 SYN sent to an ipv6 socket with TCP-AO triggered WARN * Implement un-namespaced sysctl disabled this feature by default * Allocate new key before removing any old one in setsockopt (Dmitry) * Remove tcp_authopt_key_info.local_id because it's no longer used (Dmitry) * Propagate errors from TCP_AUTHOPT getsockopt (Dmitry) * Fix no-longer-correct TCP_AUTHOPT_KEY_DEL docs (Dmitry) * Simplify crypto allocation (Eric) * Use kzmalloc instead of __GFP_ZERO (Eric) * Add static_key_false tcp_authopt_needed (Eric) * Clear authopt_info copied from oldsk in __tcp_authopt_openreq (Eric) * Replace memcmp in ipv4 and ipv6 addr comparisons (Eric) * Export symbols for CONFIG_IPV6=m (kernel test robot) * Mark more functions static (kernel test robot) * Fix build with CONFIG_PROVE_RCU_LIST=y (kernel test robot) Link: https://lore.kernel.org/netdev/cover.1629840814.git.cdleonard@gmail.com/
Changes since RFCv2: * Removed local_id from ABI and match on send_id/recv_id/addr * Add all relevant out-of-tree tests to tools/testing/selftests * Return an error instead of ignoring unknown flags, hopefully this makes it easier to extend. * Check sk_family before __tcp_authopt_info_get_or_create in tcp_set_authopt_key * Use sock_owned_by_me instead of WARN_ON(!lockdep_sock_is_held(sk)) * Fix some intermediate build failures reported by kbuild robot * Improve documentation Link: https://lore.kernel.org/netdev/cover.1628544649.git.cdleonard@gmail.com/
Changes since RFC: * Split into per-topic commits for ease of review. The intermediate commits compile with a few "unused function" warnings and don't do anything useful by themselves. * Add ABI documention including kernel-doc on uapi * Fix lockdep warnings from crypto by creating pools with one shash for each cpu * Accept short options to setsockopt by padding with zeros; this approach allows increasing the size of the structs in the future. * Support for aes-128-cmac-96 * Support for binding addresses to keys in a way similar to old tcp_md5 * Add support for retrieving received keyid/rnextkeyid and controling the keyid/rnextkeyid being sent. Link: https://lore.kernel.org/netdev/01383a8751e97ef826ef2adf93bfde3a08195a43.1626...
Leonard Crestez (26): tcp: authopt: Initial support and key management docs: Add user documentation for tcp_authopt tcp: authopt: Add crypto initialization tcp: Refactor tcp_sig_hash_skb_data for AO tcp: authopt: Compute packet signatures tcp: Refactor tcp_inbound_md5_hash into tcp_inbound_sig_hash tcp: authopt: Hook into tcp core tcp: authopt: Disable via sysctl by default tcp: authopt: Implement Sequence Number Extension tcp: ipv6: Add AO signing for tcp_v6_send_response tcp: authopt: Add support for signing skb-less replies tcp: ipv4: Add AO signing for skb-less replies tcp: authopt: Add NOSEND/NORECV flags tcp: authopt: Add initial l3index support tcp: authopt: Add prefixlen support tcp: authopt: Add send/recv lifetime support tcp: authopt: Add key selection controls tcp: authopt: Add v4mapped ipv6 address support tcp: authopt: Add /proc/net/tcp_authopt listing all keys tcp: authopt: If no keys are valid for send report an error tcp: authopt: Try to respect rnextkeyid from SYN on SYNACK tcp: authopt: Initial support for TCP_AUTHOPT_FLAG_ACTIVE tcp: authopt: Initial implementation of TCP_REPAIR_AUTHOPT selftests: nettest: Rename md5_prefix to key_addr_prefix selftests: nettest: Initial tcp_authopt support selftests: net/fcnal: Initial tcp_authopt support
Documentation/networking/index.rst | 1 + Documentation/networking/ip-sysctl.rst | 6 + Documentation/networking/tcp_authopt.rst | 95 + include/linux/tcp.h | 15 + include/net/dropreason.h | 16 + include/net/net_namespace.h | 4 + include/net/netns/tcp_authopt.h | 12 + include/net/tcp.h | 55 +- include/net/tcp_authopt.h | 269 +++ include/uapi/linux/snmp.h | 1 + include/uapi/linux/tcp.h | 188 ++ net/ipv4/Kconfig | 14 + net/ipv4/Makefile | 1 + net/ipv4/proc.c | 1 + net/ipv4/sysctl_net_ipv4.c | 39 + net/ipv4/tcp.c | 126 +- net/ipv4/tcp_authopt.c | 2044 +++++++++++++++++++++ net/ipv4/tcp_input.c | 55 +- net/ipv4/tcp_ipv4.c | 100 +- net/ipv4/tcp_minisocks.c | 12 + net/ipv4/tcp_output.c | 106 +- net/ipv6/tcp_ipv6.c | 70 +- tools/testing/selftests/net/fcnal-test.sh | 329 +++- tools/testing/selftests/net/nettest.c | 204 +- 24 files changed, 3675 insertions(+), 88 deletions(-) create mode 100644 Documentation/networking/tcp_authopt.rst create mode 100644 include/net/netns/tcp_authopt.h create mode 100644 include/net/tcp_authopt.h create mode 100644 net/ipv4/tcp_authopt.c
-- 2.25.1
This commit adds support to add and remove keys but does not use them further.
Similar to tcp md5 a single pointer to a struct tcp_authopt_info* struct is added to struct tcp_sock, this avoids increasing memory usage. The data structures related to tcp_authopt are initialized on setsockopt and only freed on socket close.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/linux/tcp.h | 9 + include/net/net_namespace.h | 4 + include/net/netns/tcp_authopt.h | 12 ++ include/net/tcp.h | 1 + include/net/tcp_authopt.h | 70 +++++++ include/uapi/linux/tcp.h | 81 ++++++++ net/ipv4/Kconfig | 14 ++ net/ipv4/Makefile | 1 + net/ipv4/tcp.c | 32 ++++ net/ipv4/tcp_authopt.c | 317 ++++++++++++++++++++++++++++++++ net/ipv4/tcp_ipv4.c | 2 + 11 files changed, 543 insertions(+) create mode 100644 include/net/netns/tcp_authopt.h create mode 100644 include/net/tcp_authopt.h create mode 100644 net/ipv4/tcp_authopt.c
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a9fbe22732c3..551942883f06 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -170,10 +170,12 @@ struct tcp_request_sock { static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) { return (struct tcp_request_sock *)req; }
+struct tcp_authopt_info; + struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; u16 tcp_header_len; /* Bytes of tcp header to send */ u16 gso_segs; /* Max number of segs per GSO packet */ @@ -434,10 +436,14 @@ struct tcp_sock {
/* TCP MD5 Signature Option information */ struct tcp_md5sig_info __rcu *md5sig_info; #endif
+#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info __rcu *authopt_info; +#endif + /* TCP fastopen related information */ struct tcp_fastopen_request *fastopen_req; /* fastopen_rsk points to request_sock that resulted in this big * socket. Used to retransmit SYNACKs etc. */ @@ -484,10 +490,13 @@ struct tcp_timewait_sock { int tw_ts_recent_stamp; u32 tw_tx_delay; #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *tw_md5_key; #endif +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info *tw_authopt_info; +#endif };
static inline struct tcp_timewait_sock *tcp_twsk(const struct sock *sk) { return (struct tcp_timewait_sock *)sk; diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 8c3587d5c308..30964366951d 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -35,10 +35,11 @@ #include <net/netns/can.h> #include <net/netns/xdp.h> #include <net/netns/smc.h> #include <net/netns/bpf.h> #include <net/netns/mctp.h> +#include <net/netns/tcp_authopt.h> #include <net/net_trackers.h> #include <linux/ns_common.h> #include <linux/idr.h> #include <linux/skbuff.h> #include <linux/notifier.h> @@ -184,10 +185,13 @@ struct net { #endif struct sock *diag_nlsk; #if IS_ENABLED(CONFIG_SMC) struct netns_smc smc; #endif +#if IS_ENABLED(CONFIG_TCP_AUTHOPT) + struct netns_tcp_authopt tcp_authopt; +#endif } __randomize_layout;
#include <linux/seq_file_net.h>
/* Init's network namespace */ diff --git a/include/net/netns/tcp_authopt.h b/include/net/netns/tcp_authopt.h new file mode 100644 index 000000000000..03b7f4e58448 --- /dev/null +++ b/include/net/netns/tcp_authopt.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __NETNS_TCP_AUTHOPT_H__ +#define __NETNS_TCP_AUTHOPT_H__ + +#include <linux/mutex.h> + +struct netns_tcp_authopt { + struct hlist_head head; + struct mutex mutex; +}; + +#endif /* __NETNS_TCP_AUTHOPT_H__ */ diff --git a/include/net/tcp.h b/include/net/tcp.h index d10962b9f0d0..9955a88faf9b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -184,10 +184,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCPOPT_WINDOW 3 /* Window scaling */ #define TCPOPT_SACK_PERM 4 /* SACK Permitted */ #define TCPOPT_SACK 5 /* SACK Block */ #define TCPOPT_TIMESTAMP 8 /* Better RTT estimations/PAWS */ #define TCPOPT_MD5SIG 19 /* MD5 Signature (RFC2385) */ +#define TCPOPT_AUTHOPT 29 /* Auth Option (RFC5925) */ #define TCPOPT_MPTCP 30 /* Multipath TCP (RFC6824) */ #define TCPOPT_FASTOPEN 34 /* Fast open (RFC7413) */ #define TCPOPT_EXP 254 /* Experimental */ /* Magic number to be after the option value for sharing TCP * experimental options. See draft-ietf-tcpm-experimental-options-00.txt diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h new file mode 100644 index 000000000000..bc2cff82830d --- /dev/null +++ b/include/net/tcp_authopt.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_TCP_AUTHOPT_H +#define _LINUX_TCP_AUTHOPT_H + +#include <uapi/linux/tcp.h> +#include <net/netns/tcp_authopt.h> +#include <linux/tcp.h> + +/** + * struct tcp_authopt_key_info - Representation of a Master Key Tuple as per RFC5925 + * + * Key structure lifetime is protected by RCU so send/recv code needs to hold a + * single rcu_read_lock until they're done with the key. + * + * Global keys can be cached in sockets, this requires increasing kref. + */ +struct tcp_authopt_key_info { + /** @node: node in &netns_tcp_authopt.head list */ + struct hlist_node node; + /** @rcu: for kfree_rcu */ + struct rcu_head rcu; + /** @ref: for kref_put */ + struct kref ref; + /** @flags: Combination of &enum tcp_authopt_key_flag */ + u32 flags; + /** @send_id: Same as &tcp_authopt_key.send_id */ + u8 send_id; + /** @recv_id: Same as &tcp_authopt_key.recv_id */ + u8 recv_id; + /** @alg_id: Same as &tcp_authopt_key.alg */ + u8 alg_id; + /** @keylen: Same as &tcp_authopt_key.keylen */ + u8 keylen; + /** @key: Same as &tcp_authopt_key.key */ + u8 key[TCP_AUTHOPT_MAXKEYLEN]; + /** @addr: Same as &tcp_authopt_key.addr */ + struct sockaddr_storage addr; +}; + +/** + * struct tcp_authopt_info - Per-socket information regarding tcp_authopt + * + * This is lazy-initialized in order to avoid increasing memory usage for + * regular TCP sockets. Once created it is only destroyed on socket close. + */ +struct tcp_authopt_info { + /** @rcu: for kfree_rcu */ + struct rcu_head rcu; + /** @flags: Combination of &enum tcp_authopt_flag */ + u32 flags; + /** @src_isn: Local Initial Sequence Number */ + u32 src_isn; + /** @dst_isn: Remote Initial Sequence Number */ + u32 dst_isn; +}; + +#ifdef CONFIG_TCP_AUTHOPT +DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed_key); +#define tcp_authopt_needed (static_branch_unlikely(&tcp_authopt_needed_key)) +void tcp_authopt_clear(struct sock *sk); +int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); +int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); +int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); +#else +static inline void tcp_authopt_clear(struct sock *sk) +{ +} +#endif + +#endif /* _LINUX_TCP_AUTHOPT_H */ diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 8fc09e8638b3..76d7be6b27f4 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -126,10 +126,12 @@ enum { #define TCP_INQ 36 /* Notify bytes available to read as a cmsg on read */
#define TCP_CM_INQ TCP_INQ
#define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */ +#define TCP_AUTHOPT 38 /* TCP Authentication Option (RFC5925) */ +#define TCP_AUTHOPT_KEY 39 /* TCP Authentication Option Key (RFC5925) */
#define TCP_REPAIR_ON 1 #define TCP_REPAIR_OFF 0 #define TCP_REPAIR_OFF_NO_WP -1 /* Turn off without window probes */ @@ -340,10 +342,89 @@ struct tcp_diag_md5sig { __u16 tcpm_keylen; __be32 tcpm_addr[4]; __u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN]; };
+/** + * enum tcp_authopt_flag - flags for `tcp_authopt.flags` + */ +enum tcp_authopt_flag { + /** + * @TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED: + * Configure behavior of segments with TCP-AO coming from hosts for which no + * key is configured. The default recommended by RFC is to silently accept + * such connections. + */ + TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED = (1 << 2), +}; + +/** + * struct tcp_authopt - Per-socket options related to TCP Authentication Option + */ +struct tcp_authopt { + /** @flags: Combination of &enum tcp_authopt_flag */ + __u32 flags; +}; + +/** + * enum tcp_authopt_key_flag - flags for `tcp_authopt.flags` + * + * @TCP_AUTHOPT_KEY_DEL: Delete the key and ignore non-id fields + * @TCP_AUTHOPT_KEY_EXCLUDE_OPTS: Exclude TCP options from signature + * @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr` + */ +enum tcp_authopt_key_flag { + TCP_AUTHOPT_KEY_DEL = (1 << 0), + TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1), + TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2), +}; + +/** + * enum tcp_authopt_alg - Algorithms for TCP Authentication Option + */ +enum tcp_authopt_alg { + /** @TCP_AUTHOPT_ALG_HMAC_SHA_1_96: HMAC-SHA-1-96 as described in RFC5926 */ + TCP_AUTHOPT_ALG_HMAC_SHA_1_96 = 1, + /** @TCP_AUTHOPT_ALG_AES_128_CMAC_96: AES-128-CMAC-96 as described in RFC5926 */ + TCP_AUTHOPT_ALG_AES_128_CMAC_96 = 2, +}; + +/* for TCP_AUTHOPT_KEY socket option */ +#define TCP_AUTHOPT_MAXKEYLEN 80 + +/** + * struct tcp_authopt_key - TCP Authentication KEY + * + * Key are identified by the combination of: + * - send_id + * - recv_id + * - addr (iff TCP_AUTHOPT_KEY_ADDR_BIND) + * + * RFC5925 requires that key ids must not overlap for the same TCP connection. + * This is not enforced by linux. + */ +struct tcp_authopt_key { + /** @flags: Combination of &enum tcp_authopt_key_flag */ + __u32 flags; + /** @send_id: keyid value for send */ + __u8 send_id; + /** @recv_id: keyid value for receive */ + __u8 recv_id; + /** @alg: One of &enum tcp_authopt_alg */ + __u8 alg; + /** @keylen: Length of the key buffer */ + __u8 keylen; + /** @key: Secret key */ + __u8 key[TCP_AUTHOPT_MAXKEYLEN]; + /** + * @addr: Key is only valid for this address + * + * Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set + */ + struct __kernel_sockaddr_storage addr; +}; + /* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index e983bb0c5012..75f7e3c75ea6 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -739,5 +739,19 @@ config TCP_MD5SIG RFC2385 specifies a method of giving MD5 protection to TCP sessions. Its main (only?) use is to protect BGP sessions between core routers on the Internet.
If unsure, say N. + +config TCP_AUTHOPT + bool "TCP: Authentication Option support (RFC5925)" + select CRYPTO + select CRYPTO_SHA1 + select CRYPTO_HMAC + select CRYPTO_AES + select CRYPTO_CMAC + help + RFC5925 specifies a new method of giving protection to TCP sessions. + Its intended use is to protect BGP sessions between core routers + on the Internet. It obsoletes TCP MD5 (RFC2385) but is incompatible. + + If unsure, say N. diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index bbdd9c44f14e..d336f32ce177 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -59,10 +59,11 @@ obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o +obj-$(CONFIG_TCP_AUTHOPT) += tcp_authopt.o obj-$(CONFIG_NET_SOCK_MSG) += tcp_bpf.o obj-$(CONFIG_BPF_SYSCALL) += udp_bpf.o obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 306b94dedc8d..6a0357cf05b5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -270,10 +270,11 @@
#include <net/icmp.h> #include <net/inet_common.h> #include <net/tcp.h> #include <net/mptcp.h> +#include <net/tcp_authopt.h> #include <net/xfrm.h> #include <net/ip.h> #include <net/sock.h>
#include <linux/uaccess.h> @@ -3703,10 +3704,18 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, #ifdef CONFIG_TCP_MD5SIG case TCP_MD5SIG: case TCP_MD5SIG_EXT: err = tp->af_specific->md5_parse(sk, optname, optval, optlen); break; +#endif +#ifdef CONFIG_TCP_AUTHOPT + case TCP_AUTHOPT: + err = tcp_set_authopt(sk, optval, optlen); + break; + case TCP_AUTHOPT_KEY: + err = tcp_set_authopt_key(sk, optval, optlen); + break; #endif case TCP_USER_TIMEOUT: /* Cap the max time in ms TCP will retry or probe the window * before giving up and aborting (ETIMEDOUT) a connection. */ @@ -4354,10 +4363,33 @@ static int do_tcp_getsockopt(struct sock *sk, int level, if (!err && copy_to_user(optval, &zc, len)) err = -EFAULT; return err; } #endif +#ifdef CONFIG_TCP_AUTHOPT + case TCP_AUTHOPT: { + struct tcp_authopt info; + int err; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = tcp_get_authopt_val(sk, &info); + release_sock(sk); + + if (err) + return err; + len = min_t(unsigned int, len, sizeof(info)); + if (put_user(len, optlen)) + return -EFAULT; + if (copy_to_user(optval, &info, len)) + return -EFAULT; + return 0; + } +#endif + default: return -ENOPROTOOPT; }
if (put_user(len, optlen)) diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..d38e9c89c89d --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include <net/tcp_authopt.h> +#include <net/ipv6.h> +#include <net/tcp.h> +#include <linux/kref.h> + +/* This is enabled when first struct tcp_authopt_info is allocated and never released */ +DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); +EXPORT_SYMBOL(tcp_authopt_needed_key); + +static inline struct netns_tcp_authopt *sock_net_tcp_authopt(const struct sock *sk) +{ + return &sock_net(sk)->tcp_authopt; +} + +static void tcp_authopt_key_release_kref(struct kref *ref) +{ + struct tcp_authopt_key_info *key = container_of(ref, struct tcp_authopt_key_info, ref); + + kfree_rcu(key, rcu); +} + +static void tcp_authopt_key_put(struct tcp_authopt_key_info *key) +{ + if (key) + kref_put(&key->ref, tcp_authopt_key_release_kref); +} + +static void tcp_authopt_key_del(struct netns_tcp_authopt *net, + struct tcp_authopt_key_info *key) +{ + lockdep_assert_held(&net->mutex); + hlist_del_rcu(&key->node); + key->flags |= TCP_AUTHOPT_KEY_DEL; + kref_put(&key->ref, tcp_authopt_key_release_kref); +} + +/* Free info and keys. + * Don't touch tp->authopt_info, it might not even be assigned yes. + */ +void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info) +{ + kfree_rcu(info, rcu); +} + +/* Free everything and clear tcp_sock.authopt_info to NULL */ +void tcp_authopt_clear(struct sock *sk) +{ + struct tcp_authopt_info *info; + + info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk)); + if (info) { + tcp_authopt_free(sk, info); + tcp_sk(sk)->authopt_info = NULL; + } +} + +/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1, + struct sockaddr_storage *a2) +{ + if (a1->ss_family != a2->ss_family) + return false; + if (a1->ss_family == AF_INET && + (((struct sockaddr_in *)a1)->sin_addr.s_addr != + ((struct sockaddr_in *)a2)->sin_addr.s_addr)) + return false; + if (a1->ss_family == AF_INET6 && + !ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr, + &((struct sockaddr_in6 *)a2)->sin6_addr)) + return false; + return true; +} + +static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info, + struct tcp_authopt_key *key) +{ + if (info->send_id != key->send_id) + return false; + if (info->recv_id != key->recv_id) + return false; + if ((info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) != (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND)) + return false; + if (info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) + if (!ipvx_addr_match(&info->addr, &key->addr)) + return false; + + return true; +} + +static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct sock *sk, + struct netns_tcp_authopt *net, + struct tcp_authopt_key *ukey) +{ + struct tcp_authopt_key_info *key_info; + + hlist_for_each_entry_rcu(key_info, &net->head, node, lockdep_is_held(&net->mutex)) + if (tcp_authopt_key_match_exact(key_info, ukey)) + return key_info; + + return NULL; +} + +static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct tcp_authopt_info *info; + + info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); + if (info) + return info; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + /* Never released: */ + static_branch_inc(&tcp_authopt_needed_key); + sk_gso_disable(sk); + rcu_assign_pointer(tp->authopt_info, info); + + return info; +} + +#define TCP_AUTHOPT_KNOWN_FLAGS ( \ + TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED) + +/* Like copy_from_sockptr except tolerate different optlen for compatibility reasons + * + * If the src is shorter then it's from an old userspace and the rest of dst is + * filled with zeros. + * + * If the dst is shorter then src is from a newer userspace and we only accept + * if the rest of the option is all zeros. + * + * This allows sockopts to grow as long as for new fields zeros has no effect. + */ +static int _copy_from_sockptr_tolerant(u8 *dst, + unsigned int dstlen, + sockptr_t src, + unsigned int srclen) +{ + int err; + + /* If userspace optlen is too short fill the rest with zeros */ + if (srclen > dstlen) { + if (sockptr_is_kernel(src)) + return -EINVAL; + err = check_zeroed_user(src.user + dstlen, srclen - dstlen); + if (err < 0) + return err; + if (err == 0) + return -EINVAL; + } + err = copy_from_sockptr(dst, src, min(srclen, dstlen)); + if (err) + return err; + if (srclen < dstlen) + memset(dst + srclen, 0, dstlen - srclen); + + return err; +} + +int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) +{ + struct tcp_authopt opt; + struct tcp_authopt_info *info; + int err; + + sock_owned_by_me(sk); + + err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); + if (err) + return err; + + if (opt.flags & ~TCP_AUTHOPT_KNOWN_FLAGS) + return -EINVAL; + + info = __tcp_authopt_info_get_or_create(sk); + if (IS_ERR(info)) + return PTR_ERR(info); + + info->flags = opt.flags & TCP_AUTHOPT_KNOWN_FLAGS; + + return 0; +} + +int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct tcp_authopt_info *info; + + memset(opt, 0, sizeof(*opt)); + sock_owned_by_me(sk); + + info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); + if (!info) + return -ENOENT; + + opt->flags = info->flags & TCP_AUTHOPT_KNOWN_FLAGS; + + return 0; +} + +#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \ + TCP_AUTHOPT_KEY_DEL | \ + TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \ + TCP_AUTHOPT_KEY_ADDR_BIND) + +int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{ + struct tcp_authopt_key opt; + struct tcp_authopt_info *info; + struct tcp_authopt_key_info *key_info, *old_key_info; + struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); + int err; + + sock_owned_by_me(sk); + if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) + return -EPERM; + + err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); + if (err) + return err; + + if (opt.flags & ~TCP_AUTHOPT_KEY_KNOWN_FLAGS) + return -EINVAL; + + if (opt.keylen > TCP_AUTHOPT_MAXKEYLEN) + return -EINVAL; + + /* Delete is a special case: */ + if (opt.flags & TCP_AUTHOPT_KEY_DEL) { + mutex_lock(&net->mutex); + key_info = tcp_authopt_key_lookup_exact(sk, net, &opt); + if (key_info) { + tcp_authopt_key_del(net, key_info); + err = 0; + } else { + err = -ENOENT; + } + mutex_unlock(&net->mutex); + return err; + } + + /* check key family */ + if (opt.flags & TCP_AUTHOPT_KEY_ADDR_BIND) { + if (sk->sk_family != opt.addr.ss_family) + return -EINVAL; + } + + /* Initialize tcp_authopt_info if not already set */ + info = __tcp_authopt_info_get_or_create(sk); + if (IS_ERR(info)) + return PTR_ERR(info); + + key_info = kmalloc(sizeof(*key_info), GFP_KERNEL | __GFP_ZERO); + if (!key_info) + return -ENOMEM; + mutex_lock(&net->mutex); + kref_init(&key_info->ref); + /* If an old key exists with exact ID then remove and replace. + * RCU-protected readers might observe both and pick any. + */ + old_key_info = tcp_authopt_key_lookup_exact(sk, net, &opt); + if (old_key_info) + tcp_authopt_key_del(net, old_key_info); + key_info->flags = opt.flags & TCP_AUTHOPT_KEY_KNOWN_FLAGS; + key_info->send_id = opt.send_id; + key_info->recv_id = opt.recv_id; + key_info->alg_id = opt.alg; + key_info->keylen = opt.keylen; + memcpy(key_info->key, opt.key, opt.keylen); + memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); + hlist_add_head_rcu(&key_info->node, &net->head); + mutex_unlock(&net->mutex); + + return 0; +} + +static int tcp_authopt_init_net(struct net *full_net) +{ + struct netns_tcp_authopt *net = &full_net->tcp_authopt; + + mutex_init(&net->mutex); + INIT_HLIST_HEAD(&net->head); + + return 0; +} + +static void tcp_authopt_exit_net(struct net *full_net) +{ + struct netns_tcp_authopt *net = &full_net->tcp_authopt; + struct tcp_authopt_key_info *key; + struct hlist_node *n; + + mutex_lock(&net->mutex); + + hlist_for_each_entry_safe(key, n, &net->head, node) { + hlist_del_rcu(&key->node); + tcp_authopt_key_put(key); + } + + mutex_unlock(&net->mutex); +} + +static struct pernet_operations net_ops = { + .init = tcp_authopt_init_net, + .exit = tcp_authopt_exit_net, +}; + +static int __init tcp_authopt_init(void) +{ + return register_pernet_subsys(&net_ops); +} +late_initcall(tcp_authopt_init); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 01b31f5c7aba..f6d1dba31ca4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -60,10 +60,11 @@
#include <net/net_namespace.h> #include <net/icmp.h> #include <net/inet_hashtables.h> #include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/transp_v6.h> #include <net/ipv6.h> #include <net/inet_common.h> #include <net/timewait_sock.h> #include <net/xfrm.h> @@ -2267,10 +2268,11 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_clear_md5_list(sk); kfree_rcu(rcu_dereference_protected(tp->md5sig_info, 1), rcu); tp->md5sig_info = NULL; } #endif + tcp_authopt_clear(sk);
/* Clean up a referenced TCP bind bucket. */ if (inet_csk(sk)->icsk_bind_hash) inet_put_port(sk);
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This commit adds support to add and remove keys but does not use them further.
Similar to tcp md5 a single pointer to a struct tcp_authopt_info* struct is added to struct tcp_sock, this avoids increasing memory usage. The data structures related to tcp_authopt are initialized on setsockopt and only freed on socket close.
Thanks Leonard.
Small points from my side, please find them attached.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
include/linux/tcp.h | 9 + include/net/net_namespace.h | 4 + include/net/netns/tcp_authopt.h | 12 ++ include/net/tcp.h | 1 + include/net/tcp_authopt.h | 70 +++++++ include/uapi/linux/tcp.h | 81 ++++++++ net/ipv4/Kconfig | 14 ++ net/ipv4/Makefile | 1 + net/ipv4/tcp.c | 32 ++++ net/ipv4/tcp_authopt.c | 317 ++++++++++++++++++++++++++++++++ net/ipv4/tcp_ipv4.c | 2 + 11 files changed, 543 insertions(+) create mode 100644 include/net/netns/tcp_authopt.h create mode 100644 include/net/tcp_authopt.h create mode 100644 net/ipv4/tcp_authopt.c
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a9fbe22732c3..551942883f06 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -170,10 +170,12 @@ struct tcp_request_sock { static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) { return (struct tcp_request_sock *)req; }
+struct tcp_authopt_info;
struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; u16 tcp_header_len; /* Bytes of tcp header to send */ u16 gso_segs; /* Max number of segs per GSO packet */ @@ -434,10 +436,14 @@ struct tcp_sock {
/* TCP MD5 Signature Option information */ struct tcp_md5sig_info __rcu *md5sig_info; #endif
+#ifdef CONFIG_TCP_AUTHOPT
struct tcp_authopt_info __rcu *authopt_info;
+#endif
/* TCP fastopen related information */ struct tcp_fastopen_request *fastopen_req; /* fastopen_rsk points to request_sock that resulted in this big * socket. Used to retransmit SYNACKs etc. */ @@ -484,10 +490,13 @@ struct tcp_timewait_sock { int tw_ts_recent_stamp; u32 tw_tx_delay; #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *tw_md5_key; #endif +#ifdef CONFIG_TCP_AUTHOPT
struct tcp_authopt_info *tw_authopt_info;
+#endif };
static inline struct tcp_timewait_sock *tcp_twsk(const struct sock *sk) { return (struct tcp_timewait_sock *)sk; diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 8c3587d5c308..30964366951d 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -35,10 +35,11 @@ #include <net/netns/can.h> #include <net/netns/xdp.h> #include <net/netns/smc.h> #include <net/netns/bpf.h> #include <net/netns/mctp.h> +#include <net/netns/tcp_authopt.h> #include <net/net_trackers.h> #include <linux/ns_common.h> #include <linux/idr.h> #include <linux/skbuff.h> #include <linux/notifier.h> @@ -184,10 +185,13 @@ struct net { #endif struct sock *diag_nlsk; #if IS_ENABLED(CONFIG_SMC) struct netns_smc smc; #endif +#if IS_ENABLED(CONFIG_TCP_AUTHOPT)
struct netns_tcp_authopt tcp_authopt;
+#endif } __randomize_layout;
#include <linux/seq_file_net.h>
/* Init's network namespace */ diff --git a/include/net/netns/tcp_authopt.h b/include/net/netns/tcp_authopt.h new file mode 100644 index 000000000000..03b7f4e58448 --- /dev/null +++ b/include/net/netns/tcp_authopt.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __NETNS_TCP_AUTHOPT_H__ +#define __NETNS_TCP_AUTHOPT_H__
+#include <linux/mutex.h>
+struct netns_tcp_authopt {
struct hlist_head head;
struct mutex mutex;
+};
+#endif /* __NETNS_TCP_AUTHOPT_H__ */ diff --git a/include/net/tcp.h b/include/net/tcp.h index d10962b9f0d0..9955a88faf9b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -184,10 +184,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCPOPT_WINDOW 3 /* Window scaling */ #define TCPOPT_SACK_PERM 4 /* SACK Permitted */ #define TCPOPT_SACK 5 /* SACK Block */ #define TCPOPT_TIMESTAMP 8 /* Better RTT estimations/PAWS */ #define TCPOPT_MD5SIG 19 /* MD5 Signature (RFC2385) */ +#define TCPOPT_AUTHOPT 29 /* Auth Option (RFC5925) */ #define TCPOPT_MPTCP 30 /* Multipath TCP (RFC6824) */ #define TCPOPT_FASTOPEN 34 /* Fast open (RFC7413) */ #define TCPOPT_EXP 254 /* Experimental */ /* Magic number to be after the option value for sharing TCP
- experimental options. See draft-ietf-tcpm-experimental-options-00.txt
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h new file mode 100644 index 000000000000..bc2cff82830d --- /dev/null +++ b/include/net/tcp_authopt.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_TCP_AUTHOPT_H +#define _LINUX_TCP_AUTHOPT_H
+#include <uapi/linux/tcp.h> +#include <net/netns/tcp_authopt.h> +#include <linux/tcp.h>
+/**
- struct tcp_authopt_key_info - Representation of a Master Key Tuple as per RFC5925
- Key structure lifetime is protected by RCU so send/recv code needs to hold a
- single rcu_read_lock until they're done with the key.
- Global keys can be cached in sockets, this requires increasing kref.
- */
+struct tcp_authopt_key_info {
/** @node: node in &netns_tcp_authopt.head list */
struct hlist_node node;
/** @rcu: for kfree_rcu */
struct rcu_head rcu;
/** @ref: for kref_put */
struct kref ref;
/** @flags: Combination of &enum tcp_authopt_key_flag */
u32 flags;
/** @send_id: Same as &tcp_authopt_key.send_id */
u8 send_id;
/** @recv_id: Same as &tcp_authopt_key.recv_id */
u8 recv_id;
/** @alg_id: Same as &tcp_authopt_key.alg */
u8 alg_id;
/** @keylen: Same as &tcp_authopt_key.keylen */
u8 keylen;
/** @key: Same as &tcp_authopt_key.key */
u8 key[TCP_AUTHOPT_MAXKEYLEN];
/** @addr: Same as &tcp_authopt_key.addr */
struct sockaddr_storage addr;
+};
+/**
- struct tcp_authopt_info - Per-socket information regarding tcp_authopt
- This is lazy-initialized in order to avoid increasing memory usage for
- regular TCP sockets. Once created it is only destroyed on socket close.
- */
+struct tcp_authopt_info {
/** @rcu: for kfree_rcu */
struct rcu_head rcu;
/** @flags: Combination of &enum tcp_authopt_flag */
u32 flags;
/** @src_isn: Local Initial Sequence Number */
u32 src_isn;
/** @dst_isn: Remote Initial Sequence Number */
u32 dst_isn;
+};
+#ifdef CONFIG_TCP_AUTHOPT +DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed_key); +#define tcp_authopt_needed (static_branch_unlikely(&tcp_authopt_needed_key)) +void tcp_authopt_clear(struct sock *sk); +int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); +int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); +int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); +#else +static inline void tcp_authopt_clear(struct sock *sk) +{ +} +#endif
+#endif /* _LINUX_TCP_AUTHOPT_H */ diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 8fc09e8638b3..76d7be6b27f4 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -126,10 +126,12 @@ enum { #define TCP_INQ 36 /* Notify bytes available to read as a cmsg on read */
#define TCP_CM_INQ TCP_INQ
#define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */ +#define TCP_AUTHOPT 38 /* TCP Authentication Option (RFC5925) */ +#define TCP_AUTHOPT_KEY 39 /* TCP Authentication Option Key (RFC5925) */
#define TCP_REPAIR_ON 1 #define TCP_REPAIR_OFF 0 #define TCP_REPAIR_OFF_NO_WP -1 /* Turn off without window probes */ @@ -340,10 +342,89 @@ struct tcp_diag_md5sig { __u16 tcpm_keylen; __be32 tcpm_addr[4]; __u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN]; };
+/**
- enum tcp_authopt_flag - flags for `tcp_authopt.flags`
- */
+enum tcp_authopt_flag {
/**
* @TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED:
* Configure behavior of segments with TCP-AO coming from hosts for which no
* key is configured. The default recommended by RFC is to silently accept
* such connections.
*/
TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED = (1 << 2),
+};
+/**
- struct tcp_authopt - Per-socket options related to TCP Authentication Option
- */
+struct tcp_authopt {
/** @flags: Combination of &enum tcp_authopt_flag */
__u32 flags;
+};
+/**
- enum tcp_authopt_key_flag - flags for `tcp_authopt.flags`
- @TCP_AUTHOPT_KEY_DEL: Delete the key and ignore non-id fields
- @TCP_AUTHOPT_KEY_EXCLUDE_OPTS: Exclude TCP options from signature
- @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr`
- */
+enum tcp_authopt_key_flag {
TCP_AUTHOPT_KEY_DEL = (1 << 0),
TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1),
TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2),
+};
+/**
- enum tcp_authopt_alg - Algorithms for TCP Authentication Option
- */
+enum tcp_authopt_alg {
/** @TCP_AUTHOPT_ALG_HMAC_SHA_1_96: HMAC-SHA-1-96 as described in RFC5926 */
TCP_AUTHOPT_ALG_HMAC_SHA_1_96 = 1,
/** @TCP_AUTHOPT_ALG_AES_128_CMAC_96: AES-128-CMAC-96 as described in RFC5926 */
TCP_AUTHOPT_ALG_AES_128_CMAC_96 = 2,
+};
+/* for TCP_AUTHOPT_KEY socket option */ +#define TCP_AUTHOPT_MAXKEYLEN 80
+/**
- struct tcp_authopt_key - TCP Authentication KEY
- Key are identified by the combination of:
- send_id
- recv_id
- addr (iff TCP_AUTHOPT_KEY_ADDR_BIND)
- RFC5925 requires that key ids must not overlap for the same TCP connection.
- This is not enforced by linux.
- */
+struct tcp_authopt_key {
/** @flags: Combination of &enum tcp_authopt_key_flag */
__u32 flags;
/** @send_id: keyid value for send */
__u8 send_id;
/** @recv_id: keyid value for receive */
__u8 recv_id;
/** @alg: One of &enum tcp_authopt_alg */
__u8 alg;
/** @keylen: Length of the key buffer */
__u8 keylen;
/** @key: Secret key */
__u8 key[TCP_AUTHOPT_MAXKEYLEN];
/**
* @addr: Key is only valid for this address
*
* Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set
*/
struct __kernel_sockaddr_storage addr;
+};
/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index e983bb0c5012..75f7e3c75ea6 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -739,5 +739,19 @@ config TCP_MD5SIG RFC2385 specifies a method of giving MD5 protection to TCP sessions. Its main (only?) use is to protect BGP sessions between core routers on the Internet.
If unsure, say N.
+config TCP_AUTHOPT
bool "TCP: Authentication Option support (RFC5925)"
select CRYPTO
select CRYPTO_SHA1
select CRYPTO_HMAC
select CRYPTO_AES
select CRYPTO_CMAC
help
RFC5925 specifies a new method of giving protection to TCP sessions.
Its intended use is to protect BGP sessions between core routers
on the Internet. It obsoletes TCP MD5 (RFC2385) but is incompatible.
If unsure, say N.
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index bbdd9c44f14e..d336f32ce177 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -59,10 +59,11 @@ obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o +obj-$(CONFIG_TCP_AUTHOPT) += tcp_authopt.o obj-$(CONFIG_NET_SOCK_MSG) += tcp_bpf.o obj-$(CONFIG_BPF_SYSCALL) += udp_bpf.o obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 306b94dedc8d..6a0357cf05b5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -270,10 +270,11 @@
#include <net/icmp.h> #include <net/inet_common.h> #include <net/tcp.h> #include <net/mptcp.h> +#include <net/tcp_authopt.h> #include <net/xfrm.h> #include <net/ip.h> #include <net/sock.h>
#include <linux/uaccess.h> @@ -3703,10 +3704,18 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, #ifdef CONFIG_TCP_MD5SIG case TCP_MD5SIG: case TCP_MD5SIG_EXT: err = tp->af_specific->md5_parse(sk, optname, optval, optlen); break; +#endif +#ifdef CONFIG_TCP_AUTHOPT
case TCP_AUTHOPT:
err = tcp_set_authopt(sk, optval, optlen);
break;
case TCP_AUTHOPT_KEY:
err = tcp_set_authopt_key(sk, optval, optlen);
break;
#endif case TCP_USER_TIMEOUT: /* Cap the max time in ms TCP will retry or probe the window * before giving up and aborting (ETIMEDOUT) a connection. */ @@ -4354,10 +4363,33 @@ static int do_tcp_getsockopt(struct sock *sk, int level, if (!err && copy_to_user(optval, &zc, len)) err = -EFAULT; return err; } #endif +#ifdef CONFIG_TCP_AUTHOPT
case TCP_AUTHOPT: {
struct tcp_authopt info;
int err;
if (get_user(len, optlen))
return -EFAULT;
lock_sock(sk);
err = tcp_get_authopt_val(sk, &info);
release_sock(sk);
if (err)
return err;
len = min_t(unsigned int, len, sizeof(info));
if (put_user(len, optlen))
return -EFAULT;
if (copy_to_user(optval, &info, len))
return -EFAULT;
return 0;
}
+#endif
default: return -ENOPROTOOPT; } if (put_user(len, optlen))
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..d38e9c89c89d --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
+#include <net/tcp_authopt.h> +#include <net/ipv6.h> +#include <net/tcp.h> +#include <linux/kref.h>
+/* This is enabled when first struct tcp_authopt_info is allocated and never released */ +DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); +EXPORT_SYMBOL(tcp_authopt_needed_key);
+static inline struct netns_tcp_authopt *sock_net_tcp_authopt(const struct sock *sk) +{
return &sock_net(sk)->tcp_authopt;
+}
+static void tcp_authopt_key_release_kref(struct kref *ref) +{
struct tcp_authopt_key_info *key = container_of(ref, struct tcp_authopt_key_info, ref);
kfree_rcu(key, rcu);
+}
+static void tcp_authopt_key_put(struct tcp_authopt_key_info *key) +{
if (key)
kref_put(&key->ref, tcp_authopt_key_release_kref);
+}
+static void tcp_authopt_key_del(struct netns_tcp_authopt *net,
struct tcp_authopt_key_info *key)
+{
lockdep_assert_held(&net->mutex);
hlist_del_rcu(&key->node);
key->flags |= TCP_AUTHOPT_KEY_DEL;
kref_put(&key->ref, tcp_authopt_key_release_kref);
+}
+/* Free info and keys.
- Don't touch tp->authopt_info, it might not even be assigned yes.
- */
+void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info) +{
kfree_rcu(info, rcu);
+}
+/* Free everything and clear tcp_sock.authopt_info to NULL */ +void tcp_authopt_clear(struct sock *sk) +{
struct tcp_authopt_info *info;
info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk));
if (info) {
tcp_authopt_free(sk, info);
tcp_sk(sk)->authopt_info = NULL;
RCU rules at deletion mandate that the pointer must be cleared before the call_rcu()/kfree_rcu() call.
It is possible that current MD5 code has an issue here, let's not copy/paste it.
}
+}
+/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1,
struct sockaddr_storage *a2)
+{
if (a1->ss_family != a2->ss_family)
return false;
if (a1->ss_family == AF_INET &&
(((struct sockaddr_in *)a1)->sin_addr.s_addr !=
((struct sockaddr_in *)a2)->sin_addr.s_addr))
return false;
if (a1->ss_family == AF_INET6 &&
!ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr,
&((struct sockaddr_in6 *)a2)->sin6_addr))
return false;
return true;
+}
Always surprising to see this kind of generic helper being added in a patch.
+static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info,
struct tcp_authopt_key *key)
+{
if (info->send_id != key->send_id)
return false;
if (info->recv_id != key->recv_id)
return false;
if ((info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) != (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND))
return false;
if (info->flags & TCP_AUTHOPT_KEY_ADDR_BIND)
if (!ipvx_addr_match(&info->addr, &key->addr))
return false;
return true;
+}
+static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct sock *sk,
struct netns_tcp_authopt *net,
struct tcp_authopt_key *ukey)
+{
struct tcp_authopt_key_info *key_info;
hlist_for_each_entry_rcu(key_info, &net->head, node, lockdep_is_held(&net->mutex))
if (tcp_authopt_key_match_exact(key_info, ukey))
return key_info;
return NULL;
+}
+static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) +{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_authopt_info *info;
info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk));
if (info)
return info;
info = kzalloc(sizeof(*info), GFP_KERNEL);
if (!info)
return ERR_PTR(-ENOMEM);
/* Never released: */
static_branch_inc(&tcp_authopt_needed_key);
sk_gso_disable(sk);
rcu_assign_pointer(tp->authopt_info, info);
return info;
+}
+#define TCP_AUTHOPT_KNOWN_FLAGS ( \
TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED)
+/* Like copy_from_sockptr except tolerate different optlen for compatibility reasons
- If the src is shorter then it's from an old userspace and the rest of dst is
- filled with zeros.
- If the dst is shorter then src is from a newer userspace and we only accept
- if the rest of the option is all zeros.
- This allows sockopts to grow as long as for new fields zeros has no effect.
- */
+static int _copy_from_sockptr_tolerant(u8 *dst,
unsigned int dstlen,
sockptr_t src,
unsigned int srclen)
+{
int err;
/* If userspace optlen is too short fill the rest with zeros */
if (srclen > dstlen) {
if (sockptr_is_kernel(src))
return -EINVAL;
err = check_zeroed_user(src.user + dstlen, srclen - dstlen);
if (err < 0)
return err;
if (err == 0)
return -EINVAL;
}
err = copy_from_sockptr(dst, src, min(srclen, dstlen));
if (err)
return err;
if (srclen < dstlen)
memset(dst + srclen, 0, dstlen - srclen);
return err;
+}
+int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) +{
struct tcp_authopt opt;
struct tcp_authopt_info *info;
int err;
sock_owned_by_me(sk);
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen);
if (err)
return err;
if (opt.flags & ~TCP_AUTHOPT_KNOWN_FLAGS)
return -EINVAL;
info = __tcp_authopt_info_get_or_create(sk);
if (IS_ERR(info))
return PTR_ERR(info);
info->flags = opt.flags & TCP_AUTHOPT_KNOWN_FLAGS;
return 0;
+}
+int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) +{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_authopt_info *info;
memset(opt, 0, sizeof(*opt));
sock_owned_by_me(sk);
info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk));
Probably not a big deal, but it seems the prior sock_owned_by_me() might be redundant.
if (!info)
return -ENOENT;
opt->flags = info->flags & TCP_AUTHOPT_KNOWN_FLAGS;
return 0;
+}
+#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \
TCP_AUTHOPT_KEY_DEL | \
TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \
TCP_AUTHOPT_KEY_ADDR_BIND)
+int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{
struct tcp_authopt_key opt;
struct tcp_authopt_info *info;
struct tcp_authopt_key_info *key_info, *old_key_info;
struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk);
int err;
sock_owned_by_me(sk);
if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
return -EPERM;
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen);
if (err)
return err;
if (opt.flags & ~TCP_AUTHOPT_KEY_KNOWN_FLAGS)
return -EINVAL;
if (opt.keylen > TCP_AUTHOPT_MAXKEYLEN)
return -EINVAL;
/* Delete is a special case: */
if (opt.flags & TCP_AUTHOPT_KEY_DEL) {
mutex_lock(&net->mutex);
key_info = tcp_authopt_key_lookup_exact(sk, net, &opt);
if (key_info) {
tcp_authopt_key_del(net, key_info);
err = 0;
} else {
err = -ENOENT;
}
mutex_unlock(&net->mutex);
return err;
}
/* check key family */
if (opt.flags & TCP_AUTHOPT_KEY_ADDR_BIND) {
if (sk->sk_family != opt.addr.ss_family)
return -EINVAL;
}
/* Initialize tcp_authopt_info if not already set */
info = __tcp_authopt_info_get_or_create(sk);
if (IS_ERR(info))
return PTR_ERR(info);
key_info = kmalloc(sizeof(*key_info), GFP_KERNEL | __GFP_ZERO);
kzalloc() ?
if (!key_info)
return -ENOMEM;
mutex_lock(&net->mutex);
kref_init(&key_info->ref);
/* If an old key exists with exact ID then remove and replace.
* RCU-protected readers might observe both and pick any.
*/
old_key_info = tcp_authopt_key_lookup_exact(sk, net, &opt);
if (old_key_info)
tcp_authopt_key_del(net, old_key_info);
key_info->flags = opt.flags & TCP_AUTHOPT_KEY_KNOWN_FLAGS;
key_info->send_id = opt.send_id;
key_info->recv_id = opt.recv_id;
key_info->alg_id = opt.alg;
key_info->keylen = opt.keylen;
memcpy(key_info->key, opt.key, opt.keylen);
memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr));
hlist_add_head_rcu(&key_info->node, &net->head);
mutex_unlock(&net->mutex);
return 0;
+}
+static int tcp_authopt_init_net(struct net *full_net)
Hmmm... our convention is to use "struct net *net"
+{
struct netns_tcp_authopt *net = &full_net->tcp_authopt;
Here, you should use a different name ...
mutex_init(&net->mutex);
INIT_HLIST_HEAD(&net->head);
return 0;
+}
+static void tcp_authopt_exit_net(struct net *full_net) +{
struct netns_tcp_authopt *net = &full_net->tcp_authopt;
struct tcp_authopt_key_info *key;
struct hlist_node *n;
Same remark here. Please reserve @net for a "struct net" pointer.
mutex_lock(&net->mutex);
hlist_for_each_entry_safe(key, n, &net->head, node) {
hlist_del_rcu(&key->node);
tcp_authopt_key_put(key);
}
mutex_unlock(&net->mutex);
+}
+static struct pernet_operations net_ops = {
.init = tcp_authopt_init_net,
.exit = tcp_authopt_exit_net,
+};
+static int __init tcp_authopt_init(void) +{
return register_pernet_subsys(&net_ops);
+} +late_initcall(tcp_authopt_init); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 01b31f5c7aba..f6d1dba31ca4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -60,10 +60,11 @@
#include <net/net_namespace.h> #include <net/icmp.h> #include <net/inet_hashtables.h> #include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/transp_v6.h> #include <net/ipv6.h> #include <net/inet_common.h> #include <net/timewait_sock.h> #include <net/xfrm.h> @@ -2267,10 +2268,11 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_clear_md5_list(sk); kfree_rcu(rcu_dereference_protected(tp->md5sig_info, 1), rcu); tp->md5sig_info = NULL; } #endif
tcp_authopt_clear(sk);
Do we really own the socket lock at this point ?
/* Clean up a referenced TCP bind bucket. */ if (inet_csk(sk)->icsk_bind_hash) inet_put_port(sk);
-- 2.25.1
On 9/7/22 01:57, Eric Dumazet wrote:
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This commit adds support to add and remove keys but does not use them further.
Similar to tcp md5 a single pointer to a struct tcp_authopt_info* struct is added to struct tcp_sock, this avoids increasing memory usage. The data structures related to tcp_authopt are initialized on setsockopt and only freed on socket close.
Thanks Leonard.
Small points from my side, please find them attached.
...
+/* Free info and keys.
- Don't touch tp->authopt_info, it might not even be assigned yes.
- */
+void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info) +{
kfree_rcu(info, rcu);
+}
+/* Free everything and clear tcp_sock.authopt_info to NULL */ +void tcp_authopt_clear(struct sock *sk) +{
struct tcp_authopt_info *info;
info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk));
if (info) {
tcp_authopt_free(sk, info);
tcp_sk(sk)->authopt_info = NULL;
RCU rules at deletion mandate that the pointer must be cleared before the call_rcu()/kfree_rcu() call.
It is possible that current MD5 code has an issue here, let's not copy/paste it.
OK. Is there a need for some special form of assignment or is current plain form enough?
}
+}
+/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1,
struct sockaddr_storage *a2)
+{
if (a1->ss_family != a2->ss_family)
return false;
if (a1->ss_family == AF_INET &&
(((struct sockaddr_in *)a1)->sin_addr.s_addr !=
((struct sockaddr_in *)a2)->sin_addr.s_addr))
return false;
if (a1->ss_family == AF_INET6 &&
!ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr,
&((struct sockaddr_in6 *)a2)->sin6_addr))
return false;
return true;
+}
Always surprising to see this kind of generic helper being added in a patch.
I remember looking for an equivalent and not finding it. Many places have distinct code paths for ipv4 and ipv6 and my use of "sockaddr_storage" as ipv4/ipv6 union is uncommon.
It also wastes some memory.
+int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) +{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_authopt_info *info;
memset(opt, 0, sizeof(*opt));
sock_owned_by_me(sk);
info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk));
Probably not a big deal, but it seems the prior sock_owned_by_me() might be redundant.
The sock_owned_by_me call checks checks lockdep_sock_is_held
The rcu_dereference_check call checks lockdep_sock_is_held || rcu_read_lock_held()
This is a getsockopt so caller ensures socket locking but rcu_read_lock_held() == 0.
The sock_owned_by_me is indeed redundant because it seems very unlikely the sockopt calling conditions will be changes. It was mostly there to clarify for myself because I had probably at one time with locking warnings. I guess they can be removed.
+int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{
struct tcp_authopt_key opt;
struct tcp_authopt_info *info;
struct tcp_authopt_key_info *key_info, *old_key_info;
struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk);
int err;
sock_owned_by_me(sk);
if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
return -EPERM;
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen);
if (err)
return err;
if (opt.flags & ~TCP_AUTHOPT_KEY_KNOWN_FLAGS)
return -EINVAL;
if (opt.keylen > TCP_AUTHOPT_MAXKEYLEN)
return -EINVAL;
/* Delete is a special case: */
if (opt.flags & TCP_AUTHOPT_KEY_DEL) {
mutex_lock(&net->mutex);
key_info = tcp_authopt_key_lookup_exact(sk, net, &opt);
if (key_info) {
tcp_authopt_key_del(net, key_info);
err = 0;
} else {
err = -ENOENT;
}
mutex_unlock(&net->mutex);
return err;
}
/* check key family */
if (opt.flags & TCP_AUTHOPT_KEY_ADDR_BIND) {
if (sk->sk_family != opt.addr.ss_family)
return -EINVAL;
}
/* Initialize tcp_authopt_info if not already set */
info = __tcp_authopt_info_get_or_create(sk);
if (IS_ERR(info))
return PTR_ERR(info);
key_info = kmalloc(sizeof(*key_info), GFP_KERNEL | __GFP_ZERO);
kzalloc() ?
Yes
+static int tcp_authopt_init_net(struct net *full_net)
Hmmm... our convention is to use "struct net *net"
+{
struct netns_tcp_authopt *net = &full_net->tcp_authopt;
Here, you should use a different name ...
OK, will replace with net_ao
@@ -2267,10 +2268,11 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_clear_md5_list(sk); kfree_rcu(rcu_dereference_protected(tp->md5sig_info, 1), rcu); tp->md5sig_info = NULL; } #endif
tcp_authopt_clear(sk);
Do we really own the socket lock at this point ?
Not sure how I would tell but there is a lockdep_sock_is_held check inside tcp_authopt_clear. I also added sock_owned_by_me and there were no warnings.
On Wed, Sep 7, 2022 at 9:19 AM Leonard Crestez cdleonard@gmail.com wrote:
On 9/7/22 01:57, Eric Dumazet wrote:
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This commit adds support to add and remove keys but does not use them further.
Similar to tcp md5 a single pointer to a struct tcp_authopt_info* struct is added to struct tcp_sock, this avoids increasing memory usage. The data structures related to tcp_authopt are initialized on setsockopt and only freed on socket close.
Thanks Leonard.
Small points from my side, please find them attached.
...
+/* Free info and keys.
- Don't touch tp->authopt_info, it might not even be assigned yes.
- */
+void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info) +{
kfree_rcu(info, rcu);
+}
+/* Free everything and clear tcp_sock.authopt_info to NULL */ +void tcp_authopt_clear(struct sock *sk) +{
struct tcp_authopt_info *info;
info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk));
if (info) {
tcp_authopt_free(sk, info);
tcp_sk(sk)->authopt_info = NULL;
RCU rules at deletion mandate that the pointer must be cleared before the call_rcu()/kfree_rcu() call.
It is possible that current MD5 code has an issue here, let's not copy/paste it.
OK. Is there a need for some special form of assignment or is current plain form enough?
It is the right way (when clearing the pointer), no need for another form.
}
+}
+/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1,
struct sockaddr_storage *a2)
+{
if (a1->ss_family != a2->ss_family)
return false;
if (a1->ss_family == AF_INET &&
(((struct sockaddr_in *)a1)->sin_addr.s_addr !=
((struct sockaddr_in *)a2)->sin_addr.s_addr))
return false;
if (a1->ss_family == AF_INET6 &&
!ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr,
&((struct sockaddr_in6 *)a2)->sin6_addr))
return false;
return true;
+}
Always surprising to see this kind of generic helper being added in a patch.
I remember looking for an equivalent and not finding it. Many places have distinct code paths for ipv4 and ipv6 and my use of "sockaddr_storage" as ipv4/ipv6 union is uncommon.
inetpeer_addr_cmp() might do it (and we also could fix a bug in it it seems, at least for __tcp_get_metrics() usage :/
It also wastes some memory.
+int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) +{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_authopt_info *info;
memset(opt, 0, sizeof(*opt));
sock_owned_by_me(sk);
info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk));
Probably not a big deal, but it seems the prior sock_owned_by_me() might be redundant.
The sock_owned_by_me call checks checks lockdep_sock_is_held
The rcu_dereference_check call checks lockdep_sock_is_held || rcu_read_lock_held()
Then if you own the socket lock, no need for rcu_dereference_check()
It could be instead an rcu_dereference_protected(). This is stronger, because if your thread no longer owns the socket lock, but is inside rcu_read_lock(), we would still get a proper lockdep splat.
This is a getsockopt so caller ensures socket locking but rcu_read_lock_held() == 0.
The sock_owned_by_me is indeed redundant because it seems very unlikely the sockopt calling conditions will be changes. It was mostly there to clarify for myself because I had probably at one time with locking warnings. I guess they can be removed.
+int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{
struct tcp_authopt_key opt;
struct tcp_authopt_info *info;
struct tcp_authopt_key_info *key_info, *old_key_info;
struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk);
int err;
sock_owned_by_me(sk);
if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
return -EPERM;
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen);
if (err)
return err;
if (opt.flags & ~TCP_AUTHOPT_KEY_KNOWN_FLAGS)
return -EINVAL;
if (opt.keylen > TCP_AUTHOPT_MAXKEYLEN)
return -EINVAL;
/* Delete is a special case: */
if (opt.flags & TCP_AUTHOPT_KEY_DEL) {
mutex_lock(&net->mutex);
key_info = tcp_authopt_key_lookup_exact(sk, net, &opt);
if (key_info) {
tcp_authopt_key_del(net, key_info);
err = 0;
} else {
err = -ENOENT;
}
mutex_unlock(&net->mutex);
return err;
}
/* check key family */
if (opt.flags & TCP_AUTHOPT_KEY_ADDR_BIND) {
if (sk->sk_family != opt.addr.ss_family)
return -EINVAL;
}
/* Initialize tcp_authopt_info if not already set */
info = __tcp_authopt_info_get_or_create(sk);
if (IS_ERR(info))
return PTR_ERR(info);
key_info = kmalloc(sizeof(*key_info), GFP_KERNEL | __GFP_ZERO);
kzalloc() ?
Yes
+static int tcp_authopt_init_net(struct net *full_net)
Hmmm... our convention is to use "struct net *net"
+{
struct netns_tcp_authopt *net = &full_net->tcp_authopt;
Here, you should use a different name ...
OK, will replace with net_ao
@@ -2267,10 +2268,11 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_clear_md5_list(sk); kfree_rcu(rcu_dereference_protected(tp->md5sig_info, 1), rcu); tp->md5sig_info = NULL; } #endif
tcp_authopt_clear(sk);
Do we really own the socket lock at this point ?
Not sure how I would tell but there is a lockdep_sock_is_held check inside tcp_authopt_clear. I also added sock_owned_by_me and there were no warnings.
Ok then :)
On 9/7/22 19:28, Eric Dumazet wrote:
On Wed, Sep 7, 2022 at 9:19 AM Leonard Crestez cdleonard@gmail.com wrote:
On 9/7/22 01:57, Eric Dumazet wrote:
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This commit adds support to add and remove keys but does not use them further.
Similar to tcp md5 a single pointer to a struct tcp_authopt_info* struct is added to struct tcp_sock, this avoids increasing memory usage. The data structures related to tcp_authopt are initialized on setsockopt and only freed on socket close.
Thanks Leonard.
Small points from my side, please find them attached.
...
+/* Free info and keys.
- Don't touch tp->authopt_info, it might not even be assigned yes.
- */
+void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info) +{
kfree_rcu(info, rcu);
+}
+/* Free everything and clear tcp_sock.authopt_info to NULL */ +void tcp_authopt_clear(struct sock *sk) +{
struct tcp_authopt_info *info;
info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk));
if (info) {
tcp_authopt_free(sk, info);
tcp_sk(sk)->authopt_info = NULL;
RCU rules at deletion mandate that the pointer must be cleared before the call_rcu()/kfree_rcu() call.
It is possible that current MD5 code has an issue here, let's not copy/paste it.
OK. Is there a need for some special form of assignment or is current plain form enough?
It is the right way (when clearing the pointer), no need for another form.
OK
+/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1,
struct sockaddr_storage *a2)
+{
if (a1->ss_family != a2->ss_family)
return false;
if (a1->ss_family == AF_INET &&
(((struct sockaddr_in *)a1)->sin_addr.s_addr !=
((struct sockaddr_in *)a2)->sin_addr.s_addr))
return false;
if (a1->ss_family == AF_INET6 &&
!ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr,
&((struct sockaddr_in6 *)a2)->sin6_addr))
return false;
return true;
+}
Always surprising to see this kind of generic helper being added in a patch.
I remember looking for an equivalent and not finding it. Many places have distinct code paths for ipv4 and ipv6 and my use of "sockaddr_storage" as ipv4/ipv6 union is uncommon.
inetpeer_addr_cmp() might do it (and we also could fix a bug in it it seems, at least for __tcp_get_metrics() usage :/
That uses a different `struct inetpeer_addr` which also has some extra "vif" fields for ipv4 that I don't know about.
Everybody seems to be rolling their own ipv4/v6 union, other examples are `struct tcp_md5_addr` and `xfrm_address_t`. struct sockaddr_storage is "more standard" but also larger so maybe that's why others don't use it.
+int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) +{
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_authopt_info *info;
memset(opt, 0, sizeof(*opt));
sock_owned_by_me(sk);
info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk));
Probably not a big deal, but it seems the prior sock_owned_by_me() might be redundant.
The sock_owned_by_me call checks checks lockdep_sock_is_held
The rcu_dereference_check call checks lockdep_sock_is_held || rcu_read_lock_held()
Then if you own the socket lock, no need for rcu_dereference_check()
It could be instead an rcu_dereference_protected(). This is stronger, because if your thread no longer owns the socket lock, but is inside rcu_read_lock(), we would still get a proper lockdep splat.
Ok, I think there are several places where rcu_dereference_check is incorrectly used instead of rcu_dereference_protected.
On Mon, 2022-09-05 at 10:05 +0300, Leonard Crestez wrote: [...]
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..d38e9c89c89d --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
+#include <net/tcp_authopt.h> +#include <net/ipv6.h> +#include <net/tcp.h> +#include <linux/kref.h>
+/* This is enabled when first struct tcp_authopt_info is allocated and never released */ +DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); +EXPORT_SYMBOL(tcp_authopt_needed_key);
+static inline struct netns_tcp_authopt *sock_net_tcp_authopt(const struct sock *sk) +{
- return &sock_net(sk)->tcp_authopt;
+}
Please have a look at PW report for this series, there are a bunch of issues to be addressed, e.g. above 'static inline' should be just 'static'
+static void tcp_authopt_key_release_kref(struct kref *ref) +{
- struct tcp_authopt_key_info *key = container_of(ref, struct tcp_authopt_key_info, ref);
- kfree_rcu(key, rcu);
+}
+static void tcp_authopt_key_put(struct tcp_authopt_key_info *key) +{
- if (key)
kref_put(&key->ref, tcp_authopt_key_release_kref);
+}
+static void tcp_authopt_key_del(struct netns_tcp_authopt *net,
struct tcp_authopt_key_info *key)
+{
- lockdep_assert_held(&net->mutex);
- hlist_del_rcu(&key->node);
- key->flags |= TCP_AUTHOPT_KEY_DEL;
- kref_put(&key->ref, tcp_authopt_key_release_kref);
+}
+/* Free info and keys.
- Don't touch tp->authopt_info, it might not even be assigned yes.
- */
+void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info)
this need to be 'static'.
I'm sorry to bring the next topic this late (If already discussed, I missed that point), is possible to split this series in smaller chunks?
Cheers,
Paolo
On 9/8/22 09:35, Paolo Abeni wrote:
On Mon, 2022-09-05 at 10:05 +0300, Leonard Crestez wrote: [...]
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..d38e9c89c89d --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
+#include <net/tcp_authopt.h> +#include <net/ipv6.h> +#include <net/tcp.h> +#include <linux/kref.h>
+/* This is enabled when first struct tcp_authopt_info is allocated and never released */ +DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); +EXPORT_SYMBOL(tcp_authopt_needed_key);
+static inline struct netns_tcp_authopt *sock_net_tcp_authopt(const struct sock *sk) +{
- return &sock_net(sk)->tcp_authopt;
+}
Please have a look at PW report for this series, there are a bunch of issues to be addressed, e.g. above 'static inline' should be just 'static'
What is a "PW report"? I can't find any info about this.
+static void tcp_authopt_key_release_kref(struct kref *ref) +{
- struct tcp_authopt_key_info *key = container_of(ref, struct tcp_authopt_key_info, ref);
- kfree_rcu(key, rcu);
+}
+static void tcp_authopt_key_put(struct tcp_authopt_key_info *key) +{
- if (key)
kref_put(&key->ref, tcp_authopt_key_release_kref);
+}
+static void tcp_authopt_key_del(struct netns_tcp_authopt *net,
struct tcp_authopt_key_info *key)
+{
- lockdep_assert_held(&net->mutex);
- hlist_del_rcu(&key->node);
- key->flags |= TCP_AUTHOPT_KEY_DEL;
- kref_put(&key->ref, tcp_authopt_key_release_kref);
+}
+/* Free info and keys.
- Don't touch tp->authopt_info, it might not even be assigned yes.
- */
+void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info)
this need to be 'static'.
Tried this and it's later called from tcp_twsk_destructor.
I'm sorry to bring the next topic this late (If already discussed, I missed that point), is possible to split this series in smaller chunks?
It's already 26 patches and 3675 added lines, less that 150 lines per patch seems reasonable?
The split is already somewhat artificial, for example there are patches that "add crypto" without actually using it because then it would be too large.
Some features could be dropped for later in order to make this smaller, for example TCP_REPAIR doesn't have many usecases. Features like prefixlen, vrf binding and ipv4-mapped-ipv6 were explicitly requested by maintainers so I included them as separate patches in the main series.
-- Regards, Leonard
On 9/8/22 4:47 AM, Leonard Crestez wrote:
On 9/8/22 09:35, Paolo Abeni wrote:
On Mon, 2022-09-05 at 10:05 +0300, Leonard Crestez wrote: [...]
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..d38e9c89c89d --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
+#include <net/tcp_authopt.h> +#include <net/ipv6.h> +#include <net/tcp.h> +#include <linux/kref.h>
+/* This is enabled when first struct tcp_authopt_info is allocated and never released */ +DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); +EXPORT_SYMBOL(tcp_authopt_needed_key);
+static inline struct netns_tcp_authopt *sock_net_tcp_authopt(const struct sock *sk) +{ + return &sock_net(sk)->tcp_authopt; +}
Please have a look at PW report for this series, there are a bunch of issues to be addressed, e.g. above 'static inline' should be just 'static'
What is a "PW report"? I can't find any info about this.
patchworks: https://patchwork.kernel.org/project/netdevbpf/list/
This set: https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=1...
I'm sorry to bring the next topic this late (If already discussed, I missed that point), is possible to split this series in smaller chunks?
It's already 26 patches and 3675 added lines, less that 150 lines per patch seems reasonable?
The split is already somewhat artificial, for example there are patches that "add crypto" without actually using it because then it would be too large.
Some features could be dropped for later in order to make this smaller, for example TCP_REPAIR doesn't have many usecases. Features like prefixlen, vrf binding and ipv4-mapped-ipv6 were explicitly requested by maintainers so I included them as separate patches in the main series.
The tests could be dropped from the first set along with TCP_REPAIR and /proc/net/tcp_authopt patch. That would get it down to 21 patches. From there the refactor patches could be sent first in a separate PR that would get it down to 19. Those 19 are the core feature split into small patches; they should come in together IMHO.
The .rst documentation contains a brief description of the user interface and includes kernel-doc generated from uapi header.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/index.rst | 1 + Documentation/networking/tcp_authopt.rst | 51 ++++++++++++++++++++++++ 2 files changed, 52 insertions(+) create mode 100644 Documentation/networking/tcp_authopt.rst
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index bacadd09e570..b134037c94ec 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -102,10 +102,11 @@ Contents: strparser switchdev sysfs-tagging tc-actions-env-rules tcp-thin + tcp_authopt team timestamping tipc tproxy tuntap diff --git a/Documentation/networking/tcp_authopt.rst b/Documentation/networking/tcp_authopt.rst new file mode 100644 index 000000000000..72adb7a891ce --- /dev/null +++ b/Documentation/networking/tcp_authopt.rst @@ -0,0 +1,51 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +TCP Authentication Option +========================= + +The TCP Authentication option specified by RFC5925 replaces the TCP MD5 +Signature option. It similar in goals but not compatible in either wire formats +or ABI. + +Interface +========= + +Individual keys can be added to or removed through an TCP socket by using +TCP_AUTHOPT_KEY setsockopt and a struct tcp_authopt_key. There is no +support for reading back keys and updates always replace the old key. These +structures represent "Master Key Tuples (MKTs)" as described by the RFC. + +Per-socket options can set or read using the TCP_AUTHOPT sockopt and a struct +tcp_authopt. This is optional: doing setsockopt TCP_AUTHOPT_KEY is sufficient to +enable the feature. + +Configuration associated with TCP Authentication is global for each network +namespace, this means that all sockets for which TCP_AUTHOPT is enabled will +be affected by the same set of keys. + +Manipulating keys requires ``CAP_NET_ADMIN``. + +Key binding +----------- + +Keys can be bound to remote addresses in a way that is somewhat similar to +``TCP_MD5SIG``. By default a key matches all connections but matching criteria can +be specified as fields inside struct tcp_authopt_key together with matching +flags in tcp_authopt_key.flags. The sort of these "matching criteria" can +expand over time by increasing the size of `struct tcp_authopt_key` and adding +new flags. + + * Address binding is optional, by default keys match all addresses + * Local address is ignored, matching is done by remote address + * Ports are ignored + +RFC5925 requires that key ids do not overlap when tcp identifiers (addr/port) +overlap. This is not enforced by linux, configuring ambiguous keys will result +in packet drops and lost connections. + +ABI Reference +============= + +.. kernel-doc:: include/uapi/linux/tcp.h + :identifiers: tcp_authopt tcp_authopt_flag tcp_authopt_key tcp_authopt_key_flag tcp_authopt_alg
The crypto_shash API is used in order to compute packet signatures. The API comes with several unfortunate limitations:
1) Allocating a crypto_shash can sleep and must be done in user context. 2) Packet signatures must be computed in softirq context 3) Packet signatures use dynamic "traffic keys" which require exclusive access to crypto_shash for crypto_setkey.
The solution is to allocate one crypto_shash for each possible cpu for each algorithm at setsockopt time. The per-cpu tfm is then borrowed from softirq context, signatures are computed and the tfm is returned.
The pool for each algorithm is allocated on first use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 16 +++ net/ipv4/tcp_authopt.c | 199 +++++++++++++++++++++++++++++++++++++- 2 files changed, 214 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index bc2cff82830d..ed9995c8d486 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -4,10 +4,24 @@
#include <uapi/linux/tcp.h> #include <net/netns/tcp_authopt.h> #include <linux/tcp.h>
+/* According to RFC5925 the length of the authentication option varies based on + * the signature algorithm. Linux only implements the algorithms defined in + * RFC5926 which have a constant length of 16. + * + * This is used in stack allocation of tcp option buffers for output. It is + * shorter than the length of the MD5 option. + * + * Input packets can have authentication options of different lengths but they + * will always be flagged as invalid (since no such algorithms are supported). + */ +#define TCPOLEN_AUTHOPT_OUTPUT 16 + +struct tcp_authopt_alg_imp; + /** * struct tcp_authopt_key_info - Representation of a Master Key Tuple as per RFC5925 * * Key structure lifetime is protected by RCU so send/recv code needs to hold a * single rcu_read_lock until they're done with the key. @@ -33,10 +47,12 @@ struct tcp_authopt_key_info { u8 keylen; /** @key: Same as &tcp_authopt_key.key */ u8 key[TCP_AUTHOPT_MAXKEYLEN]; /** @addr: Same as &tcp_authopt_key.addr */ struct sockaddr_storage addr; + /** @alg: Algorithm implementation matching alg_id */ + struct tcp_authopt_alg_imp *alg; };
/** * struct tcp_authopt_info - Per-socket information regarding tcp_authopt * diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index d38e9c89c89d..005fac36760b 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -2,15 +2,201 @@
#include <net/tcp_authopt.h> #include <net/ipv6.h> #include <net/tcp.h> #include <linux/kref.h> +#include <crypto/hash.h>
/* This is enabled when first struct tcp_authopt_info is allocated and never released */ DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); EXPORT_SYMBOL(tcp_authopt_needed_key);
+/* All current algorithms have a mac length of 12 but crypto API digestsize can be larger */ +#define TCP_AUTHOPT_MAXMACBUF 20 +#define TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN 20 +#define TCP_AUTHOPT_MACLEN 12 + +struct tcp_authopt_alg_pool { + struct crypto_ahash *tfm; + struct ahash_request *req; +}; + +/* Constant data with per-algorithm information from RFC5926 + * The "KDF" and "MAC" happen to be the same for both algorithms. + */ +struct tcp_authopt_alg_imp { + /* Name of algorithm in crypto-api */ + const char *alg_name; + /* One of the TCP_AUTHOPT_ALG_* constants from uapi */ + u8 alg_id; + /* Length of traffic key */ + u8 traffic_key_len; + + /* shared crypto_ahash */ + struct mutex init_mutex; + bool init_done; + struct tcp_authopt_alg_pool __percpu *pool; +}; + +static struct tcp_authopt_alg_imp tcp_authopt_alg_list[] = { + { + .alg_id = TCP_AUTHOPT_ALG_HMAC_SHA_1_96, + .alg_name = "hmac(sha1)", + .traffic_key_len = 20, + .init_mutex = __MUTEX_INITIALIZER(tcp_authopt_alg_list[0].init_mutex), + }, + { + .alg_id = TCP_AUTHOPT_ALG_AES_128_CMAC_96, + .alg_name = "cmac(aes)", + .traffic_key_len = 16, + .init_mutex = __MUTEX_INITIALIZER(tcp_authopt_alg_list[1].init_mutex), + }, +}; + +/* get a pointer to the tcp_authopt_alg instance or NULL if id invalid */ +static inline struct tcp_authopt_alg_imp *tcp_authopt_alg_get(int alg_num) +{ + if (alg_num <= 0 || alg_num > 2) + return NULL; + return &tcp_authopt_alg_list[alg_num - 1]; +} + +static int tcp_authopt_alg_pool_init(struct tcp_authopt_alg_imp *alg, + struct tcp_authopt_alg_pool *pool) +{ + pool->tfm = crypto_alloc_ahash(alg->alg_name, 0, CRYPTO_ALG_ASYNC); + if (IS_ERR(pool->tfm)) + return PTR_ERR(pool->tfm); + + pool->req = ahash_request_alloc(pool->tfm, GFP_ATOMIC); + if (IS_ERR(pool->req)) + return PTR_ERR(pool->req); + ahash_request_set_callback(pool->req, 0, NULL, NULL); + + return 0; +} + +static void tcp_authopt_alg_pool_free(struct tcp_authopt_alg_pool *pool) +{ + if (!IS_ERR_OR_NULL(pool->req)) + ahash_request_free(pool->req); + pool->req = NULL; + if (!IS_ERR_OR_NULL(pool->tfm)) + crypto_free_ahash(pool->tfm); + pool->tfm = NULL; +} + +static void __tcp_authopt_alg_free(struct tcp_authopt_alg_imp *alg) +{ + int cpu; + struct tcp_authopt_alg_pool *pool; + + if (!alg->pool) + return; + for_each_possible_cpu(cpu) { + pool = per_cpu_ptr(alg->pool, cpu); + tcp_authopt_alg_pool_free(pool); + } + free_percpu(alg->pool); + alg->pool = NULL; +} + +static int __tcp_authopt_alg_init(struct tcp_authopt_alg_imp *alg) +{ + struct tcp_authopt_alg_pool *pool; + int cpu; + int err; + + BUILD_BUG_ON(TCP_AUTHOPT_MAXMACBUF < TCPOLEN_AUTHOPT_OUTPUT); + if (WARN_ON_ONCE(alg->traffic_key_len > TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN)) + return -ENOBUFS; + + alg->pool = alloc_percpu(struct tcp_authopt_alg_pool); + if (!alg->pool) + return -ENOMEM; + for_each_possible_cpu(cpu) { + pool = per_cpu_ptr(alg->pool, cpu); + err = tcp_authopt_alg_pool_init(alg, pool); + if (err) + goto out_err; + + pool = per_cpu_ptr(alg->pool, cpu); + /* sanity checks: */ + if (WARN_ON_ONCE(crypto_ahash_digestsize(pool->tfm) != alg->traffic_key_len)) { + err = -EINVAL; + goto out_err; + } + if (WARN_ON_ONCE(crypto_ahash_digestsize(pool->tfm) > TCP_AUTHOPT_MAXMACBUF)) { + err = -EINVAL; + goto out_err; + } + } + return 0; + +out_err: + pr_info("Failed to initialize %s\n", alg->alg_name); + __tcp_authopt_alg_free(alg); + return err; +} + +static int tcp_authopt_alg_require(struct tcp_authopt_alg_imp *alg) +{ + int err = 0; + + mutex_lock(&alg->init_mutex); + if (alg->init_done) + goto out; + err = __tcp_authopt_alg_init(alg); + if (err) + goto out; + pr_info("initialized tcp-ao algorithm %s", alg->alg_name); + alg->init_done = true; + +out: + mutex_unlock(&alg->init_mutex); + return err; +} + +static struct tcp_authopt_alg_pool *tcp_authopt_alg_get_pool(struct tcp_authopt_alg_imp *alg) +{ + local_bh_disable(); + return this_cpu_ptr(alg->pool); +} + +static void tcp_authopt_alg_put_pool(struct tcp_authopt_alg_imp *alg, + struct tcp_authopt_alg_pool *pool) +{ + WARN_ON(pool != this_cpu_ptr(alg->pool)); + local_bh_enable(); +} + +__always_unused +static struct tcp_authopt_alg_pool *tcp_authopt_get_kdf_pool(struct tcp_authopt_key_info *key) +{ + return tcp_authopt_alg_get_pool(key->alg); +} + +__always_unused +static void tcp_authopt_put_kdf_pool(struct tcp_authopt_key_info *key, + struct tcp_authopt_alg_pool *pool) +{ + return tcp_authopt_alg_put_pool(key->alg, pool); +} + +__always_unused +static struct tcp_authopt_alg_pool *tcp_authopt_get_mac_pool(struct tcp_authopt_key_info *key) +{ + return tcp_authopt_alg_get_pool(key->alg); +} + +__always_unused +static void tcp_authopt_put_mac_pool(struct tcp_authopt_key_info *key, + struct tcp_authopt_alg_pool *pool) +{ + return tcp_authopt_alg_put_pool(key->alg, pool); +} + static inline struct netns_tcp_authopt *sock_net_tcp_authopt(const struct sock *sk) { return &sock_net(sk)->tcp_authopt; }
@@ -53,11 +239,10 @@ void tcp_authopt_clear(struct sock *sk) if (info) { tcp_authopt_free(sk, info); tcp_sk(sk)->authopt_info = NULL; } } - /* checks that ipv4 or ipv6 addr matches. */ static bool ipvx_addr_match(struct sockaddr_storage *a1, struct sockaddr_storage *a2) { if (a1->ss_family != a2->ss_family) @@ -212,10 +397,11 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt_key opt; struct tcp_authopt_info *info; struct tcp_authopt_key_info *key_info, *old_key_info; struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); + struct tcp_authopt_alg_imp *alg; int err;
sock_owned_by_me(sk); if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) return -EPERM; @@ -253,10 +439,20 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) /* Initialize tcp_authopt_info if not already set */ info = __tcp_authopt_info_get_or_create(sk); if (IS_ERR(info)) return PTR_ERR(info);
+ /* check the algorithm */ + alg = tcp_authopt_alg_get(opt.alg); + if (!alg) + return -EINVAL; + if (WARN_ON_ONCE(alg->alg_id != opt.alg)) + return -EINVAL; + err = tcp_authopt_alg_require(alg); + if (err) + return err; + key_info = kmalloc(sizeof(*key_info), GFP_KERNEL | __GFP_ZERO); if (!key_info) return -ENOMEM; mutex_lock(&net->mutex); kref_init(&key_info->ref); @@ -268,10 +464,11 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) tcp_authopt_key_del(net, old_key_info); key_info->flags = opt.flags & TCP_AUTHOPT_KEY_KNOWN_FLAGS; key_info->send_id = opt.send_id; key_info->recv_id = opt.recv_id; key_info->alg_id = opt.alg; + key_info->alg = alg; key_info->keylen = opt.keylen; memcpy(key_info->key, opt.key, opt.keylen); memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); hlist_add_head_rcu(&key_info->node, &net->head); mutex_unlock(&net->mutex);
This function feeds all SKB data into an ahash and this behavior is identical between the TCP-MD5 and TCP-AO so rename and refactor.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp.h | 2 +- net/ipv4/tcp.c | 17 ++++++++++++----- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- 4 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index 9955a88faf9b..fbe18b5bf576 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1718,11 +1718,11 @@ struct tcp_md5sig_pool *tcp_get_md5sig_pool(void); static inline void tcp_put_md5sig_pool(void) { local_bh_enable(); }
-int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *, const struct sk_buff *, +int tcp_sig_hash_skb_data(struct ahash_request *, const struct sk_buff *, unsigned int header_len); int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key);
/* From tcp_fastopen.c */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6a0357cf05b5..9c362f357fbb 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4511,16 +4511,19 @@ struct tcp_md5sig_pool *tcp_get_md5sig_pool(void) local_bh_enable(); return NULL; } EXPORT_SYMBOL(tcp_get_md5sig_pool);
-int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp, +#endif /* CONFIG_TCP_MD5SIG */ + +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) + +int tcp_sig_hash_skb_data(struct ahash_request *req, const struct sk_buff *skb, unsigned int header_len) { struct scatterlist sg; const struct tcphdr *tp = tcp_hdr(skb); - struct ahash_request *req = hp->md5_req; unsigned int i; const unsigned int head_data_len = skb_headlen(skb) > header_len ? skb_headlen(skb) - header_len : 0; const struct skb_shared_info *shi = skb_shinfo(skb); struct sk_buff *frag_iter; @@ -4543,16 +4546,20 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp, if (crypto_ahash_update(req)) return 1; }
skb_walk_frags(skb, frag_iter) - if (tcp_md5_hash_skb_data(hp, frag_iter, 0)) + if (tcp_sig_hash_skb_data(req, frag_iter, 0)) return 1;
return 0; } -EXPORT_SYMBOL(tcp_md5_hash_skb_data); +EXPORT_SYMBOL(tcp_sig_hash_skb_data); + +#endif /* defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) */ + +#ifdef CONFIG_TCP_MD5SIG
int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key) { u8 keylen = READ_ONCE(key->keylen); /* paired with WRITE_ONCE() in tcp_md5_do_add */ struct scatterlist sg; @@ -4639,11 +4646,11 @@ tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, } return SKB_NOT_DROPPED_YET; } EXPORT_SYMBOL(tcp_inbound_md5_hash);
-#endif +#endif /* CONFIG_TCP_MD5SIG */
void tcp_done(struct sock *sk) { struct request_sock *req;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f6d1dba31ca4..8debbd2c2f4b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1407,11 +1407,11 @@ int tcp_v4_md5_hash_skb(char *md5_hash, const struct tcp_md5sig_key *key, if (crypto_ahash_init(req)) goto clear_hash;
if (tcp_v4_md5_hash_headers(hp, daddr, saddr, th, skb->len)) goto clear_hash; - if (tcp_md5_hash_skb_data(hp, skb, th->doff << 2)) + if (tcp_sig_hash_skb_data(hp->md5_req, skb, th->doff << 2)) goto clear_hash; if (tcp_md5_hash_key(hp, key)) goto clear_hash; ahash_request_set_crypt(req, NULL, md5_hash, 0); if (crypto_ahash_final(req)) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 35013497e407..2e6333769ea5 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -768,11 +768,11 @@ static int tcp_v6_md5_hash_skb(char *md5_hash, if (crypto_ahash_init(req)) goto clear_hash;
if (tcp_v6_md5_hash_headers(hp, daddr, saddr, th, skb->len)) goto clear_hash; - if (tcp_md5_hash_skb_data(hp, skb, th->doff << 2)) + if (tcp_sig_hash_skb_data(hp->md5_req, skb, th->doff << 2)) goto clear_hash; if (tcp_md5_hash_key(hp, key)) goto clear_hash; ahash_request_set_crypt(req, NULL, md5_hash, 0); if (crypto_ahash_final(req))
Computing tcp authopt packet signatures is a two step process:
* traffic key is computed based on tcp 4-tuple, initial sequence numbers and the secret key. * packet mac is computed based on traffic key and content of individual packets.
The traffic key could be cached for established sockets but it is not.
A single code path exists for ipv4/ipv6 and input/output. This keeps the code short but slightly slower due to lots of conditionals.
On output we read remote IP address from socket members on output, we can't use skb network header because it's computed after TCP options.
On input we read remote IP address from skb network headers, we can't use socket binding members because those are not available for SYN.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 9 + net/ipv4/tcp_authopt.c | 460 +++++++++++++++++++++++++++++++++++++- 2 files changed, 465 insertions(+), 4 deletions(-)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index ed9995c8d486..e303ef53e1a3 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -68,10 +68,19 @@ struct tcp_authopt_info { u32 src_isn; /** @dst_isn: Remote Initial Sequence Number */ u32 dst_isn; };
+/* TCP authopt as found in header */ +struct tcphdr_authopt { + u8 num; + u8 len; + u8 keyid; + u8 rnextkeyid; + u8 mac[0]; +}; + #ifdef CONFIG_TCP_AUTHOPT DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed_key); #define tcp_authopt_needed (static_branch_unlikely(&tcp_authopt_needed_key)) void tcp_authopt_clear(struct sock *sk); int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 005fac36760b..440d329b52f4 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -167,30 +167,26 @@ static void tcp_authopt_alg_put_pool(struct tcp_authopt_alg_imp *alg, { WARN_ON(pool != this_cpu_ptr(alg->pool)); local_bh_enable(); }
-__always_unused static struct tcp_authopt_alg_pool *tcp_authopt_get_kdf_pool(struct tcp_authopt_key_info *key) { return tcp_authopt_alg_get_pool(key->alg); }
-__always_unused static void tcp_authopt_put_kdf_pool(struct tcp_authopt_key_info *key, struct tcp_authopt_alg_pool *pool) { return tcp_authopt_alg_put_pool(key->alg, pool); }
-__always_unused static struct tcp_authopt_alg_pool *tcp_authopt_get_mac_pool(struct tcp_authopt_key_info *key) { return tcp_authopt_alg_get_pool(key->alg); }
-__always_unused static void tcp_authopt_put_mac_pool(struct tcp_authopt_key_info *key, struct tcp_authopt_alg_pool *pool) { return tcp_authopt_alg_put_pool(key->alg, pool); } @@ -474,10 +470,466 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) mutex_unlock(&net->mutex);
return 0; }
+static int tcp_authopt_get_isn(struct sock *sk, + struct tcp_authopt_info *info, + struct sk_buff *skb, + int input, + __be32 *sisn, + __be32 *disn) +{ + struct tcphdr *th = tcp_hdr(skb); + + /* Special cases for SYN and SYN/ACK */ + if (th->syn && !th->ack) { + *sisn = th->seq; + *disn = 0; + return 0; + } + if (th->syn && th->ack) { + *sisn = th->seq; + *disn = htonl(ntohl(th->ack_seq) - 1); + return 0; + } + + if (sk->sk_state == TCP_NEW_SYN_RECV) { + struct tcp_request_sock *rsk = (struct tcp_request_sock *)sk; + + if (WARN_ONCE(!input, "Caller passed wrong socket")) + return -EINVAL; + *sisn = htonl(rsk->rcv_isn); + *disn = htonl(rsk->snt_isn); + return 0; + } else if (sk->sk_state == TCP_LISTEN) { + /* Signature computation for non-syn packet on a listen + * socket is not possible because we lack the initial + * sequence numbers. + * + * Input segments that are not matched by any request, + * established or timewait socket will get here. These + * are not normally sent by peers. + * + * Their signature might be valid but we don't have + * enough state to determine that. TCP-MD5 can attempt + * to validate and reply with a signed RST because it + * doesn't care about ISNs. + * + * Reporting an error from signature code causes the + * packet to be discarded which is good. + */ + if (WARN_ONCE(!input, "Caller passed wrong socket")) + return -EINVAL; + *sisn = 0; + *disn = 0; + return 0; + } + if (WARN_ONCE(!info, "caller did not pass tcp_authopt_info\n")) + return -EINVAL; + /* Initial sequence numbers for ESTABLISHED connections from info */ + if (input) { + *sisn = htonl(info->dst_isn); + *disn = htonl(info->src_isn); + } else { + *sisn = htonl(info->src_isn); + *disn = htonl(info->dst_isn); + } + return 0; +} + +/* Feed one buffer into ahash + * The buffer is assumed to be DMA-able + */ +static int crypto_ahash_buf(struct ahash_request *req, u8 *buf, uint len) +{ + struct scatterlist sg; + + sg_init_one(&sg, buf, len); + ahash_request_set_crypt(req, &sg, NULL, len); + + return crypto_ahash_update(req); +} + +/* feed traffic key into ahash */ +static int tcp_authopt_ahash_traffic_key(struct tcp_authopt_alg_pool *pool, + struct sock *sk, + struct sk_buff *skb, + struct tcp_authopt_info *info, + bool input, + bool ipv6) +{ + struct tcphdr *th = tcp_hdr(skb); + int err; + __be32 sisn, disn; + __be16 digestbits = htons(crypto_ahash_digestsize(pool->tfm) * 8); + /* For ahash const data buffers don't work so ensure header is on stack */ + char traffic_key_context_header[7] = "\x01TCP-AO"; + + // RFC5926 section 3.1.1.1 + err = crypto_ahash_buf(pool->req, traffic_key_context_header, 7); + if (err) + return err; + + /* Addresses from packet on input and from sk_common on output + * This is because on output MAC is computed before prepending IP header + */ + if (input) { + if (ipv6) + err = crypto_ahash_buf(pool->req, (u8 *)&ipv6_hdr(skb)->saddr, 32); + else + err = crypto_ahash_buf(pool->req, (u8 *)&ip_hdr(skb)->saddr, 8); + if (err) + return err; + } else { + if (ipv6) { +#if IS_ENABLED(CONFIG_IPV6) + err = crypto_ahash_buf(pool->req, (u8 *)&sk->sk_v6_rcv_saddr, 16); + if (err) + return err; + err = crypto_ahash_buf(pool->req, (u8 *)&sk->sk_v6_daddr, 16); + if (err) + return err; +#else + return -EINVAL; +#endif + } else { + err = crypto_ahash_buf(pool->req, (u8 *)&sk->sk_rcv_saddr, 4); + if (err) + return err; + err = crypto_ahash_buf(pool->req, (u8 *)&sk->sk_daddr, 4); + if (err) + return err; + } + } + + /* TCP ports from header */ + err = crypto_ahash_buf(pool->req, (u8 *)&th->source, 4); + if (err) + return err; + err = tcp_authopt_get_isn(sk, info, skb, input, &sisn, &disn); + if (err) + return err; + err = crypto_ahash_buf(pool->req, (u8 *)&sisn, 4); + if (err) + return err; + err = crypto_ahash_buf(pool->req, (u8 *)&disn, 4); + if (err) + return err; + err = crypto_ahash_buf(pool->req, (u8 *)&digestbits, 2); + if (err) + return err; + + return 0; +} + +/* Convert a variable-length key to a 16-byte fixed-length key for AES-CMAC + * This is described in RFC5926 section 3.1.1.2 + */ +static int aes_setkey_derived(struct crypto_ahash *tfm, struct ahash_request *req, + u8 *key, size_t keylen) +{ + static const u8 zeros[16] = {0}; + struct scatterlist sg; + u8 derived_key[16]; + int err; + + if (WARN_ON_ONCE(crypto_ahash_digestsize(tfm) != sizeof(derived_key))) + return -EINVAL; + err = crypto_ahash_setkey(tfm, zeros, sizeof(zeros)); + if (err) + return err; + err = crypto_ahash_init(req); + if (err) + return err; + sg_init_one(&sg, key, keylen); + ahash_request_set_crypt(req, &sg, derived_key, keylen); + err = crypto_ahash_digest(req); + if (err) + return err; + return crypto_ahash_setkey(tfm, derived_key, sizeof(derived_key)); +} + +static int tcp_authopt_setkey(struct tcp_authopt_alg_pool *pool, struct tcp_authopt_key_info *key) +{ + if (key->alg_id == TCP_AUTHOPT_ALG_AES_128_CMAC_96 && key->keylen != 16) + return aes_setkey_derived(pool->tfm, pool->req, key->key, key->keylen); + else + return crypto_ahash_setkey(pool->tfm, key->key, key->keylen); +} + +static int tcp_authopt_get_traffic_key(struct sock *sk, + struct sk_buff *skb, + struct tcp_authopt_key_info *key, + struct tcp_authopt_info *info, + bool input, + bool ipv6, + u8 *traffic_key) +{ + struct tcp_authopt_alg_pool *pool; + int err; + + pool = tcp_authopt_get_kdf_pool(key); + if (IS_ERR(pool)) + return PTR_ERR(pool); + + err = tcp_authopt_setkey(pool, key); + if (err) + goto out; + err = crypto_ahash_init(pool->req); + if (err) + goto out; + + err = tcp_authopt_ahash_traffic_key(pool, sk, skb, info, input, ipv6); + if (err) + goto out; + + ahash_request_set_crypt(pool->req, NULL, traffic_key, 0); + err = crypto_ahash_final(pool->req); + if (err) + return err; + +out: + tcp_authopt_put_kdf_pool(key, pool); + return err; +} + +static int crypto_ahash_buf_zero(struct ahash_request *req, int len) +{ + u8 zeros[TCP_AUTHOPT_MACLEN] = {0}; + int buflen, err; + + /* In practice this is always called with len exactly 12. + * Even on input we drop unusual signature sizes early. + */ + while (len) { + buflen = min_t(int, len, sizeof(zeros)); + err = crypto_ahash_buf(req, zeros, buflen); + if (err) + return err; + len -= buflen; + } + + return 0; +} + +static int tcp_authopt_hash_tcp4_pseudoheader(struct tcp_authopt_alg_pool *pool, + __be32 saddr, + __be32 daddr, + int nbytes) +{ + struct tcp4_pseudohdr phdr = { + .saddr = saddr, + .daddr = daddr, + .pad = 0, + .protocol = IPPROTO_TCP, + .len = htons(nbytes) + }; + return crypto_ahash_buf(pool->req, (u8 *)&phdr, sizeof(phdr)); +} + +#if IS_ENABLED(CONFIG_IPV6) +static int tcp_authopt_hash_tcp6_pseudoheader(struct tcp_authopt_alg_pool *pool, + struct in6_addr *saddr, + struct in6_addr *daddr, + u32 plen) +{ + int err; + __be32 buf[2]; + + buf[0] = htonl(plen); + buf[1] = htonl(IPPROTO_TCP); + + err = crypto_ahash_buf(pool->req, (u8 *)saddr, sizeof(*saddr)); + if (err) + return err; + err = crypto_ahash_buf(pool->req, (u8 *)daddr, sizeof(*daddr)); + if (err) + return err; + return crypto_ahash_buf(pool->req, (u8 *)&buf, sizeof(buf)); +} +#endif + +/** Hash tcphdr options. + * + * If include_options is false then only the TCPOPT_AUTHOPT option itself is hashed + * Point to AO inside TH is passed by the caller + */ +static int tcp_authopt_hash_opts(struct tcp_authopt_alg_pool *pool, + struct tcphdr *th, + struct tcphdr_authopt *aoptr, + bool include_options) +{ + int err; + /* start of options */ + u8 *tcp_opts = (u8 *)(th + 1); + /* start of options */ + u8 *aobuf = (u8 *)aoptr; + u8 aolen = aoptr->len; + + if (WARN_ONCE(aoptr->num != TCPOPT_AUTHOPT, "Bad aoptr\n")) + return -EINVAL; + + if (include_options) { + /* end of options */ + u8 *tcp_data = ((u8 *)th) + th->doff * 4; + + err = crypto_ahash_buf(pool->req, tcp_opts, aobuf - tcp_opts + 4); + if (err) + return err; + err = crypto_ahash_buf_zero(pool->req, aolen - 4); + if (err) + return err; + err = crypto_ahash_buf(pool->req, aobuf + aolen, tcp_data - (aobuf + aolen)); + if (err) + return err; + } else { + err = crypto_ahash_buf(pool->req, aobuf, 4); + if (err) + return err; + err = crypto_ahash_buf_zero(pool->req, aolen - 4); + if (err) + return err; + } + + return 0; +} + +static int tcp_authopt_hash_packet(struct tcp_authopt_alg_pool *pool, + struct sock *sk, + struct sk_buff *skb, + struct tcphdr_authopt *aoptr, + struct tcp_authopt_info *info, + bool input, + bool ipv6, + bool include_options, + u8 *macbuf) +{ + struct tcphdr *th = tcp_hdr(skb); + int err; + + /* NOTE: SNE unimplemented */ + __be32 sne = 0; + + err = crypto_ahash_init(pool->req); + if (err) + return err; + + err = crypto_ahash_buf(pool->req, (u8 *)&sne, 4); + if (err) + return err; + + if (ipv6) { +#if IS_ENABLED(CONFIG_IPV6) + struct in6_addr *saddr; + struct in6_addr *daddr; + + if (input) { + saddr = &ipv6_hdr(skb)->saddr; + daddr = &ipv6_hdr(skb)->daddr; + } else { + saddr = &sk->sk_v6_rcv_saddr; + daddr = &sk->sk_v6_daddr; + } + err = tcp_authopt_hash_tcp6_pseudoheader(pool, saddr, daddr, skb->len); + if (err) + return err; +#else + return -EINVAL; +#endif + } else { + __be32 saddr; + __be32 daddr; + + if (input) { + saddr = ip_hdr(skb)->saddr; + daddr = ip_hdr(skb)->daddr; + } else { + saddr = sk->sk_rcv_saddr; + daddr = sk->sk_daddr; + } + err = tcp_authopt_hash_tcp4_pseudoheader(pool, saddr, daddr, skb->len); + if (err) + return err; + } + + // TCP header with checksum set to zero + { + struct tcphdr hashed_th = *th; + + hashed_th.check = 0; + err = crypto_ahash_buf(pool->req, (u8 *)&hashed_th, sizeof(hashed_th)); + if (err) + return err; + } + + // TCP options + err = tcp_authopt_hash_opts(pool, th, aoptr, include_options); + if (err) + return err; + + // Rest of SKB->data + err = tcp_sig_hash_skb_data(pool->req, skb, th->doff << 2); + if (err) + return err; + + ahash_request_set_crypt(pool->req, NULL, macbuf, 0); + return crypto_ahash_final(pool->req); +} + +/* __tcp_authopt_calc_mac - Compute packet MAC using key + * + * The macbuf output buffer must be large enough to fit the digestsize of the + * underlying transform before truncation. + * This means TCP_AUTHOPT_MAXMACBUF, not TCP_AUTHOPT_MACLEN + */ +__always_unused +static int __tcp_authopt_calc_mac(struct sock *sk, + struct sk_buff *skb, + struct tcphdr_authopt *aoptr, + struct tcp_authopt_key_info *key, + struct tcp_authopt_info *info, + bool input, + char *macbuf) +{ + struct tcp_authopt_alg_pool *mac_pool; + u8 traffic_key[TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN]; + int err; + bool ipv6 = (sk->sk_family != AF_INET); + + if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) + return -EINVAL; + + err = tcp_authopt_get_traffic_key(sk, skb, key, info, input, ipv6, traffic_key); + if (err) + return err; + + mac_pool = tcp_authopt_get_mac_pool(key); + if (IS_ERR(mac_pool)) + return PTR_ERR(mac_pool); + err = crypto_ahash_setkey(mac_pool->tfm, traffic_key, key->alg->traffic_key_len); + if (err) + goto out; + err = crypto_ahash_init(mac_pool->req); + if (err) + return err; + + err = tcp_authopt_hash_packet(mac_pool, + sk, + skb, + aoptr, + info, + input, + ipv6, + !(key->flags & TCP_AUTHOPT_KEY_EXCLUDE_OPTS), + macbuf); + +out: + tcp_authopt_put_mac_pool(key, mac_pool); + return err; +} + static int tcp_authopt_init_net(struct net *full_net) { struct netns_tcp_authopt *net = &full_net->tcp_authopt;
mutex_init(&net->mutex);
The TCP-MD5 and TCP-AO signature options must be handled together so replace the old tcp_inbound_md5_hash with tcp_inbound_sig_hash which will handle both options.
As a side effect of this change Linux will start dropping packets where both MD5 and AO are present instead of ignoring the so-far unrecognized AO option. This is a direct requirement from RFC5925 2.2
This difference can be detected remotely without ever establishing a connection and used to fingerprint linux version. This seems acceptable.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/dropreason.h | 4 ++++ include/net/tcp.h | 52 +++++++++++++++++++++++++++++----------- net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++---- net/ipv4/tcp_input.c | 39 ++++++++++++++++++++++-------- net/ipv4/tcp_ipv4.c | 12 +++++----- net/ipv6/tcp_ipv6.c | 8 +++---- 6 files changed, 115 insertions(+), 39 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h index fae9b40e54fa..c5397c24296c 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -229,10 +229,14 @@ enum skb_drop_reason { /** * @SKB_DROP_REASON_PKT_TOO_BIG: packet size is too big (maybe exceed the * MTU) */ SKB_DROP_REASON_PKT_TOO_BIG, + /** + * @SKB_DROP_REASON_TCP_BOTHAOMD5: Both AO and MD5 found in packet. + */ + SKB_DROP_REASON_TCP_BOTHAOMD5, /** * @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be * used as a real 'reason' */ SKB_DROP_REASON_MAX, diff --git a/include/net/tcp.h b/include/net/tcp.h index fbe18b5bf576..96e7e406e324 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -424,11 +424,33 @@ int tcp_mmap(struct file *file, struct socket *sock, struct vm_area_struct *vma); #endif void tcp_parse_options(const struct net *net, const struct sk_buff *skb, struct tcp_options_received *opt_rx, int estab, struct tcp_fastopen_cookie *foc); -const u8 *tcp_parse_md5sig_option(const struct tcphdr *th); +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) +enum skb_drop_reason tcp_parse_sig_options(const struct tcphdr *th, + const u8 **md5ptr, + const u8 **aoptr); +#else +static inline enum skb_drop_reason tcp_parse_sig_options(const struct tcphdr *th, + const u8 **md5ptr, + const u8 **aoptr) +{ + *aoptr = NULL; + *md5ptr = NULL; + return 0; +} +#endif +static inline const u8 *tcp_parse_md5sig_option(const struct tcphdr *th) +{ + const u8 *md5, *ao; + int ret; + + ret = tcp_parse_sig_options(th, &md5, &ao); + + return (md5 && !ao && !ret) ? md5 : NULL; +}
/* * BPF SKB-less helpers */ u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph, @@ -1685,32 +1707,19 @@ tcp_md5_do_lookup(const struct sock *sk, int l3index, if (!static_branch_unlikely(&tcp_md5_needed)) return NULL; return __tcp_md5_do_lookup(sk, l3index, addr, family); }
-enum skb_drop_reason -tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, - const void *saddr, const void *daddr, - int family, int dif, int sdif); - - #define tcp_twsk_md5_key(twsk) ((twsk)->tw_md5_key) #else static inline struct tcp_md5sig_key * tcp_md5_do_lookup(const struct sock *sk, int l3index, const union tcp_md5_addr *addr, int family) { return NULL; }
-static inline enum skb_drop_reason -tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, - const void *saddr, const void *daddr, - int family, int dif, int sdif) -{ - return SKB_NOT_DROPPED_YET; -} #define tcp_twsk_md5_key(twsk) NULL #endif
bool tcp_alloc_md5sig_pool(void);
@@ -1723,10 +1732,25 @@ static inline void tcp_put_md5sig_pool(void) int tcp_sig_hash_skb_data(struct ahash_request *, const struct sk_buff *, unsigned int header_len); int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key);
+#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) +enum skb_drop_reason +tcp_inbound_sig_hash(const struct sock *sk, const struct sk_buff *skb, + const void *saddr, const void *daddr, + int family, int dif, int sdif); +#else +static inline enum skb_drop_reason +tcp_inbound_sig_hash(const struct sock *sk, const struct sk_buff *skb, + const void *saddr, const void *daddr, + int family, int dif, int sdif) +{ + return SKB_NOT_DROPPED_YET; +} +#endif + /* From tcp_fastopen.c */ void tcp_fastopen_cache_get(struct sock *sk, u16 *mss, struct tcp_fastopen_cookie *cookie); void tcp_fastopen_cache_set(struct sock *sk, u16 mss, struct tcp_fastopen_cookie *cookie, bool syn_lost, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9c362f357fbb..d159f1b66930 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4571,24 +4571,24 @@ int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *ke return data_race(crypto_ahash_update(hp->md5_req)); } EXPORT_SYMBOL(tcp_md5_hash_key);
/* Called with rcu_read_lock() */ -enum skb_drop_reason +static enum skb_drop_reason tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, const void *saddr, const void *daddr, - int family, int dif, int sdif) + int family, int dif, int sdif, + const u8 *hash_location) { /* * This gets called for each TCP segment that arrives * so we want to be efficient. * We have 3 drop cases: * o No MD5 hash and one expected. * o MD5 hash and we're not expecting one. * o MD5 hash and its wrong. */ - const __u8 *hash_location = NULL; struct tcp_md5sig_key *hash_expected; const struct tcphdr *th = tcp_hdr(skb); struct tcp_sock *tp = tcp_sk(sk); int genhash, l3index; u8 newhash[16]; @@ -4597,11 +4597,10 @@ tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, * in an L3 domain and dif is set to the l3mdev */ l3index = sdif ? dif : 0;
hash_expected = tcp_md5_do_lookup(sk, l3index, saddr, family); - hash_location = tcp_parse_md5sig_option(th);
/* We've parsed the options - do we have a hash? */ if (!hash_expected && !hash_location) return SKB_NOT_DROPPED_YET;
@@ -4644,14 +4643,44 @@ tcp_inbound_md5_hash(const struct sock *sk, const struct sk_buff *skb, } return SKB_DROP_REASON_TCP_MD5FAILURE; } return SKB_NOT_DROPPED_YET; } -EXPORT_SYMBOL(tcp_inbound_md5_hash);
#endif /* CONFIG_TCP_MD5SIG */
+#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) + +enum skb_drop_reason +tcp_inbound_sig_hash(const struct sock *sk, const struct sk_buff *skb, + const void *saddr, const void *daddr, + int family, int dif, int sdif) +{ + /* FIXME: Restore reqsk handling */ + const u8 *md5, *ao; + enum skb_drop_reason ret; + const struct sock *parent_sk; + + if (sk->sk_state == TCP_NEW_SYN_RECV) + parent_sk = inet_reqsk(sk)->rsk_listener; + else + parent_sk = sk; + + ret = tcp_parse_sig_options(tcp_hdr(skb), &md5, &ao); + if (ret) + return ret; + +#ifdef CONFIG_TCP_MD5SIG + return tcp_inbound_md5_hash(parent_sk, skb, saddr, daddr, family, dif, sdif, md5); +#else + return SKB_NOT_DROPPED_YET; +#endif +} +EXPORT_SYMBOL(tcp_inbound_sig_hash); + +#endif /* defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) */ + void tcp_done(struct sock *sk) { struct request_sock *req;
/* We might be called with a new socket, after diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b85a9f755da4..a6b43fb954b7 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4187,43 +4187,62 @@ static bool tcp_fast_parse_options(const struct net *net, tp->rx_opt.rcv_tsecr -= tp->tsoffset;
return true; }
-#ifdef CONFIG_TCP_MD5SIG +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) /* - * Parse MD5 Signature option + * Parse MD5 and AO options + * + * md5ptr: pointer to content of MD5 option (16-byte hash) + * aoptr: pointer to start of AO option (variable length) */ -const u8 *tcp_parse_md5sig_option(const struct tcphdr *th) +enum skb_drop_reason tcp_parse_sig_options(const struct tcphdr *th, + const u8 **md5ptr, + const u8 **aoptr) { int length = (th->doff << 2) - sizeof(*th); const u8 *ptr = (const u8 *)(th + 1);
+ *md5ptr = NULL; + *aoptr = NULL; + /* If not enough data remaining, we can short cut */ - while (length >= TCPOLEN_MD5SIG) { + while (length >= 4) { int opcode = *ptr++; int opsize;
switch (opcode) { case TCPOPT_EOL: - return NULL; + goto out; case TCPOPT_NOP: length--; continue; default: opsize = *ptr++; if (opsize < 2 || opsize > length) - return NULL; - if (opcode == TCPOPT_MD5SIG) - return opsize == TCPOLEN_MD5SIG ? ptr : NULL; + goto out; + if (opcode == TCPOPT_MD5SIG && opsize == TCPOLEN_MD5SIG) + *md5ptr = ptr; + if (opcode == TCPOPT_AUTHOPT) + *aoptr = ptr - 2; } ptr += opsize - 2; length -= opsize; } - return NULL; + +out: + /* RFC5925 2.2: An endpoint MUST NOT use TCP-AO for the same connection + * in which TCP MD5 is used. When both options appear, TCP MUST silently + * discard the segment. + */ + if (*md5ptr && *aoptr) + return SKB_DROP_REASON_TCP_BOTHAOMD5; + + return SKB_NOT_DROPPED_YET; } -EXPORT_SYMBOL(tcp_parse_md5sig_option); +EXPORT_SYMBOL(tcp_parse_sig_options); #endif
/* Sorry, PAWS as specified is broken wrt. pure-ACKs -DaveM * * It is not fatal. If this ACK does _not_ change critical state (seqs, window) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8debbd2c2f4b..05939e696dd6 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1983,17 +1983,17 @@ int tcp_v4_rcv(struct sk_buff *skb) if (sk->sk_state == TCP_NEW_SYN_RECV) { struct request_sock *req = inet_reqsk(sk); bool req_stolen = false; struct sock *nsk;
- sk = req->rsk_listener; - if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) + if (!xfrm4_policy_check(req->rsk_listener, XFRM_POLICY_IN, skb)) drop_reason = SKB_DROP_REASON_XFRM_POLICY; else - drop_reason = tcp_inbound_md5_hash(sk, skb, - &iph->saddr, &iph->daddr, - AF_INET, dif, sdif); + drop_reason = tcp_inbound_sig_hash(sk, skb, + &iph->saddr, &iph->daddr, + AF_INET, dif, sdif); + sk = req->rsk_listener; if (unlikely(drop_reason)) { sk_drops_add(sk, skb); reqsk_put(req); goto discard_it; } @@ -2065,11 +2065,11 @@ int tcp_v4_rcv(struct sk_buff *skb) if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) { drop_reason = SKB_DROP_REASON_XFRM_POLICY; goto discard_and_relse; }
- drop_reason = tcp_inbound_md5_hash(sk, skb, &iph->saddr, + drop_reason = tcp_inbound_sig_hash(sk, skb, &iph->saddr, &iph->daddr, AF_INET, dif, sdif); if (drop_reason) goto discard_and_relse;
nf_reset_ct(skb); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 2e6333769ea5..8969aee822d5 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1651,14 +1651,14 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) if (sk->sk_state == TCP_NEW_SYN_RECV) { struct request_sock *req = inet_reqsk(sk); bool req_stolen = false; struct sock *nsk;
- sk = req->rsk_listener; - drop_reason = tcp_inbound_md5_hash(sk, skb, + drop_reason = tcp_inbound_sig_hash(sk, skb, &hdr->saddr, &hdr->daddr, AF_INET6, dif, sdif); + sk = req->rsk_listener; if (drop_reason) { sk_drops_add(sk, skb); reqsk_put(req); goto discard_it; } @@ -1726,12 +1726,12 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { drop_reason = SKB_DROP_REASON_XFRM_POLICY; goto discard_and_relse; }
- drop_reason = tcp_inbound_md5_hash(sk, skb, &hdr->saddr, &hdr->daddr, - AF_INET6, dif, sdif); + drop_reason = tcp_inbound_sig_hash(sk, skb, &hdr->saddr, + &hdr->daddr, AF_INET6, dif, sdif); if (drop_reason) goto discard_and_relse;
if (tcp_filter(sk, skb)) { drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
The tcp_authopt features exposes a minimal interface to the rest of the TCP stack. Only a few functions are exposed and if the feature is disabled they return neutral values, avoiding ifdefs in the rest of the code. This approach is different from MD5.
Add calls into tcp authopt from send, receive, accept, close code.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/dropreason.h | 12 ++ include/net/tcp_authopt.h | 85 +++++++++++ include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c | 1 + net/ipv4/tcp.c | 17 +++ net/ipv4/tcp_authopt.c | 298 +++++++++++++++++++++++++++++++++++++- net/ipv4/tcp_input.c | 3 + net/ipv4/tcp_minisocks.c | 12 ++ net/ipv4/tcp_output.c | 85 ++++++++++- 9 files changed, 511 insertions(+), 3 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h index c5397c24296c..d5dd92affde8 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -233,10 +233,22 @@ enum skb_drop_reason { SKB_DROP_REASON_PKT_TOO_BIG, /** * @SKB_DROP_REASON_TCP_BOTHAOMD5: Both AO and MD5 found in packet. */ SKB_DROP_REASON_TCP_BOTHAOMD5, + /** + * @SKB_DROP_REASON_TCP_AONOTFOUND: No AO signature and one expected. + */ + SKB_DROP_REASON_TCP_AONOTFOUND, + /** + * @SKB_DROP_REASON_TCP_AOUNEXPECTED: AO hash and we're not expecting + */ + SKB_DROP_REASON_TCP_AOUNEXPECTED, + /** + * @SKB_DROP_REASON_TCP_AOFAILURE: AO hash incorrect + */ + SKB_DROP_REASON_TCP_AOFAILURE, /** * @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be * used as a real 'reason' */ SKB_DROP_REASON_MAX, diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index e303ef53e1a3..7ad34a6987ec 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -80,16 +80,101 @@ struct tcphdr_authopt { };
#ifdef CONFIG_TCP_AUTHOPT DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed_key); #define tcp_authopt_needed (static_branch_unlikely(&tcp_authopt_needed_key)) +void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info); void tcp_authopt_clear(struct sock *sk); int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); +struct tcp_authopt_key_info *__tcp_authopt_select_key( + const struct sock *sk, + struct tcp_authopt_info *info, + const struct sock *addr_sk, + u8 *rnextkeyid); +static inline struct tcp_authopt_key_info *tcp_authopt_select_key( + const struct sock *sk, + const struct sock *addr_sk, + struct tcp_authopt_info **info, + u8 *rnextkeyid) +{ + if (tcp_authopt_needed) { + *info = rcu_dereference(tcp_sk(sk)->authopt_info); + + if (*info) + return __tcp_authopt_select_key(sk, *info, addr_sk, rnextkeyid); + } + return NULL; +} +int tcp_authopt_hash( + char *hash_location, + struct tcp_authopt_key_info *key, + struct tcp_authopt_info *info, + struct sock *sk, struct sk_buff *skb); +int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct request_sock *req); +static inline int tcp_authopt_openreq( + struct sock *newsk, + const struct sock *oldsk, + struct request_sock *req) +{ + if (!rcu_dereference(tcp_sk(oldsk)->authopt_info)) + return 0; + else + return __tcp_authopt_openreq(newsk, oldsk, req); +} +void __tcp_authopt_finish_connect(struct sock *sk, struct sk_buff *skb, + struct tcp_authopt_info *info); +static inline void tcp_authopt_finish_connect(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_authopt_info *info; + + if (skb && tcp_authopt_needed) { + info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, + lockdep_sock_is_held(sk)); + + if (info) + __tcp_authopt_finish_connect(sk, skb, info); + } +} +static inline void tcp_authopt_time_wait( + struct tcp_timewait_sock *tcptw, + struct tcp_sock *tp) +{ + if (tcp_authopt_needed) { + /* Transfer ownership of authopt_info to the twsk + * This requires no other users of the origin sock. + */ + tcptw->tw_authopt_info = rcu_dereference_protected( + tp->authopt_info, + lockdep_sock_is_held((struct sock *)tp)); + rcu_assign_pointer(tp->authopt_info, NULL); + } else { + tcptw->tw_authopt_info = NULL; + } +} +int __tcp_authopt_inbound_check( + struct sock *sk, + struct sk_buff *skb, + struct tcp_authopt_info *info, + const u8 *opt); #else static inline void tcp_authopt_clear(struct sock *sk) { } +static inline int tcp_authopt_openreq(struct sock *newsk, + const struct sock *oldsk, + struct request_sock *req) +{ + return 0; +} +static inline void tcp_authopt_finish_connect(struct sock *sk, struct sk_buff *skb) +{ +} +static inline void tcp_authopt_time_wait( + struct tcp_timewait_sock *tcptw, + struct tcp_sock *tp) +{ +} #endif
#endif /* _LINUX_TCP_AUTHOPT_H */ diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 4d7470036a8b..ae2738a7992b 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -290,10 +290,11 @@ enum LINUX_MIB_TCPDUPLICATEDATAREHASH, /* TCPDuplicateDataRehash */ LINUX_MIB_TCPDSACKRECVSEGS, /* TCPDSACKRecvSegs */ LINUX_MIB_TCPDSACKIGNOREDDUBIOUS, /* TCPDSACKIgnoredDubious */ LINUX_MIB_TCPMIGRATEREQSUCCESS, /* TCPMigrateReqSuccess */ LINUX_MIB_TCPMIGRATEREQFAILURE, /* TCPMigrateReqFailure */ + LINUX_MIB_TCPAUTHOPTFAILURE, /* TCPAuthOptFailure */ __LINUX_MIB_MAX };
/* linux Xfrm mib definitions */ enum diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 0088a4c64d77..e48c7245c571 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -295,10 +295,11 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TcpDuplicateDataRehash", LINUX_MIB_TCPDUPLICATEDATAREHASH), SNMP_MIB_ITEM("TCPDSACKRecvSegs", LINUX_MIB_TCPDSACKRECVSEGS), SNMP_MIB_ITEM("TCPDSACKIgnoredDubious", LINUX_MIB_TCPDSACKIGNOREDDUBIOUS), SNMP_MIB_ITEM("TCPMigrateReqSuccess", LINUX_MIB_TCPMIGRATEREQSUCCESS), SNMP_MIB_ITEM("TCPMigrateReqFailure", LINUX_MIB_TCPMIGRATEREQFAILURE), + SNMP_MIB_ITEM("TCPAuthOptFailure", LINUX_MIB_TCPAUTHOPTFAILURE), SNMP_MIB_SENTINEL };
static void icmpmsg_put_line(struct seq_file *seq, unsigned long *vals, unsigned short *type, int count) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index d159f1b66930..dd31e78bd22d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4667,10 +4667,27 @@ tcp_inbound_sig_hash(const struct sock *sk, const struct sk_buff *skb,
ret = tcp_parse_sig_options(tcp_hdr(skb), &md5, &ao); if (ret) return ret;
+#if defined(CONFIG_TCP_AUTHOPT) + if (tcp_authopt_needed) { + struct tcp_authopt_info *info = rcu_dereference(tcp_sk(parent_sk)->authopt_info); + int aoret; + + if (info) { + aoret = __tcp_authopt_inbound_check((struct sock *)sk, + (struct sk_buff *)skb, + info, ao); + /* Don't do MD5 lookup if AO found */ + if (aoret == 1) + return SKB_NOT_DROPPED_YET; + if (aoret < 0) + return -aoret; + } + } +#endif #ifdef CONFIG_TCP_MD5SIG return tcp_inbound_md5_hash(parent_sk, skb, saddr, daddr, family, dif, sdif, md5); #else return SKB_NOT_DROPPED_YET; #endif diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 440d329b52f4..4f7cbe1e17f3 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -268,10 +268,57 @@ static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info, return false;
return true; }
+static bool tcp_authopt_key_match_skb_addr(struct tcp_authopt_key_info *key, + struct sk_buff *skb) +{ + u16 keyaf = key->addr.ss_family; + struct iphdr *iph = (struct iphdr *)skb_network_header(skb); + + if (keyaf == AF_INET && iph->version == 4) { + struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr; + + return iph->saddr == key_addr->sin_addr.s_addr; + } else if (keyaf == AF_INET6 && iph->version == 6) { + struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb); + struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr; + + return ipv6_addr_equal(&ip6h->saddr, &key_addr->sin6_addr); + } + + /* This actually happens with ipv6-mapped-ipv4-addresses + * IPv6 listen sockets will be asked to validate ipv4 packets. + */ + return false; +} + +static bool tcp_authopt_key_match_sk_addr(struct tcp_authopt_key_info *key, + const struct sock *addr_sk) +{ + u16 keyaf = key->addr.ss_family; + + /* This probably can't happen even with ipv4-mapped-ipv6 */ + if (keyaf != addr_sk->sk_family) + return false; + + if (keyaf == AF_INET) { + struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr; + + return addr_sk->sk_daddr == key_addr->sin_addr.s_addr; +#if IS_ENABLED(CONFIG_IPV6) + } else if (keyaf == AF_INET6) { + struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr; + + return ipv6_addr_equal(&addr_sk->sk_v6_daddr, &key_addr->sin6_addr); +#endif + } + + return false; +} + static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct sock *sk, struct netns_tcp_authopt *net, struct tcp_authopt_key *ukey) { struct tcp_authopt_key_info *key_info; @@ -281,10 +328,59 @@ static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct so return key_info;
return NULL; }
+/** + * tcp_authopt_lookup_send - lookup key for sending + * + * @net: Per-namespace information containing keys + * @addr_sk: Socket used for destination address lookup + * + * If anykey is false then authentication is not required for peer. + * + * If anykey is true but no key was found then all our keys must be expired and sending should fail. + */ +static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_authopt *net, + const struct sock *addr_sk) +{ + struct tcp_authopt_key_info *result = NULL; + struct tcp_authopt_key_info *key; + + hlist_for_each_entry_rcu(key, &net->head, node, 0) { + if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) + if (!tcp_authopt_key_match_sk_addr(key, addr_sk)) + continue; + if (result && net_ratelimit()) + pr_warn("ambiguous tcp authentication keys configured for send\n"); + result = key; + } + + return result; +} + +/** + * __tcp_authopt_select_key - select key for sending + * + * @sk: socket + * @info: socket's tcp_authopt_info + * @addr_sk: socket used for address lookup. Same as sk except for synack case + * @rnextkeyid: value of rnextkeyid caller should write in packet + * + * Result is protected by RCU and can't be stored, it may only be passed to + * tcp_authopt_hash and only under a single rcu_read_lock. + */ +struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, + struct tcp_authopt_info *info, + const struct sock *addr_sk, + u8 *rnextkeyid) +{ + struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); + + return tcp_authopt_lookup_send(net, addr_sk); +} + static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info;
@@ -548,10 +644,45 @@ static int crypto_ahash_buf(struct ahash_request *req, u8 *buf, uint len) ahash_request_set_crypt(req, &sg, NULL, len);
return crypto_ahash_update(req); }
+/** Called to create accepted sockets. + * + * Need to copy authopt info from listen socket. + */ +int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct request_sock *req) +{ + struct tcp_authopt_info *old_info; + struct tcp_authopt_info *new_info; + + old_info = rcu_dereference(tcp_sk(oldsk)->authopt_info); + if (!old_info) + return 0; + + /* Clear value copies from oldsk: */ + rcu_assign_pointer(tcp_sk(newsk)->authopt_info, NULL); + + new_info = kzalloc(sizeof(*new_info), GFP_ATOMIC); + if (!new_info) + return -ENOMEM; + + new_info->src_isn = tcp_rsk(req)->snt_isn; + new_info->dst_isn = tcp_rsk(req)->rcv_isn; + sk_gso_disable(newsk); + rcu_assign_pointer(tcp_sk(newsk)->authopt_info, new_info); + + return 0; +} + +void __tcp_authopt_finish_connect(struct sock *sk, struct sk_buff *skb, + struct tcp_authopt_info *info) +{ + info->src_isn = ntohl(tcp_hdr(skb)->ack_seq) - 1; + info->dst_isn = ntohl(tcp_hdr(skb)->seq); +} + /* feed traffic key into ahash */ static int tcp_authopt_ahash_traffic_key(struct tcp_authopt_alg_pool *pool, struct sock *sk, struct sk_buff *skb, struct tcp_authopt_info *info, @@ -880,11 +1011,10 @@ static int tcp_authopt_hash_packet(struct tcp_authopt_alg_pool *pool, * * The macbuf output buffer must be large enough to fit the digestsize of the * underlying transform before truncation. * This means TCP_AUTHOPT_MAXMACBUF, not TCP_AUTHOPT_MACLEN */ -__always_unused static int __tcp_authopt_calc_mac(struct sock *sk, struct sk_buff *skb, struct tcphdr_authopt *aoptr, struct tcp_authopt_key_info *key, struct tcp_authopt_info *info, @@ -926,10 +1056,176 @@ static int __tcp_authopt_calc_mac(struct sock *sk, out: tcp_authopt_put_mac_pool(key, mac_pool); return err; }
+/* tcp_authopt_hash - fill in the mac + * + * The key must come from tcp_authopt_select_key. + */ +int tcp_authopt_hash(char *hash_location, + struct tcp_authopt_key_info *key, + struct tcp_authopt_info *info, + struct sock *sk, + struct sk_buff *skb) +{ + /* MAC inside option is truncated to 12 bytes but crypto API needs output + * buffer to be large enough so we use a buffer on the stack. + */ + u8 macbuf[TCP_AUTHOPT_MAXMACBUF]; + int err; + struct tcphdr_authopt *aoptr = (struct tcphdr_authopt *)(hash_location - 4); + + err = __tcp_authopt_calc_mac(sk, skb, aoptr, key, info, false, macbuf); + if (err) + goto fail; + memcpy(hash_location, macbuf, TCP_AUTHOPT_MACLEN); + + return 0; + +fail: + /* If mac calculation fails and caller doesn't handle the error + * try to make it obvious inside the packet. + */ + memset(hash_location, 0, TCP_AUTHOPT_MACLEN); + return err; +} + +/** + * tcp_authopt_lookup_recv - lookup key for receive + * + * @sk: Receive socket + * @skb: Packet, used to compare addr and iface + * @net: Per-namespace information containing keys + * @recv_id: Optional recv_id. If >= 0 then only return keys that match + * @anykey: Set to true if any keys are present for the peer + * + * If anykey is false then authentication is not expected from peer. + * + * If anykey is true then a valid key is required. + */ +static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, + struct sk_buff *skb, + struct netns_tcp_authopt *net, + int recv_id, + bool *anykey) +{ + struct tcp_authopt_key_info *result = NULL; + struct tcp_authopt_key_info *key; + + *anykey = false; + /* multiple matches will cause occasional failures */ + hlist_for_each_entry_rcu(key, &net->head, node, 0) { + if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND && + !tcp_authopt_key_match_skb_addr(key, skb)) + continue; + *anykey = true; + if (recv_id >= 0 && key->recv_id != recv_id) + continue; + if (!result) + result = key; + else if (result) + net_warn_ratelimited("ambiguous tcp authentication keys configured for recv\n"); + } + + return result; +} + +/* Show a rate-limited message for authentication fail */ +static void print_tcpao_notice(const char *msg, struct sk_buff *skb) +{ + struct iphdr *iph = (struct iphdr *)skb_network_header(skb); + struct tcphdr *th = (struct tcphdr *)skb_transport_header(skb); + + if (iph->version == 4) { + net_info_ratelimited("%s (%pI4, %d)->(%pI4, %d)\n", msg, + &iph->saddr, ntohs(th->source), + &iph->daddr, ntohs(th->dest)); + } else if (iph->version == 6) { + struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb); + + net_info_ratelimited("%s (%pI6, %d)->(%pI6, %d)\n", msg, + &ip6h->saddr, ntohs(th->source), + &ip6h->daddr, ntohs(th->dest)); + } else { + WARN_ONCE(1, "%s unknown IP version\n", msg); + } +} + +/** + * __tcp_authopt_inbound_check - Check inbound TCP authentication option + * + * @sk: Receive socket. For the SYN_RECV state this must be the request_sock, not the listener + * @skb: Input Packet + * @info: TCP authentication option information + * @_opt: Pointer to TCP authentication option inside the skb + * + * Return: + * 0: Nothing found or expected + * 1: Found and verified + * <0: Error, negative skb_drop_reason + */ +int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, + struct tcp_authopt_info *info, const u8 *_opt) +{ + struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); + struct tcphdr_authopt *opt = (struct tcphdr_authopt *)_opt; + struct tcp_authopt_key_info *key; + bool anykey; + u8 macbuf[TCP_AUTHOPT_MAXMACBUF]; + int err; + + key = tcp_authopt_lookup_recv(sk, skb, net, opt ? opt->keyid : -1, &anykey); + + /* nothing found or expected */ + if (!opt && !anykey) + return 0; + if (!opt && anykey) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + print_tcpao_notice("TCP Authentication Missing", skb); + return -SKB_DROP_REASON_TCP_AONOTFOUND; + } + if (opt && !anykey) { + /* RFC5925 Section 7.3: + * A TCP-AO implementation MUST allow for configuration of the behavior + * of segments with TCP-AO but that do not match an MKT. The initial + * default of this configuration SHOULD be to silently accept such + * connections. + */ + if (info->flags & TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + print_tcpao_notice("TCP Authentication Unexpected: Rejected", skb); + return -SKB_DROP_REASON_TCP_AOUNEXPECTED; + } + print_tcpao_notice("TCP Authentication Unexpected: Accepted", skb); + return 0; + } + if (opt && !key) { + /* Keys are configured for peer but with different keyid than packet */ + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + print_tcpao_notice("TCP Authentication Failed", skb); + return -SKB_DROP_REASON_TCP_AOFAILURE; + } + + /* bad inbound key len */ + if (opt->len != TCPOLEN_AUTHOPT_OUTPUT) + return -SKB_DROP_REASON_TCP_AOFAILURE; + + err = __tcp_authopt_calc_mac(sk, skb, opt, key, info, true, macbuf); + if (err) + return -SKB_DROP_REASON_TCP_AOFAILURE; + + if (memcmp(macbuf, opt->mac, TCP_AUTHOPT_MACLEN)) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + print_tcpao_notice("TCP Authentication Failed", skb); + return -SKB_DROP_REASON_TCP_AOFAILURE; + } + + return 1; +} +EXPORT_SYMBOL(__tcp_authopt_inbound_check); + static int tcp_authopt_init_net(struct net *full_net) { struct netns_tcp_authopt *net = &full_net->tcp_authopt;
mutex_init(&net->mutex); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a6b43fb954b7..9f065469562d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -70,10 +70,11 @@ #include <linux/sysctl.h> #include <linux/kernel.h> #include <linux/prefetch.h> #include <net/dst.h> #include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/inet_common.h> #include <linux/ipsec.h> #include <asm/unaligned.h> #include <linux/errqueue.h> #include <trace/events/tcp.h> @@ -6060,10 +6061,12 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb) struct inet_connection_sock *icsk = inet_csk(sk);
tcp_set_state(sk, TCP_ESTABLISHED); icsk->icsk_ack.lrcvtime = tcp_jiffies32;
+ tcp_authopt_finish_connect(sk, skb); + if (skb) { icsk->icsk_af_ops->sk_rx_dst_set(sk, skb); security_inet_conn_established(sk, skb); sk_mark_napi_id(sk, skb); } diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index cb95d88497ae..64357bf5ede2 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -18,10 +18,11 @@ * Arnt Gulbrandsen, agulbra@nvg.unit.no * Jorge Cwik, jorge@laser.satlink.net */
#include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/xfrm.h> #include <net/busy_poll.h>
static bool tcp_in_window(u32 seq, u32 end_seq, u32 s_win, u32 e_win) { @@ -300,10 +301,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo) BUG_ON(tcptw->tw_md5_key && !tcp_alloc_md5sig_pool()); } } } while (0); #endif + tcp_authopt_time_wait(tcptw, tcp_sk(sk));
/* Get the TIME_WAIT timeout firing. */ if (timeo < rto) timeo = rto;
@@ -342,10 +344,19 @@ void tcp_twsk_destructor(struct sock *sk)
if (twsk->tw_md5_key) kfree_rcu(twsk->tw_md5_key, rcu); } #endif +#ifdef CONFIG_TCP_AUTHOPT + if (tcp_authopt_needed) { + struct tcp_timewait_sock *twsk = tcp_twsk(sk); + + /* twsk only contains sock_common so pass NULL as sk. */ + if (twsk->tw_authopt_info) + tcp_authopt_free(NULL, twsk->tw_authopt_info); + } +#endif } EXPORT_SYMBOL_GPL(tcp_twsk_destructor);
/* Warning : This function is called without sk_listener being locked. * Be sure to read socket fields once, as their value could change under us. @@ -532,10 +543,11 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, #ifdef CONFIG_TCP_MD5SIG newtp->md5sig_info = NULL; /*XXX*/ if (treq->af_specific->req_md5_lookup(sk, req_to_sk(req))) newtp->tcp_header_len += TCPOLEN_MD5SIG_ALIGNED; #endif + tcp_authopt_openreq(newsk, sk, req); if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len) newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len; newtp->rx_opt.mss_clamp = req->mss; tcp_ecn_openreq_child(newtp, req); newtp->fastopen_req = NULL; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 290019de766d..da683f7951eb 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -37,10 +37,11 @@
#define pr_fmt(fmt) "TCP: " fmt
#include <net/tcp.h> #include <net/mptcp.h> +#include <net/tcp_authopt.h>
#include <linux/compiler.h> #include <linux/gfp.h> #include <linux/module.h> #include <linux/static_key.h> @@ -408,10 +409,11 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
#define OPTION_SACK_ADVERTISE BIT(0) #define OPTION_TS BIT(1) #define OPTION_MD5 BIT(2) #define OPTION_WSCALE BIT(3) +#define OPTION_AUTHOPT BIT(4) #define OPTION_FAST_OPEN_COOKIE BIT(8) #define OPTION_SMC BIT(9) #define OPTION_MPTCP BIT(10)
static void smc_options_write(__be32 *ptr, u16 *options) @@ -432,16 +434,22 @@ static void smc_options_write(__be32 *ptr, u16 *options) struct tcp_out_options { u16 options; /* bit field of OPTION_* */ u16 mss; /* 0 to disable */ u8 ws; /* window scale, 0 to disable */ u8 num_sack_blocks; /* number of SACK blocks to include */ - u8 hash_size; /* bytes in hash_location */ u8 bpf_opt_len; /* length of BPF hdr option */ +#ifdef CONFIG_TCP_AUTHOPT + u8 authopt_rnextkeyid; /* rnextkey */ +#endif __u8 *hash_location; /* temporary pointer, overloaded */ __u32 tsval, tsecr; /* need to include OPTION_TS */ struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */ struct mptcp_out_options mptcp; +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info *authopt_info; + struct tcp_authopt_key_info *authopt_key; +#endif };
static void mptcp_options_write(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp, struct tcp_out_options *opts) @@ -616,10 +624,25 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp, /* overload cookie hash location */ opts->hash_location = (__u8 *)ptr; ptr += 4; }
+#ifdef CONFIG_TCP_AUTHOPT + if (unlikely(OPTION_AUTHOPT & options)) { + struct tcp_authopt_key_info *key = opts->authopt_key; + + WARN_ON(!key); + *ptr = htonl((TCPOPT_AUTHOPT << 24) | + (TCPOLEN_AUTHOPT_OUTPUT << 16) | + (key->send_id << 8) | + opts->authopt_rnextkeyid); + /* overload cookie hash location */ + opts->hash_location = (__u8 *)(ptr + 1); + ptr += TCPOLEN_AUTHOPT_OUTPUT / 4; + } +#endif + if (unlikely(opts->mss)) { *ptr++ = htonl((TCPOPT_MSS << 24) | (TCPOLEN_MSS << 16) | opts->mss); } @@ -751,10 +774,28 @@ static void mptcp_set_option_cond(const struct request_sock *req, } } } }
+static int tcp_authopt_init_options(const struct sock *sk, + const struct sock *addr_sk, + struct tcp_out_options *opts) +{ +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_key_info *key; + + key = tcp_authopt_select_key(sk, addr_sk, &opts->authopt_info, &opts->authopt_rnextkeyid); + if (key) { + opts->options |= OPTION_AUTHOPT; + opts->authopt_key = key; + return TCPOLEN_AUTHOPT_OUTPUT; + } +#endif + + return 0; +} + /* Compute TCP options for SYN packets. This is not the final * network wire format yet. */ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, struct tcp_out_options *opts, @@ -763,12 +804,15 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, struct tcp_sock *tp = tcp_sk(sk); unsigned int remaining = MAX_TCP_OPTION_SPACE; struct tcp_fastopen_request *fastopen = tp->fastopen_req;
*md5 = NULL; + + remaining -= tcp_authopt_init_options(sk, sk, opts); #ifdef CONFIG_TCP_MD5SIG if (static_branch_unlikely(&tcp_md5_needed) && + !(opts->options & OPTION_AUTHOPT) && rcu_access_pointer(tp->md5sig_info)) { *md5 = tp->af_specific->md5_lookup(sk, sk); if (*md5) { opts->options |= OPTION_MD5; remaining -= TCPOLEN_MD5SIG_ALIGNED; @@ -847,12 +891,13 @@ static unsigned int tcp_synack_options(const struct sock *sk, struct sk_buff *syn_skb) { struct inet_request_sock *ireq = inet_rsk(req); unsigned int remaining = MAX_TCP_OPTION_SPACE;
+ remaining -= tcp_authopt_init_options(sk, req_to_sk(req), opts); #ifdef CONFIG_TCP_MD5SIG - if (md5) { + if (md5 && !(opts->options & OPTION_AUTHOPT)) { opts->options |= OPTION_MD5; remaining -= TCPOLEN_MD5SIG_ALIGNED;
/* We can't fit any SACK blocks in a packet with MD5 + TS * options. There was discussion about disabling SACK @@ -918,13 +963,15 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb unsigned int size = 0; unsigned int eff_sacks;
opts->options = 0;
+ size += tcp_authopt_init_options(sk, sk, opts); *md5 = NULL; #ifdef CONFIG_TCP_MD5SIG if (static_branch_unlikely(&tcp_md5_needed) && + !(opts->options & OPTION_AUTHOPT) && rcu_access_pointer(tp->md5sig_info)) { *md5 = tp->af_specific->md5_lookup(sk, sk); if (*md5) { opts->options |= OPTION_MD5; size += TCPOLEN_MD5SIG_ALIGNED; @@ -1274,10 +1321,14 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
inet = inet_sk(sk); tcb = TCP_SKB_CB(skb); memset(&opts, 0, sizeof(opts));
+#ifdef CONFIG_TCP_AUTHOPT + /* for tcp_authopt_init_options inside tcp_syn_options or tcp_established_options */ + rcu_read_lock(); +#endif if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) { tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5); } else { tcp_options_size = tcp_established_options(sk, skb, &opts, &md5); @@ -1362,10 +1413,17 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, sk_gso_disable(sk); tp->af_specific->calc_md5_hash(opts.hash_location, md5, sk, skb); } #endif +#ifdef CONFIG_TCP_AUTHOPT + if (opts.authopt_key) { + sk_gso_disable(sk); + tcp_authopt_hash(opts.hash_location, opts.authopt_key, opts.authopt_info, sk, skb); + } + rcu_read_unlock(); +#endif
/* BPF prog is the last one writing header option */ bpf_skops_write_hdr_opt(sk, skb, NULL, NULL, 0, &opts);
INDIRECT_CALL_INET(icsk->icsk_af_ops->send_check, @@ -1832,12 +1890,21 @@ unsigned int tcp_current_mss(struct sock *sk) u32 mtu = dst_mtu(dst); if (mtu != inet_csk(sk)->icsk_pmtu_cookie) mss_now = tcp_sync_mss(sk, mtu); }
+#ifdef CONFIG_TCP_AUTHOPT + /* Even if the result is not used rcu_read_lock is required when scanning for + * tcp authentication keys. Otherwise lockdep will complain. + */ + rcu_read_lock(); +#endif header_len = tcp_established_options(sk, NULL, &opts, &md5) + sizeof(struct tcphdr); +#ifdef CONFIG_TCP_AUTHOPT + rcu_read_unlock(); +#endif /* The mss_cache is sized based on tp->tcp_header_len, which assumes * some common options. If this is an odd packet (because we have SACK * blocks etc) then our calculated header_len will be different, and * we have to adjust mss_now correspondingly */ if (header_len != tp->tcp_header_len) { @@ -3573,10 +3640,14 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, }
#ifdef CONFIG_TCP_MD5SIG rcu_read_lock(); md5 = tcp_rsk(req)->af_specific->req_md5_lookup(sk, req_to_sk(req)); +#endif +#ifdef CONFIG_TCP_AUTHOPT + /* for tcp_authopt_init_options inside tcp_synack_options */ + rcu_read_lock(); #endif skb_set_hash(skb, tcp_rsk(req)->txhash, PKT_HASH_TYPE_L4); /* bpf program will be interested in the tcp_flags */ TCP_SKB_CB(skb)->tcp_flags = TCPHDR_SYN | TCPHDR_ACK; tcp_header_size = tcp_synack_options(sk, req, mss, skb, &opts, md5, @@ -3610,10 +3681,20 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, if (md5) tcp_rsk(req)->af_specific->calc_md5_hash(opts.hash_location, md5, req_to_sk(req), skb); rcu_read_unlock(); #endif +#ifdef CONFIG_TCP_AUTHOPT + /* If signature fails we do nothing */ + if (opts.authopt_key) + tcp_authopt_hash(opts.hash_location, + opts.authopt_key, + opts.authopt_info, + req_to_sk(req), + skb); + rcu_read_unlock(); +#endif
bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb, synack_type, &opts);
skb_set_delivery_time(skb, now, true);
This is mainly intended to protect against local privilege escalations through a rarely used feature so it is deliberately not namespaced.
Enforcement is only at the setsockopt level, this should be enough to ensure that the tcp_authopt_needed static key never turns on.
No effort is made to handle disabling when the feature is already in use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/ip-sysctl.rst | 6 ++++ include/net/tcp_authopt.h | 1 + net/ipv4/sysctl_net_ipv4.c | 39 ++++++++++++++++++++++++++ net/ipv4/tcp_authopt.c | 25 +++++++++++++++++ 4 files changed, 71 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index a759872a2883..41be0e69d767 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1038,10 +1038,16 @@ tcp_challenge_ack_limit - INTEGER Note that this per netns rate limit can allow some side channel attacks and probably should not be enabled. TCP stack implements per TCP socket limits anyway. Default: INT_MAX (unlimited)
+tcp_authopt - BOOLEAN + Enable the TCP Authentication Option (RFC5925), a replacement for TCP + MD5 Signatures (RFC2835). + + Default: 0 + UDP variables =============
udp_l3mdev_accept - BOOLEAN Enabling this option allows a "global" bound socket to work diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 7ad34a6987ec..1f5020b790dd 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -80,10 +80,11 @@ struct tcphdr_authopt { };
#ifdef CONFIG_TCP_AUTHOPT DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed_key); #define tcp_authopt_needed (static_branch_unlikely(&tcp_authopt_needed_key)) +extern int sysctl_tcp_authopt; void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info); void tcp_authopt_clear(struct sock *sk); int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 5490c285668b..908a3ef15b47 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -17,10 +17,11 @@ #include <net/udp.h> #include <net/cipso_ipv4.h> #include <net/ping.h> #include <net/protocol.h> #include <net/netevent.h> +#include <net/tcp_authopt.h>
static int tcp_retr1_max = 255; static int ip_local_port_range_min[] = { 1, 1 }; static int ip_local_port_range_max[] = { 65535, 65535 }; static int tcp_adv_win_scale_min = -31; @@ -413,10 +414,37 @@ static int proc_fib_multipath_hash_fields(struct ctl_table *table, int write,
return ret; } #endif
+#ifdef CONFIG_TCP_AUTHOPT +static int proc_tcp_authopt(struct ctl_table *ctl, + int write, void *buffer, size_t *lenp, + loff_t *ppos) +{ + int val = sysctl_tcp_authopt; + struct ctl_table tmp = { + .data = &val, + .mode = ctl->mode, + .maxlen = sizeof(val), + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }; + int err; + + err = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); + if (err) + return err; + if (sysctl_tcp_authopt && !val) { + net_warn_ratelimited("Enabling TCP Authentication Option is permanent\n"); + return -EINVAL; + } + sysctl_tcp_authopt = val; + return 0; +} +#endif + static struct ctl_table ipv4_table[] = { { .procname = "tcp_max_orphans", .data = &sysctl_tcp_max_orphans, .maxlen = sizeof(int), @@ -524,10 +552,21 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra1 = &sysctl_fib_sync_mem_min, .extra2 = &sysctl_fib_sync_mem_max, }, +#ifdef CONFIG_TCP_AUTHOPT + { + .procname = "tcp_authopt", + .data = &sysctl_tcp_authopt, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_tcp_authopt, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#endif { } };
static struct ctl_table ipv4_net_table[] = { /* tcp_max_tw_buckets must be first in this table. */ diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 4f7cbe1e17f3..9d02da8d6964 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -4,10 +4,15 @@ #include <net/ipv6.h> #include <net/tcp.h> #include <linux/kref.h> #include <crypto/hash.h>
+/* This is mainly intended to protect against local privilege escalations through + * a rarely used feature so it is deliberately not namespaced. + */ +int sysctl_tcp_authopt; + /* This is enabled when first struct tcp_authopt_info is allocated and never released */ DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); EXPORT_SYMBOL(tcp_authopt_needed_key);
/* All current algorithms have a mac length of 12 but crypto API digestsize can be larger */ @@ -437,17 +442,30 @@ static int _copy_from_sockptr_tolerant(u8 *dst, memset(dst + srclen, 0, dstlen - srclen);
return err; }
+static int check_sysctl_tcp_authopt(void) +{ + if (!sysctl_tcp_authopt) { + net_warn_ratelimited("TCP Authentication Option disabled by sysctl.\n"); + return -EPERM; + } + + return 0; +} + int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt opt; struct tcp_authopt_info *info; int err;
sock_owned_by_me(sk); + err = check_sysctl_tcp_authopt(); + if (err) + return err;
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); if (err) return err;
@@ -465,13 +483,17 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen)
int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info; + int err;
memset(opt, 0, sizeof(*opt)); sock_owned_by_me(sk); + err = check_sysctl_tcp_authopt(); + if (err) + return err;
info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); if (!info) return -ENOENT;
@@ -493,10 +515,13 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); struct tcp_authopt_alg_imp *alg; int err;
sock_owned_by_me(sk); + err = check_sysctl_tcp_authopt(); + if (err) + return err; if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) return -EPERM;
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); if (err)
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This is mainly intended to protect against local privilege escalations through a rarely used feature so it is deliberately not namespaced.
Enforcement is only at the setsockopt level, this should be enough to ensure that the tcp_authopt_needed static key never turns on.
No effort is made to handle disabling when the feature is already in use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
Documentation/networking/ip-sysctl.rst | 6 ++++ include/net/tcp_authopt.h | 1 + net/ipv4/sysctl_net_ipv4.c | 39 ++++++++++++++++++++++++++ net/ipv4/tcp_authopt.c | 25 +++++++++++++++++ 4 files changed, 71 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index a759872a2883..41be0e69d767 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1038,10 +1038,16 @@ tcp_challenge_ack_limit - INTEGER Note that this per netns rate limit can allow some side channel attacks and probably should not be enabled. TCP stack implements per TCP socket limits anyway. Default: INT_MAX (unlimited)
+tcp_authopt - BOOLEAN
Enable the TCP Authentication Option (RFC5925), a replacement for TCP
MD5 Signatures (RFC2835).
Default: 0
UDP variables
udp_l3mdev_accept - BOOLEAN Enabling this option allows a "global" bound socket to work diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 7ad34a6987ec..1f5020b790dd 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -80,10 +80,11 @@ struct tcphdr_authopt { };
#ifdef CONFIG_TCP_AUTHOPT DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed_key); #define tcp_authopt_needed (static_branch_unlikely(&tcp_authopt_needed_key)) +extern int sysctl_tcp_authopt; void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info); void tcp_authopt_clear(struct sock *sk); int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 5490c285668b..908a3ef15b47 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -17,10 +17,11 @@ #include <net/udp.h> #include <net/cipso_ipv4.h> #include <net/ping.h> #include <net/protocol.h> #include <net/netevent.h> +#include <net/tcp_authopt.h>
static int tcp_retr1_max = 255; static int ip_local_port_range_min[] = { 1, 1 }; static int ip_local_port_range_max[] = { 65535, 65535 }; static int tcp_adv_win_scale_min = -31; @@ -413,10 +414,37 @@ static int proc_fib_multipath_hash_fields(struct ctl_table *table, int write,
return ret;
} #endif
+#ifdef CONFIG_TCP_AUTHOPT +static int proc_tcp_authopt(struct ctl_table *ctl,
int write, void *buffer, size_t *lenp,
loff_t *ppos)
+{
int val = sysctl_tcp_authopt;
val = READ_ONCE(sysctl_tcp_authopt);
struct ctl_table tmp = {
.data = &val,
.mode = ctl->mode,
.maxlen = sizeof(val),
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
};
int err;
err = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
if (err)
return err;
if (sysctl_tcp_authopt && !val) {
READ_ONCE(sysctl_tcp_authopt)
Note that this test would still be racy, because another cpu might change sysctl_tcp_authopt right after the read.
net_warn_ratelimited("Enabling TCP Authentication Option is permanent\n");
return -EINVAL;
}
sysctl_tcp_authopt = val;
WRITE_ONCE(sysctl_tcp_authopt, val), or even better:
if (val) cmpxchg(&sysctl_tcp_authopt, 0, val);
return 0;
+} +#endif
static struct ctl_table ipv4_table[] = { { .procname = "tcp_max_orphans", .data = &sysctl_tcp_max_orphans, .maxlen = sizeof(int), @@ -524,10 +552,21 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra1 = &sysctl_fib_sync_mem_min, .extra2 = &sysctl_fib_sync_mem_max, }, +#ifdef CONFIG_TCP_AUTHOPT
{
.procname = "tcp_authopt",
.data = &sysctl_tcp_authopt,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_tcp_authopt,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
+#endif { } };
static struct ctl_table ipv4_net_table[] = { /* tcp_max_tw_buckets must be first in this table. */ diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 4f7cbe1e17f3..9d02da8d6964 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -4,10 +4,15 @@ #include <net/ipv6.h> #include <net/tcp.h> #include <linux/kref.h> #include <crypto/hash.h>
+/* This is mainly intended to protect against local privilege escalations through
- a rarely used feature so it is deliberately not namespaced.
- */
+int sysctl_tcp_authopt;
/* This is enabled when first struct tcp_authopt_info is allocated and never released */ DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed_key); EXPORT_SYMBOL(tcp_authopt_needed_key);
/* All current algorithms have a mac length of 12 but crypto API digestsize can be larger */ @@ -437,17 +442,30 @@ static int _copy_from_sockptr_tolerant(u8 *dst, memset(dst + srclen, 0, dstlen - srclen);
return err;
}
+static int check_sysctl_tcp_authopt(void) +{
if (!sysctl_tcp_authopt) {
READ_ONCE(...)
net_warn_ratelimited("TCP Authentication Option disabled by sysctl.\n");
return -EPERM;
}
return 0;
+}
int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt opt; struct tcp_authopt_info *info; int err;
sock_owned_by_me(sk);
err = check_sysctl_tcp_authopt();
if (err)
return err; err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); if (err) return err;
@@ -465,13 +483,17 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen)
int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info;
int err; memset(opt, 0, sizeof(*opt)); sock_owned_by_me(sk);
err = check_sysctl_tcp_authopt();
if (err)
return err; info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); if (!info) return -ENOENT;
@@ -493,10 +515,13 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); struct tcp_authopt_alg_imp *alg; int err;
sock_owned_by_me(sk);
err = check_sysctl_tcp_authopt();
if (err)
return err; if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) return -EPERM; err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); if (err)
-- 2.25.1
On 9/7/22 02:11, Eric Dumazet wrote:
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This is mainly intended to protect against local privilege escalations through a rarely used feature so it is deliberately not namespaced.
Enforcement is only at the setsockopt level, this should be enough to ensure that the tcp_authopt_needed static key never turns on.
No effort is made to handle disabling when the feature is already in use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
Documentation/networking/ip-sysctl.rst | 6 ++++ include/net/tcp_authopt.h | 1 + net/ipv4/sysctl_net_ipv4.c | 39 ++++++++++++++++++++++++++ net/ipv4/tcp_authopt.c | 25 +++++++++++++++++ 4 files changed, 71 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index a759872a2883..41be0e69d767 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1038,10 +1038,16 @@ tcp_challenge_ack_limit - INTEGER Note that this per netns rate limit can allow some side channel attacks and probably should not be enabled. TCP stack implements per TCP socket limits anyway. Default: INT_MAX (unlimited)
+tcp_authopt - BOOLEAN
Enable the TCP Authentication Option (RFC5925), a replacement for TCP
MD5 Signatures (RFC2835).
Default: 0
...
+#ifdef CONFIG_TCP_AUTHOPT +static int proc_tcp_authopt(struct ctl_table *ctl,
int write, void *buffer, size_t *lenp,
loff_t *ppos)
+{
int val = sysctl_tcp_authopt;
val = READ_ONCE(sysctl_tcp_authopt);
struct ctl_table tmp = {
.data = &val,
.mode = ctl->mode,
.maxlen = sizeof(val),
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
};
int err;
err = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
if (err)
return err;
if (sysctl_tcp_authopt && !val) {
READ_ONCE(sysctl_tcp_authopt)
Note that this test would still be racy, because another cpu might change sysctl_tcp_authopt right after the read.
What meaningful races are possible here? This is a variable that changes from 0 to 1 at most once.
In theory if two processes attempt to assign "non-zero" at the same time then one will "win" and the other will get an error but races between userspace writing different values are possible for any sysctl. The solution seems to be "write sysctls from a single place".
All the checks are in sockopts - in theory if the sysctl is written on one CPU then a sockopt can still fail on another CPU until caches are flushed. Is this what you're worried about?
In theory doing READ_ONCE might incur a slight penalty on sockopt but not noticeable.
net_warn_ratelimited("Enabling TCP Authentication Option is permanent\n");
return -EINVAL;
}
sysctl_tcp_authopt = val;
WRITE_ONCE(sysctl_tcp_authopt, val), or even better:
if (val) cmpxchg(&sysctl_tcp_authopt, 0, val);
return 0;
+} +#endif
This would be useful if we did any sort of initialization here but we don't. Crypto is initialized somewhere completely different.
On Wed, Sep 7, 2022 at 9:53 AM Leonard Crestez cdleonard@gmail.com wrote:
On 9/7/22 02:11, Eric Dumazet wrote:
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This is mainly intended to protect against local privilege escalations through a rarely used feature so it is deliberately not namespaced.
Enforcement is only at the setsockopt level, this should be enough to ensure that the tcp_authopt_needed static key never turns on.
No effort is made to handle disabling when the feature is already in use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
Documentation/networking/ip-sysctl.rst | 6 ++++ include/net/tcp_authopt.h | 1 + net/ipv4/sysctl_net_ipv4.c | 39 ++++++++++++++++++++++++++ net/ipv4/tcp_authopt.c | 25 +++++++++++++++++ 4 files changed, 71 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index a759872a2883..41be0e69d767 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1038,10 +1038,16 @@ tcp_challenge_ack_limit - INTEGER Note that this per netns rate limit can allow some side channel attacks and probably should not be enabled. TCP stack implements per TCP socket limits anyway. Default: INT_MAX (unlimited)
+tcp_authopt - BOOLEAN
Enable the TCP Authentication Option (RFC5925), a replacement for TCP
MD5 Signatures (RFC2835).
Default: 0
...
+#ifdef CONFIG_TCP_AUTHOPT +static int proc_tcp_authopt(struct ctl_table *ctl,
int write, void *buffer, size_t *lenp,
loff_t *ppos)
+{
int val = sysctl_tcp_authopt;
val = READ_ONCE(sysctl_tcp_authopt);
struct ctl_table tmp = {
.data = &val,
.mode = ctl->mode,
.maxlen = sizeof(val),
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
};
int err;
err = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
if (err)
return err;
if (sysctl_tcp_authopt && !val) {
READ_ONCE(sysctl_tcp_authopt)
Note that this test would still be racy, because another cpu might change sysctl_tcp_authopt right after the read.
What meaningful races are possible here? This is a variable that changes from 0 to 1 at most once.
Two cpus can issue writes of 0 and 1 values at the same time.
Depending on scheduling writing the 0 can 'win' the race and overwrite the value back to 0.
This is in clear violation of the claim you are making (that the sysctl can only go once from 0 to 1)
In theory if two processes attempt to assign "non-zero" at the same time then one will "win" and the other will get an error but races between userspace writing different values are possible for any sysctl. The solution seems to be "write sysctls from a single place".
All the checks are in sockopts - in theory if the sysctl is written on one CPU then a sockopt can still fail on another CPU until caches are flushed. Is this what you're worried about?
In theory doing READ_ONCE might incur a slight penalty on sockopt but not noticeable.
Not at all. There is _no_ penalty using READ_ONCE(). Unless it is done in a loop and this prevents some compiler optimization.
Please use WRITE_ONCE() and READ_ONCE() for all sysctl values used in TCP stack (and elsewhere)
See all the silly patches we had recently.
net_warn_ratelimited("Enabling TCP Authentication Option is permanent\n");
return -EINVAL;
}
sysctl_tcp_authopt = val;
WRITE_ONCE(sysctl_tcp_authopt, val), or even better:
if (val) cmpxchg(&sysctl_tcp_authopt, 0, val);
return 0;
+} +#endif
This would be useful if we did any sort of initialization here but we don't. Crypto is initialized somewhere completely different.
On 9/7/22 20:04, Eric Dumazet wrote:
On Wed, Sep 7, 2022 at 9:53 AM Leonard Crestez cdleonard@gmail.com wrote:
On 9/7/22 02:11, Eric Dumazet wrote:
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This is mainly intended to protect against local privilege escalations through a rarely used feature so it is deliberately not namespaced.
Enforcement is only at the setsockopt level, this should be enough to ensure that the tcp_authopt_needed static key never turns on.
No effort is made to handle disabling when the feature is already in use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
Documentation/networking/ip-sysctl.rst | 6 ++++ include/net/tcp_authopt.h | 1 + net/ipv4/sysctl_net_ipv4.c | 39 ++++++++++++++++++++++++++ net/ipv4/tcp_authopt.c | 25 +++++++++++++++++ 4 files changed, 71 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index a759872a2883..41be0e69d767 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1038,10 +1038,16 @@ tcp_challenge_ack_limit - INTEGER Note that this per netns rate limit can allow some side channel attacks and probably should not be enabled. TCP stack implements per TCP socket limits anyway. Default: INT_MAX (unlimited)
+tcp_authopt - BOOLEAN
Enable the TCP Authentication Option (RFC5925), a replacement for TCP
MD5 Signatures (RFC2835).
Default: 0
...
+#ifdef CONFIG_TCP_AUTHOPT +static int proc_tcp_authopt(struct ctl_table *ctl,
int write, void *buffer, size_t *lenp,
loff_t *ppos)
+{
int val = sysctl_tcp_authopt;
val = READ_ONCE(sysctl_tcp_authopt);
struct ctl_table tmp = {
.data = &val,
.mode = ctl->mode,
.maxlen = sizeof(val),
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
};
int err;
err = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
if (err)
return err;
if (sysctl_tcp_authopt && !val) {
READ_ONCE(sysctl_tcp_authopt)
Note that this test would still be racy, because another cpu might change sysctl_tcp_authopt right after the read.
What meaningful races are possible here? This is a variable that changes from 0 to 1 at most once.
Two cpus can issue writes of 0 and 1 values at the same time.
Depending on scheduling writing the 0 can 'win' the race and overwrite the value back to 0.
This is in clear violation of the claim you are making (that the sysctl can only go once from 0 to 1)
Not clear why anyone would attempt to write 0, maybe to ensure that it's still disabled?
But you're right that userspace CAN do that and the kernel CAN misbehave in this scenario so it would be better to just make the changes you suggested.
In theory if two processes attempt to assign "non-zero" at the same time then one will "win" and the other will get an error but races between userspace writing different values are possible for any sysctl. The solution seems to be "write sysctls from a single place".
All the checks are in sockopts - in theory if the sysctl is written on one CPU then a sockopt can still fail on another CPU until caches are flushed. Is this what you're worried about?
In theory doing READ_ONCE might incur a slight penalty on sockopt but not noticeable.
Not at all. There is _no_ penalty using READ_ONCE(). Unless it is done in a loop and this prevents some compiler optimization.
Please use WRITE_ONCE() and READ_ONCE() for all sysctl values used in TCP stack (and elsewhere)
See all the silly patches we had recently.
OK
On Tue, Sep 06, 2022 at 04:11:58PM -0700, Eric Dumazet wrote:
WRITE_ONCE(sysctl_tcp_authopt, val), or even better:
if (val) cmpxchg(&sysctl_tcp_authopt, 0, val);
What's the point of the cmpxchg? Since you're simply trying to prevent sysctl_tcp_authopt from going back to zero, then the if clause by itself is enough:
if (val) WRITE_ONCE(sysctl_tcp_authopt, val);
Cheers,
On Wed, Sep 7, 2022 at 3:50 PM Herbert Xu herbert@gondor.apana.org.au wrote:
On Tue, Sep 06, 2022 at 04:11:58PM -0700, Eric Dumazet wrote:
WRITE_ONCE(sysctl_tcp_authopt, val), or even better:
if (val) cmpxchg(&sysctl_tcp_authopt, 0, val);
What's the point of the cmpxchg? Since you're simply trying to prevent sysctl_tcp_authopt from going back to zero, then the if clause by itself is enough:
if (val) WRITE_ONCE(sysctl_tcp_authopt, val);
Ack.
Original patch was doing something racy, I have not though about the most efficient way to deal with it.
Add a compute_sne function which finds the value of SNE for a certain SEQ given an already known "recent" SNE/SEQ. This is implemented using the standard tcp before/after macro and will work for SEQ values that are without 2^31 of the SEQ for which we know the SNE.
For updating we advance the value for rcv_sne at the same time as rcv_nxt and for snd_sne at the same time as snd_nxt. We could track other values (for example snd_una) but this is good enough and works very easily for timewait socket.
This implementation is different from RFC suggestions and doesn't require additional flags. It does pass tests from this draft: https://datatracker.ietf.org/doc/draft-touch-sne/
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 34 ++++++++++++++ net/ipv4/tcp_authopt.c | 98 ++++++++++++++++++++++++++++++++++++++- net/ipv4/tcp_input.c | 1 + net/ipv4/tcp_output.c | 1 + 4 files changed, 132 insertions(+), 2 deletions(-)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 1f5020b790dd..1fa1b968c80c 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -66,10 +66,14 @@ struct tcp_authopt_info { u32 flags; /** @src_isn: Local Initial Sequence Number */ u32 src_isn; /** @dst_isn: Remote Initial Sequence Number */ u32 dst_isn; + /** @rcv_sne: Recv-side Sequence Number Extension tracking tcp_sock.rcv_nxt */ + u32 rcv_sne; + /** @snd_sne: Send-side Sequence Number Extension tracking tcp_sock.snd_nxt */ + u32 snd_sne; };
/* TCP authopt as found in header */ struct tcphdr_authopt { u8 num; @@ -156,10 +160,34 @@ static inline void tcp_authopt_time_wait( int __tcp_authopt_inbound_check( struct sock *sk, struct sk_buff *skb, struct tcp_authopt_info *info, const u8 *opt); +void __tcp_authopt_update_rcv_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_rcv_sne(struct tcp_sock *tp, u32 seq) +{ + struct tcp_authopt_info *info; + + if (tcp_authopt_needed) { + info = rcu_dereference_protected(tp->authopt_info, + lockdep_sock_is_held((struct sock *)tp)); + if (info) + __tcp_authopt_update_rcv_sne(tp, info, seq); + } +} +void __tcp_authopt_update_snd_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_snd_sne(struct tcp_sock *tp, u32 seq) +{ + struct tcp_authopt_info *info; + + if (tcp_authopt_needed) { + info = rcu_dereference_protected(tp->authopt_info, + lockdep_sock_is_held((struct sock *)tp)); + if (info) + __tcp_authopt_update_snd_sne(tp, info, seq); + } +} #else static inline void tcp_authopt_clear(struct sock *sk) { } static inline int tcp_authopt_openreq(struct sock *newsk, @@ -174,8 +202,14 @@ static inline void tcp_authopt_finish_connect(struct sock *sk, struct sk_buff *s static inline void tcp_authopt_time_wait( struct tcp_timewait_sock *tcptw, struct tcp_sock *tp) { } +static inline void tcp_authopt_update_rcv_sne(struct tcp_sock *tp, u32 seq) +{ +} +static inline void tcp_authopt_update_snd_sne(struct tcp_sock *tp, u32 seq) +{ +} #endif
#endif /* _LINUX_TCP_AUTHOPT_H */ diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 9d02da8d6964..1c2039a48bf6 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -656,10 +656,97 @@ static int tcp_authopt_get_isn(struct sock *sk, *disn = htonl(info->dst_isn); } return 0; }
+/* compute_sne - Calculate Sequence Number Extension + * + * Give old upper/lower 32bit values and a new lower 32bit value determine the + * new value of the upper 32 bit. The new sequence number can be 2^31 before or + * after prev_seq but TCP window scaling should limit this further. + * + * For correct accounting the stored SNE value should be only updated together + * with the SEQ. + */ +static u32 compute_sne(u32 sne, u32 prev_seq, u32 seq) +{ + if (before(seq, prev_seq)) { + if (seq > prev_seq) + --sne; + } else { + if (seq < prev_seq) + ++sne; + } + + return sne; +} + +/* Update rcv_sne, must be called immediately before rcv_nxt update */ +void __tcp_authopt_update_rcv_sne(struct tcp_sock *tp, + struct tcp_authopt_info *info, u32 seq) +{ + info->rcv_sne = compute_sne(info->rcv_sne, tp->rcv_nxt, seq); +} + +/* Update snd_sne, must be called immediately before snd_nxt update */ +void __tcp_authopt_update_snd_sne(struct tcp_sock *tp, + struct tcp_authopt_info *info, u32 seq) +{ + info->snd_sne = compute_sne(info->snd_sne, tp->snd_nxt, seq); +} + +/* Compute SNE for a specific packet (by seq). */ +static int compute_packet_sne(struct sock *sk, struct tcp_authopt_info *info, + u32 seq, bool input, __be32 *sne) +{ + u32 rcv_nxt, snd_nxt; + + // For TCP_NEW_SYN_RECV we have no tcp_authopt_info but tcp_request_sock holds ISN. + if (sk->sk_state == TCP_NEW_SYN_RECV) { + struct tcp_request_sock *rsk = tcp_rsk((struct request_sock *)sk); + + if (input) + *sne = htonl(compute_sne(0, rsk->rcv_isn, seq)); + else + *sne = htonl(compute_sne(0, rsk->snt_isn, seq)); + return 0; + } + + /* TCP_LISTEN only receives SYN */ + if (sk->sk_state == TCP_LISTEN && input) + return 0; + + /* TCP_SYN_SENT only sends SYN and receives SYN/ACK + * For the input case rcv_nxt is initialized after the packet is + * validated so tcp_sk(sk)->rcv_nxt is not initialized. + */ + if (sk->sk_state == TCP_SYN_SENT) + return 0; + + if (sk->sk_state == TCP_TIME_WAIT) { + rcv_nxt = tcp_twsk(sk)->tw_rcv_nxt; + snd_nxt = tcp_twsk(sk)->tw_snd_nxt; + } else { + if (WARN_ONCE(!sk_fullsock(sk), + "unexpected minisock sk=%p state=%d", sk, + sk->sk_state)) + return -EINVAL; + rcv_nxt = tcp_sk(sk)->rcv_nxt; + snd_nxt = tcp_sk(sk)->snd_nxt; + } + + if (WARN_ONCE(!info, "unexpected missing info for sk=%p sk_state=%d", sk, sk->sk_state)) + return -EINVAL; + + if (input) + *sne = htonl(compute_sne(info->rcv_sne, rcv_nxt, seq)); + else + *sne = htonl(compute_sne(info->snd_sne, snd_nxt, seq)); + + return 0; +} + /* Feed one buffer into ahash * The buffer is assumed to be DMA-able */ static int crypto_ahash_buf(struct ahash_request *req, u8 *buf, uint len) { @@ -691,10 +778,13 @@ int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct r if (!new_info) return -ENOMEM;
new_info->src_isn = tcp_rsk(req)->snt_isn; new_info->dst_isn = tcp_rsk(req)->rcv_isn; + /* Caller is tcp_create_openreq_child and already initializes snd_nxt/rcv_nxt */ + new_info->snd_sne = compute_sne(0, new_info->src_isn, tcp_sk(newsk)->snd_nxt); + new_info->rcv_sne = compute_sne(0, new_info->dst_isn, tcp_sk(newsk)->rcv_nxt); sk_gso_disable(newsk); rcu_assign_pointer(tcp_sk(newsk)->authopt_info, new_info);
return 0; } @@ -702,10 +792,12 @@ int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct r void __tcp_authopt_finish_connect(struct sock *sk, struct sk_buff *skb, struct tcp_authopt_info *info) { info->src_isn = ntohl(tcp_hdr(skb)->ack_seq) - 1; info->dst_isn = ntohl(tcp_hdr(skb)->seq); + info->snd_sne = compute_sne(0, info->src_isn, tcp_sk(sk)->snd_nxt); + info->rcv_sne = compute_sne(0, info->dst_isn, tcp_sk(sk)->rcv_nxt); }
/* feed traffic key into ahash */ static int tcp_authopt_ahash_traffic_key(struct tcp_authopt_alg_pool *pool, struct sock *sk, @@ -959,14 +1051,16 @@ static int tcp_authopt_hash_packet(struct tcp_authopt_alg_pool *pool, bool ipv6, bool include_options, u8 *macbuf) { struct tcphdr *th = tcp_hdr(skb); + __be32 sne = 0; int err;
- /* NOTE: SNE unimplemented */ - __be32 sne = 0; + err = compute_packet_sne(sk, info, ntohl(th->seq), input, &sne); + if (err) + return err;
err = crypto_ahash_init(pool->req); if (err) return err;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9f065469562d..4da39c32b934 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3527,10 +3527,11 @@ static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack) static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq) { u32 delta = seq - tp->rcv_nxt;
sock_owned_by_me((struct sock *)tp); + tcp_authopt_update_rcv_sne(tp, seq); tp->bytes_received += delta; WRITE_ONCE(tp->rcv_nxt, seq); }
/* Update our send window. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index da683f7951eb..d48d5dc36916 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -67,10 +67,11 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); unsigned int prior_packets = tp->packets_out;
+ tcp_authopt_update_snd_sne(tp, TCP_SKB_CB(skb)->end_seq); WRITE_ONCE(tp->snd_nxt, TCP_SKB_CB(skb)->end_seq);
__skb_unlink(skb, &sk->sk_write_queue); tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
This is a special code path for acks and resets outside of normal connection establishment and closing.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_authopt.c | 2 ++ net/ipv6/tcp_ipv6.c | 60 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+)
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 1c2039a48bf6..bb74ab96b18f 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -381,10 +381,11 @@ struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, { struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk);
return tcp_authopt_lookup_send(net, addr_sk); } +EXPORT_SYMBOL(__tcp_authopt_select_key);
static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info; @@ -1206,10 +1207,11 @@ int tcp_authopt_hash(char *hash_location, * try to make it obvious inside the packet. */ memset(hash_location, 0, TCP_AUTHOPT_MACLEN); return err; } +EXPORT_SYMBOL(tcp_authopt_hash);
/** * tcp_authopt_lookup_recv - lookup key for receive * * @sk: Receive socket diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 8969aee822d5..9e507fcad7cc 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -40,10 +40,11 @@ #include <linux/icmpv6.h> #include <linux/random.h> #include <linux/indirect_call_wrapper.h>
#include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/ndisc.h> #include <net/inet6_hashtables.h> #include <net/inet6_connection_sock.h> #include <net/ipv6.h> #include <net/transp_v6.h> @@ -853,10 +854,48 @@ const struct tcp_request_sock_ops tcp_request_sock_ipv6_ops = { .init_seq = tcp_v6_init_seq, .init_ts_off = tcp_v6_init_ts_off, .send_synack = tcp_v6_send_synack, };
+#ifdef CONFIG_TCP_AUTHOPT +static int tcp_v6_send_response_init_authopt(const struct sock *sk, + struct tcp_authopt_info **info, + struct tcp_authopt_key_info **key, + u8 *rnextkeyid) +{ + /* Key lookup before SKB allocation */ + if (!(tcp_authopt_needed && sk)) + return 0; + if (sk->sk_state == TCP_TIME_WAIT) + *info = tcp_twsk(sk)->tw_authopt_info; + else + *info = rcu_dereference(tcp_sk(sk)->authopt_info); + if (!*info) + return 0; + *key = __tcp_authopt_select_key(sk, *info, sk, rnextkeyid); + if (*key) + return TCPOLEN_AUTHOPT_OUTPUT; + return 0; +} + +static void tcp_v6_send_response_sign_authopt(const struct sock *sk, + struct tcp_authopt_info *info, + struct tcp_authopt_key_info *key, + struct sk_buff *skb, + struct tcphdr_authopt *ptr, + u8 rnextkeyid) +{ + if (!(tcp_authopt_needed && key)) + return; + ptr->num = TCPOPT_AUTHOPT; + ptr->len = TCPOLEN_AUTHOPT_OUTPUT; + ptr->keyid = key->send_id; + ptr->rnextkeyid = rnextkeyid; + tcp_authopt_hash(ptr->mac, key, info, (struct sock *)sk, skb); +} +#endif + static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 seq, u32 ack, u32 win, u32 tsval, u32 tsecr, int oif, struct tcp_md5sig_key *key, int rst, u8 tclass, __be32 label, u32 priority, u32 txhash) { @@ -868,13 +907,30 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 struct sock *ctl_sk = net->ipv6.tcp_sk; unsigned int tot_len = sizeof(struct tcphdr); __be32 mrst = 0, *topt; struct dst_entry *dst; __u32 mark = 0; +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info *aoinfo; + struct tcp_authopt_key_info *aokey; + u8 aornextkeyid; + int aolen; +#endif
if (tsecr) tot_len += TCPOLEN_TSTAMP_ALIGNED; +#ifdef CONFIG_TCP_AUTHOPT + /* Key lookup before SKB allocation */ + aolen = tcp_v6_send_response_init_authopt(sk, &aoinfo, &aokey, &aornextkeyid); + if (aolen) { + tot_len += aolen; +#ifdef CONFIG_TCP_MD5SIG + /* Don't use MD5 */ + key = NULL; +#endif + } +#endif #ifdef CONFIG_TCP_MD5SIG if (key) tot_len += TCPOLEN_MD5SIG_ALIGNED; #endif
@@ -926,10 +982,14 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 tcp_v6_md5_hash_hdr((__u8 *)topt, key, &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, t1); } #endif +#ifdef CONFIG_TCP_AUTHOPT + tcp_v6_send_response_sign_authopt(sk, aoinfo, aokey, buff, + (struct tcphdr_authopt *)topt, aornextkeyid); +#endif
memset(&fl6, 0, sizeof(fl6)); fl6.daddr = ipv6_hdr(skb)->saddr; fl6.saddr = ipv6_hdr(skb)->daddr; fl6.flowlabel = label;
This is required because tcp ipv4 sometimes sends replies without allocating a full skb that can be signed by tcp authopt.
Handle this with additional code in tcp authopt.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 7 ++ net/ipv4/tcp_authopt.c | 144 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 151 insertions(+)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 1fa1b968c80c..9bc0f58a78cb 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -114,10 +114,17 @@ static inline struct tcp_authopt_key_info *tcp_authopt_select_key( int tcp_authopt_hash( char *hash_location, struct tcp_authopt_key_info *key, struct tcp_authopt_info *info, struct sock *sk, struct sk_buff *skb); +int tcp_v4_authopt_hash_reply( + char *hash_location, + struct tcp_authopt_info *info, + struct tcp_authopt_key_info *key, + __be32 saddr, + __be32 daddr, + struct tcphdr *th); int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct request_sock *req); static inline int tcp_authopt_openreq( struct sock *newsk, const struct sock *oldsk, struct request_sock *req) diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index bb74ab96b18f..0260173cd546 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -940,10 +940,72 @@ static int tcp_authopt_get_traffic_key(struct sock *sk, out: tcp_authopt_put_kdf_pool(key, pool); return err; }
+struct tcp_v4_authopt_context_data { + __be32 saddr; + __be32 daddr; + __be16 sport; + __be16 dport; + __be32 sisn; + __be32 disn; + __be16 digestbits; +} __packed; + +static int tcp_v4_authopt_get_traffic_key_noskb(struct tcp_authopt_key_info *key, + __be32 saddr, + __be32 daddr, + __be16 sport, + __be16 dport, + __be32 sisn, + __be32 disn, + u8 *traffic_key) +{ + int err; + struct tcp_authopt_alg_pool *pool; + struct tcp_v4_authopt_context_data data; + + BUILD_BUG_ON(sizeof(data) != 22); + + pool = tcp_authopt_get_kdf_pool(key); + if (IS_ERR(pool)) + return PTR_ERR(pool); + + err = tcp_authopt_setkey(pool, key); + if (err) + goto out; + err = crypto_ahash_init(pool->req); + if (err) + goto out; + + // RFC5926 section 3.1.1.1 + // Separate to keep alignment semi-sane + err = crypto_ahash_buf(pool->req, "\x01TCP-AO", 7); + if (err) + return err; + data.saddr = saddr; + data.daddr = daddr; + data.sport = sport; + data.dport = dport; + data.sisn = sisn; + data.disn = disn; + data.digestbits = htons(crypto_ahash_digestsize(pool->tfm) * 8); + + err = crypto_ahash_buf(pool->req, (u8 *)&data, sizeof(data)); + if (err) + goto out; + ahash_request_set_crypt(pool->req, NULL, traffic_key, 0); + err = crypto_ahash_final(pool->req); + if (err) + goto out; + +out: + tcp_authopt_put_kdf_pool(key, pool); + return err; +} + static int crypto_ahash_buf_zero(struct ahash_request *req, int len) { u8 zeros[TCP_AUTHOPT_MACLEN] = {0}; int buflen, err;
@@ -1210,10 +1272,92 @@ int tcp_authopt_hash(char *hash_location, return err; } EXPORT_SYMBOL(tcp_authopt_hash);
/** + * tcp_v4_authopt_hash_reply - Hash tcp+ipv4 header without SKB + * + * @hash_location: output buffer + * @info: sending socket's tcp_authopt_info + * @key: signing key, from tcp_authopt_select_key. + * @saddr: source address + * @daddr: destination address + * @th: Pointer to TCP header and options + */ +int tcp_v4_authopt_hash_reply(char *hash_location, + struct tcp_authopt_info *info, + struct tcp_authopt_key_info *key, + __be32 saddr, + __be32 daddr, + struct tcphdr *th) +{ + struct tcp_authopt_alg_pool *pool; + u8 macbuf[TCP_AUTHOPT_MAXMACBUF]; + u8 traffic_key[TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN]; + __be32 sne = 0; + int err; + + /* Call special code path for computing traffic key without skb + * This can be called from tcp_v4_reqsk_send_ack so caching would be + * difficult here. + */ + err = tcp_v4_authopt_get_traffic_key_noskb(key, saddr, daddr, + th->source, th->dest, + htonl(info->src_isn), htonl(info->dst_isn), + traffic_key); + if (err) + goto out_err_traffic_key; + + /* Init mac shash */ + pool = tcp_authopt_get_mac_pool(key); + if (IS_ERR(pool)) + return PTR_ERR(pool); + err = crypto_ahash_setkey(pool->tfm, traffic_key, key->alg->traffic_key_len); + if (err) + goto out_err; + err = crypto_ahash_init(pool->req); + if (err) + return err; + + err = crypto_ahash_buf(pool->req, (u8 *)&sne, 4); + if (err) + return err; + + err = tcp_authopt_hash_tcp4_pseudoheader(pool, saddr, daddr, th->doff * 4); + if (err) + return err; + + // TCP header with checksum set to zero. Caller ensures this. + if (WARN_ON_ONCE(th->check != 0)) + goto out_err; + err = crypto_ahash_buf(pool->req, (u8 *)th, sizeof(*th)); + if (err) + goto out_err; + + // TCP options + err = tcp_authopt_hash_opts(pool, th, (struct tcphdr_authopt *)(hash_location - 4), + !(key->flags & TCP_AUTHOPT_KEY_EXCLUDE_OPTS)); + if (err) + goto out_err; + + ahash_request_set_crypt(pool->req, NULL, macbuf, 0); + err = crypto_ahash_final(pool->req); + if (err) + goto out_err; + memcpy(hash_location, macbuf, TCP_AUTHOPT_MACLEN); + + tcp_authopt_put_mac_pool(key, pool); + return 0; + +out_err: + tcp_authopt_put_mac_pool(key, pool); +out_err_traffic_key: + memset(hash_location, 0, TCP_AUTHOPT_MACLEN); + return err; +} + +/* * tcp_authopt_lookup_recv - lookup key for receive * * @sk: Receive socket * @skb: Packet, used to compare addr and iface * @net: Per-namespace information containing keys
The code in tcp_v4_send_ack and tcp_v4_send_reset does not allocate a full skb so special handling is required for tcp-authopt handling.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_authopt.c | 3 +- net/ipv4/tcp_ipv4.c | 84 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 83 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 0260173cd546..0672a3bf5686 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -962,10 +962,11 @@ static int tcp_v4_authopt_get_traffic_key_noskb(struct tcp_authopt_key_info *key u8 *traffic_key) { int err; struct tcp_authopt_alg_pool *pool; struct tcp_v4_authopt_context_data data; + char traffic_key_context_header[7] = "\x01TCP-AO";
BUILD_BUG_ON(sizeof(data) != 22);
pool = tcp_authopt_get_kdf_pool(key); if (IS_ERR(pool)) @@ -978,11 +979,11 @@ static int tcp_v4_authopt_get_traffic_key_noskb(struct tcp_authopt_key_info *key if (err) goto out;
// RFC5926 section 3.1.1.1 // Separate to keep alignment semi-sane - err = crypto_ahash_buf(pool->req, "\x01TCP-AO", 7); + err = crypto_ahash_buf(pool->req, traffic_key_context_header, 7); if (err) return err; data.saddr = saddr; data.daddr = daddr; data.sport = sport; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 05939e696dd6..198912f3f533 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -664,10 +664,50 @@ void tcp_v4_send_check(struct sock *sk, struct sk_buff *skb)
__tcp_v4_send_check(skb, inet->inet_saddr, inet->inet_daddr); } EXPORT_SYMBOL(tcp_v4_send_check);
+#ifdef CONFIG_TCP_AUTHOPT +/** tcp_v4_authopt_handle_reply - Insert TCPOPT_AUTHOPT if required + * + * returns number of bytes (always aligned to 4) or zero + */ +static int tcp_v4_authopt_handle_reply(const struct sock *sk, + struct sk_buff *skb, + __be32 *optptr, + struct tcphdr *th) +{ + struct tcp_authopt_info *info; + struct tcp_authopt_key_info *key_info; + u8 rnextkeyid; + + if (sk->sk_state == TCP_TIME_WAIT) + info = tcp_twsk(sk)->tw_authopt_info; + else + info = rcu_dereference_check(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk)); + if (!info) + return 0; + key_info = __tcp_authopt_select_key(sk, info, sk, &rnextkeyid); + if (!key_info) + return 0; + *optptr = htonl((TCPOPT_AUTHOPT << 24) | + (TCPOLEN_AUTHOPT_OUTPUT << 16) | + (key_info->send_id << 8) | + (rnextkeyid)); + /* must update doff before signature computation */ + th->doff += TCPOLEN_AUTHOPT_OUTPUT / 4; + tcp_v4_authopt_hash_reply((char *)(optptr + 1), + info, + key_info, + ip_hdr(skb)->daddr, + ip_hdr(skb)->saddr, + th); + + return TCPOLEN_AUTHOPT_OUTPUT; +} +#endif + /* * This routine will send an RST to the other tcp. * * Someone asks: why I NEVER use socket parameters (TOS, TTL etc.) * for reset. @@ -679,10 +719,12 @@ EXPORT_SYMBOL(tcp_v4_send_check); * Exception: precedence violation. We do not implement it in any case. */
#ifdef CONFIG_TCP_MD5SIG #define OPTION_BYTES TCPOLEN_MD5SIG_ALIGNED +#elif defined(OPTION_BYTES_TCP_AUTHOPT) +#define OPTION_BYTES TCPOLEN_AUTHOPT_OUTPUT #else #define OPTION_BYTES sizeof(__be32) #endif
static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) @@ -732,12 +774,29 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) memset(&arg, 0, sizeof(arg)); arg.iov[0].iov_base = (unsigned char *)&rep; arg.iov[0].iov_len = sizeof(rep.th);
net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev); -#ifdef CONFIG_TCP_MD5SIG +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) rcu_read_lock(); +#endif +#ifdef CONFIG_TCP_AUTHOPT + /* Unlike TCP-MD5 the signatures for TCP-AO depend on initial sequence + * numbers so we can only handle established and time-wait sockets. + */ + if (tcp_authopt_needed && sk && + sk->sk_state != TCP_NEW_SYN_RECV && + sk->sk_state != TCP_LISTEN) { + int tcp_authopt_ret = tcp_v4_authopt_handle_reply(sk, skb, rep.opt, &rep.th); + + if (tcp_authopt_ret) { + arg.iov[0].iov_len += tcp_authopt_ret; + goto skip_md5sig; + } + } +#endif +#ifdef CONFIG_TCP_MD5SIG hash_location = tcp_parse_md5sig_option(th); if (sk && sk_fullsock(sk)) { const union tcp_md5_addr *addr; int l3index;
@@ -775,11 +834,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) addr = (union tcp_md5_addr *)&ip_hdr(skb)->saddr; key = tcp_md5_do_lookup(sk1, l3index, addr, AF_INET); if (!key) goto out;
- genhash = tcp_v4_md5_hash_skb(newhash, key, NULL, skb); if (genhash || memcmp(hash_location, newhash, 16) != 0) goto out;
} @@ -795,10 +853,13 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
tcp_v4_md5_hash_hdr((__u8 *) &rep.opt[1], key, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr, &rep.th); } +#endif +#ifdef CONFIG_TCP_AUTHOPT +skip_md5sig: #endif /* Can't co-exist with TCPMD5, hence check rep.opt[0] */ if (rep.opt[0] == 0) { __be32 mrst = mptcp_reset_option(skb);
@@ -852,12 +913,14 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) sock_net_set(ctl_sk, &init_net); __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); local_bh_enable();
-#ifdef CONFIG_TCP_MD5SIG +#if defined(CONFIG_TCP_MD5SIG) out: +#endif +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) rcu_read_unlock(); #endif }
/* The code following below sending ACKs in SYN-RECV and TIME-WAIT states @@ -874,10 +937,12 @@ static void tcp_v4_send_ack(const struct sock *sk, struct { struct tcphdr th; __be32 opt[(TCPOLEN_TSTAMP_ALIGNED >> 2) #ifdef CONFIG_TCP_MD5SIG + (TCPOLEN_MD5SIG_ALIGNED >> 2) +#elif defined(CONFIG_TCP_AUTHOPT) + + (TCPOLEN_AUTHOPT_OUTPUT >> 2) #endif ]; } rep; struct net *net = sock_net(sk); struct ip_reply_arg arg; @@ -905,10 +970,23 @@ static void tcp_v4_send_ack(const struct sock *sk, rep.th.seq = htonl(seq); rep.th.ack_seq = htonl(ack); rep.th.ack = 1; rep.th.window = htons(win);
+#ifdef CONFIG_TCP_AUTHOPT + if (tcp_authopt_needed) { + int aoret, offset = (tsecr) ? 3 : 0; + + aoret = tcp_v4_authopt_handle_reply(sk, skb, &rep.opt[offset], &rep.th); + if (aoret) { + arg.iov[0].iov_len += aoret; +#ifdef CONFIG_TCP_MD5SIG + key = NULL; +#endif + } + } +#endif #ifdef CONFIG_TCP_MD5SIG if (key) { int offset = (tsecr) ? 3 : 0;
rep.opt[offset++] = htonl((TCPOPT_NOP << 24) |
Add flags to allow marking individual keys and invalid for send or recv. Making keys assymetric this way is not mentioned in RFC5925 but RFC8177 requires that keys inside a keychain have independent "accept" and "send" lifetimes.
Flag names are negative so that the default behavior is for keys to be valid for both send and recv.
Setting both NOSEND and NORECV for a certain peer address can be used on a listen socket can be used to mean "TCP-AO is required from this peer but no keys are currently valid".
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/uapi/linux/tcp.h | 4 ++++ net/ipv4/tcp_authopt.c | 9 ++++++++- 2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 76d7be6b27f4..75107a7fd935 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -369,15 +369,19 @@ struct tcp_authopt { * enum tcp_authopt_key_flag - flags for `tcp_authopt.flags` * * @TCP_AUTHOPT_KEY_DEL: Delete the key and ignore non-id fields * @TCP_AUTHOPT_KEY_EXCLUDE_OPTS: Exclude TCP options from signature * @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr` + * @TCP_AUTHOPT_KEY_NOSEND: Key invalid for send (expired) + * @TCP_AUTHOPT_KEY_NORECV: Key invalid for recv (expired) */ enum tcp_authopt_key_flag { TCP_AUTHOPT_KEY_DEL = (1 << 0), TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1), TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2), + TCP_AUTHOPT_KEY_NOSEND = (1 << 4), + TCP_AUTHOPT_KEY_NORECV = (1 << 5), };
/** * enum tcp_authopt_alg - Algorithms for TCP Authentication Option */ diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 0672a3bf5686..4dc2fe541498 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -353,10 +353,12 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_aut
hlist_for_each_entry_rcu(key, &net->head, node, 0) { if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) if (!tcp_authopt_key_match_sk_addr(key, addr_sk)) continue; + if (key->flags & TCP_AUTHOPT_KEY_NOSEND) + continue; if (result && net_ratelimit()) pr_warn("ambiguous tcp authentication keys configured for send\n"); result = key; }
@@ -504,11 +506,13 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) }
#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \ TCP_AUTHOPT_KEY_DEL | \ TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \ - TCP_AUTHOPT_KEY_ADDR_BIND) + TCP_AUTHOPT_KEY_ADDR_BIND | \ + TCP_AUTHOPT_KEY_NOSEND | \ + TCP_AUTHOPT_KEY_NORECV)
int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt_key opt; struct tcp_authopt_info *info; @@ -1383,10 +1387,13 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, hlist_for_each_entry_rcu(key, &net->head, node, 0) { if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND && !tcp_authopt_key_match_skb_addr(key, skb)) continue; *anykey = true; + // If only keys with norecv flag are present still consider that + if (key->flags & TCP_AUTHOPT_KEY_NORECV) + continue; if (recv_id >= 0 && key->recv_id != recv_id) continue; if (!result) result = key; else if (result)
This is a parallel feature to tcp_md5sig.tcpm_ifindex support and allows applications to server multiple VRFs with a single socket.
The ifindex argument must be the ifindex of a VRF device and must match exactly, keys with ifindex == 0 (outside of VRF) will not match for connections inside a VRF.
Keys without the TCP_AUTHOPT_KEY_IFINDEX will ignore ifindex and match both inside and outside VRF.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/tcp_authopt.rst | 1 + include/net/tcp_authopt.h | 2 + include/uapi/linux/tcp.h | 11 ++++ net/ipv4/tcp_authopt.c | 69 ++++++++++++++++++++++-- 4 files changed, 79 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/tcp_authopt.rst b/Documentation/networking/tcp_authopt.rst index 72adb7a891ce..cbdea65e2b5d 100644 --- a/Documentation/networking/tcp_authopt.rst +++ b/Documentation/networking/tcp_authopt.rst @@ -37,10 +37,11 @@ expand over time by increasing the size of `struct tcp_authopt_key` and adding new flags.
* Address binding is optional, by default keys match all addresses * Local address is ignored, matching is done by remote address * Ports are ignored + * It is possible to match a specific VRF by l3index (default is to ignore)
RFC5925 requires that key ids do not overlap when tcp identifiers (addr/port) overlap. This is not enforced by linux, configuring ambiguous keys will result in packet drops and lost connections.
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 9bc0f58a78cb..e450f7c30043 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -45,10 +45,12 @@ struct tcp_authopt_key_info { u8 alg_id; /** @keylen: Same as &tcp_authopt_key.keylen */ u8 keylen; /** @key: Same as &tcp_authopt_key.key */ u8 key[TCP_AUTHOPT_MAXKEYLEN]; + /** @l3index: Same as &tcp_authopt_key.ifindex */ + int l3index; /** @addr: Same as &tcp_authopt_key.addr */ struct sockaddr_storage addr; /** @alg: Algorithm implementation matching alg_id */ struct tcp_authopt_alg_imp *alg; }; diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 75107a7fd935..28be52f4e411 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -369,17 +369,19 @@ struct tcp_authopt { * enum tcp_authopt_key_flag - flags for `tcp_authopt.flags` * * @TCP_AUTHOPT_KEY_DEL: Delete the key and ignore non-id fields * @TCP_AUTHOPT_KEY_EXCLUDE_OPTS: Exclude TCP options from signature * @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr` + * @TCP_AUTHOPT_KEY_IFINDEX: Key only valid for `tcp_authopt.ifindex` * @TCP_AUTHOPT_KEY_NOSEND: Key invalid for send (expired) * @TCP_AUTHOPT_KEY_NORECV: Key invalid for recv (expired) */ enum tcp_authopt_key_flag { TCP_AUTHOPT_KEY_DEL = (1 << 0), TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1), TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2), + TCP_AUTHOPT_KEY_IFINDEX = (1 << 3), TCP_AUTHOPT_KEY_NOSEND = (1 << 4), TCP_AUTHOPT_KEY_NORECV = (1 << 5), };
/** @@ -423,10 +425,19 @@ struct tcp_authopt_key { * @addr: Key is only valid for this address * * Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set */ struct __kernel_sockaddr_storage addr; + /** + * @ifindex: ifindex of vrf (l3mdev_master) interface + * + * If the TCP_AUTHOPT_KEY_IFINDEX flag is set then key only applies for + * connections through this interface. Interface must be an vrf master. + * + * This is similar to `tcp_msg5sig.tcpm_ifindex` + */ + int ifindex; };
/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 4dc2fe541498..3704af8202eb 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -1,8 +1,9 @@ // SPDX-License-Identifier: GPL-2.0-or-later
#include <net/tcp_authopt.h> +#include <net/ip.h> #include <net/ipv6.h> #include <net/tcp.h> #include <linux/kref.h> #include <crypto/hash.h>
@@ -264,10 +265,14 @@ static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info, { if (info->send_id != key->send_id) return false; if (info->recv_id != key->recv_id) return false; + if ((info->flags & TCP_AUTHOPT_KEY_IFINDEX) != (key->flags & TCP_AUTHOPT_KEY_IFINDEX)) + return false; + if ((info->flags & TCP_AUTHOPT_KEY_IFINDEX) && info->l3index != key->ifindex) + return false; if ((info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) != (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND)) return false; if (info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) if (!ipvx_addr_match(&info->addr, &key->addr)) return false; @@ -333,10 +338,24 @@ static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct so return key_info;
return NULL; }
+static bool better_key_match(struct tcp_authopt_key_info *old, struct tcp_authopt_key_info *new) +{ + if (!old) + return true; + + /* l3index always overrides non-l3index */ + if (old->l3index && new->l3index == 0) + return false; + if (old->l3index == 0 && new->l3index) + return true; + + return false; +} + /** * tcp_authopt_lookup_send - lookup key for sending * * @net: Per-namespace information containing keys * @addr_sk: Socket used for destination address lookup @@ -348,20 +367,29 @@ static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct so static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_authopt *net, const struct sock *addr_sk) { struct tcp_authopt_key_info *result = NULL; struct tcp_authopt_key_info *key; + int l3index = -1;
hlist_for_each_entry_rcu(key, &net->head, node, 0) { if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) if (!tcp_authopt_key_match_sk_addr(key, addr_sk)) continue; + if (key->flags & TCP_AUTHOPT_KEY_IFINDEX) { + if (l3index < 0) + l3index = l3mdev_master_ifindex_by_index(sock_net(addr_sk), + addr_sk->sk_bound_dev_if); + if (l3index != key->l3index) + continue; + } if (key->flags & TCP_AUTHOPT_KEY_NOSEND) continue; - if (result && net_ratelimit()) - pr_warn("ambiguous tcp authentication keys configured for send\n"); - result = key; + if (better_key_match(result, key)) + result = key; + else if (result) + net_warn_ratelimited("ambiguous tcp authentication keys configured for send\n"); }
return result; }
@@ -507,20 +535,22 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt)
#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \ TCP_AUTHOPT_KEY_DEL | \ TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \ TCP_AUTHOPT_KEY_ADDR_BIND | \ + TCP_AUTHOPT_KEY_IFINDEX | \ TCP_AUTHOPT_KEY_NOSEND | \ TCP_AUTHOPT_KEY_NORECV)
int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt_key opt; struct tcp_authopt_info *info; struct tcp_authopt_key_info *key_info, *old_key_info; struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); struct tcp_authopt_alg_imp *alg; + int l3index = 0; int err;
sock_owned_by_me(sk); err = check_sysctl_tcp_authopt(); if (err) @@ -571,10 +601,24 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) return -EINVAL; err = tcp_authopt_alg_require(alg); if (err) return err;
+ /* check ifindex is valid (zero is always valid) */ + if (opt.flags & TCP_AUTHOPT_KEY_IFINDEX && opt.ifindex) { + struct net_device *dev; + + rcu_read_lock(); + dev = dev_get_by_index_rcu(sock_net(sk), opt.ifindex); + if (dev && netif_is_l3_master(dev)) + l3index = dev->ifindex; + rcu_read_unlock(); + + if (!l3index) + return -EINVAL; + } + key_info = kmalloc(sizeof(*key_info), GFP_KERNEL | __GFP_ZERO); if (!key_info) return -ENOMEM; mutex_lock(&net->mutex); kref_init(&key_info->ref); @@ -590,10 +634,11 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) key_info->alg_id = opt.alg; key_info->alg = alg; key_info->keylen = opt.keylen; memcpy(key_info->key, opt.key, opt.keylen); memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); + key_info->l3index = l3index; hlist_add_head_rcu(&key_info->node, &net->head); mutex_unlock(&net->mutex);
return 0; } @@ -1379,24 +1424,40 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, int recv_id, bool *anykey) { struct tcp_authopt_key_info *result = NULL; struct tcp_authopt_key_info *key; + int l3index = -1;
*anykey = false; /* multiple matches will cause occasional failures */ hlist_for_each_entry_rcu(key, &net->head, node, 0) { if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND && !tcp_authopt_key_match_skb_addr(key, skb)) continue; + if (key->flags & TCP_AUTHOPT_KEY_IFINDEX) { + if (l3index < 0) { + if (skb->protocol == htons(ETH_P_IP)) { + l3index = inet_sdif(skb) ? inet_iif(skb) : 0; + } else if (skb->protocol == htons(ETH_P_IPV6)) { + l3index = inet6_sdif(skb) ? inet6_iif(skb) : 0; + } else { + WARN_ONCE(1, "unexpected skb->protocol=%x", skb->protocol); + continue; + } + } + + if (l3index != key->l3index) + continue; + } *anykey = true; // If only keys with norecv flag are present still consider that if (key->flags & TCP_AUTHOPT_KEY_NORECV) continue; if (recv_id >= 0 && key->recv_id != recv_id) continue; - if (!result) + if (better_key_match(result, key)) result = key; else if (result) net_warn_ratelimited("ambiguous tcp authentication keys configured for recv\n"); }
This allows making a key apply to an addr/prefix instead of just the full addr. This is enabled through a custom flag, default behavior is still full address match.
This is equivalent to TCP_MD5SIG_FLAG_PREFIX from TCP_MD5SIG and has the same use-cases.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/tcp_authopt.rst | 1 + include/net/tcp_authopt.h | 2 + include/uapi/linux/tcp.h | 10 ++++ net/ipv4/tcp_authopt.c | 63 ++++++++++++++++++++++-- 4 files changed, 72 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/tcp_authopt.rst b/Documentation/networking/tcp_authopt.rst index cbdea65e2b5d..d0191d0c6c02 100644 --- a/Documentation/networking/tcp_authopt.rst +++ b/Documentation/networking/tcp_authopt.rst @@ -38,10 +38,11 @@ new flags.
* Address binding is optional, by default keys match all addresses * Local address is ignored, matching is done by remote address * Ports are ignored * It is possible to match a specific VRF by l3index (default is to ignore) + * It is possible to match with a fixed prefixlen (default is full address)
RFC5925 requires that key ids do not overlap when tcp identifiers (addr/port) overlap. This is not enforced by linux, configuring ambiguous keys will result in packet drops and lost connections.
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index e450f7c30043..6260c3ef6864 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -47,10 +47,12 @@ struct tcp_authopt_key_info { u8 keylen; /** @key: Same as &tcp_authopt_key.key */ u8 key[TCP_AUTHOPT_MAXKEYLEN]; /** @l3index: Same as &tcp_authopt_key.ifindex */ int l3index; + /** @prefixlen: Length of addr match (default full) */ + int prefixlen; /** @addr: Same as &tcp_authopt_key.addr */ struct sockaddr_storage addr; /** @alg: Algorithm implementation matching alg_id */ struct tcp_authopt_alg_imp *alg; }; diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 28be52f4e411..274ddfefd6de 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -372,18 +372,21 @@ struct tcp_authopt { * @TCP_AUTHOPT_KEY_EXCLUDE_OPTS: Exclude TCP options from signature * @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr` * @TCP_AUTHOPT_KEY_IFINDEX: Key only valid for `tcp_authopt.ifindex` * @TCP_AUTHOPT_KEY_NOSEND: Key invalid for send (expired) * @TCP_AUTHOPT_KEY_NORECV: Key invalid for recv (expired) + * @TCP_AUTHOPT_KEY_PREFIXLEN: Valid value in `tcp_authopt.prefixlen`, otherwise + * match full address length */ enum tcp_authopt_key_flag { TCP_AUTHOPT_KEY_DEL = (1 << 0), TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1), TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2), TCP_AUTHOPT_KEY_IFINDEX = (1 << 3), TCP_AUTHOPT_KEY_NOSEND = (1 << 4), TCP_AUTHOPT_KEY_NORECV = (1 << 5), + TCP_AUTHOPT_KEY_PREFIXLEN = (1 << 6), };
/** * enum tcp_authopt_alg - Algorithms for TCP Authentication Option */ @@ -434,10 +437,17 @@ struct tcp_authopt_key { * connections through this interface. Interface must be an vrf master. * * This is similar to `tcp_msg5sig.tcpm_ifindex` */ int ifindex; + /** + * @prefixlen: length of prefix to match + * + * Without the TCP_AUTHOPT_KEY_PREFIXLEN flag this is ignored and a full + * address match is performed. + */ + int prefixlen; };
/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 3704af8202eb..daeecb64c89e 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -4,10 +4,11 @@ #include <net/ip.h> #include <net/ipv6.h> #include <net/tcp.h> #include <linux/kref.h> #include <crypto/hash.h> +#include <linux/inetdevice.h>
/* This is mainly intended to protect against local privilege escalations through * a rarely used feature so it is deliberately not namespaced. */ int sysctl_tcp_authopt; @@ -269,10 +270,14 @@ static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info, return false; if ((info->flags & TCP_AUTHOPT_KEY_IFINDEX) != (key->flags & TCP_AUTHOPT_KEY_IFINDEX)) return false; if ((info->flags & TCP_AUTHOPT_KEY_IFINDEX) && info->l3index != key->ifindex) return false; + if ((info->flags & TCP_AUTHOPT_KEY_PREFIXLEN) != (key->flags & TCP_AUTHOPT_KEY_PREFIXLEN)) + return false; + if ((info->flags & TCP_AUTHOPT_KEY_PREFIXLEN) && info->prefixlen != key->prefixlen) + return false; if ((info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) != (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND)) return false; if (info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) if (!ipvx_addr_match(&info->addr, &key->addr)) return false; @@ -286,17 +291,20 @@ static bool tcp_authopt_key_match_skb_addr(struct tcp_authopt_key_info *key, u16 keyaf = key->addr.ss_family; struct iphdr *iph = (struct iphdr *)skb_network_header(skb);
if (keyaf == AF_INET && iph->version == 4) { struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr; + __be32 mask = inet_make_mask(key->prefixlen);
- return iph->saddr == key_addr->sin_addr.s_addr; + return (iph->saddr & mask) == key_addr->sin_addr.s_addr; } else if (keyaf == AF_INET6 && iph->version == 6) { struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb); struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr;
- return ipv6_addr_equal(&ip6h->saddr, &key_addr->sin6_addr); + return ipv6_prefix_equal(&ip6h->saddr, + &key_addr->sin6_addr, + key->prefixlen); }
/* This actually happens with ipv6-mapped-ipv4-addresses * IPv6 listen sockets will be asked to validate ipv4 packets. */ @@ -312,17 +320,20 @@ static bool tcp_authopt_key_match_sk_addr(struct tcp_authopt_key_info *key, if (keyaf != addr_sk->sk_family) return false;
if (keyaf == AF_INET) { struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr; + __be32 mask = inet_make_mask(key->prefixlen);
- return addr_sk->sk_daddr == key_addr->sin_addr.s_addr; + return (addr_sk->sk_daddr & mask) == key_addr->sin_addr.s_addr; #if IS_ENABLED(CONFIG_IPV6) } else if (keyaf == AF_INET6) { struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr;
- return ipv6_addr_equal(&addr_sk->sk_v6_daddr, &key_addr->sin6_addr); + return ipv6_prefix_equal(&addr_sk->sk_v6_daddr, + &key_addr->sin6_addr, + key->prefixlen); #endif }
return false; } @@ -348,10 +359,16 @@ static bool better_key_match(struct tcp_authopt_key_info *old, struct tcp_authop /* l3index always overrides non-l3index */ if (old->l3index && new->l3index == 0) return false; if (old->l3index == 0 && new->l3index) return true; + /* Full address match overrides match by prefixlen */ + if (!(new->flags & TCP_AUTHOPT_KEY_PREFIXLEN) && (old->flags & TCP_AUTHOPT_KEY_PREFIXLEN)) + return false; + /* Longer prefixes are better matches */ + if (new->prefixlen > old->prefixlen) + return true;
return false; }
/** @@ -536,21 +553,32 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) #define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \ TCP_AUTHOPT_KEY_DEL | \ TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \ TCP_AUTHOPT_KEY_ADDR_BIND | \ TCP_AUTHOPT_KEY_IFINDEX | \ + TCP_AUTHOPT_KEY_PREFIXLEN | \ TCP_AUTHOPT_KEY_NOSEND | \ TCP_AUTHOPT_KEY_NORECV)
+static bool ipv6_addr_is_prefix(struct in6_addr *addr, int plen) +{ + struct in6_addr copy; + + ipv6_addr_prefix(©, addr, plen); + + return !memcmp(©, addr, sizeof(*addr)); +} + int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt_key opt; struct tcp_authopt_info *info; struct tcp_authopt_key_info *key_info, *old_key_info; struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); struct tcp_authopt_alg_imp *alg; int l3index = 0; + int prefixlen; int err;
sock_owned_by_me(sk); err = check_sysctl_tcp_authopt(); if (err) @@ -586,10 +614,36 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) if (opt.flags & TCP_AUTHOPT_KEY_ADDR_BIND) { if (sk->sk_family != opt.addr.ss_family) return -EINVAL; }
+ /* check prefixlen */ + if (opt.flags & TCP_AUTHOPT_KEY_PREFIXLEN) { + prefixlen = opt.prefixlen; + if (sk->sk_family == AF_INET) { + if (prefixlen < 0 || prefixlen > 32) + return -EINVAL; + if (((struct sockaddr_in *)&opt.addr)->sin_addr.s_addr & + ~inet_make_mask(prefixlen)) + return -EINVAL; + } + if (sk->sk_family == AF_INET6) { + if (prefixlen < 0 || prefixlen > 128) + return -EINVAL; + if (!ipv6_addr_is_prefix(&((struct sockaddr_in6 *)&opt.addr)->sin6_addr, + prefixlen)) + return -EINVAL; + } + } else { + if (sk->sk_family == AF_INET) + prefixlen = 32; + else if (sk->sk_family == AF_INET6) + prefixlen = 128; + else + return -EINVAL; + } + /* Initialize tcp_authopt_info if not already set */ info = __tcp_authopt_info_get_or_create(sk); if (IS_ERR(info)) return PTR_ERR(info);
@@ -635,10 +689,11 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) key_info->alg = alg; key_info->keylen = opt.keylen; memcpy(key_info->key, opt.key, opt.keylen); memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); key_info->l3index = l3index; + key_info->prefixlen = prefixlen; hlist_add_head_rcu(&key_info->node, &net->head); mutex_unlock(&net->mutex);
return 0; }
These fields are modeled on RFC8177. This allows the kernel to handle key expiration internally instead of relying on userspace changing the NORECV/NOSEND flags on a timer.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 9 +++++++++ include/uapi/linux/tcp.h | 21 ++++++++++++++++++++- net/ipv4/tcp_authopt.c | 39 ++++++++++++++++++++++++++++++++++++--- 3 files changed, 65 insertions(+), 4 deletions(-)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 6260c3ef6864..6ef893e75ee4 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -53,10 +53,19 @@ struct tcp_authopt_key_info { int prefixlen; /** @addr: Same as &tcp_authopt_key.addr */ struct sockaddr_storage addr; /** @alg: Algorithm implementation matching alg_id */ struct tcp_authopt_alg_imp *alg; + /** @alg: Algorithm implementation matching alg_id */ + /** @send_lifetime_begin: Beginning of send lifetime */ + u64 send_lifetime_begin; + /** @send_lifetime_end: End of send lifetime */ + u64 send_lifetime_end; + /** @recv_lifetime_begin: Beginning of recv lifetime */ + u64 recv_lifetime_begin; + /** @recv_lifetime_end: End of recv lifetime */ + u64 recv_lifetime_end; };
/** * struct tcp_authopt_info - Per-socket information regarding tcp_authopt * diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 274ddfefd6de..52e6293048f5 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -373,20 +373,28 @@ struct tcp_authopt { * @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr` * @TCP_AUTHOPT_KEY_IFINDEX: Key only valid for `tcp_authopt.ifindex` * @TCP_AUTHOPT_KEY_NOSEND: Key invalid for send (expired) * @TCP_AUTHOPT_KEY_NORECV: Key invalid for recv (expired) * @TCP_AUTHOPT_KEY_PREFIXLEN: Valid value in `tcp_authopt.prefixlen`, otherwise - * match full address length + * always match full address length + * @TCP_AUTHOPT_KEY_SEND_LIFETIME_BEGIN: Valid value in `tcp_authopt.send_lifetime_begin` + * @TCP_AUTHOPT_KEY_SEND_LIFETIME_END: Valid value in `tcp_authopt.send_lifetime_end` + * @TCP_AUTHOPT_KEY_RECV_LIFETIME_BEGIN: Valid value in `tcp_authopt.recv_lifetime_begin` + * @TCP_AUTHOPT_KEY_RECV_LIFETIME_END: Valid value in `tcp_authopt.recv_lifetime_end` */ enum tcp_authopt_key_flag { TCP_AUTHOPT_KEY_DEL = (1 << 0), TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1), TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2), TCP_AUTHOPT_KEY_IFINDEX = (1 << 3), TCP_AUTHOPT_KEY_NOSEND = (1 << 4), TCP_AUTHOPT_KEY_NORECV = (1 << 5), TCP_AUTHOPT_KEY_PREFIXLEN = (1 << 6), + TCP_AUTHOPT_KEY_SEND_LIFETIME_BEGIN = (1 << 7), + TCP_AUTHOPT_KEY_SEND_LIFETIME_END = (1 << 8), + TCP_AUTHOPT_KEY_RECV_LIFETIME_BEGIN = (1 << 9), + TCP_AUTHOPT_KEY_RECV_LIFETIME_END = (1 << 10), };
/** * enum tcp_authopt_alg - Algorithms for TCP Authentication Option */ @@ -408,10 +416,13 @@ enum tcp_authopt_alg { * - recv_id * - addr (iff TCP_AUTHOPT_KEY_ADDR_BIND) * * RFC5925 requires that key ids must not overlap for the same TCP connection. * This is not enforced by linux. + * + * Key validity times are optional. When specified they are interpreted as "wall + * time" and compared to CLOCK_REALTIME. */ struct tcp_authopt_key { /** @flags: Combination of &enum tcp_authopt_key_flag */ __u32 flags; /** @send_id: keyid value for send */ @@ -444,10 +455,18 @@ struct tcp_authopt_key { * * Without the TCP_AUTHOPT_KEY_PREFIXLEN flag this is ignored and a full * address match is performed. */ int prefixlen; + /** @send_lifetime_begin: Beginning of send lifetime */ + __u64 send_lifetime_begin; + /** @send_lifetime_end: End of send lifetime */ + __u64 send_lifetime_end; + /** @recv_lifetime_begin: Beginning of recv lifetime */ + __u64 recv_lifetime_begin; + /** @recv_lifetime_end: End of recv lifetime */ + __u64 recv_lifetime_end; };
/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index daeecb64c89e..2bb7b2356e50 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -242,10 +242,33 @@ void tcp_authopt_clear(struct sock *sk) if (info) { tcp_authopt_free(sk, info); tcp_sk(sk)->authopt_info = NULL; } } + +static bool key_valid_for_send(struct tcp_authopt_key_info *key, ktime_t now) +{ + if (key->flags & TCP_AUTHOPT_KEY_NOSEND) + return false; + if (key->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_BEGIN && now < key->send_lifetime_begin) + return false; + if (key->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_END && now > key->send_lifetime_end) + return false; + return true; +} + +static bool key_valid_for_recv(struct tcp_authopt_key_info *key, ktime_t now) +{ + if (key->flags & TCP_AUTHOPT_KEY_NORECV) + return false; + if (key->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_BEGIN && now < key->recv_lifetime_begin) + return false; + if (key->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_END && now > key->recv_lifetime_end) + return false; + return true; +} + /* checks that ipv4 or ipv6 addr matches. */ static bool ipvx_addr_match(struct sockaddr_storage *a1, struct sockaddr_storage *a2) { if (a1->ss_family != a2->ss_family) @@ -384,10 +407,11 @@ static bool better_key_match(struct tcp_authopt_key_info *old, struct tcp_authop static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_authopt *net, const struct sock *addr_sk) { struct tcp_authopt_key_info *result = NULL; struct tcp_authopt_key_info *key; + time64_t now = ktime_get_real_seconds(); int l3index = -1;
hlist_for_each_entry_rcu(key, &net->head, node, 0) { if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) if (!tcp_authopt_key_match_sk_addr(key, addr_sk)) @@ -397,11 +421,11 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_aut l3index = l3mdev_master_ifindex_by_index(sock_net(addr_sk), addr_sk->sk_bound_dev_if); if (l3index != key->l3index) continue; } - if (key->flags & TCP_AUTHOPT_KEY_NOSEND) + if (!key_valid_for_send(key, now)) continue; if (better_key_match(result, key)) result = key; else if (result) net_warn_ratelimited("ambiguous tcp authentication keys configured for send\n"); @@ -555,11 +579,15 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \ TCP_AUTHOPT_KEY_ADDR_BIND | \ TCP_AUTHOPT_KEY_IFINDEX | \ TCP_AUTHOPT_KEY_PREFIXLEN | \ TCP_AUTHOPT_KEY_NOSEND | \ - TCP_AUTHOPT_KEY_NORECV) + TCP_AUTHOPT_KEY_NORECV | \ + TCP_AUTHOPT_KEY_SEND_LIFETIME_BEGIN | \ + TCP_AUTHOPT_KEY_SEND_LIFETIME_END | \ + TCP_AUTHOPT_KEY_RECV_LIFETIME_BEGIN | \ + TCP_AUTHOPT_KEY_RECV_LIFETIME_END)
static bool ipv6_addr_is_prefix(struct in6_addr *addr, int plen) { struct in6_addr copy;
@@ -690,10 +718,14 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) key_info->keylen = opt.keylen; memcpy(key_info->key, opt.key, opt.keylen); memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); key_info->l3index = l3index; key_info->prefixlen = prefixlen; + key_info->send_lifetime_begin = opt.send_lifetime_begin; + key_info->send_lifetime_end = opt.send_lifetime_end; + key_info->recv_lifetime_begin = opt.recv_lifetime_begin; + key_info->recv_lifetime_end = opt.recv_lifetime_end; hlist_add_head_rcu(&key_info->node, &net->head); mutex_unlock(&net->mutex);
return 0; } @@ -1480,10 +1512,11 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, bool *anykey) { struct tcp_authopt_key_info *result = NULL; struct tcp_authopt_key_info *key; int l3index = -1; + time64_t now = ktime_get_real_seconds();
*anykey = false; /* multiple matches will cause occasional failures */ hlist_for_each_entry_rcu(key, &net->head, node, 0) { if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND && @@ -1504,11 +1537,11 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, if (l3index != key->l3index) continue; } *anykey = true; // If only keys with norecv flag are present still consider that - if (key->flags & TCP_AUTHOPT_KEY_NORECV) + if (!key_valid_for_recv(key, now)) continue; if (recv_id >= 0 && key->recv_id != recv_id) continue; if (better_key_match(result, key)) result = key;
The RFC requires that TCP can report the keyid and rnextkeyid values being sent or received, implement this via getsockopt values.
The RFC also requires that user can select the sending key and that the sending key is automatically switched based on rnextkeyid. These requirements can conflict so we implement both and add a flag which specifies if user or peer request takes priority.
Also add an option to control rnextkeyid explicitly from userspace.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/tcp_authopt.rst | 32 +++++ include/net/tcp_authopt.h | 32 +++++ include/uapi/linux/tcp.h | 31 +++++ net/ipv4/tcp_authopt.c | 167 +++++++++++++++++++++-- 4 files changed, 254 insertions(+), 8 deletions(-)
diff --git a/Documentation/networking/tcp_authopt.rst b/Documentation/networking/tcp_authopt.rst index d0191d0c6c02..5631750cc3f7 100644 --- a/Documentation/networking/tcp_authopt.rst +++ b/Documentation/networking/tcp_authopt.rst @@ -44,10 +44,42 @@ new flags.
RFC5925 requires that key ids do not overlap when tcp identifiers (addr/port) overlap. This is not enforced by linux, configuring ambiguous keys will result in packet drops and lost connections.
+Key selection +------------- + +On getsockopt(TCP_AUTHOPT) information is provided about keyid/rnextkeyid in +the last send packet and about the keyid/rnextkeyd in the last valid received +packet. + +By default the sending keyid is selected to match the rnextkeyid value sent by +the remote side, visible as recv_rnextkeyid in getsockopt. If that keyid is not +available then the valid key with the longest send validity time is used, and +otherwise ties are broken by preferring lowest numeric send_id. + +If the ``TCP_AUTHOPT_LOCK_KEYID`` flag is set then the sending key is selected +by the `tcp_authopt.send_local_id` field and recv_rnextkeyid is ignored. If no +key with local_id == send_local_id is valid then the same default is used +as for missing recv_rnextkeyid. + +The rnextkeyid value sent on the wire is the recv_id of the valid key with the +longest recv validity time, and otherwise ties are broken by preferring lowest +numeric recv_id. + +If the TCP_AUTHOPT_LOCK_RNEXTKEY flag is set in `tcp_authopt.flags` the value of +`tcp_authopt.send_rnextkeyid` is sent instead. + +The default key selection behavior is designed to implement key rollover in a +way that is compatible with existing vendors without needing userspace key +management. It also tries to behave predictably in all scenarios therefore it +breaks ties by numeric IDs. + +A userspace daemon can use the "lock" flags to implement different key +management and key rotation policies. + ABI Reference =============
.. kernel-doc:: include/uapi/linux/tcp.h :identifiers: tcp_authopt tcp_authopt_flag tcp_authopt_key tcp_authopt_key_flag tcp_authopt_alg diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 6ef893e75ee4..759b6d71fe86 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -83,10 +83,42 @@ struct tcp_authopt_info { u32 dst_isn; /** @rcv_sne: Recv-side Sequence Number Extension tracking tcp_sock.rcv_nxt */ u32 rcv_sne; /** @snd_sne: Send-side Sequence Number Extension tracking tcp_sock.snd_nxt */ u32 snd_sne; + + /** + * @send_keyid: keyid currently being sent + * + * This is controlled by userspace by userspace if + * TCP_AUTHOPT_FLAG_LOCK_KEYID, otherwise we try to match recv_rnextkeyid. + * + * This is the "currently effective" value from the last packet. + */ + u8 send_keyid; + /** + * @user_pref_send_keyid: Preferred keyid requested by userspace + */ + u8 user_pref_send_keyid; + /** + * @send_rnextkeyid: rnextkeyid currently being sent + * + * This is controlled by userspace if TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID is set + */ + u8 send_rnextkeyid; + /** + * @recv_keyid: last keyid received from remote + * + * This is reported to userspace but has no other special behavior attached. + */ + u8 recv_keyid; + /** + * @recv_rnextkeyid: last rnextkeyid received from remote + * + * Linux tries to honor this unless TCP_AUTHOPT_FLAG_LOCK_KEYID is set + */ + u8 recv_rnextkeyid; };
/* TCP authopt as found in header */ struct tcphdr_authopt { u8 num; diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 52e6293048f5..4c3b1aef9976 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -346,10 +346,24 @@ struct tcp_diag_md5sig {
/** * enum tcp_authopt_flag - flags for `tcp_authopt.flags` */ enum tcp_authopt_flag { + /** + * @TCP_AUTHOPT_FLAG_LOCK_KEYID: keyid controlled by sockopt + * + * If this is set `tcp_authopt.send_keyid` is used to determined sending + * key. Otherwise a key with send_id == recv_rnextkeyid is preferred. + */ + TCP_AUTHOPT_FLAG_LOCK_KEYID = (1 << 0), + /** + * @TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID: Override rnextkeyid from userspace + * + * If this is set then `tcp_authopt.send_rnextkeyid` is sent on outbound + * packets. Other the recv_id of the current sending key is sent. + */ + TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID = (1 << 1), /** * @TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED: * Configure behavior of segments with TCP-AO coming from hosts for which no * key is configured. The default recommended by RFC is to silently accept * such connections. @@ -361,10 +375,27 @@ enum tcp_authopt_flag { * struct tcp_authopt - Per-socket options related to TCP Authentication Option */ struct tcp_authopt { /** @flags: Combination of &enum tcp_authopt_flag */ __u32 flags; + /** + * @send_keyid: `tcp_authopt_key.send_id` of preferred send key + * + * This is only used if `TCP_AUTHOPT_FLAG_LOCK_KEYID` is set. + */ + __u8 send_keyid; + /** + * @send_rnextkeyid: The rnextkeyid to send in packets + * + * This is controlled by the user iff TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID is + * set. Otherwise rnextkeyid is the recv_id of the current key. + */ + __u8 send_rnextkeyid; + /** @recv_keyid: A recently-received keyid value. Only for getsockopt. */ + __u8 recv_keyid; + /** @recv_rnextkeyid: A recently-received rnextkeyid value. Only for getsockopt. */ + __u8 recv_rnextkeyid; };
/** * enum tcp_authopt_key_flag - flags for `tcp_authopt.flags` * diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 2bb7b2356e50..6db06e1edcc7 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -392,24 +392,85 @@ static bool better_key_match(struct tcp_authopt_key_info *old, struct tcp_authop return true;
return false; }
+static int better_key_match_for_send(struct tcp_authopt_key_info *old, + struct tcp_authopt_key_info *new) +{ + if (better_key_match(old, new)) + return 1; + + /* For keys with expiration dates prefer the one with longest lifetime */ + if ((new->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_END) != 0 && + (old->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_END) == 0) + return -1; + if ((new->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_END) == 0 && + (old->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_END) != 0) + return 1; + if (old->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_END && + new->flags & TCP_AUTHOPT_KEY_SEND_LIFETIME_END) { + if (new->send_lifetime_end > old->send_lifetime_end) + return 1; + if (new->send_lifetime_end < old->send_lifetime_end) + return -1; + } + + if (new->send_id != old->send_id) + return !!(old->send_id - new->send_id); + + return 0; +} + +static int better_rnextkey(struct tcp_authopt_key_info *old, struct tcp_authopt_key_info *new) +{ + if (better_key_match(old, new)) + return 1; + + /* For keys with expiration dates prefer the one with longest lifetime */ + if ((new->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_END) != 0 && + (old->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_END) == 0) + return -1; + if ((new->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_END) == 0 && + (old->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_END) != 0) + return 1; + if (old->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_END && + new->flags & TCP_AUTHOPT_KEY_RECV_LIFETIME_END) { + if (new->recv_lifetime_end > old->recv_lifetime_end) + return 1; + if (new->recv_lifetime_end < old->recv_lifetime_end) + return -1; + } + + /* Break ties by numeric ID */ + if (new->recv_id != old->recv_id) + return !!(old->recv_id - new->recv_id); + + return 0; +} + /** * tcp_authopt_lookup_send - lookup key for sending * * @net: Per-namespace information containing keys * @addr_sk: Socket used for destination address lookup + * @pref_send_id: Preferred send_id. If >= 0 then prefer keys that match + * @rnextkeyid: Output pointer to preferred rnextkeyid + * @anykey: Set to true if any keys are present for the peer * * If anykey is false then authentication is not required for peer. * * If anykey is true but no key was found then all our keys must be expired and sending should fail. */ static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_authopt *net, - const struct sock *addr_sk) + const struct sock *addr_sk, + int pref_send_id, + u8 *rnextkeyid, + bool *anykey) { struct tcp_authopt_key_info *result = NULL; + struct tcp_authopt_key_info *rnext_result = NULL; struct tcp_authopt_key_info *key; time64_t now = ktime_get_real_seconds(); int l3index = -1;
hlist_for_each_entry_rcu(key, &net->head, node, 0) { @@ -421,16 +482,35 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_aut l3index = l3mdev_master_ifindex_by_index(sock_net(addr_sk), addr_sk->sk_bound_dev_if); if (l3index != key->l3index) continue; } - if (!key_valid_for_send(key, now)) - continue; - if (better_key_match(result, key)) - result = key; - else if (result) - net_warn_ratelimited("ambiguous tcp authentication keys configured for send\n"); + if (anykey) + *anykey = true; + + if (rnextkeyid && + key_valid_for_recv(key, now) && + better_rnextkey(rnext_result, key) > 0) + rnext_result = key; + + if (key_valid_for_send(key, now)) { + if (pref_send_id >= 0 && result && + key->send_id != pref_send_id && + result->send_id == pref_send_id) + continue; + if (better_key_match_for_send(result, key) > 0) + result = key; + else if (result) + net_warn_ratelimited("ambiguous tcp authentication keys configured for send\n"); + } + } + + if (rnextkeyid) { + if (rnext_result) + *rnextkeyid = rnext_result->recv_id; + else + *rnextkeyid = 0; }
return result; }
@@ -442,19 +522,59 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct netns_tcp_aut * @addr_sk: socket used for address lookup. Same as sk except for synack case * @rnextkeyid: value of rnextkeyid caller should write in packet * * Result is protected by RCU and can't be stored, it may only be passed to * tcp_authopt_hash and only under a single rcu_read_lock. + * + * Returns NULL if no key is required or ERR_PTR(-ENOKEY) if key is required but + * none is currently valid. */ struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, struct tcp_authopt_info *info, const struct sock *addr_sk, u8 *rnextkeyid) { + struct tcp_authopt_key_info *key; struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); + bool anykey = false; + int pref_send_id; + + /* Listen sockets don't refer to any specific connection so we don't try + * to keep using the same key and ignore any received keyids. + */ + if (sk->sk_state == TCP_LISTEN) { + if (info->flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) + pref_send_id = info->user_pref_send_keyid; + else + pref_send_id = -1; + key = tcp_authopt_lookup_send(net, addr_sk, pref_send_id, rnextkeyid, &anykey); + + return key; + } + + /* Try to keep the same sending key unless user or peer requires a different key + * User request (via TCP_AUTHOPT_FLAG_LOCK_KEYID) always overrides peer request. + */ + if (info->flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) + pref_send_id = info->user_pref_send_keyid; + else + pref_send_id = info->recv_rnextkeyid;
- return tcp_authopt_lookup_send(net, addr_sk); + key = tcp_authopt_lookup_send(net, addr_sk, pref_send_id, rnextkeyid, &anykey); + + if (!key) + return NULL; + + info->send_keyid = key->send_id; + if (rnextkeyid) { + if (info->flags & TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID) + *rnextkeyid = info->send_rnextkeyid; + else + info->send_rnextkeyid = *rnextkeyid; + } + + return key; } EXPORT_SYMBOL(__tcp_authopt_select_key);
static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) { @@ -476,10 +596,12 @@ static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk
return info; }
#define TCP_AUTHOPT_KNOWN_FLAGS ( \ + TCP_AUTHOPT_FLAG_LOCK_KEYID | \ + TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID | \ TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED)
/* Like copy_from_sockptr except tolerate different optlen for compatibility reasons * * If the src is shorter then it's from an old userspace and the rest of dst is @@ -547,10 +669,14 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) info = __tcp_authopt_info_get_or_create(sk); if (IS_ERR(info)) return PTR_ERR(info);
info->flags = opt.flags & TCP_AUTHOPT_KNOWN_FLAGS; + if (opt.flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) + info->user_pref_send_keyid = opt.send_keyid; + if (opt.flags & TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID) + info->send_rnextkeyid = opt.send_rnextkeyid;
return 0; }
int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) @@ -568,10 +694,18 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); if (!info) return -ENOENT;
opt->flags = info->flags & TCP_AUTHOPT_KNOWN_FLAGS; + /* These keyids might be undefined, for example before connect. + * Reporting zero is not strictly correct because there are no reserved + * values. + */ + opt->send_keyid = info->send_keyid; + opt->send_rnextkeyid = info->send_rnextkeyid; + opt->recv_keyid = info->recv_keyid; + opt->recv_rnextkeyid = info->recv_rnextkeyid;
return 0; }
#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \ @@ -1571,10 +1705,25 @@ static void print_tcpao_notice(const char *msg, struct sk_buff *skb) } else { WARN_ONCE(1, "%s unknown IP version\n", msg); } }
+static void save_inbound_key_info( + struct tcp_authopt_info *info, + struct tcphdr_authopt *opt) +{ + /* Doing this for all valid packets will results in keyids temporarily + * flipping back and forth if packets are reordered or retransmitted + * but keys should eventually stabilize. + * + * This is connection-specific so don't store for listen sockets. + * + */ + info->recv_keyid = opt->keyid; + info->recv_rnextkeyid = opt->rnextkeyid; +} + /** * __tcp_authopt_inbound_check - Check inbound TCP authentication option * * @sk: Receive socket. For the SYN_RECV state this must be the request_sock, not the listener * @skb: Input Packet @@ -1617,10 +1766,11 @@ int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); print_tcpao_notice("TCP Authentication Unexpected: Rejected", skb); return -SKB_DROP_REASON_TCP_AOUNEXPECTED; } print_tcpao_notice("TCP Authentication Unexpected: Accepted", skb); + save_inbound_key_info(info, opt); return 0; } if (opt && !key) { /* Keys are configured for peer but with different keyid than packet */ NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); @@ -1640,10 +1790,11 @@ int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); print_tcpao_notice("TCP Authentication Failed", skb); return -SKB_DROP_REASON_TCP_AOFAILURE; }
+ save_inbound_key_info(info, opt); return 1; } EXPORT_SYMBOL(__tcp_authopt_inbound_check);
static int tcp_authopt_init_net(struct net *full_net)
Keys that are added with v4mapped ipv6 addresses will now be used for ipv4 packets. This outward behavior is similar to how MD5 support currently works.
The implementation is different - v4mapped keys are still stored with ipv6 addresses.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_authopt.c | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 6db06e1edcc7..28c10a916fb3 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -324,27 +324,30 @@ static bool tcp_authopt_key_match_skb_addr(struct tcp_authopt_key_info *key, struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr;
return ipv6_prefix_equal(&ip6h->saddr, &key_addr->sin6_addr, key->prefixlen); + } else if (keyaf == AF_INET6 && iph->version == 4) { + struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr; + + /* handle ipv6-mapped-ipv4-addresses */ + if (ipv6_addr_v4mapped(&key_addr->sin6_addr)) { + __be32 mask = inet_make_mask(key->prefixlen); + __be32 ipv4 = key_addr->sin6_addr.s6_addr32[3]; + + return (ipv4 & mask) == ipv4; + } }
- /* This actually happens with ipv6-mapped-ipv4-addresses - * IPv6 listen sockets will be asked to validate ipv4 packets. - */ return false; }
static bool tcp_authopt_key_match_sk_addr(struct tcp_authopt_key_info *key, const struct sock *addr_sk) { u16 keyaf = key->addr.ss_family;
- /* This probably can't happen even with ipv4-mapped-ipv6 */ - if (keyaf != addr_sk->sk_family) - return false; - if (keyaf == AF_INET) { struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr; __be32 mask = inet_make_mask(key->prefixlen);
return (addr_sk->sk_daddr & mask) == key_addr->sin_addr.s_addr; @@ -353,10 +356,16 @@ static bool tcp_authopt_key_match_sk_addr(struct tcp_authopt_key_info *key, struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr;
return ipv6_prefix_equal(&addr_sk->sk_v6_daddr, &key_addr->sin6_addr, key->prefixlen); + } else if (keyaf == AF_INET6 && addr_sk->sk_family == AF_INET) { + struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr; + __be32 mask = inet_make_mask(key->prefixlen); + __be32 ipv4 = key_addr->sin6_addr.s6_addr32[3]; + + return (addr_sk->sk_daddr & mask) == ipv4; #endif }
return false; } @@ -1475,14 +1484,20 @@ static int __tcp_authopt_calc_mac(struct sock *sk, char *macbuf) { struct tcp_authopt_alg_pool *mac_pool; u8 traffic_key[TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN]; int err; - bool ipv6 = (sk->sk_family != AF_INET); + bool ipv6;
- if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) - return -EINVAL; +#if IS_ENABLED(CONFIG_IPV6) + if (input) + ipv6 = (skb->protocol == htons(ETH_P_IPV6)); + else + ipv6 = (sk->sk_family == AF_INET6) && !ipv6_addr_v4mapped(&sk->sk_v6_daddr); +#else + ipv6 = false; +#endif
err = tcp_authopt_get_traffic_key(sk, skb, key, info, input, ipv6, traffic_key); if (err) return err;
This provides a very brief summary of all keys for debugging purposes.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/tcp_authopt.rst | 10 +++ net/ipv4/tcp_authopt.c | 102 ++++++++++++++++++++++- 2 files changed, 111 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/tcp_authopt.rst b/Documentation/networking/tcp_authopt.rst index 5631750cc3f7..2bceefe6fe1d 100644 --- a/Documentation/networking/tcp_authopt.rst +++ b/Documentation/networking/tcp_authopt.rst @@ -76,10 +76,20 @@ management. It also tries to behave predictably in all scenarios therefore it breaks ties by numeric IDs.
A userspace daemon can use the "lock" flags to implement different key management and key rotation policies.
+Proc interface +-------------- + +The ``/proc/net/tcp_authopt`` file contains a tab-separated table of keys. The +first line contains column names. The number of columns might increase in the +future if more matching criteria are added. Here is an example of the table:: + + flags send_id recv_id alg addr l3index + 0x44 0 0 1 10.10.2.2/31 0 + ABI Reference =============
.. kernel-doc:: include/uapi/linux/tcp.h :identifiers: tcp_authopt tcp_authopt_flag tcp_authopt_key tcp_authopt_key_flag tcp_authopt_alg diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 28c10a916fb3..ba16b8c50565 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -5,10 +5,11 @@ #include <net/ipv6.h> #include <net/tcp.h> #include <linux/kref.h> #include <crypto/hash.h> #include <linux/inetdevice.h> +#include <linux/proc_fs.h>
/* This is mainly intended to protect against local privilege escalations through * a rarely used feature so it is deliberately not namespaced. */ int sysctl_tcp_authopt; @@ -1810,26 +1811,125 @@ int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, save_inbound_key_info(info, opt); return 1; } EXPORT_SYMBOL(__tcp_authopt_inbound_check);
+#ifdef CONFIG_PROC_FS +struct tcp_authopt_iter_state { + struct seq_net_private p; +}; + +static struct tcp_authopt_key_info *tcp_authopt_get_key_index(struct netns_tcp_authopt *net, + int index) +{ + struct tcp_authopt_key_info *key; + + hlist_for_each_entry(key, &net->head, node) { + if (--index < 0) + return key; + } + + return NULL; +} + +static void *tcp_authopt_seq_start(struct seq_file *seq, loff_t *pos) + __acquires(RCU) +{ + struct netns_tcp_authopt *net = &seq_file_net(seq)->tcp_authopt; + + rcu_read_lock(); + if (*pos == 0) + return SEQ_START_TOKEN; + else + return tcp_authopt_get_key_index(net, *pos - 1); +} + +static void tcp_authopt_seq_stop(struct seq_file *seq, void *v) + __releases(RCU) +{ + rcu_read_unlock(); +} + +static void *tcp_authopt_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct netns_tcp_authopt *net = &seq_file_net(seq)->tcp_authopt; + void *ret; + + ret = tcp_authopt_get_key_index(net, *pos); + ++*pos; + + return ret; +} + +static int tcp_authopt_seq_show(struct seq_file *seq, void *v) +{ + struct tcp_authopt_key_info *key = v; + + /* FIXME: Document somewhere */ + /* Key is deliberately inaccessible */ + if (v == SEQ_START_TOKEN) { + seq_puts(seq, "flags\tsend_id\trecv_id\talg\taddr\tl3index\n"); + return 0; + } + + seq_printf(seq, "0x%x\t%d\t%d\t%d", + key->flags, key->send_id, key->recv_id, (int)key->alg_id); + if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) { + if (key->addr.ss_family == AF_INET6) + seq_printf(seq, "\t%pI6", &((struct sockaddr_in6 *)&key->addr)->sin6_addr); + else + seq_printf(seq, "\t%pI4", &((struct sockaddr_in *)&key->addr)->sin_addr); + if (key->flags & TCP_AUTHOPT_KEY_PREFIXLEN) + seq_printf(seq, "/%d", key->prefixlen); + } else { + seq_puts(seq, "\t*"); + } + seq_printf(seq, "\t%d", key->l3index); + seq_puts(seq, "\n"); + + return 0; +} + +static const struct seq_operations tcp_authopt_seq_ops = { + .start = tcp_authopt_seq_start, + .next = tcp_authopt_seq_next, + .stop = tcp_authopt_seq_stop, + .show = tcp_authopt_seq_show, +}; +#endif /* CONFIG_PROC_FS */ + +static int __net_init tcp_authopt_proc_init_net(struct net *net) +{ + if (!proc_create_net("tcp_authopt", 0400, net->proc_net, + &tcp_authopt_seq_ops, + sizeof(struct tcp_authopt_iter_state))) + return -ENOMEM; + return 0; +} + +static void __net_exit tcp_authopt_proc_exit_net(struct net *net) +{ + remove_proc_entry("tcp_authopt", net->proc_net); +} + static int tcp_authopt_init_net(struct net *full_net) { struct netns_tcp_authopt *net = &full_net->tcp_authopt;
mutex_init(&net->mutex); INIT_HLIST_HEAD(&net->head);
- return 0; + return tcp_authopt_proc_init_net(full_net); }
static void tcp_authopt_exit_net(struct net *full_net) { struct netns_tcp_authopt *net = &full_net->tcp_authopt; struct tcp_authopt_key_info *key; struct hlist_node *n;
+ tcp_authopt_proc_exit_net(full_net); mutex_lock(&net->mutex);
hlist_for_each_entry_safe(key, n, &net->head, node) { hlist_del_rcu(&key->node); tcp_authopt_key_put(key);
If this is not treated specially then when all keys are removed or expired then TCP will start sending unsigned packets which is undesirable. Instead try to report an error on key selection and propagate it to userspace.
The error is assigned to sk_err and propagate it as soon as possible. In theory we could try to make the error "soft" and even let the connection continue if userspace adds a new key but the advantages are unclear.
Since userspace is responsible for managing keys it can also avoid sending unsigned packets by always closing the socket before removing the active last key.
The specific error reported is ENOKEY.
This requires changes inside TCP option write code to support aborting the actual packet send, until this point this did not happen in any scenario.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_authopt.c | 4 ++++ net/ipv4/tcp_output.c | 20 ++++++++++++++++++++ 2 files changed, 24 insertions(+)
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index ba16b8c50565..2a1ddae69b27 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -555,10 +555,12 @@ struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, if (info->flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) pref_send_id = info->user_pref_send_keyid; else pref_send_id = -1; key = tcp_authopt_lookup_send(net, addr_sk, pref_send_id, rnextkeyid, &anykey); + if (!key && anykey) + return ERR_PTR(-ENOKEY);
return key; }
/* Try to keep the same sending key unless user or peer requires a different key @@ -569,10 +571,12 @@ struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, else pref_send_id = info->recv_rnextkeyid;
key = tcp_authopt_lookup_send(net, addr_sk, pref_send_id, rnextkeyid, &anykey);
+ if (!key && anykey) + return ERR_PTR(-ENOKEY); if (!key) return NULL;
info->send_keyid = key->send_id; if (rnextkeyid) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d48d5dc36916..e8a6fec22fbf 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -411,10 +411,11 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp) #define OPTION_SACK_ADVERTISE BIT(0) #define OPTION_TS BIT(1) #define OPTION_MD5 BIT(2) #define OPTION_WSCALE BIT(3) #define OPTION_AUTHOPT BIT(4) +#define OPTION_AUTHOPT_FAIL BIT(5) #define OPTION_FAST_OPEN_COOKIE BIT(8) #define OPTION_SMC BIT(9) #define OPTION_MPTCP BIT(10)
static void smc_options_write(__be32 *ptr, u16 *options) @@ -783,10 +784,14 @@ static int tcp_authopt_init_options(const struct sock *sk, { #ifdef CONFIG_TCP_AUTHOPT struct tcp_authopt_key_info *key;
key = tcp_authopt_select_key(sk, addr_sk, &opts->authopt_info, &opts->authopt_rnextkeyid); + if (IS_ERR(key)) { + opts->options |= OPTION_AUTHOPT_FAIL; + return TCPOLEN_AUTHOPT_OUTPUT; + } if (key) { opts->options |= OPTION_AUTHOPT; opts->authopt_key = key; return TCPOLEN_AUTHOPT_OUTPUT; } @@ -1342,10 +1347,18 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, * release the following packet. */ if (tcp_skb_pcount(skb) > 1) tcb->tcp_flags |= TCPHDR_PSH; } +#ifdef CONFIG_TCP_AUTHOPT + if (opts.options & OPTION_AUTHOPT_FAIL) { + rcu_read_unlock(); + sk->sk_err = ENOKEY; + sk_error_report(sk); + return -ENOKEY; + } +#endif tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
/* if no packet is in qdisc/device queue, then allow XPS to select * another queue. We can be called from tcp_tsq_handler() * which holds one reference to sk. @@ -3652,10 +3665,17 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, /* bpf program will be interested in the tcp_flags */ TCP_SKB_CB(skb)->tcp_flags = TCPHDR_SYN | TCPHDR_ACK; tcp_header_size = tcp_synack_options(sk, req, mss, skb, &opts, md5, foc, synack_type, syn_skb) + sizeof(*th); +#ifdef CONFIG_TCP_AUTHOPT + if (opts.options & OPTION_AUTHOPT_FAIL) { + rcu_read_unlock(); + kfree_skb(skb); + return NULL; + } +#endif
skb_push(skb, tcp_header_size); skb_reset_transport_header(skb);
th = (struct tcphdr *)skb->data;
According to the RFC we should use the key that the peer suggests via rnextkeyid.
This is currently done by storing recv_rnextkeyid in tcp_authopt_info but this does not work for the SYNACK case because the tcp_request_sock does not hold an info pointer for reasons of memory usage.
Handle this by storing recv_rnextkeyid inside tcp_request_sock. This doesn't increase the memory usage because there are unused bytes at the end.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/linux/tcp.h | 6 ++++++ net/ipv4/tcp_authopt.c | 14 +++++++++++--- net/ipv4/tcp_input.c | 12 ++++++++++++ 3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 551942883f06..6a4ff0ed55c6 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -125,10 +125,13 @@ struct tcp_options_received { u8 saw_unknown:1, /* Received unknown option */ unused:7; u8 num_sacks; /* Number of SACK blocks */ u16 user_mss; /* mss requested by user in ioctl */ u16 mss_clamp; /* Maximal mss, negotiated at connection setup */ +#if IS_ENABLED(CONFIG_TCP_AUTHOPT) + u8 rnextkeyid; +#endif };
static inline void tcp_clear_options(struct tcp_options_received *rx_opt) { rx_opt->tstamp_ok = rx_opt->sack_ok = 0; @@ -163,10 +166,13 @@ struct tcp_request_sock { u32 rcv_nxt; /* the ack # by SYNACK. For * FastOpen it's the seq# * after data-in-SYN. */ u8 syn_tos; +#if IS_ENABLED(CONFIG_TCP_AUTHOPT) + u8 recv_rnextkeyid; +#endif };
static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) { return (struct tcp_request_sock *)req; diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 2a1ddae69b27..a141439d9ebe 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -547,21 +547,29 @@ struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, struct netns_tcp_authopt *net = sock_net_tcp_authopt(sk); bool anykey = false; int pref_send_id;
/* Listen sockets don't refer to any specific connection so we don't try - * to keep using the same key and ignore any received keyids. + * to keep using the same key. + * The rnextkeyid is stored in tcp_request_sock */ if (sk->sk_state == TCP_LISTEN) { + struct tcp_request_sock *rsk; + + if (WARN_ONCE(addr_sk->sk_state != TCP_NEW_SYN_RECV, "bad socket state")) + return NULL; + rsk = tcp_rsk((struct request_sock *)addr_sk); + /* Forcing a specific send_keyid on a listen socket forces it for + * all clients so is unlikely to be useful. + */ if (info->flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) pref_send_id = info->user_pref_send_keyid; else - pref_send_id = -1; + pref_send_id = rsk->recv_rnextkeyid; key = tcp_authopt_lookup_send(net, addr_sk, pref_send_id, rnextkeyid, &anykey); if (!key && anykey) return ERR_PTR(-ENOKEY); - return key; }
/* Try to keep the same sending key unless user or peer requires a different key * User request (via TCP_AUTHOPT_FLAG_LOCK_KEYID) always overrides peer request. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4da39c32b934..6f477b110896 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4108,10 +4108,18 @@ void tcp_parse_options(const struct net *net, /* * The MD5 Hash has already been * checked (see tcp_v{4,6}_do_rcv()). */ break; +#endif +#ifdef CONFIG_TCP_AUTHOPT + case TCPOPT_AUTHOPT: + /* Hash has already been checked. + * We parse rnextkeyid here so we can match it on synack + */ + opt_rx->rnextkeyid = ptr[1]; + break; #endif case TCPOPT_FASTOPEN: tcp_parse_fastopen_option( opsize - TCPOLEN_FASTOPEN_BASE, ptr, th->syn, foc, false); @@ -6964,10 +6972,14 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops, tcp_clear_options(&tmp_opt);
if (IS_ENABLED(CONFIG_SMC) && want_cookie) tmp_opt.smc_ok = 0;
+#if IS_ENABLED(CONFIG_TCP_AUTHOPT) + tcp_rsk(req)->recv_rnextkeyid = tmp_opt.rnextkeyid; +#endif + tmp_opt.tstamp_ok = tmp_opt.saw_tstamp; tcp_openreq_init(req, &tmp_opt, skb, sk); inet_rsk(req)->no_srccheck = inet_sk(sk)->transparent;
/* Note: tcp_v6_init_req() might override ir_iif for link locals */
This can be used to determine if tcp authentication option is actually active on the current connection.
TCP Authentication can be enabled but inactive on a socket if keys are only configured for destinations other than the peer.
A listen socket with authentication enabled will return other sockets with authentication enabled on accept(). If no key is configured for the peer of an accepted socket then authentication will be inactive.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/uapi/linux/tcp.h | 13 +++++++++++++ net/ipv4/tcp_authopt.c | 22 +++++++++++++++++++--- 2 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 4c3b1aef9976..ff8b53f4209d 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -367,10 +367,23 @@ enum tcp_authopt_flag { * Configure behavior of segments with TCP-AO coming from hosts for which no * key is configured. The default recommended by RFC is to silently accept * such connections. */ TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED = (1 << 2), + /** + * @TCP_AUTHOPT_FLAG_ACTIVE: If authentication is active for a specific socket. + * + * TCP Authentication can be enabled but inactive on a socket if keys are + * only configured for destinations other than the peer. + * + * A listen socket with authentication enabled will return other sockets + * with authentication enabled on accept(). If no key is configured for the + * peer of an accepted socket then authentication will be inactive. + * + * This flag is readonly and the value is determined at connection establishment time. + */ + TCP_AUTHOPT_FLAG_ACTIVE = (1 << 3), };
/** * struct tcp_authopt - Per-socket options related to TCP Authentication Option */ diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index a141439d9ebe..b4158b430b79 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -617,15 +617,23 @@ static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk rcu_assign_pointer(tp->authopt_info, info);
return info; }
-#define TCP_AUTHOPT_KNOWN_FLAGS ( \ +/* Flags fully controlled by user: */ +#define TCP_AUTHOPT_USER_FLAGS ( \ TCP_AUTHOPT_FLAG_LOCK_KEYID | \ TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID | \ TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED)
+/* All known flags */ +#define TCP_AUTHOPT_KNOWN_FLAGS ( \ + TCP_AUTHOPT_FLAG_LOCK_KEYID | \ + TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID | \ + TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED | \ + TCP_AUTHOPT_FLAG_ACTIVE) + /* Like copy_from_sockptr except tolerate different optlen for compatibility reasons * * If the src is shorter then it's from an old userspace and the rest of dst is * filled with zeros. * @@ -690,11 +698,11 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen)
info = __tcp_authopt_info_get_or_create(sk); if (IS_ERR(info)) return PTR_ERR(info);
- info->flags = opt.flags & TCP_AUTHOPT_KNOWN_FLAGS; + info->flags = opt.flags & TCP_AUTHOPT_USER_FLAGS; if (opt.flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) info->user_pref_send_keyid = opt.send_keyid; if (opt.flags & TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID) info->send_rnextkeyid = opt.send_rnextkeyid;
@@ -703,10 +711,11 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen)
int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info; + bool anykey = false; int err;
memset(opt, 0, sizeof(*opt)); sock_owned_by_me(sk); err = check_sysctl_tcp_authopt(); @@ -715,11 +724,18 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt)
info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); if (!info) return -ENOENT;
- opt->flags = info->flags & TCP_AUTHOPT_KNOWN_FLAGS; + opt->flags = info->flags & TCP_AUTHOPT_USER_FLAGS; + + rcu_read_lock(); + tcp_authopt_lookup_send(sock_net_tcp_authopt(sk), sk, -1, NULL, &anykey); + if (anykey) + opt->flags |= TCP_AUTHOPT_FLAG_ACTIVE; + rcu_read_unlock(); + /* These keyids might be undefined, for example before connect. * Reporting zero is not strictly correct because there are no reserved * values. */ opt->send_keyid = info->send_keyid;
In order to support TCP_REPAIR for connections using RFC5925 Authentication Option add a sockopt to get/set ISN and SNE values.
The TCP_REPAIR_AUTHOxpTP sockopt is only allowed when the socket is already in "repair" mode, this behavior is shared with other sockopts relevant to TCP_REPAIR.
The setsockopt further requires the TCP_ESTABLISHED state, this is because it relies on snd_nxt which is only initialized after connect().
For SNE restoration we provide a full 64-bit sequence number on "get" and handle any recent 64-bit sequence number on "set", where recent means "within ~2GB to the current window".
Linux tracks snd_sne and rcv_sne as the extension of snd_nxt and rcv_nxt but this is an implementation detail and snd_nxt doesn't even seem to be one of the values that can be read by userspace. Handling SNE with 64-bit values means userspace doesn't need to worry about matching snd_nxt.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 2 ++ include/uapi/linux/tcp.h | 19 +++++++++++ net/ipv4/tcp.c | 23 ++++++++++++++ net/ipv4/tcp_authopt.c | 66 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 110 insertions(+)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 759b6d71fe86..5a8cea32a5f3 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -236,10 +236,12 @@ static inline void tcp_authopt_update_snd_sne(struct tcp_sock *tp, u32 seq) lockdep_sock_is_held((struct sock *)tp)); if (info) __tcp_authopt_update_snd_sne(tp, info, seq); } } +int tcp_get_authopt_repair_val(struct sock *sk, struct tcp_authopt_repair *opt); +int tcp_set_authopt_repair(struct sock *sk, sockptr_t optval, unsigned int optlen); #else static inline void tcp_authopt_clear(struct sock *sk) { } static inline int tcp_authopt_openreq(struct sock *newsk, diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index ff8b53f4209d..d6911d9c2b4e 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -128,10 +128,11 @@ enum { #define TCP_CM_INQ TCP_INQ
#define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */ #define TCP_AUTHOPT 38 /* TCP Authentication Option (RFC5925) */ #define TCP_AUTHOPT_KEY 39 /* TCP Authentication Option Key (RFC5925) */ +#define TCP_REPAIR_AUTHOPT 40
#define TCP_REPAIR_ON 1 #define TCP_REPAIR_OFF 0 #define TCP_REPAIR_OFF_NO_WP -1 /* Turn off without window probes */ @@ -509,10 +510,28 @@ struct tcp_authopt_key { __u64 recv_lifetime_begin; /** @recv_lifetime_end: End of recv lifetime */ __u64 recv_lifetime_end; };
+/** + * struct tcp_authopt_repair - TCP_REPAIR information related to Authentication Option + * @src_isn: Local Initial Sequence Number + * @dst_isn: Remote Initial Sequence Number + * @snd_sne: Sequence Number Extension for Send (upper 32 bits of snd_seq) + * @rcv_sne: Sequence Number Extension for Recv (upper 32 bits of rcv_seq) + * @snd_seq: Recent Send Sequence Number (lower 32 bits of snd_sne) + * @rcv_seq: Recent Recv Sequence Number (lower 32 bits of rcv_sne) + */ +struct tcp_authopt_repair { + __u32 src_isn; + __u32 dst_isn; + __u32 snd_sne; + __u32 rcv_sne; + __u32 snd_seq; + __u32 rcv_seq; +}; + /* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index dd31e78bd22d..1e0dcfae23f5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3712,10 +3712,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, err = tcp_set_authopt(sk, optval, optlen); break; case TCP_AUTHOPT_KEY: err = tcp_set_authopt_key(sk, optval, optlen); break; + case TCP_REPAIR_AUTHOPT: + err = tcp_set_authopt_repair(sk, optval, optlen); + break; #endif case TCP_USER_TIMEOUT: /* Cap the max time in ms TCP will retry or probe the window * before giving up and aborting (ETIMEDOUT) a connection. */ @@ -4384,10 +4387,30 @@ static int do_tcp_getsockopt(struct sock *sk, int level, return -EFAULT; if (copy_to_user(optval, &info, len)) return -EFAULT; return 0; } + case TCP_REPAIR_AUTHOPT: { + struct tcp_authopt_repair val; + int err; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = tcp_get_authopt_repair_val(sk, &val); + release_sock(sk); + + if (err) + return err; + len = min_t(unsigned int, len, sizeof(val)); + if (put_user(len, optlen)) + return -EFAULT; + if (copy_to_user(optval, &val, len)) + return -EFAULT; + return 0; + } #endif
default: return -ENOPROTOOPT; } diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index b4158b430b79..1c571a977224 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -1839,10 +1839,76 @@ int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, save_inbound_key_info(info, opt); return 1; } EXPORT_SYMBOL(__tcp_authopt_inbound_check);
+int tcp_get_authopt_repair_val(struct sock *sk, struct tcp_authopt_repair *opt) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct tcp_authopt_info *info; + int err; + + memset(opt, 0, sizeof(*opt)); + sock_owned_by_me(sk); + err = check_sysctl_tcp_authopt(); + if (err) + return err; + if (!tp->repair) + return -EPERM; + + info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); + if (!info) + return -ENOENT; + + opt->dst_isn = info->dst_isn; + opt->src_isn = info->src_isn; + opt->rcv_sne = info->rcv_sne; + opt->snd_sne = info->snd_sne; + opt->rcv_seq = tp->rcv_nxt; + opt->snd_seq = tp->snd_nxt; + + return 0; +} + +int tcp_set_authopt_repair(struct sock *sk, sockptr_t optval, unsigned int optlen) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct tcp_authopt_info *info; + struct tcp_authopt_repair val; + int err; + + sock_owned_by_me(sk); + err = check_sysctl_tcp_authopt(); + if (err) + return err; + + if (optlen != sizeof(val)) + return -EFAULT; + if (copy_from_sockptr(&val, optval, sizeof(val))) + return -EFAULT; + + /* tcp_authopt repair relies on fields that are only initialized after + * tcp_connect. Doing this setsockopt before connect() can't be correct + * so return an error. + */ + if (sk->sk_state != TCP_ESTABLISHED) + return -EPERM; + + info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); + if (!info) + return -ENOENT; + if (!tp->repair) + return -EPERM; + + info->dst_isn = val.dst_isn; + info->src_isn = val.src_isn; + info->rcv_sne = compute_sne(val.rcv_sne, val.rcv_seq, tp->rcv_nxt); + info->snd_sne = compute_sne(val.snd_sne, val.snd_seq, tp->snd_nxt); + + return 0; +} + #ifdef CONFIG_PROC_FS struct tcp_authopt_iter_state { struct seq_net_private p; };
This is in preparation for reusing the same option for TCP-AO
Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Leonard Crestez cdleonard@gmail.com --- tools/testing/selftests/net/nettest.c | 50 +++++++++++++-------------- 1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/tools/testing/selftests/net/nettest.c b/tools/testing/selftests/net/nettest.c index 7900fa98eccb..30585050e00a 100644 --- a/tools/testing/selftests/net/nettest.c +++ b/tools/testing/selftests/net/nettest.c @@ -94,17 +94,17 @@ struct sock_args { const char *clientns; const char *serverns;
const char *password; const char *client_pw; - /* prefix for MD5 password */ - const char *md5_prefix_str; + /* prefix for MD5/AO*/ + const char *key_addr_prefix_str; union { struct sockaddr_in v4; struct sockaddr_in6 v6; - } md5_prefix; - unsigned int prefix_len; + } key_addr; + unsigned int key_addr_prefix_len; /* 0: default, -1: force off, +1: force on */ int bind_key_ifindex;
/* expected addresses and device index for connection */ const char *expected_dev; @@ -267,16 +267,16 @@ static int tcp_md5sig(int sd, void *addr, socklen_t alen, struct sock_args *args int rc;
md5sig.tcpm_keylen = keylen; memcpy(md5sig.tcpm_key, args->password, keylen);
- if (args->prefix_len) { + if (args->key_addr_prefix_len) { opt = TCP_MD5SIG_EXT; md5sig.tcpm_flags |= TCP_MD5SIG_FLAG_PREFIX;
- md5sig.tcpm_prefixlen = args->prefix_len; - addr = &args->md5_prefix; + md5sig.tcpm_prefixlen = args->key_addr_prefix_len; + addr = &args->key_addr; } memcpy(&md5sig.tcpm_addr, addr, alen);
if ((args->ifindex && args->bind_key_ifindex >= 0) || args->bind_key_ifindex >= 1) { opt = TCP_MD5SIG_EXT; @@ -312,17 +312,17 @@ static int tcp_md5_remote(int sd, struct sock_args *args) int alen;
switch (args->version) { case AF_INET: sin.sin_port = htons(args->port); - sin.sin_addr = args->md5_prefix.v4.sin_addr; + sin.sin_addr = args->key_addr.v4.sin_addr; addr = &sin; alen = sizeof(sin); break; case AF_INET6: sin6.sin6_port = htons(args->port); - sin6.sin6_addr = args->md5_prefix.v6.sin6_addr; + sin6.sin6_addr = args->key_addr.v6.sin6_addr; addr = &sin6; alen = sizeof(sin6); break; default: log_error("unknown address family\n"); @@ -708,11 +708,11 @@ enum addr_type { ADDR_TYPE_LOCAL, ADDR_TYPE_REMOTE, ADDR_TYPE_MCAST, ADDR_TYPE_EXPECTED_LOCAL, ADDR_TYPE_EXPECTED_REMOTE, - ADDR_TYPE_MD5_PREFIX, + ADDR_TYPE_KEY_PREFIX, };
static int convert_addr(struct sock_args *args, const char *_str, enum addr_type atype) { @@ -748,32 +748,32 @@ static int convert_addr(struct sock_args *args, const char *_str, break; case ADDR_TYPE_EXPECTED_REMOTE: desc = "expected remote"; addr = &args->expected_raddr; break; - case ADDR_TYPE_MD5_PREFIX: - desc = "md5 prefix"; + case ADDR_TYPE_KEY_PREFIX: + desc = "key addr prefix"; if (family == AF_INET) { - args->md5_prefix.v4.sin_family = AF_INET; - addr = &args->md5_prefix.v4.sin_addr; + args->key_addr.v4.sin_family = AF_INET; + addr = &args->key_addr.v4.sin_addr; } else if (family == AF_INET6) { - args->md5_prefix.v6.sin6_family = AF_INET6; - addr = &args->md5_prefix.v6.sin6_addr; + args->key_addr.v6.sin6_family = AF_INET6; + addr = &args->key_addr.v6.sin6_addr; } else return 1;
sep = strchr(str, '/'); if (sep) { *sep = '\0'; sep++; if (str_to_uint(sep, 1, pfx_len_max, - &args->prefix_len) != 0) { - fprintf(stderr, "Invalid port\n"); + &args->key_addr_prefix_len) != 0) { + fprintf(stderr, "Invalid prefix\n"); return 1; } } else { - args->prefix_len = 0; + args->key_addr_prefix_len = 0; } break; default: log_error("unknown address type\n"); exit(1); @@ -838,13 +838,13 @@ static int validate_addresses(struct sock_args *args)
if (args->remote_addr_str && convert_addr(args, args->remote_addr_str, ADDR_TYPE_REMOTE) < 0) return 1;
- if (args->md5_prefix_str && - convert_addr(args, args->md5_prefix_str, - ADDR_TYPE_MD5_PREFIX) < 0) + if (args->key_addr_prefix_str && + convert_addr(args, args->key_addr_prefix_str, + ADDR_TYPE_KEY_PREFIX) < 0) return 1;
if (args->expected_laddr_str && convert_addr(args, args->expected_laddr_str, ADDR_TYPE_EXPECTED_LOCAL)) @@ -2029,11 +2029,11 @@ int main(int argc, char *argv[]) break; case 'X': args.client_pw = optarg; break; case 'm': - args.md5_prefix_str = optarg; + args.key_addr_prefix_str = optarg; break; case 'S': args.use_setsockopt = 1; break; case 'f': @@ -2091,17 +2091,17 @@ int main(int argc, char *argv[]) return 1; } }
if (args.password && - ((!args.has_remote_ip && !args.md5_prefix_str) || + ((!args.has_remote_ip && !args.key_addr_prefix_str) || args.type != SOCK_STREAM)) { log_error("MD5 passwords apply to TCP only and require a remote ip for the password\n"); return 1; }
- if (args.md5_prefix_str && !args.password) { + if (args.key_addr_prefix_str && !args.password) { log_error("Prefix range for MD5 protection specified without a password\n"); return 1; }
if (iter == 0) {
Add support for configuring TCP Authentication Option. Only a single key is supported with default options.
Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Leonard Crestez cdleonard@gmail.com --- tools/testing/selftests/net/nettest.c | 156 ++++++++++++++++++++++++-- 1 file changed, 145 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/net/nettest.c b/tools/testing/selftests/net/nettest.c index 30585050e00a..c5faabf6ba34 100644 --- a/tools/testing/selftests/net/nettest.c +++ b/tools/testing/selftests/net/nettest.c @@ -27,10 +27,11 @@ #include <string.h> #include <unistd.h> #include <time.h> #include <errno.h> #include <getopt.h> +#include <stdbool.h>
#include <linux/xfrm.h> #include <linux/ipsec.h> #include <linux/pfkeyv2.h>
@@ -104,10 +105,12 @@ struct sock_args { } key_addr; unsigned int key_addr_prefix_len; /* 0: default, -1: force off, +1: force on */ int bind_key_ifindex;
+ const char *authopt_password; + /* expected addresses and device index for connection */ const char *expected_dev; const char *expected_server_dev; int expected_ifindex;
@@ -257,10 +260,75 @@ static int switch_ns(const char *ns) close(fd);
return ret; }
+/* Fill key identification fields: address and ifindex */ +static void tcp_authopt_key_fill_id(struct tcp_authopt_key *key, struct sock_args *args) +{ + if (args->key_addr_prefix_str) { + key->flags |= TCP_AUTHOPT_KEY_ADDR_BIND; + switch (args->version) { + case AF_INET: + memcpy(&key->addr, &args->key_addr.v4, sizeof(args->key_addr.v4)); + break; + case AF_INET6: + memcpy(&key->addr, &args->key_addr.v6, sizeof(args->key_addr.v6)); + break; + default: + log_error("unknown address family\n"); + exit(1); + } + if (args->key_addr_prefix_len) { + key->flags |= TCP_AUTHOPT_KEY_PREFIXLEN; + key->prefixlen = args->key_addr_prefix_len; + } + } + + if ((args->ifindex && args->bind_key_ifindex >= 0) || args->bind_key_ifindex >= 1) { + key->flags |= TCP_AUTHOPT_KEY_IFINDEX; + key->ifindex = args->ifindex; + log_msg("TCP_AUTHOPT_KEY_IFINDEX set ifindex=%d\n", key->ifindex); + } else { + log_msg("TCP_AUTHOPT_KEY_IFINDEX off\n", key->ifindex); + } +} + +static int tcp_del_authopt(int sd, struct sock_args *args) +{ + struct tcp_authopt_key key; + int rc; + + memset(&key, 0, sizeof(key)); + key.flags |= TCP_AUTHOPT_KEY_DEL; + tcp_authopt_key_fill_id(&key, args); + + rc = setsockopt(sd, IPPROTO_TCP, TCP_AUTHOPT_KEY, &key, sizeof(key)); + if (rc < 0) + log_err_errno("setsockopt(TCP_AUTHOPT_KEY) del fail"); + + return rc; +} + +static int tcp_set_authopt(int sd, struct sock_args *args) +{ + struct tcp_authopt_key key; + int rc; + + memset(&key, 0, sizeof(key)); + strcpy((char *)key.key, args->authopt_password); + key.keylen = strlen(args->authopt_password); + key.alg = TCP_AUTHOPT_ALG_HMAC_SHA_1_96; + tcp_authopt_key_fill_id(&key, args); + + rc = setsockopt(sd, IPPROTO_TCP, TCP_AUTHOPT_KEY, &key, sizeof(key)); + if (rc < 0) + log_err_errno("setsockopt(TCP_AUTHOPT_KEY) add fail"); + + return rc; +} + static int tcp_md5sig(int sd, void *addr, socklen_t alen, struct sock_args *args) { int keylen = strlen(args->password); struct tcp_md5sig md5sig = {}; int opt = TCP_MD5SIG; @@ -1549,10 +1617,15 @@ static int do_server(struct sock_args *args, int ipc_fd) if (args->password && tcp_md5_remote(lsd, args)) { close(lsd); goto err_exit; }
+ if (args->authopt_password && tcp_set_authopt(lsd, args)) { + close(lsd); + goto err_exit; + } + ipc_write(ipc_fd, 1); while (1) { log_msg("waiting for client connection.\n"); FD_ZERO(&rfds); FD_SET(lsd, &rfds); @@ -1671,10 +1744,13 @@ static int connectsock(void *addr, socklen_t alen, struct sock_args *args) goto out;
if (args->password && tcp_md5sig(sd, addr, alen, args)) goto err;
+ if (args->authopt_password && tcp_set_authopt(sd, args)) + goto err; + if (args->bind_test_only) goto out;
if (connect(sd, addr, alen) < 0) { if (errno != EINPROGRESS) { @@ -1860,11 +1936,11 @@ static int ipc_parent(int cpid, int fd, struct sock_args *args)
wait(&status); return client_status; }
-#define GETOPT_STR "sr:l:c:p:t:g:P:DRn:M:X:m:d:I:BN:O:SUCi6xL:0:1:2:3:Fbqf" +#define GETOPT_STR "sr:l:c:p:t:g:P:DRn:M:X:m:A:d:I:BN:O:SUCi6xL:0:1:2:3:Fbqf" #define OPT_FORCE_BIND_KEY_IFINDEX 1001 #define OPT_NO_BIND_KEY_IFINDEX 1002
static struct option long_opts[] = { {"force-bind-key-ifindex", 0, 0, OPT_FORCE_BIND_KEY_IFINDEX}, @@ -1906,14 +1982,15 @@ static void print_usage(char *prog) " -L len send random message of given length\n" " -n num number of times to send message\n" "\n" " -M password use MD5 sum protection\n" " -X password MD5 password for client mode\n" - " -m prefix/len prefix and length to use for MD5 key\n" - " --no-bind-key-ifindex: Force TCP_MD5SIG_FLAG_IFINDEX off\n" - " --force-bind-key-ifindex: Force TCP_MD5SIG_FLAG_IFINDEX on\n" + " -m prefix/len prefix and length to use for MD5/AO key\n" + " --no-bind-key-ifindex: Force disable binding key to ifindex\n" + " --force-bind-key-ifindex: Force enable binding key to ifindex\n" " (default: only if -I is passed)\n" + " -A password use RFC5925 TCP Authentication Option with password\n" "\n" " -g grp multicast group (e.g., 239.1.1.1)\n" " -i interactive mode (default is echo and terminate)\n" "\n" " -0 addr Expected local address\n" @@ -1924,17 +2001,64 @@ static void print_usage(char *prog) " -b Bind test only.\n" " -q Be quiet. Run test without printing anything.\n" , prog, DEFAULT_PORT); }
-int main(int argc, char *argv[]) +/* Needs explicit cleanup because keys are global per-namespace */ +void cleanup_tcp_authopt(struct sock_args *args) +{ + int fd; + + if (!args->authopt_password) + return; + + fd = socket(AF_INET, SOCK_STREAM, 0); + if (fd < 0) { + log_err_errno("Failed to create socket"); + return; + } + tcp_del_authopt(fd, args); + close(fd); +} + +static bool cleanup_done; +static struct sock_args args = { + .version = AF_INET, + .type = SOCK_STREAM, + .port = DEFAULT_PORT, +}; + +void cleanup(void) +{ + if (cleanup_done) + return; + cleanup_done = true; + cleanup_tcp_authopt(&args); +} + +void signal_handler(int num) +{ + cleanup(); +} + +void atexit_handler(void) +{ + cleanup(); +} + +/* Explicit cleanup is required for TCP-AO because keys are global. */ +static void register_cleanup(void) { - struct sock_args args = { - .version = AF_INET, - .type = SOCK_STREAM, - .port = DEFAULT_PORT, + struct sigaction sa = { + .sa_handler = signal_handler, }; + sigaction(SIGINT, &sa, NULL); + atexit(atexit_handler); +} + +int main(int argc, char *argv[]) +{ struct protoent *pe; int both_mode = 0; unsigned int tmp; int forever = 0; int fd[2]; @@ -2031,10 +2155,13 @@ int main(int argc, char *argv[]) args.client_pw = optarg; break; case 'm': args.key_addr_prefix_str = optarg; break; + case 'A': + args.authopt_password = optarg; + break; case 'S': args.use_setsockopt = 1; break; case 'f': args.use_freebind = 1; @@ -2097,12 +2224,17 @@ int main(int argc, char *argv[]) args.type != SOCK_STREAM)) { log_error("MD5 passwords apply to TCP only and require a remote ip for the password\n"); return 1; }
- if (args.key_addr_prefix_str && !args.password) { - log_error("Prefix range for MD5 protection specified without a password\n"); + if (args.key_addr_prefix_str && !args.password && !args.authopt_password) { + log_error("Prefix range for authentication requires -M or -A\n"); + return 1; + } + + if (args.key_addr_prefix_len && args.authopt_password) { + log_error("TCP-AO does not support prefix match, only full address\n"); return 1; }
if (iter == 0) { fprintf(stderr, "Invalid number of messages to send\n"); @@ -2125,10 +2257,12 @@ int main(int argc, char *argv[]) fprintf(stderr, "Local (server mode) or remote IP (client IP) required\n"); return 1; }
+ register_cleanup(); + if (interactive) { prog_timeout = 0; msg = NULL; }
Tests are mostly copied from tcp_md5 with minor changes.
It covers VRF support but only based on binding multiple servers: not multiple keys bound to different interfaces.
Also add a specific -t tcp_authopt to run only these tests specifically.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- tools/testing/selftests/net/fcnal-test.sh | 329 +++++++++++++++++++++- 1 file changed, 327 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh index 31c3b6ebd388..ae0cccd6fcca 100755 --- a/tools/testing/selftests/net/fcnal-test.sh +++ b/tools/testing/selftests/net/fcnal-test.sh @@ -830,10 +830,330 @@ ipv4_ping() }
################################################################################ # IPv4 TCP
+# +# TCP Authentication Option Tests +# + +# try to enable tcp_authopt sysctl +enable_tcp_authopt() +{ + if [[ -e /proc/sys/net/ipv4/tcp_authopt ]]; then + sysctl -w net.ipv4.tcp_authopt=1 + fi +} + +# check if tcp_authopt is compiled with a client-side bind test +has_tcp_authopt() +{ + run_cmd_nsb nettest -b -A ${MD5_PW} -r ${NSA_IP} +} + +# Verify /proc/net/tcp_authopt is empty in all namespaces +check_tcp_authopt_key_leak() +{ + local ns cnt + + for ns in $NSA $NSB $NSC; do + if ! ip netns list | grep -q $ns; then + continue + fi + cnt=$(ip netns exec "$ns" cat /proc/net/tcp_authopt | wc -l) + if [[ $cnt != 1 ]]; then + echo "FAIL: leaked tcp_authopt keys in netns $ns" + ip netns exec $ns cat /proc/net/tcp_authopt + return 1 + fi + done +} + +log_check_tcp_authopt_key_leak() +{ + check_tcp_authopt_key_leak + log_test $? 0 "TCP-AO: Key leak check" +} + +ipv4_tcp_authopt_novrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -s -A ${MD5_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: Single address config" + + log_start + run_cmd nettest -s & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: Server no config, client uses password" + + log_start + run_cmd nettest -s -A ${MD5_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: Client uses wrong password" + log_check_tcp_authopt_key_leak + + log_start + run_cmd nettest -s -A ${MD5_PW} -m ${NSB_LO_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: Client address does not match address configured on server" + log_check_tcp_authopt_key_leak + + # client in prefix + log_start + run_cmd nettest -s -A ${MD5_PW} -m ${NS_NET} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: Prefix config" + + # client in prefix, wrong password + log_start + show_hint "Should timeout since client uses wrong password" + run_cmd nettest -s -A ${MD5_PW} -m ${NS_NET} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: Prefix config, client uses wrong password" + log_check_tcp_authopt_key_leak + + # client outside of prefix + log_start + show_hint "Should timeout due to MD5 mismatch" + run_cmd nettest -s -A ${MD5_PW} -m ${NS_NET} & + sleep 1 + run_cmd_nsb nettest -c ${NSB_LO_IP} -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: Prefix config, client address not in configured prefix" + log_check_tcp_authopt_key_leak +} + +ipv6_tcp_authopt_novrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -6 -s -A ${MD5_PW} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 0 "AO: Simple correct config" + + log_start + run_cmd nettest -6 -s + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 2 "AO: Server no config, client uses password" + + log_start + run_cmd nettest -6 -s -A ${MD5_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: Client uses wrong password" + + log_start + run_cmd nettest -6 -s -A ${MD5_PW} -m ${NSB_LO_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 2 "AO: Client address does not match address configured on server" +} + +ipv4_tcp_authopt_vrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Simple config" + + # + # duplicate config between default VRF and a VRF + # + + log_start + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Servers in default VRF and VRF, client in VRF" + + log_start + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_WRONG_PW} + log_test $? 0 "AO: VRF: Servers in default VRF and VRF, client in default VRF" + + log_start + show_hint "Should timeout since client in default VRF uses VRF password" + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in default VRF with VRF pw" + + log_start + show_hint "Should timeout since client in VRF uses default VRF password" + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in VRF with default VRF pw" + + test_ipv4_tcp_authopt_vrf__global_server__bind_ifindex0 +} + +test_ipv4_tcp_authopt_vrf__global_server__bind_ifindex0() +{ + # This particular test needs tcp_l3mdev_accept=1 for Global server to accept VRF connections + local old_tcp_l3mdev_accept + old_tcp_l3mdev_accept=$(get_sysctl net.ipv4.tcp_l3mdev_accept) + set_sysctl net.ipv4.tcp_l3mdev_accept=1 + + log_start + run_cmd nettest -s -A ${MD5_PW} --force-bind-key-ifindex & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: VRF: Global server, Key bound to ifindex=0 rejects VRF connection" + + log_start + run_cmd nettest -s -A ${MD5_PW} --force-bind-key-ifindex & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Global server, key bound to ifindex=0 accepts non-VRF connection" + log_start + + run_cmd nettest -s -A ${MD5_PW} --no-bind-key-ifindex & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Global server, key not bound to ifindex accepts VRF connection" + + log_start + run_cmd nettest -s -A ${MD5_PW} --no-bind-key-ifindex & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Global server, key not bound to ifindex accepts non-VRF connection" + + # restore value + set_sysctl net.ipv4.tcp_l3mdev_accept="$old_tcp_l3mdev_accept" +} + +ipv6_tcp_authopt_vrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Simple config" + + # + # duplicate config between default VRF and a VRF + # + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Servers in default VRF and VRF, client in VRF" + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsc nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 0 "AO: VRF: Servers in default VRF and VRF, client in default VRF" + + log_start + show_hint "Should timeout since client in default VRF uses VRF password" + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsc nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in default VRF with VRF pw" + + log_start + show_hint "Should timeout since client in VRF uses default VRF password" + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in VRF with default VRF pw" + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NS_NET6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NS_NET6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Prefix config in default VRF and VRF, conn in VRF" + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NS_NET6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NS_NET6} & + sleep 1 + run_cmd_nsc nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 0 "AO: VRF: Prefix config in default VRF and VRF, conn in default VRF" + + log_start + show_hint "Should timeout since client in default VRF uses VRF password" + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NS_NET6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NS_NET6} & + sleep 1 + run_cmd_nsc nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 2 "AO: VRF: Prefix config in def VRF and VRF, conn in def VRF with VRF pw" + + log_start + show_hint "Should timeout since client in VRF uses default VRF password" + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NS_NET6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NS_NET6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: VRF: Prefix config in dev VRF and VRF, conn in VRF with def VRF pw" +} + +only_tcp_authopt() +{ + log_section "TCP Authentication Option" + + setup + set_sysctl net.ipv4.tcp_l3mdev_accept=0 + log_subsection "TCP-AO IPv4 no VRF" + ipv4_tcp_authopt_novrf + log_subsection "TCP-AO IPv6 no VRF" + ipv6_tcp_authopt_novrf + + setup "yes" + setup_vrf_dup + set_sysctl net.ipv4.tcp_l3mdev_accept=0 + log_subsection "TCP-AO IPv4 VRF" + ipv4_tcp_authopt_vrf + log_subsection "TCP-AO IPv6 VRF" + ipv6_tcp_authopt_vrf +} + # # MD5 tests without VRF # ipv4_tcp_md5_novrf() { @@ -1215,10 +1535,11 @@ ipv4_tcp_novrf() show_hint "Should fail 'Connection refused'" run_cmd nettest -d ${NSA_DEV} -r ${a} log_test_addr ${a} $? 1 "No server, device client, local conn"
ipv4_tcp_md5_novrf + ipv4_tcp_authopt_novrf }
ipv4_tcp_vrf() { local a @@ -1267,13 +1588,14 @@ ipv4_tcp_vrf() run_cmd nettest -s & sleep 1 run_cmd nettest -r ${a} -d ${NSA_DEV} log_test_addr ${a} $? 1 "Global server, local connection"
- # run MD5 tests + # run MD5+AO tests setup_vrf_dup ipv4_tcp_md5 + ipv6_tcp_md5_vrf cleanup_vrf_dup
# # enable VRF global server # @@ -2771,10 +3093,11 @@ ipv6_tcp_novrf() run_cmd nettest -6 -d ${NSA_DEV} -r ${a} log_test_addr ${a} $? 1 "No server, device client, local conn" done
ipv6_tcp_md5_novrf + ipv6_tcp_authopt_novrf }
ipv6_tcp_vrf() { local a @@ -2839,13 +3162,14 @@ ipv6_tcp_vrf() run_cmd nettest -6 -s & sleep 1 run_cmd nettest -6 -r ${a} -d ${NSA_DEV} log_test_addr ${a} $? 1 "Global server, local connection"
- # run MD5 tests + # run MD5+AO tests setup_vrf_dup ipv6_tcp_md5 + ipv6_tcp_authopt_vrf cleanup_vrf_dup
# # enable VRF global server # @@ -4221,10 +4545,11 @@ do ipv6_bind|bind6) ipv6_addr_bind;; ipv6_runtime) ipv6_runtime;; ipv6_netfilter) ipv6_netfilter;;
use_cases) use_cases;; + tcp_authopt) only_tcp_authopt;;
# setup namespaces and config, but do not run any tests setup) setup; exit 0;; vrf_setup) setup "yes"; exit 0;; esac
Hi Leonard,
On Mon, Sep 5, 2022 at 12:06 AM Leonard Crestez cdleonard@gmail.com wrote:
This is similar to TCP-MD5 in functionality but it's sufficiently different that packet formats and interfaces are incompatible. Compared to TCP-MD5 more algorithms are supported and multiple keys can be used on the same connection but there is still no negotiation mechanism.
...
A completely unrelated series that implements the same features was posted recently: https://lore.kernel.org/netdev/20220818170005.747015-1-dima@arista.com/
The biggest difference is that this series puts TCP-AO key on a global instead of per-socket list and that it attempts to make kernel-mode key selection decisions instead of very strictly requiring userspace to make all decisions.
This is a departure from how md5 is implemented and the interface that BGP developers are used to. The reason you switched your implementation to a global database was to fix a minor race between key addition/deletion and connections being accepted on a listening socket. This race can be easily solved with a getsockopt() in user space. Thus it doesn’t justify the complexity that a global key database brings to the implementation. I have a few issues with that design that I would like to point out.
- Currently, a setsockopt on a given socket that adds a key will add it to the global database. That opens up the door for buggy/malicious apps to install bogus keys and mess up the connections of other apps. Also, it seems unusual for a setsockopt to affect all sockets in a namespace. This requires all user space apps to play nicely together.
- Having the keys be per-socket takes advantage of the existing socket lock, simplifying synchronization and avoiding extra locks in the TCP stack.
- Caching of traffic keys becomes much easier with per-socket keys. Once a connection is established it will typically have one or two keys on its list with traffic keys cached. In your current implementation, a linked list of potentially thousands of keys has to be linearly searched for each packet and the traffic key has to be calculated before doing the actual hashing of the packet. We believe a linear search with the extra hashing to calculate the traffic keys will be detrimental to the performance of real world deployments.
- Using a global database might have a benefit if the goal is to have user space apps use tcp-ao transparently without any modifications. This would require key matching on the local and remote ports. But again, do we expect any apps other than BGP/LDP using tcp-ao? If not, why the extra complexity in the kernel?
I believe my approach greatly simplifies userspace implementation. The biggest difference in this iteration of the patch series is adding per-key lifetime values based on RFC8177 in order to implement kernel-mode key rollover.
We believe that key rotation should be done in user-space. One reason is that different vendors might have slightly different behaviors during key rotation and having the logic be in user-space is more flexible for fixing issues. It’s not fun having to patch the kernel every time an interop issue is discovered.
Older versions still required userspace to tweak the NOSEND/NORECV flags and always pick rnextkeyid explicitly, but now no active "key management" should be required on established socket - Just set correct flags and expiration dates and the kernel can perform key rollover itself. You can see a (simple) test of that behavior here:
...
Best,
Salam
Hello,
I need to inform you that I have given up on pushing this series upstream. This work might be continued by Yonatan Linik ylinik@drivenets.com who works for the same employer (I will stop working there).
The main reason for abandoning this is that engaging in zero-sum competition in upstream is the last thing I ever want to do.
-- Bye, Leonard
On 9/5/22 10:05, Leonard Crestez wrote:
This is similar to TCP-MD5 in functionality but it's sufficiently different that packet formats and interfaces are incompatible. Compared to TCP-MD5 more algorithms are supported and multiple keys can be used on the same connection but there is still no negotiation mechanism.
Expected use-case is protecting long-duration BGP/LDP connections between routers using pre-shared keys. The goal of this series is to allow routers using the Linux TCP stack to interoperate with vendors such as Cisco and Juniper. An fully-featured userspace implementation using this patchset exists but it is not open.
A completely unrelated series that implements the same features was posted recently: https://lore.kernel.org/netdev/20220818170005.747015-1-dima@arista.com/
The biggest difference is that this series puts TCP-AO key on a global instead of per-socket list and that it attempts to make kernel-mode key selection decisions instead of very strictly requiring userspace to make all decisions.
I believe my approach greatly simplifies userspace implementation. The biggest difference in this iteration of the patch series is adding per-key lifetime values based on RFC8177 in order to implement kernel-mode key rollover.
Older versions still required userspace to tweak the NOSEND/NORECV flags and always pick rnextkeyid explicitly, but now no active "key management" should be required on established socket - Just set correct flags and expiration dates and the kernel can perform key rollover itself. You can see a (simple) test of that behavior here:
https://github.com/cdleonard/tcp-authopt-test/blob/main/tcp_authopt_test/tes...
The main implementation of this behavior is patch 17.
Very very old versions of this series had per-socket keys but that approach was prone to an issue when key change made on a listen socket between "synack" and "accept" did not affect the new socket.
My solution was to make keys global, the Arista solution is to require userspace to query the key list on accepted sockets and update them. This offloads responsibility for an ABI race to userspace. It can be made to work.
Here are some known flaws and limitations:
- Crypto API is used with buffers on the stack and inside struct sock,
this might not work on all arches. I'm currently only testing x64 VMs
- Interaction with FASTOPEN not tested.
- Traffic key is not cached (reducing performance).
- All lookups examine all keys, ignoring optimization opportunities
- Overlaping MKTs can be configured despite what RFC5925 says. This is
considered "misconfiguration by userspace" and it would make sense for the kernel to be more aggressive here.
Some testing support is included in nettest and fcnal-test.sh, similar to the current level of tcp-md5 testing.
A more elaborate test suite using pytest and scapy is available out of tree: https://github.com/cdleonard/tcp-authopt-test There is an automatic system that runs that test suite in vagrant in gitlab-ci: https://gitlab.com/cdleonard/vagrantcpao That test suite fully covers the ABI of this patchset.
Changes for frr (obsolete): https://github.com/FRRouting/frr/pull/9442 That PR was made early for ABI feedback, it has many issues.
Changes for yabgp (obsolete): https://github.com/cdleonard/yabgp/commits/tcp_authopt This was used for interoperability testing with cisco. Would need updates for global keys to avoid leaks.
Changes since PATCH v7:
- Add lifetime fields to struct tcp_authopt_key
- Fix not checking MD5 after unexpected AO.
Link to v7: https://lore.kernel.org/netdev/cover.1660852705.git.cdleonard@gmail.com/
Changes since PATCH v6:
- Squash "remove unused noops" patch (forgot to do this before v5 send).
- Make TCP_REPAIR_AUTHOPT fail if (!tp->repair)
- Add {snd,rcv}_seq to struct tcp_repair_authopt next to {snd,rcv}_sne.
The fact that internally snd_sne is maintained as a 64-bit extension of sne_nxt is a problem for TCP_REPAIR implementation in userspace which might not have access to snd_nxt during live traffic. By exposing a full 64-bit “recent sequence number” to userspace it's possible to ignore which exact SEQ number the SNE value is an extension of.
- Fix ipv6_addr_is_prefix helper; it was incorrect and dependant on
uninitialized stack memory. This was caught by test suite after many rebases.
- Implement ipv4-mapped-ipv6 support, request by Eric Dumazet
Link: https://lore.kernel.org/netdev/cover.1658815925.git.cdleonard@gmail.com/
Changes since PATCH v5:
- Rebased on recent net-next, including recent changes refactoring md5
- Use to skb_drop_reason
- Fix using sock_kmalloc for key alloc but regular kfree for free. Use kmalloc
because keys are global
- Fix mentioning non-existent copy_from_sockopt in doc for _copy_from_sockptr_tolerant
- If no valid keys are available for a destination then report a socket error
instead of sending unsigned traffic
- Remove several noop implementations which are always called from ifdef
- Fix build issues in all scenarios, including -Werror at every point.
- Split "tcp: Refactor tcp_inbound_md5_hash into tcp_inbound_sig_hash" into a separate commit.
- Add TCP_AUTHOPT_FLAG_ACTIVE to distinguish between "keys configured for socket"
and "connection authenticated". A listen socket with authentication enabled will return other sockets with authentication enabled on accept() but if no key is configured for the peer then authentication will be inactive.
- Add support for TCP_REPAIR_AUTHOPT new sockopts which loads/saves the AO-specific
information. Link: https://lore.kernel.org/netdev/cover.1643026076.git.cdleonard@gmail.com/
Changes since PATCH v4:
- Move the traffic_key context_bytes header to stack. If it's a constant
string then ahash can fail unexpectedly.
- Fix allowing unsigned traffic if all keys are marked norecv.
- Fix crashing in __tcp_authopt_alg_init on failure.
- Try to respect the rnextkeyid from SYN on SYNACK (new patch)
- Fix incorrect check for TCP_AUTHOPT_KEY_DEL in __tcp_authopt_select_key
- Improve docs on __tcp_authopt_select_key
- Fix build with CONFIG_PROC_FS=n (kernel build robot)
- Fix build with CONFIG_IPV6=n (kernel build robot)
Link: https://lore.kernel.org/netdev/cover.1640273966.git.cdleonard@gmail.com/
Changes since PATCH v3:
- Made keys global (per-netns rather than per-sock).
- Add /proc/net/tcp_authopt with a table of keys (not sockets).
- Fix part of the shash/ahash conversion having slipped from patch 3 to patch 5
- Fix tcp_parse_sig_options assigning NULL incorrectly when both MD5 and AO
are disabled (kernel build robot)
- Fix sparse endianness warnings in prefix match (kernel build robot)
- Fix several incorrect RCU annotations reported by sparse (kernel build robot)
Link: https://lore.kernel.org/netdev/cover.1638962992.git.cdleonard@gmail.com/
Changes since PATCH v2:
- Protect tcp_authopt_alg_get/put_tfm with local_bh_disable instead of
preempt_disable. This caused signature corruption when send path executing with BH enabled was interrupted by recv.
- Fix accepted keyids not configured locally as "unexpected". If any key
is configured that matches the peer then traffic MUST be signed.
- Fix issues related to sne rollover during handshake itself. (Francesco)
- Implement and test prefixlen (David)
- Replace shash with ahash and reuse some of the MD5 code (Dmitry)
- Parse md5+ao options only once in the same function (Dmitry)
- Pass tcp_authopt_info into inbound check path, this avoids second rcu
dereference for same packet.
- Pass tcp_request_socket into inbound check path instead of just listen
socket. This is required for SNE rollover during handshake and clearifies ISN handling.
- Do not allow disabling via sysctl after enabling once, this is difficult
to support well (David)
- Verbose check for sysctl_tcp_authopt (Dmitry)
- Use netif_index_is_l3_master (David)
- Cleanup ipvx_addr_match (David)
- Add a #define tcp_authopt_needed to wrap static key usage because it looks
nicer.
- Replace rcu_read_lock with rcu_dereference_protected in SNE updates (Eric)
- Remove test suite
Link: https://lore.kernel.org/netdev/cover.1635784253.git.cdleonard@gmail.com/
Changes since PATCH v1:
- Implement Sequence Number Extension
- Implement l3index for vrf: TCP_AUTHOPT_KEY_IFINDEX as equivalent of
TCP_MD5SIG_FLAG_IFINDEX
- Expand TCP-AO tests in fcnal-test.sh to near-parity with md5.
- Show addr/port on failure similar to md5
- Remove tox dependency from test suite (create venv directly)
- Switch default pytest output format to TAP (kselftest standard)
- Fix _copy_from_sockptr_tolerant stack corruption on short sockopts.
This was covered in test but error was invisible without STACKPROTECTOR=y
- Fix sysctl_tcp_authopt check in tcp_get_authopt_val before memset. This
was harmless because error code is checked in getsockopt anyway.
- Fix dropping md5 packets on all sockets with AO enabled
- Fix checking (key->recv_id & TCP_AUTHOPT_KEY_ADDR_BIND) instead of
key->flags in tcp_authopt_key_match_exact
- Fix PATCH 1/19 not compiling due to missing "int err" declaration
- Add ratelimited message for AO and MD5 both present
- Export all symbols required by CONFIG_IPV6=m (again)
- Fix compilation with CONFIG_TCP_AUTHOPT=y CONFIG_TCP_MD5SIG=n
- Fix checkpatch issues
- Pass -rrequirements.txt to tox to avoid dependency variation.
Link: https://lore.kernel.org/netdev/cover.1632240523.git.cdleonard@gmail.com/
Changes since RFCv3:
- Implement TCP_AUTHOPT handling for timewait and reset replies. Write
tests to execute these paths by injecting packets with scapy
- Handle combining md5 and authopt: if both are configured use authopt.
- Fix locking issues around send_key, introduced in on of the later patches.
- Handle IPv4-mapped-IPv6 addresses: it used to be that an ipv4 SYN sent
to an ipv6 socket with TCP-AO triggered WARN
- Implement un-namespaced sysctl disabled this feature by default
- Allocate new key before removing any old one in setsockopt (Dmitry)
- Remove tcp_authopt_key_info.local_id because it's no longer used (Dmitry)
- Propagate errors from TCP_AUTHOPT getsockopt (Dmitry)
- Fix no-longer-correct TCP_AUTHOPT_KEY_DEL docs (Dmitry)
- Simplify crypto allocation (Eric)
- Use kzmalloc instead of __GFP_ZERO (Eric)
- Add static_key_false tcp_authopt_needed (Eric)
- Clear authopt_info copied from oldsk in __tcp_authopt_openreq (Eric)
- Replace memcmp in ipv4 and ipv6 addr comparisons (Eric)
- Export symbols for CONFIG_IPV6=m (kernel test robot)
- Mark more functions static (kernel test robot)
- Fix build with CONFIG_PROVE_RCU_LIST=y (kernel test robot)
Link: https://lore.kernel.org/netdev/cover.1629840814.git.cdleonard@gmail.com/
Changes since RFCv2:
- Removed local_id from ABI and match on send_id/recv_id/addr
- Add all relevant out-of-tree tests to tools/testing/selftests
- Return an error instead of ignoring unknown flags, hopefully this makes
it easier to extend.
- Check sk_family before __tcp_authopt_info_get_or_create in tcp_set_authopt_key
- Use sock_owned_by_me instead of WARN_ON(!lockdep_sock_is_held(sk))
- Fix some intermediate build failures reported by kbuild robot
- Improve documentation
Link: https://lore.kernel.org/netdev/cover.1628544649.git.cdleonard@gmail.com/
Changes since RFC:
- Split into per-topic commits for ease of review. The intermediate
commits compile with a few "unused function" warnings and don't do anything useful by themselves.
- Add ABI documention including kernel-doc on uapi
- Fix lockdep warnings from crypto by creating pools with one shash for
each cpu
- Accept short options to setsockopt by padding with zeros; this
approach allows increasing the size of the structs in the future.
- Support for aes-128-cmac-96
- Support for binding addresses to keys in a way similar to old tcp_md5
- Add support for retrieving received keyid/rnextkeyid and controling
the keyid/rnextkeyid being sent. Link: https://lore.kernel.org/netdev/01383a8751e97ef826ef2adf93bfde3a08195a43.1626...
Leonard Crestez (26): tcp: authopt: Initial support and key management docs: Add user documentation for tcp_authopt tcp: authopt: Add crypto initialization tcp: Refactor tcp_sig_hash_skb_data for AO tcp: authopt: Compute packet signatures tcp: Refactor tcp_inbound_md5_hash into tcp_inbound_sig_hash tcp: authopt: Hook into tcp core tcp: authopt: Disable via sysctl by default tcp: authopt: Implement Sequence Number Extension tcp: ipv6: Add AO signing for tcp_v6_send_response tcp: authopt: Add support for signing skb-less replies tcp: ipv4: Add AO signing for skb-less replies tcp: authopt: Add NOSEND/NORECV flags tcp: authopt: Add initial l3index support tcp: authopt: Add prefixlen support tcp: authopt: Add send/recv lifetime support tcp: authopt: Add key selection controls tcp: authopt: Add v4mapped ipv6 address support tcp: authopt: Add /proc/net/tcp_authopt listing all keys tcp: authopt: If no keys are valid for send report an error tcp: authopt: Try to respect rnextkeyid from SYN on SYNACK tcp: authopt: Initial support for TCP_AUTHOPT_FLAG_ACTIVE tcp: authopt: Initial implementation of TCP_REPAIR_AUTHOPT selftests: nettest: Rename md5_prefix to key_addr_prefix selftests: nettest: Initial tcp_authopt support selftests: net/fcnal: Initial tcp_authopt support
Documentation/networking/index.rst | 1 + Documentation/networking/ip-sysctl.rst | 6 + Documentation/networking/tcp_authopt.rst | 95 + include/linux/tcp.h | 15 + include/net/dropreason.h | 16 + include/net/net_namespace.h | 4 + include/net/netns/tcp_authopt.h | 12 + include/net/tcp.h | 55 +- include/net/tcp_authopt.h | 269 +++ include/uapi/linux/snmp.h | 1 + include/uapi/linux/tcp.h | 188 ++ net/ipv4/Kconfig | 14 + net/ipv4/Makefile | 1 + net/ipv4/proc.c | 1 + net/ipv4/sysctl_net_ipv4.c | 39 + net/ipv4/tcp.c | 126 +- net/ipv4/tcp_authopt.c | 2044 +++++++++++++++++++++ net/ipv4/tcp_input.c | 55 +- net/ipv4/tcp_ipv4.c | 100 +- net/ipv4/tcp_minisocks.c | 12 + net/ipv4/tcp_output.c | 106 +- net/ipv6/tcp_ipv6.c | 70 +- tools/testing/selftests/net/fcnal-test.sh | 329 +++- tools/testing/selftests/net/nettest.c | 204 +- 24 files changed, 3675 insertions(+), 88 deletions(-) create mode 100644 Documentation/networking/tcp_authopt.rst create mode 100644 include/net/netns/tcp_authopt.h create mode 100644 include/net/tcp_authopt.h create mode 100644 net/ipv4/tcp_authopt.c
-- 2.25.1
linux-kselftest-mirror@lists.linaro.org