On Fri, Aug 05, 2022 at 05:11:48PM -0700, Adel Abouchaev wrote:
Adding Documentation/networking/quic.rst file to describe kernel QUIC code.
Better say "Add documentation for kernel QUIC code".
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 03b215bddde8..656fa1dac26b 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -90,6 +90,7 @@ Contents: plip ppp_generic proc_net_tcp
- quic radiotap-headers rds regulatory
diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst new file mode 100644 index 000000000000..416099b80e60 --- /dev/null +++ b/Documentation/networking/quic.rst @@ -0,0 +1,186 @@ +.. _kernel_quic:
+=========== +KERNEL QUIC +===========
+Overview +========
+QUIC is a secure general-purpose transport protocol that creates a stateful +interaction between a client and a server. QUIC provides end-to-end integrity +and confidentiality. Refer to RFC 9000 for more information on QUIC.
+The kernel Tx side offload covers the encryption of the application streams +in the kernel rather than in the application. These packets are 1RTT packets +in QUIC connection. Encryption of every other packets is still done by the +QUIC library in user space.
+User Interface +==============
+Creating a QUIC connection +--------------------------
+QUIC connection originates and terminates in the application, using one of many +available QUIC libraries. The code instantiates QUIC client and QUIC server in +some form and configures them to use certain addresses and ports for the +source and destination. The client and server negotiate the set of keys to +protect the communication during different phases of the connection, maintain +the connection and perform congestion control.
+Requesting to add QUIC Tx kernel encryption to the connection +-------------------------------------------------------------
+Each flow that should be encrypted by the kernel needs to be registered with +the kernel using socket API. A setsockopt() call on the socket creates an +association between the QUIC connection ID of the flow with the encryption +parameters for the crypto operations:
+.. code-block:: c
- struct quic_connection_info conn_info;
- char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
- const size_t conn_id_len = sizeof(conn_id);
- char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
- char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b};
- char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
};
- conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
- memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
- conn_info.key.conn_id_length = 5;
- memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
- conn_id_len],
&conn_id, conn_id_len);
- memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
- memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
- memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
- setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
sizeof(conn_info));
+Requesting to remove QUIC Tx kernel crypto offload control messages +-------------------------------------------------------------------
+All flows are removed when the socket is closed. To request an explicit remove +of the offload for the connection during the lifetime of the socket the process +is similar to adding the flow. Only the connection ID and its length are +necessary to supply to remove the connection from the offload:
+.. code-block:: c
- memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
- conn_info.key.conn_id_length = 5;
- memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
- conn_id_len],
&conn_id, conn_id_len);
- setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
sizeof(conn_info));
+Sending QUIC application data +-----------------------------
+For QUIC Tx encryption offload, the application should use sendmsg() socket +call and provide ancillary data with information on connection ID length and +offload flags for the kernel to perform the encryption and GSO support if +requested.
+.. code-block:: c
- size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
- uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
- struct quic_tx_ancillary_data * anc_data;
- size_t quic_data_len = 4500;
- struct cmsghdr * cmsg_hdr;
- char quic_data[9000];
- struct iovec iov[2];
- int send_len = 9000;
- struct msghdr msg;
- int err;
- iov[0].iov_base = quic_data;
- iov[0].iov_len = quic_data_len;
- iov[1].iov_base = quic_data + 4500;
- iov[1].iov_len = quic_data_len;
- if (client.addr.sin_family == AF_INET) {
msg.msg_name = &client.addr;
msg.msg_namelen = sizeof(client.addr);
- } else {
msg.msg_name = &client.addr6;
msg.msg_namelen = sizeof(client.addr6);
- }
- msg.msg_iov = iov;
- msg.msg_iovlen = 2;
- msg.msg_control = cmsg_buf;
- msg.msg_controllen = sizeof(cmsg_buf);
- cmsg_hdr = CMSG_FIRSTHDR(&msg);
- cmsg_hdr->cmsg_level = IPPROTO_UDP;
- cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
- cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
- anc_data = CMSG_DATA(cmsg_hdr);
- anc_data->flags = 0;
- anc_data->next_pkt_num = 0x0d65c9;
- anc_data->conn_id_length = conn_id_len;
- err = sendmsg(self->sfd, &msg, 0);
+QUIC Tx offload in kernel will read the data from userspace, encrypt and +copy it to the ciphertext within the same operation.
+Sending QUIC application data with GSO +-------------------------------------- +When GSO is in use, the kernel will use the GSO fragment size as the target +for ciphertext. The packets from the user space should align on the boundary +of GSO fragment size minus the size of the tag for the chosen cipher. For the +GSO fragment 1200, the plain packets should follow each other at every 1184 +bytes, given the tag size of 16. After the encryption, the rest of the UDP +and IP stacks will follow the defined value of GSO fragment which will include +the trailing tag bytes.
+To set up GSO fragmentation:
+.. code-block:: c
- setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
sizeof(frag_size));
+If the GSO fragment size is provided in ancillary data within the sendmsg() +call, the value in ancillary data will take precedence over the segment size +provided in setsockopt to split the payload into packets. This is consistent +with the UDP stack behavior.
+Integrating to userspace QUIC libraries +---------------------------------------
+Userspace QUIC libraries integration would depend on the implementation of the +QUIC protocol. For MVFST library, the control plane is integrated into the +handshake callbacks to properly configure the flows into the socket; and the +data plane is integrated into the methods that perform encryption and send +the packets to the batch scheduler for transmissions to the socket.
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+Statistics +==========
+QUIC Tx offload to the kernel has counters +(``/proc/net/quic_stat``):
+- ``QuicCurrTxSw`` -
- number of currently active kernel offloaded QUIC connections
+- ``QuicTxSw`` -
- accumulative total number of offloaded QUIC connections
+- ``QuicTxSwError`` -
- accumulative total number of errors during QUIC Tx offload to kernel
The documentation looks OK (no new warnings).
Thanks.