This series expands the XDP TX metadata framework to allow user applications to pass per packet 64-bit launch time directly to the kernel driver, requesting launch time hardware offload support. The XDP TX metadata framework will not perform any clock conversion or packet reordering.
Please note that the role of Tx metadata is just to pass the launch time, not to enable the offload feature. Users will need to enable the launch time hardware offload feature of the device by using the respective command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time hardware offload feature, xsk packets will not go through the etf qdisc. Therefore, in my opinion, the launch time should always be based on the PTP Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test tool in such a way that it will set the launch time based on the offset provided by the user and the value of the Receive Hardware Timestamp, which is against the PHC. This will eliminate the need to discipline System Clock with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the application if the requested launch time is invalid. So, users are expected to familiar with the horizon of the launch time of the device they use and not request a launch time that is beyond the horizon. Otherwise, the driver might interpret the launch time incorrectly and react wrongly. For stmmac and igc, where modulo computation is used, a launch time larger than the horizon will cause the device to transmit the packet earlier that the requested launch time.
Although there is no feedback mechanism for the launch time request for now, user still can check whether the requested launch time is working or not, by requesting the Transmit Completion Hardware Timestamp.
Changes since v1: - renamed to use Earliest TxTime First (Willem) - renamed to use txtime (Willem)
Changes since v2: - renamed to use launch time (Jesper & Willem) - changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s because some NICs do not support such a large future time.
Changes since v3: - added XDP launch time support to the igc driver (Jesper & Florian) - added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper) - added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub) - added step to enable launch time in the commit message (Jesper & Willem) - explicitly documented the type of launch_time and which clock source it is against (Willem)
Changes since v4: - change netdev feature name from tx-launch-time to tx-launch-time-fifo to explicitly state the FIFO behaviour (Stanislav) - improve the looping of xdp_hw_metadata app to wait for packet tx completion to be more readable by using clock_gettime() (Stanislav) - add launch time setup steps into xdp_hw_metadata app (Stanislav)
Changes since v5: - fix selftest build errors by using asprintf() and realloc() instead of managing the buffer sizes manually (Daniel, Stanislav)
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20231130162028.852006-1... v2: https://patchwork.kernel.org/project/netdevbpf/cover/20231201062421.1074768-... v3: https://patchwork.kernel.org/project/netdevbpf/cover/20231203165129.1740512-... v4: https://patchwork.kernel.org/project/netdevbpf/cover/20250106135506.9687-1-y... v5: https://patchwork.kernel.org/project/netdevbpf/cover/20250114152718.120588-1...
Song Yoong Siang (4): xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add launch time request to xdp_hw_metadata net: stmmac: Add launch time support to XDP ZC igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 + Documentation/networking/xsk-tx-metadata.rst | 62 +++++++ drivers/net/ethernet/intel/igc/igc_main.c | 78 +++++--- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 + .../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++ include/net/xdp_sock.h | 10 ++ include/net/xdp_sock_drv.h | 1 + include/uapi/linux/if_xdp.h | 10 ++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c | 2 + net/xdp/xsk.c | 3 + tools/include/uapi/linux/if_xdp.h | 10 ++ tools/include/uapi/linux/netdev.h | 3 + tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++- 14 files changed, 342 insertions(+), 27 deletions(-)
Extend the XDP Tx metadata framework so that user can requests launch time hardware offload, where the Ethernet device will schedule the packet for transmission at a pre-determined time called launch time. The value of launch time is communicated from user space to Ethernet driver via launch_time field of struct xsk_tx_metadata.
Suggested-by: Stanislav Fomichev sdf@fomichev.me Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com Acked-by: Stanislav Fomichev sdf@fomichev.me --- Documentation/netlink/specs/netdev.yaml | 4 ++ Documentation/networking/xsk-tx-metadata.rst | 62 ++++++++++++++++++++ include/net/xdp_sock.h | 10 ++++ include/net/xdp_sock_drv.h | 1 + include/uapi/linux/if_xdp.h | 10 ++++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c | 2 + net/xdp/xsk.c | 3 + tools/include/uapi/linux/if_xdp.h | 10 ++++ tools/include/uapi/linux/netdev.h | 3 + 10 files changed, 108 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index cbb544bd6c84..901b5afb3df0 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -70,6 +70,10 @@ definitions: name: tx-checksum doc: L3 checksum HW offload is supported by the driver. + - + name: tx-launch-time-fifo + doc: + Launch time HW offload is supported by the driver. - name: queue-type type: enum diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst index e76b0cfc32f7..df53a10ccac3 100644 --- a/Documentation/networking/xsk-tx-metadata.rst +++ b/Documentation/networking/xsk-tx-metadata.rst @@ -50,6 +50,10 @@ The flags field enables the particular offload: checksum. ``csum_start`` specifies byte offset of where the checksumming should start and ``csum_offset`` specifies byte offset where the device should store the computed checksum. +- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the + packet for transmission at a pre-determined time called launch time. The + value of launch time is indicated by ``launch_time`` field of + ``union xsk_tx_metadata``.
Besides the flags above, in order to trigger the offloads, the first packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` @@ -65,6 +69,63 @@ In this case, when running in ``XDK_COPY`` mode, the TX checksum is calculated on the CPU. Do not enable this option in production because it will negatively affect performance.
+Launch Time +=========== + +The value of the requested launch time should be based on the device's PTP +Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path +compared to the ETF queuing discipline, which organizes packets and delays +their transmission. Instead, AF_XDP immediately hands off the packets to +the device driver without rearranging their order or holding them prior to +transmission. Since the driver maintains FIFO behavior and does not perform +packet reordering, a packet with a launch time request will block other +packets in the same Tx Queue until it is sent. Therefore, it is recommended +to allocate separate queue for scheduling traffic that is intended for +future transmission. + +In scenarios where the launch time offload feature is disabled, the device +driver is expected to disregard the launch time request. For correct +interpretation and meaningful operation, the launch time should never be +set to a value larger than the farthest programmable time in the future +(the horizon). Different devices have different hardware limitations on the +launch time offload feature. + +stmmac driver +------------- + +For stmmac, TSO and launch time (TBS) features are mutually exclusive for +each individual Tx Queue. By default, the driver configures Tx Queue 0 to +support TSO and the rest of the Tx Queues to support TBS. The launch time +hardware offload feature can be enabled or disabled by using the tc-etf +command to call the driver's ndo_setup_tc() callback. + +The value of the launch time that is programmed in the Enhanced Normal +Transmit Descriptors is a 32-bit value, where the most significant 8 bits +represent the time in seconds and the remaining 24 bits represent the time +in 256 ns increments. The programmed launch time is compared against the +PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the +horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the +future. + +igc driver +---------- + +For igc, all four Tx Queues support the launch time feature. The launch +time hardware offload feature can be enabled or disabled by using the +tc-etf command to call the driver's ndo_setup_tc() callback. When entering +TSN mode, the igc driver will reset the device and create a default Qbv +schedule with a 1-second cycle time, with all Tx Queues open at all times. + +The value of the launch time that is programmed in the Advanced Transmit +Context Descriptor is a relative offset to the starting time of the Qbv +transmission window of the queue. The Frst flag of the descriptor can be +set to schedule the packet for the next Qbv cycle. Therefore, the horizon +of the launch time for i225 and i226 is the ending time of the next cycle +of the Qbv transmission window of the queue. For example, when the Qbv +cycle time is set to 1 second, the horizon of the launch time ranges +from 1 second to 2 seconds, depending on where the Qbv cycle is currently +running. + Querying Device Capabilities ============================
@@ -74,6 +135,7 @@ Refer to ``xsk-flags`` features bitmask in
- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP`` - ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM`` +- ``tx-launch-time-fifo``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME``
See ``tools/net/ynl/samples/netdev.c`` on how to query this information.
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index bfe625b55d55..a58ae7589d12 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -110,11 +110,16 @@ struct xdp_sock { * indicates position where checksumming should start. * csum_offset indicates position where checksum should be stored. * + * void (*tmo_request_launch_time)(u64 launch_time, void *priv) + * Called when AF_XDP frame requested launch time HW offload support. + * launch_time indicates the PTP time at which the device can schedule the + * packet for transmission. */ struct xsk_tx_metadata_ops { void (*tmo_request_timestamp)(void *priv); u64 (*tmo_fill_timestamp)(void *priv); void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv); + void (*tmo_request_launch_time)(u64 launch_time, void *priv); };
#ifdef CONFIG_XDP_SOCKETS @@ -162,6 +167,11 @@ static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta, if (!meta) return;
+ if (ops->tmo_request_launch_time) + if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME) + ops->tmo_request_launch_time(meta->request.launch_time, + priv); + if (ops->tmo_request_timestamp) if (meta->flags & XDP_TXMD_FLAGS_TIMESTAMP) ops->tmo_request_timestamp(priv); diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 40085afd9160..78af371bc002 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -198,6 +198,7 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) #define XDP_TXMD_FLAGS_VALID ( \ XDP_TXMD_FLAGS_TIMESTAMP | \ XDP_TXMD_FLAGS_CHECKSUM | \ + XDP_TXMD_FLAGS_LAUNCH_TIME | \ 0)
static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta) diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 42ec5ddaab8d..42869770776e 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -127,6 +127,12 @@ struct xdp_options { */ #define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
+/* Request launch time hardware offload. The device will schedule the packet for + * transmission at a pre-determined time called launch time. The value of + * launch time is communicated via launch_time field of struct xsk_tx_metadata. + */ +#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2) + /* AF_XDP offloads request. 'request' union member is consumed by the driver * when the packet is being transmitted. 'completion' union member is * filled by the driver when the transmit completion arrives. @@ -142,6 +148,10 @@ struct xsk_tx_metadata { __u16 csum_start; /* Offset from csum_start where checksum should be stored. */ __u16 csum_offset; + + /* XDP_TXMD_FLAGS_LAUNCH_TIME */ + /* Launch time in nanosecond against the PTP HW Clock */ + __u64 launch_time; } request;
struct { diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index e4be227d3ad6..5ab85f4af009 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata { * by the driver. * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the * driver. + * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the + * driver. */ enum netdev_xsk_flags { NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, + NETDEV_XSK_FLAGS_LAUNCH_TIME = 4, };
enum netdev_queue_type { diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 9527dd46e4dc..e2515cf9190f 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -52,6 +52,8 @@ XDP_METADATA_KFUNC_xxx xsk_features |= NETDEV_XSK_FLAGS_TX_TIMESTAMP; if (netdev->xsk_tx_metadata_ops->tmo_request_checksum) xsk_features |= NETDEV_XSK_FLAGS_TX_CHECKSUM; + if (netdev->xsk_tx_metadata_ops->tmo_request_launch_time) + xsk_features |= NETDEV_XSK_FLAGS_LAUNCH_TIME; }
if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) || diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3fa70286c846..8feaa0e86f07 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -743,6 +743,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, goto free_err; } } + + if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME) + skb->skb_mstamp_ns = meta->request.launch_time; } }
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 2f082b01ff22..67719f8966c2 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -127,6 +127,12 @@ struct xdp_options { */ #define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
+/* Request launch time hardware offload. The device will schedule the packet for + * transmission at a pre-determined time called launch time. The value of + * launch time is communicated via launch_time field of struct xsk_tx_metadata. + */ +#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2) + /* AF_XDP offloads request. 'request' union member is consumed by the driver * when the packet is being transmitted. 'completion' union member is * filled by the driver when the transmit completion arrives. @@ -142,6 +148,10 @@ struct xsk_tx_metadata { __u16 csum_start; /* Offset from csum_start where checksum should be stored. */ __u16 csum_offset; + + /* XDP_TXMD_FLAGS_LAUNCH_TIME */ + /* Launch time in nanosecond against the PTP HW Clock */ + __u64 launch_time; } request;
struct { diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index e4be227d3ad6..5ab85f4af009 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata { * by the driver. * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the * driver. + * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the + * driver. */ enum netdev_xsk_flags { NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, + NETDEV_XSK_FLAGS_LAUNCH_TIME = 4, };
enum netdev_queue_type {
Add launch time hardware offload request to xdp_hw_metadata. Users can configure the delta of launch time relative to HW RX-time using the "-l" argument. By default, the delta is set to 0 ns, which means the launch time is disabled. By setting the delta to a non-zero value, the launch time hardware offload feature will be enabled and requested. Additionally, users can configure the Tx Queue to be enabled with the launch time hardware offload using the "-L" argument. By default, Tx Queue 0 will be used.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com --- tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++- 1 file changed, 163 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c index 06266aad2f99..706eecabf278 100644 --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -13,6 +13,7 @@ * - UDP 9091 packets trigger TX reply * - TX HW timestamp is requested and reported back upon completion * - TX checksum is requested + * - TX launch time HW offload is requested for transmission */
#include <test_progs.h> @@ -37,6 +38,15 @@ #include <time.h> #include <unistd.h> #include <libgen.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> +#include <linux/pkt_sched.h> +#include <linux/pkt_cls.h> +#include <linux/ethtool.h> +#include <sys/socket.h> +#include <arpa/inet.h>
#include "xdp_metadata.h"
@@ -64,6 +74,18 @@ int rxq; bool skip_tx; __u64 last_hw_rx_timestamp; __u64 last_xdp_rx_timestamp; +__u64 last_launch_time; +__u64 launch_time_delta_to_hw_rx_timestamp; +int launch_time_queue; + +#define run_command(cmd, ...) \ +({ \ + char command[1024]; \ + memset(command, 0, sizeof(command)); \ + snprintf(command, sizeof(command), cmd, ##__VA_ARGS__); \ + fprintf(stderr, "Running: %s\n", command); \ + system(command); \ +})
void test__fail(void) { /* for network_helpers.c */ }
@@ -298,6 +320,12 @@ static bool complete_tx(struct xsk *xsk, clockid_t clock_id) if (meta->completion.tx_timestamp) { __u64 ref_tstamp = gettime(clock_id);
+ if (launch_time_delta_to_hw_rx_timestamp) { + print_tstamp_delta("HW Launch-time", + "HW TX-complete-time", + last_launch_time, + meta->completion.tx_timestamp); + } print_tstamp_delta("HW TX-complete-time", "User TX-complete-time", meta->completion.tx_timestamp, ref_tstamp); print_tstamp_delta("XDP RX-time", "User TX-complete-time", @@ -395,6 +423,17 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, clockid_t clock_id) xsk, ntohs(udph->check), ntohs(want_csum), meta->request.csum_start, meta->request.csum_offset);
+ /* Set the value of launch time */ + if (launch_time_delta_to_hw_rx_timestamp) { + meta->flags |= XDP_TXMD_FLAGS_LAUNCH_TIME; + meta->request.launch_time = last_hw_rx_timestamp + + launch_time_delta_to_hw_rx_timestamp; + last_launch_time = meta->request.launch_time; + print_tstamp_delta("HW RX-time", "HW Launch-time", + last_hw_rx_timestamp, + meta->request.launch_time); + } + memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity */ tx_desc->options |= XDP_TX_METADATA; tx_desc->len = len; @@ -407,6 +446,7 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t const struct xdp_desc *rx_desc; struct pollfd fds[rxq + 1]; __u64 comp_addr; + __u64 deadline; __u64 addr; __u32 idx = 0; int ret; @@ -477,9 +517,15 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t if (ret) printf("kick_tx ret=%d\n", ret);
- for (int j = 0; j < 500; j++) { + /* wait 1 second + cover launch time */ + deadline = gettime(clock_id) + + NANOSEC_PER_SEC + + launch_time_delta_to_hw_rx_timestamp; + while (true) { if (complete_tx(xsk, clock_id)) break; + if (gettime(clock_id) >= deadline) + break; usleep(10); } } @@ -607,6 +653,10 @@ static void print_usage(void) " -h Display this help and exit\n\n" " -m Enable multi-buffer XDP for larger MTU\n" " -r Don't generate AF_XDP reply (rx metadata only)\n" + " -l Delta of launch time relative to HW RX-time in ns\n" + " default: 0 ns (launch time request is disabled)\n" + " -L Tx Queue to be enabled with launch time offload\n" + " default: 0 (Tx Queue 0)\n" "Generate test packets on the other machine with:\n" " echo -n xdp | nc -u -q1 <dst_ip> 9091\n";
@@ -617,7 +667,7 @@ static void read_args(int argc, char *argv[]) { int opt;
- while ((opt = getopt(argc, argv, "chmr")) != -1) { + while ((opt = getopt(argc, argv, "chmrl:L:")) != -1) { switch (opt) { case 'c': bind_flags &= ~XDP_USE_NEED_WAKEUP; @@ -633,6 +683,12 @@ static void read_args(int argc, char *argv[]) case 'r': skip_tx = true; break; + case 'l': + launch_time_delta_to_hw_rx_timestamp = atoll(optarg); + break; + case 'L': + launch_time_queue = atoll(optarg); + break; case '?': if (isprint(optopt)) fprintf(stderr, "Unknown option: -%c\n", optopt); @@ -656,23 +712,118 @@ static void read_args(int argc, char *argv[]) error(-1, errno, "Invalid interface name"); }
+void clean_existing_configurations(void) +{ + /* Check and delete root qdisc if exists */ + if (run_command("sudo tc qdisc show dev %s | grep -q 'qdisc mqprio 8001:'", ifname) == 0) + run_command("sudo tc qdisc del dev %s root", ifname); + + /* Check and delete ingress qdisc if exists */ + if (run_command("sudo tc qdisc show dev %s | grep -q 'qdisc ingress ffff:'", ifname) == 0) + run_command("sudo tc qdisc del dev %s ingress", ifname); + + /* Check and delete ethtool filters if any exist */ + if (run_command("sudo ethtool -n %s | grep -q 'Filter:'", ifname) == 0) { + run_command("sudo ethtool -n %s | grep 'Filter:' | awk '{print $2}' | xargs -n1 sudo ethtool -N %s delete >&2", + ifname, ifname); + } +} + +#define MAX_TC 16 + int main(int argc, char *argv[]) { clockid_t clock_id = CLOCK_TAI; + struct bpf_program *prog; int server_fd = -1; + size_t map_len = 0; + size_t que_len = 0; + char *buf = NULL; + char *map = NULL; + char *que = NULL; + char *tmp = NULL; + int tc = 0; int ret; int i;
- struct bpf_program *prog; - read_args(argc, argv);
rxq = rxq_num(ifname); - printf("rxq: %d\n", rxq);
+ if (launch_time_queue >= rxq || launch_time_queue < 0) + error(1, 0, "Invalid launch_time_queue."); + + clean_existing_configurations(); + sleep(1); + + /* Enable tx and rx hardware timestamping */ hwtstamp_enable(ifname);
+ /* Prepare priority to traffic class map for tc-mqprio */ + for (i = 0; i < MAX_TC; i++) { + if (i < rxq) + tc = i; + + if (asprintf(&buf, "%d ", tc) == -1) { + printf("Failed to malloc buf for tc map.\n"); + goto free_mem; + } + + map_len += strlen(buf); + tmp = realloc(map, map_len + 1); + if (!tmp) { + printf("Failed to realloc tc map.\n"); + goto free_mem; + } + map = tmp; + strcat(map, buf); + free(buf); + buf = NULL; + } + + /* Prepare traffic class to hardware queue map for tc-mqprio */ + for (i = 0; i <= tc; i++) { + if (asprintf(&buf, "1@%d ", i) == -1) { + printf("Failed to malloc buf for tc queues.\n"); + goto free_mem; + } + + que_len += strlen(buf); + tmp = realloc(que, que_len + 1); + if (!tmp) { + printf("Failed to realloc tc queues.\n"); + goto free_mem; + } + que = tmp; + strcat(que, buf); + free(buf); + buf = NULL; + } + + /* Add mqprio qdisc */ + run_command("sudo tc qdisc add dev %s handle 8001: parent root mqprio num_tc %d map %squeues %shw 0", + ifname, tc + 1, map, que); + + /* To test launch time, send UDP packet with VLAN priority 1 to port 9091 */ + if (launch_time_delta_to_hw_rx_timestamp) { + /* Enable launch time hardware offload on launch_time_queue */ + run_command("sudo tc qdisc replace dev %s parent 8001:%d etf offload clockid CLOCK_TAI delta 500000", + ifname, launch_time_queue + 1); + sleep(1); + + /* Route incoming packet with VLAN priority 1 into launch_time_queue */ + if (run_command("sudo ethtool -N %s flow-type ether vlan 0x2000 vlan-mask 0x1FFF action %d", + ifname, launch_time_queue)) { + run_command("sudo tc qdisc add dev %s ingress", ifname); + run_command("sudo tc filter add dev %s parent ffff: protocol 802.1Q flower vlan_prio 1 hw_tc %d", + ifname, launch_time_queue); + } + + /* Enable VLAN tag stripping offload */ + run_command("sudo ethtool -K %s rxvlan on", ifname); + } + rx_xsk = malloc(sizeof(struct xsk) * rxq); if (!rx_xsk) error(1, ENOMEM, "malloc"); @@ -732,4 +883,11 @@ int main(int argc, char *argv[]) cleanup(); if (ret) error(1, -ret, "verify_metadata"); + + clean_existing_configurations(); + +free_mem: + free(buf); + free(map); + free(que); }
On 01/16, Song Yoong Siang wrote:
Add launch time hardware offload request to xdp_hw_metadata. Users can configure the delta of launch time relative to HW RX-time using the "-l" argument. By default, the delta is set to 0 ns, which means the launch time is disabled. By setting the delta to a non-zero value, the launch time hardware offload feature will be enabled and requested. Additionally, users can configure the Tx Queue to be enabled with the launch time hardware offload using the "-L" argument. By default, Tx Queue 0 will be used.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
Forgot to add:
Acked-by: Stanislav Fomichev sdf@fomichev.me
Enable launch time (Time-Based Scheduling) support to XDP zero copy via XDP Tx metadata framework.
This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel Tiger Lake platform. Below are the test steps and result.
Test Steps: 1. At DUT, start xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp0s30f4 -l 1000000000 -L 1
2. At Link Partner, send an UDP packet with VLAN priority 1 to port 9091 of DUT.
When launch time is set to 1s in the future, the delta between launch time and transmit hardware timestamp is equal to 16.963us, as shown in result below: 0x55b5864717a8: rx_desc[4]->addr=88100 addr=88100 comp_addr=88100 EoP No rx_hash, err=-95 HW RX-time: 1734579065767717328 (sec:1734579065.7677) delta to User RX-time sec:0.0004 (375.624 usec) XDP RX-time: 1734579065768004454 (sec:1734579065.7680) delta to User RX-time sec:0.0001 (88.498 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x55b5864717a8: ping-pong with csum=5619 (want 0000) csum_start=34 csum_offset=6 HW RX-time: 1734579065767717328 (sec:1734579065.7677) delta to HW Launch-time sec:1.0000 (1000000.000 usec) 0x55b5864717a8: complete tx idx=4 addr=4018 HW Launch-time: 1734579066767717328 (sec:1734579066.7677) delta to HW TX-complete-time sec:0.0000 (16.963 usec) HW TX-complete-time: 1734579066767734291 (sec:1734579066.7677) delta to User TX-complete-time sec:0.0001 (130.408 usec) XDP RX-time: 1734579065768004454 (sec:1734579065.7680) delta to User TX-complete-time sec:0.9999 (999860.245 usec) HW RX-time: 1734579065767717328 (sec:1734579065.7677) delta to HW TX-complete-time sec:1.0000 (1000016.963 usec) 0x55b5864717a8: complete rx idx=132 addr=88100
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 ++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +++++++++++++ 2 files changed, 15 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 1d86439b8a14..c80462d42989 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -106,6 +106,8 @@ struct stmmac_metadata_request { struct stmmac_priv *priv; struct dma_desc *tx_desc; bool *set_ic; + struct dma_edesc *edesc; + int tbs; };
struct stmmac_xsk_tx_complete { diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index c81ea8cdfe6e..3a083e3684ed 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2445,9 +2445,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv) return 0; }
+static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv) +{ + struct stmmac_metadata_request *meta_req = _priv; + struct timespec64 ts = ns_to_timespec64(launch_time); + + if (meta_req->tbs & STMMAC_TBS_EN) + stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec, + ts.tv_nsec); +} + static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = { .tmo_request_timestamp = stmmac_xsk_request_timestamp, .tmo_fill_timestamp = stmmac_xsk_fill_timestamp, + .tmo_request_launch_time = stmmac_xsk_request_launch_time, };
static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget) @@ -2531,6 +2542,8 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget) meta_req.priv = priv; meta_req.tx_desc = tx_desc; meta_req.set_ic = &set_ic; + meta_req.tbs = tx_q->tbs; + meta_req.edesc = &tx_q->dma_entx[entry]; xsk_tx_metadata_request(meta, &stmmac_xsk_tx_metadata_ops, &meta_req); if (set_ic) {
Enable Launch Time Control (LTC) support to XDP zero copy via XDP Tx metadata framework.
This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel I225-LM Ethernet controller. Below are the test steps and result.
Test Steps: 1. At DUT, start xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
2. At Link Partner, send an UDP packet with VLAN priority 1 to port 9091 of DUT.
When launch time is set to 1s in the future, the delta between launch time and transmit hardware timestamp is equal to 0.016us, as shown in result below: 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP rx_hash: 0xE343384 with RSS type:0x1 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to User RX-time sec:0.0002 (183.103 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User RX-time sec:0.0001 (80.309 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) csum_start=34 csum_offset=6 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW Launch-time sec:1.0000 (1000000.000 usec) 0x562ff5dc8880: complete tx idx=4 addr=4018 HW Launch-time: 1734578016467548904 (sec:1734578016.4675) delta to HW TX-complete-time sec:0.0000 (0.016 usec) HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) delta to User TX-complete-time sec:0.0000 (32.546 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User TX-complete-time sec:0.9999 (999929.768 usec) HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) 0x562ff5dc8880: complete rx idx=132 addr=84110
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com --- drivers/net/ethernet/intel/igc/igc_main.c | 78 ++++++++++++++++------- 1 file changed, 56 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 27872bdea9bd..6857f5f5b4b2 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter *adapter, struct sk_buff *s return false; }
+static void igc_insert_empty_packet(struct igc_ring *tx_ring) +{ + struct igc_tx_buffer *empty_info; + struct sk_buff *empty; + void *data; + + empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use]; + empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC); + if (!empty) + return; + + data = skb_put(empty, IGC_EMPTY_FRAME_SIZE); + memset(data, 0, IGC_EMPTY_FRAME_SIZE); + + igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0); + + if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0) + dev_kfree_skb_any(empty); +} + static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, struct igc_ring *tx_ring) { @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, skb->tstamp = ktime_set(0, 0); launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, &insert_empty);
- if (insert_empty) { - struct igc_tx_buffer *empty_info; - struct sk_buff *empty; - void *data; - - empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use]; - empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC); - if (!empty) - goto done; - - data = skb_put(empty, IGC_EMPTY_FRAME_SIZE); - memset(data, 0, IGC_EMPTY_FRAME_SIZE); - - igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0); - - if (igc_init_tx_empty_descriptor(tx_ring, - empty, - empty_info) < 0) - dev_kfree_skb_any(empty); - } + if (insert_empty) + igc_insert_empty_packet(tx_ring);
done: /* record the location of the first descriptor for this packet */ @@ -2955,9 +2957,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) return *(u64 *)_priv; }
+static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) +{ + struct igc_metadata_request *meta_req = _priv; + struct igc_ring *tx_ring = meta_req->tx_ring; + __le32 launch_time_offset; + bool insert_empty = false; + bool first_flag = false; + + if (!tx_ring->launchtime_enable) + return; + + launch_time_offset = igc_tx_launchtime(tx_ring, + ns_to_ktime(launch_time), + &first_flag, &insert_empty); + if (insert_empty) { + igc_insert_empty_packet(tx_ring); + meta_req->tx_buffer = + &tx_ring->tx_buffer_info[tx_ring->next_to_use]; + } + + igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); +} + const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { .tmo_request_timestamp = igc_xsk_request_timestamp, .tmo_fill_timestamp = igc_xsk_fill_timestamp, + .tmo_request_launch_time = igc_xsk_request_launch_time, };
static void igc_xdp_xmit_zc(struct igc_ring *ring) @@ -2980,7 +3006,7 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu = ring->next_to_use; budget = igc_desc_unused(ring);
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { + while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { struct igc_metadata_request meta_req; struct xsk_tx_metadata *meta = NULL; struct igc_tx_buffer *bi; @@ -3004,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, &meta_req);
+ /* xsk_tx_metadata_request() may have updated next_to_use */ + ntu = ring->next_to_use; + + /* xsk_tx_metadata_request() may have updated Tx buffer info */ + bi = meta_req.tx_buffer; + tx_desc = IGC_TX_DESC(ring, ntu); tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type); tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); @@ -3021,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu++; if (ntu == ring->count) ntu = 0; + + ring->next_to_use = ntu; + budget = igc_desc_unused(ring); }
- ring->next_to_use = ntu; if (tx_desc) { igc_flush_tx_descriptors(ring); xsk_tx_release(pool);
Hi Siang.
On 16/1/2025 11:53 pm, Song Yoong Siang wrote:
Enable Launch Time Control (LTC) support to XDP zero copy via XDP Tx metadata framework.
This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel I225-LM Ethernet controller. Below are the test steps and result.
Test Steps:
At DUT, start xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
At Link Partner, send an UDP packet with VLAN priority 1 to port 9091 of DUT.
When launch time is set to 1s in the future, the delta between launch time and transmit hardware timestamp is equal to 0.016us, as shown in result below: 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP rx_hash: 0xE343384 with RSS type:0x1 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to User RX-time sec:0.0002 (183.103 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User RX-time sec:0.0001 (80.309 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) csum_start=34 csum_offset=6 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW Launch-time sec:1.0000 (1000000.000 usec) 0x562ff5dc8880: complete tx idx=4 addr=4018 HW Launch-time: 1734578016467548904 (sec:1734578016.4675) delta to HW TX-complete-time sec:0.0000 (0.016 usec) HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) delta to User TX-complete-time sec:0.0000 (32.546 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User TX-complete-time sec:0.9999 (999929.768 usec) HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) 0x562ff5dc8880: complete rx idx=132 addr=84110
To be cautious, could we perform a stress test by sending a higher number of packets with launch time? For example, we could send 200 packets, each configured with a launch time, and verify that the driver continues to function correctly afterward.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
drivers/net/ethernet/intel/igc/igc_main.c | 78 ++++++++++++++++------- 1 file changed, 56 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 27872bdea9bd..6857f5f5b4b2 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter *adapter, struct sk_buff *s return false; } +static void igc_insert_empty_packet(struct igc_ring *tx_ring) +{
- struct igc_tx_buffer *empty_info;
- struct sk_buff *empty;
- void *data;
- empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
- empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
- if (!empty)
return;
- data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
- memset(data, 0, IGC_EMPTY_FRAME_SIZE);
- igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
- if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
dev_kfree_skb_any(empty);
+}
The function igc_insert_empty_packet() appears to wrap existing code to enhance reusability, with no new changes related to enabling launch-time XDP ZC functionality. If so, could we split this into a separate commit? This would make it clearer for the reader to distinguish between the refactoring changes and the new changes related to enabling launch-time XDP ZC support.
static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, struct igc_ring *tx_ring) { @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, skb->tstamp = ktime_set(0, 0); launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, &insert_empty);
- if (insert_empty) {
struct igc_tx_buffer *empty_info;
struct sk_buff *empty;
void *data;
empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
if (!empty)
goto done;
data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
memset(data, 0, IGC_EMPTY_FRAME_SIZE);
igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
if (igc_init_tx_empty_descriptor(tx_ring,
empty,
empty_info) < 0)
dev_kfree_skb_any(empty);
- }
- if (insert_empty)
igc_insert_empty_packet(tx_ring);
done: /* record the location of the first descriptor for this packet */ @@ -2955,9 +2957,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) return *(u64 *)_priv; } +static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) +{
- struct igc_metadata_request *meta_req = _priv;
- struct igc_ring *tx_ring = meta_req->tx_ring;
- __le32 launch_time_offset;
- bool insert_empty = false;
- bool first_flag = false;
- if (!tx_ring->launchtime_enable)
return;
- launch_time_offset = igc_tx_launchtime(tx_ring,
ns_to_ktime(launch_time),
&first_flag, &insert_empty);
- if (insert_empty) {
igc_insert_empty_packet(tx_ring);
meta_req->tx_buffer =
&tx_ring->tx_buffer_info[tx_ring->next_to_use];
- }
- igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
+}
- const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { .tmo_request_timestamp = igc_xsk_request_timestamp, .tmo_fill_timestamp = igc_xsk_fill_timestamp,
- .tmo_request_launch_time = igc_xsk_request_launch_time, };
static void igc_xdp_xmit_zc(struct igc_ring *ring) @@ -2980,7 +3006,7 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu = ring->next_to_use; budget = igc_desc_unused(ring);
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
Could we add some explanation on what & why the value "4" is used ?
On 20/1/2025 2:25 pm, Abdul Rahim, Faizal wrote:
To be cautious, could we perform a stress test by sending a higher number of packets with launch time? For example, we could send 200 packets, each configured with a launch time, and verify that the driver continues to function correctly afterward.
I agree on this point. Could you perform the same stress test on the STMMAC driver as well?
On Monday, January 20, 2025 3:25 PM, Choong Yong Liang yong.liang.choong@linux.intel.com wrote:
On 20/1/2025 2:25 pm, Abdul Rahim, Faizal wrote:
To be cautious, could we perform a stress test by sending a higher number of packets with launch time? For example, we could send 200 packets, each configured with a launch time, and verify that the driver continues to function correctly afterward.
I agree on this point. Could you perform the same stress test on the STMMAC driver as well?
Hi Yong Liang,
Sure. I will perform the same tests on stmmac and share the results.
Thanks & Regards Siang
On Monday, January 20, 2025 2:26 PM, Abdul Rahim, Faizal faizal.abdul.rahim@linux.intel.com wrote:
Hi Siang.
On 16/1/2025 11:53 pm, Song Yoong Siang wrote:
Enable Launch Time Control (LTC) support to XDP zero copy via XDP Tx metadata framework.
This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel I225-LM Ethernet controller. Below are the test steps and result.
Test Steps:
At DUT, start xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
At Link Partner, send an UDP packet with VLAN priority 1 to port 9091 of DUT.
When launch time is set to 1s in the future, the delta between launch time and transmit hardware timestamp is equal to 0.016us, as shown in result below: 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP rx_hash: 0xE343384 with RSS type:0x1 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to User RX-
time sec:0.0002 (183.103 usec)
XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User RX-
time sec:0.0001 (80.309 usec)
No rx_vlan_tci or rx_vlan_proto, err=-95 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) csum_start=34
csum_offset=6
HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW
Launch-time sec:1.0000 (1000000.000 usec)
0x562ff5dc8880: complete tx idx=4 addr=4018 HW Launch-time: 1734578016467548904 (sec:1734578016.4675) delta to HW
TX-complete-time sec:0.0000 (0.016 usec)
HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) delta
to User TX-complete-time sec:0.0000 (32.546 usec)
XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User TX-
complete-time sec:0.9999 (999929.768 usec)
HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW TX-
complete-time sec:1.0000 (1000000.016 usec)
0x562ff5dc8880: complete rx idx=132 addr=84110
To be cautious, could we perform a stress test by sending a higher number of packets with launch time? For example, we could send 200 packets, each configured with a launch time, and verify that the driver continues to function correctly afterward.
Hi Faizal,
Thanks for your review comments. Sure, I can send continuous packets with short interval and share the result in commit msg.
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
drivers/net/ethernet/intel/igc/igc_main.c | 78 ++++++++++++++++------- 1 file changed, 56 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
b/drivers/net/ethernet/intel/igc/igc_main.c
index 27872bdea9bd..6857f5f5b4b2 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter
*adapter, struct sk_buff *s
return false; }
+static void igc_insert_empty_packet(struct igc_ring *tx_ring) +{
- struct igc_tx_buffer *empty_info;
- struct sk_buff *empty;
- void *data;
- empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
- empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
- if (!empty)
return;
- data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
- memset(data, 0, IGC_EMPTY_FRAME_SIZE);
- igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
- if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
dev_kfree_skb_any(empty);
+}
The function igc_insert_empty_packet() appears to wrap existing code to enhance reusability, with no new changes related to enabling launch-time XDP ZC functionality. If so, could we split this into a separate commit? This would make it clearer for the reader to distinguish between the refactoring changes and the new changes related to enabling launch-time XDP ZC support.
I am ok to split the patch into two. Will do it on next version submission.
static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, struct igc_ring *tx_ring) { @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct
sk_buff *skb,
skb->tstamp = ktime_set(0, 0); launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag,
&insert_empty);
- if (insert_empty) {
struct igc_tx_buffer *empty_info;
struct sk_buff *empty;
void *data;
empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
if (!empty)
goto done;
data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
memset(data, 0, IGC_EMPTY_FRAME_SIZE);
igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
if (igc_init_tx_empty_descriptor(tx_ring,
empty,
empty_info) < 0)
dev_kfree_skb_any(empty);
- }
if (insert_empty)
igc_insert_empty_packet(tx_ring);
done: /* record the location of the first descriptor for this packet */
@@ -2955,9 +2957,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) return *(u64 *)_priv; }
+static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) +{
- struct igc_metadata_request *meta_req = _priv;
- struct igc_ring *tx_ring = meta_req->tx_ring;
- __le32 launch_time_offset;
- bool insert_empty = false;
- bool first_flag = false;
- if (!tx_ring->launchtime_enable)
return;
- launch_time_offset = igc_tx_launchtime(tx_ring,
ns_to_ktime(launch_time),
&first_flag, &insert_empty);
- if (insert_empty) {
igc_insert_empty_packet(tx_ring);
meta_req->tx_buffer =
&tx_ring->tx_buffer_info[tx_ring->next_to_use];
- }
- igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0);
+}
const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { .tmo_request_timestamp = igc_xsk_request_timestamp, .tmo_fill_timestamp = igc_xsk_fill_timestamp,
.tmo_request_launch_time = igc_xsk_request_launch_time, };
static void igc_xdp_xmit_zc(struct igc_ring *ring)
@@ -2980,7 +3006,7 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu = ring->next_to_use; budget = igc_desc_unused(ring);
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) {
Could we add some explanation on what & why the value "4" is used ?
It is because packet with launch time needs 2 descriptors and same goes for the empty packets. Thus, total need 4 descriptors. I will add detail explanation.
Thanks & Regards Siang
Hi Siang,
I tested this patch series on 6.13 with Intel I226-LM (rev 04).
I also applied patch "selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata" [1] and "selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata" [2] so that TX timestamps work.
HW RX-timestamp was small (0.5956 instead of 1737373125.5956):
HW RX-time: 595572448 (sec:0.5956) delta to User RX-time sec:1737373124.9873 (1737373124987318.750 usec) XDP RX-time: 1737373125582798388 (sec:1737373125.5828) delta to User RX-time sec:0.0001 (92.733 usec)
Igc's raw HW RX-timestamp in front of frame data was overwritten by BPF program on line 90 in tools/testing/selftests/bpf: meta->hint_valid = 0;
"HW timestamp has been copied into local variable" comment is outdated on line 2813 in drivers/net/ethernet/intel/igc/igc_main.c after commit 069b142f5819 igc: Add support for PTP .getcyclesx64() [3].
Workaround is to add unused data to xdp_meta struct:
--- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -49,4 +49,5 @@ struct xdp_meta { __s32 rx_vlan_tag_err; }; enum xdp_meta_field hint_valid; + __u8 avoid_IGC_TS_HDR_LEN[16]; };
But Launch time still does not work:
HW Launch-time: 1737374407515922696 (sec:1737374407.5159) delta to HW TX-complete-time sec:-0.9999 (-999923.649 usec)
Command "sudo ethtool -X enp1s0 start 1 equal 1" was in v4 [4] but is not in v6. Was that intentional? After executing it Launch time feature works:
HW Launch-time: 1737374618088557111 (sec:1737374618.0886) delta to HW TX-complete-time sec:0.0000 (0.012 usec)
Thank you for XDP launch time support!
[1] https://lore.kernel.org/linux-kernel/20241205044258.3155799-1-yoong.siang.so... [2] https://lore.kernel.org/linux-kernel/20241205051936.3156307-1-yoong.siang.so... [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [4] https://lore.kernel.org/linux-kernel/20250106135724.9749-1-yoong.siang.song@...
Best regards, Zdenek Bouska
-- Siemens, s.r.o Foundational Technologies
-----Original Message----- From: Song Yoong Siang yoong.siang.song@intel.com Sent: Thursday, January 16, 2025 4:54 PM To: David S . Miller davem@davemloft.net; Eric Dumazet edumazet@google.com; Jakub Kicinski kuba@kernel.org; Paolo Abeni pabeni@redhat.com; Simon Horman horms@kernel.org; Willem de Bruijn willemb@google.com; Bezdeka, Florian (FT RPD CED OES-DE) florian.bezdeka@siemens.com; Donald Hunter donald.hunter@gmail.com; Jonathan Corbet corbet@lwn.net; Bjorn Topel bjorn@kernel.org; Magnus Karlsson magnus.karlsson@intel.com; Maciej Fijalkowski maciej.fijalkowski@intel.com; Jonathan Lemon jonathan.lemon@gmail.com; Andrew Lunn andrew+netdev@lunn.ch; Alexei Starovoitov ast@kernel.org; Daniel Borkmann daniel@iogearbox.net; Jesper Dangaard Brouer hawk@kernel.org; John Fastabend john.fastabend@gmail.com; Joe Damato jdamato@fastly.com; Stanislav Fomichev sdf@fomichev.me; Xuan Zhuo xuanzhuo@linux.alibaba.com; Mina Almasry almasrymina@google.com; Daniel Jurgens danielj@nvidia.com; Song Yoong Siang yoong.siang.song@intel.com; Andrii Nakryiko andrii@kernel.org; Eduard Zingerman eddyz87@gmail.com; Mykola Lysenko mykolal@fb.com; Martin KaFai Lau martin.lau@linux.dev; Song Liu song@kernel.org; Yonghong Song yonghong.song@linux.dev; KP Singh kpsingh@kernel.org; Hao Luo haoluo@google.com; Jiri Olsa jolsa@kernel.org; Shuah Khan shuah@kernel.org; Alexandre Torgue alexandre.torgue@foss.st.com; Jose Abreu joabreu@synopsys.com; Maxime Coquelin mcoquelin.stm32@gmail.com; Tony Nguyen anthony.l.nguyen@intel.com; Przemek Kitszel przemyslaw.kitszel@intel.com Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux- doc@vger.kernel.org; bpf@vger.kernel.org; linux-kselftest@vger.kernel.org; linux-stm32@st-md-mailman.stormreply.com; linux-arm- kernel@lists.infradead.org; intel-wired-lan@lists.osuosl.org; xdp-hints@xdp- project.net Subject: [PATCH bpf-next v6 4/4] igc: Add launch time support to XDP ZC
Enable Launch Time Control (LTC) support to XDP zero copy via XDP Tx metadata framework.
This patch is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel I225-LM Ethernet controller. Below are the test steps and result.
Test Steps:
At DUT, start xdp_hw_metadata selftest application: $ sudo ./xdp_hw_metadata enp2s0 -l 1000000000 -L 1
At Link Partner, send an UDP packet with VLAN priority 1 to port 9091 of DUT.
When launch time is set to 1s in the future, the delta between launch time and transmit hardware timestamp is equal to 0.016us, as shown in result below: 0x562ff5dc8880: rx_desc[4]->addr=84110 addr=84110 comp_addr=84110 EoP rx_hash: 0xE343384 with RSS type:0x1 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to User RX-time sec:0.0002 (183.103 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User RX-time sec:0.0001 (80.309 usec) No rx_vlan_tci or rx_vlan_proto, err=-95 0x562ff5dc8880: ping-pong with csum=561c (want c7dd) csum_start=34 csum_offset=6 HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW Launch-time sec:1.0000 (1000000.000 usec) 0x562ff5dc8880: complete tx idx=4 addr=4018 HW Launch-time: 1734578016467548904 (sec:1734578016.4675) delta to HW TX-complete-time sec:0.0000 (0.016 usec) HW TX-complete-time: 1734578016467548920 (sec:1734578016.4675) delta to User TX-complete-time sec:0.0000 (32.546 usec) XDP RX-time: 1734578015467651698 (sec:1734578015.4677) delta to User TX-complete-time sec:0.9999 (999929.768 usec) HW RX-time: 1734578015467548904 (sec:1734578015.4675) delta to HW TX-complete-time sec:1.0000 (1000000.016 usec) 0x562ff5dc8880: complete rx idx=132 addr=84110
Signed-off-by: Song Yoong Siang yoong.siang.song@intel.com
drivers/net/ethernet/intel/igc/igc_main.c | 78 ++++++++++++++++------- 1 file changed, 56 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 27872bdea9bd..6857f5f5b4b2 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -1566,6 +1566,26 @@ static bool igc_request_tx_tstamp(struct igc_adapter *adapter, struct sk_buff *s return false; }
+static void igc_insert_empty_packet(struct igc_ring *tx_ring) {
- struct igc_tx_buffer *empty_info;
- struct sk_buff *empty;
- void *data;
- empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
- empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
- if (!empty)
return;
- data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
- memset(data, 0, IGC_EMPTY_FRAME_SIZE);
- igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
- if (igc_init_tx_empty_descriptor(tx_ring, empty, empty_info) < 0)
dev_kfree_skb_any(empty);
+}
static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, struct igc_ring *tx_ring) { @@ -1603,26 +1623,8 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb, skb->tstamp = ktime_set(0, 0); launch_time = igc_tx_launchtime(tx_ring, txtime, &first_flag, &insert_empty);
- if (insert_empty) {
struct igc_tx_buffer *empty_info;
struct sk_buff *empty;
void *data;
empty_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use];
empty = alloc_skb(IGC_EMPTY_FRAME_SIZE, GFP_ATOMIC);
if (!empty)
goto done;
data = skb_put(empty, IGC_EMPTY_FRAME_SIZE);
memset(data, 0, IGC_EMPTY_FRAME_SIZE);
igc_tx_ctxtdesc(tx_ring, 0, false, 0, 0, 0);
if (igc_init_tx_empty_descriptor(tx_ring,
empty,
empty_info) < 0)
dev_kfree_skb_any(empty);
- }
- if (insert_empty)
igc_insert_empty_packet(tx_ring);
done: /* record the location of the first descriptor for this packet */ @@ - 2955,9 +2957,33 @@ static u64 igc_xsk_fill_timestamp(void *_priv) return *(u64 *)_priv; }
+static void igc_xsk_request_launch_time(u64 launch_time, void *_priv) {
- struct igc_metadata_request *meta_req = _priv;
- struct igc_ring *tx_ring = meta_req->tx_ring;
- __le32 launch_time_offset;
- bool insert_empty = false;
- bool first_flag = false;
- if (!tx_ring->launchtime_enable)
return;
- launch_time_offset = igc_tx_launchtime(tx_ring,
ns_to_ktime(launch_time),
&first_flag, &insert_empty);
- if (insert_empty) {
igc_insert_empty_packet(tx_ring);
meta_req->tx_buffer =
&tx_ring->tx_buffer_info[tx_ring->next_to_use];
- }
- igc_tx_ctxtdesc(tx_ring, launch_time_offset, first_flag, 0, 0, 0); }
const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = { .tmo_request_timestamp = igc_xsk_request_timestamp, .tmo_fill_timestamp = igc_xsk_fill_timestamp,
- .tmo_request_launch_time = igc_xsk_request_launch_time,
};
static void igc_xdp_xmit_zc(struct igc_ring *ring) @@ -2980,7 +3006,7 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu = ring->next_to_use; budget = igc_desc_unused(ring);
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
- while (xsk_tx_peek_desc(pool, &xdp_desc) && budget >= 4) { struct igc_metadata_request meta_req; struct xsk_tx_metadata *meta = NULL; struct igc_tx_buffer *bi;
@@ -3004,6 +3030,12 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops, &meta_req);
/* xsk_tx_metadata_request() may have updated next_to_use
*/
ntu = ring->next_to_use;
/* xsk_tx_metadata_request() may have updated Tx buffer
info */
bi = meta_req.tx_buffer;
- tx_desc = IGC_TX_DESC(ring, ntu); tx_desc->read.cmd_type_len =
cpu_to_le32(meta_req.cmd_type); tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); @@ -3021,9 +3053,11 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) ntu++; if (ntu == ring->count) ntu = 0;
ring->next_to_use = ntu;
}budget = igc_desc_unused(ring);
- ring->next_to_use = ntu; if (tx_desc) { igc_flush_tx_descriptors(ring); xsk_tx_release(pool);
-- 2.34.1
On Thursday, January 23, 2025 11:40 PM, Bouska, Zdenek zdenek.bouska@siemens.com wrote:
Hi Siang,
I tested this patch series on 6.13 with Intel I226-LM (rev 04).
I also applied patch "selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata" [1] and "selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata" [2] so that TX timestamps work.
HW RX-timestamp was small (0.5956 instead of 1737373125.5956):
HW RX-time: 595572448 (sec:0.5956) delta to User RX-time sec:1737373124.9873 (1737373124987318.750 usec) XDP RX-time: 1737373125582798388 (sec:1737373125.5828) delta to User RX-time sec:0.0001 (92.733 usec)
Igc's raw HW RX-timestamp in front of frame data was overwritten by BPF program on line 90 in tools/testing/selftests/bpf: meta->hint_valid = 0;
"HW timestamp has been copied into local variable" comment is outdated on line 2813 in drivers/net/ethernet/intel/igc/igc_main.c after commit 069b142f5819 igc: Add support for PTP .getcyclesx64() [3].
Workaround is to add unused data to xdp_meta struct:
--- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -49,4 +49,5 @@ struct xdp_meta { __s32 rx_vlan_tag_err; }; enum xdp_meta_field hint_valid;
__u8 avoid_IGC_TS_HDR_LEN[16];
};
Hi Zdenek Bouska,
Thanks for your help on testing this patch set. You are right, there is some issue with the Rx hw timestamp, I will submit the bug fix patch when the solution is finalized, but the fix will not be part of this launch time patch set. Until then, you can continue to use your WA.
But Launch time still does not work:
HW Launch-time: 1737374407515922696 (sec:1737374407.5159) delta to HW TX-complete-time sec:-0.9999 (-999923.649 usec)
Command "sudo ethtool -X enp1s0 start 1 equal 1" was in v4 [4] but is not in v6. Was that intentional? After executing it Launch time feature works:
This ethtool command is to use RSS method to route the incoming packet to the queue which has launch time enabled. However, not every device support RSS. So I move to use a more generic method, which is vlan priority method, to route the incoming packet. Therefore, you need to send an UDP packet with VLAN priority 1 to port 9091 of DUT.
Below is example of my python script to generate the vlan UDP packet. You can have a quick try on it.
from scapy.all import * from scapy.all import Ether, Dot1Q, IP, UDP packet = Ether(src="44:ab:bc:bb:21:44", dst="22:ab:bc:bb:12:34") / Dot1Q(vlan=100, prio=1) / IP(src="169.254.1.2", dst="169.254.1.1") / UDP(dport=9091) sendp(packet, iface="enp1s0")
Thanks & Regards Siang
HW Launch-time: 1737374618088557111 (sec:1737374618.0886) delta to HW TX-complete-time sec:0.0000 (0.012 usec)
Thank you for XDP launch time support!
[1] https://lore.kernel.org/linux-kernel/20241205044258.3155799-1- yoong.siang.song@intel.com/ [2] https://lore.kernel.org/linux-kernel/20241205051936.3156307-1- yoong.siang.song@intel.com/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... b142f58196bd9f47b35e493255741e2c663c7 [4] https://lore.kernel.org/linux-kernel/20250106135724.9749-1- yoong.siang.song@intel.com/
Best regards, Zdenek Bouska
-- Siemens, s.r.o Foundational Technologies
Hi all,
On Thu, 2025-01-23 at 16:41 +0000, Song, Yoong Siang wrote:
On Thursday, January 23, 2025 11:40 PM, Bouska, Zdenek zdenek.bouska@siemens.com wrote:
Hi Siang,
I tested this patch series on 6.13 with Intel I226-LM (rev 04).
I also applied patch "selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata" [1] and "selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata" [2] so that TX timestamps work.
HW RX-timestamp was small (0.5956 instead of 1737373125.5956):
HW RX-time: 595572448 (sec:0.5956) delta to User RX-time sec:1737373124.9873 (1737373124987318.750 usec) XDP RX-time: 1737373125582798388 (sec:1737373125.5828) delta to User RX-time sec:0.0001 (92.733 usec)
Igc's raw HW RX-timestamp in front of frame data was overwritten by BPF program on line 90 in tools/testing/selftests/bpf: meta->hint_valid = 0;
"HW timestamp has been copied into local variable" comment is outdated on line 2813 in drivers/net/ethernet/intel/igc/igc_main.c after commit 069b142f5819 igc: Add support for PTP .getcyclesx64() [3].
Workaround is to add unused data to xdp_meta struct:
--- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -49,4 +49,5 @@ struct xdp_meta { __s32 rx_vlan_tag_err; }; enum xdp_meta_field hint_valid;
__u8 avoid_IGC_TS_HDR_LEN[16];
};
Hi Zdenek Bouska,
Thanks for your help on testing this patch set. You are right, there is some issue with the Rx hw timestamp, I will submit the bug fix patch when the solution is finalized, but the fix will not be part of this launch time patch set. Until then, you can continue to use your WA.
I think there is no simple fix for that. That needs some discussion around the "expectations" to the headroom / meta data area in front of the actual packet data.
To be able to write generic BPF programs - generic in terms of "works with all drivers" - the headroom is expected to be available for use inside the BPF program.
I think that is true for most drivers / devices, but at least igc is different in this regard. Devices deliver the RX timestamp in front of the actual data while other devices deliver the meta information as part of the RX descriptor.
For igc we get:
+----------+-----------------+-----+------+ | headroom | custom metadata |RX TS| data | +----------+-----------------+-----+------+ ^ ^ | | xdp_buff->data_meta xdp_buff->data
The only information the application gets is a pointer to the start of the data section. For calculating / finding the beginning of the meta data area the application has to go backward.
That is exactly how it is currently implemented in the selftest.
Problem: By writing into the calculated meta data area the BPF program might already destroy meta information delivered by the driver. At least for igc this is a problem.
I hope that was clear...
Best regards, Florian
But Launch time still does not work:
HW Launch-time: 1737374407515922696 (sec:1737374407.5159) delta to HW TX-complete-time sec:-0.9999 (-999923.649 usec)
Command "sudo ethtool -X enp1s0 start 1 equal 1" was in v4 [4] but is not in v6. Was that intentional? After executing it Launch time feature works:
This ethtool command is to use RSS method to route the incoming packet to the queue which has launch time enabled. However, not every device support RSS. So I move to use a more generic method, which is vlan priority method, to route the incoming packet. Therefore, you need to send an UDP packet with VLAN priority 1 to port 9091 of DUT.
Below is example of my python script to generate the vlan UDP packet. You can have a quick try on it.
from scapy.all import * from scapy.all import Ether, Dot1Q, IP, UDP packet = Ether(src="44:ab:bc:bb:21:44", dst="22:ab:bc:bb:12:34") / Dot1Q(vlan=100, prio=1) / IP(src="169.254.1.2", dst="169.254.1.1") / UDP(dport=9091) sendp(packet, iface="enp1s0")
Thanks & Regards Siang
HW Launch-time: 1737374618088557111 (sec:1737374618.0886) delta to HW TX-complete-time sec:0.0000 (0.012 usec)
Thank you for XDP launch time support!
[1] https://lore.kernel.org/linux-kernel/20241205044258.3155799-1- yoong.siang.song@intel.com/ [2] https://lore.kernel.org/linux-kernel/20241205051936.3156307-1- yoong.siang.song@intel.com/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... b142f58196bd9f47b35e493255741e2c663c7 [4] https://lore.kernel.org/linux-kernel/20250106135724.9749-1- yoong.siang.song@intel.com/
Best regards, Zdenek Bouska
-- Siemens, s.r.o Foundational Technologies
On 01/23, Florian Bezdeka wrote:
Hi all,
On Thu, 2025-01-23 at 16:41 +0000, Song, Yoong Siang wrote:
On Thursday, January 23, 2025 11:40 PM, Bouska, Zdenek zdenek.bouska@siemens.com wrote:
Hi Siang,
I tested this patch series on 6.13 with Intel I226-LM (rev 04).
I also applied patch "selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata" [1] and "selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata" [2] so that TX timestamps work.
HW RX-timestamp was small (0.5956 instead of 1737373125.5956):
HW RX-time: 595572448 (sec:0.5956) delta to User RX-time sec:1737373124.9873 (1737373124987318.750 usec) XDP RX-time: 1737373125582798388 (sec:1737373125.5828) delta to User RX-time sec:0.0001 (92.733 usec)
Igc's raw HW RX-timestamp in front of frame data was overwritten by BPF program on line 90 in tools/testing/selftests/bpf: meta->hint_valid = 0;
"HW timestamp has been copied into local variable" comment is outdated on line 2813 in drivers/net/ethernet/intel/igc/igc_main.c after commit 069b142f5819 igc: Add support for PTP .getcyclesx64() [3].
Workaround is to add unused data to xdp_meta struct:
--- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -49,4 +49,5 @@ struct xdp_meta { __s32 rx_vlan_tag_err; }; enum xdp_meta_field hint_valid;
__u8 avoid_IGC_TS_HDR_LEN[16];
};
Hi Zdenek Bouska,
Thanks for your help on testing this patch set. You are right, there is some issue with the Rx hw timestamp, I will submit the bug fix patch when the solution is finalized, but the fix will not be part of this launch time patch set. Until then, you can continue to use your WA.
I think there is no simple fix for that. That needs some discussion around the "expectations" to the headroom / meta data area in front of the actual packet data.
By 'simple' you mean without some new UAPI to signal the size of that 'reserved area' by the driver? I don't see any other easy way out as well :-/
Stanislav Fomichev stfomichev@gmail.com writes:
On 01/23, Florian Bezdeka wrote:
Hi all,
On Thu, 2025-01-23 at 16:41 +0000, Song, Yoong Siang wrote:
On Thursday, January 23, 2025 11:40 PM, Bouska, Zdenek zdenek.bouska@siemens.com wrote:
Hi Siang,
I tested this patch series on 6.13 with Intel I226-LM (rev 04).
I also applied patch "selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata" [1] and "selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata" [2] so that TX timestamps work.
HW RX-timestamp was small (0.5956 instead of 1737373125.5956):
HW RX-time: 595572448 (sec:0.5956) delta to User RX-time sec:1737373124.9873 (1737373124987318.750 usec) XDP RX-time: 1737373125582798388 (sec:1737373125.5828) delta to User RX-time sec:0.0001 (92.733 usec)
Igc's raw HW RX-timestamp in front of frame data was overwritten by BPF program on line 90 in tools/testing/selftests/bpf: meta->hint_valid = 0;
"HW timestamp has been copied into local variable" comment is outdated on line 2813 in drivers/net/ethernet/intel/igc/igc_main.c after commit 069b142f5819 igc: Add support for PTP .getcyclesx64() [3].
Workaround is to add unused data to xdp_meta struct:
--- a/tools/testing/selftests/bpf/xdp_metadata.h +++ b/tools/testing/selftests/bpf/xdp_metadata.h @@ -49,4 +49,5 @@ struct xdp_meta { __s32 rx_vlan_tag_err; }; enum xdp_meta_field hint_valid;
__u8 avoid_IGC_TS_HDR_LEN[16];
};
Hi Zdenek Bouska,
Thanks for your help on testing this patch set. You are right, there is some issue with the Rx hw timestamp, I will submit the bug fix patch when the solution is finalized, but the fix will not be part of this launch time patch set. Until then, you can continue to use your WA.
I think there is no simple fix for that. That needs some discussion around the "expectations" to the headroom / meta data area in front of the actual packet data.
By 'simple' you mean without some new UAPI to signal the size of that 'reserved area' by the driver? I don't see any other easy way out as well :-/
Yeah, I don't think we can impose UAPI restrictions on the metadata area at this point. I guess the best we can do is to educate users that they should call the timestamp kfunc before they modify the metadata?
-Toke
On Fri, 24 Jan 2025 12:45:42 +0100 Toke Høiland-Jørgensen wrote:
I think there is no simple fix for that. That needs some discussion around the "expectations" to the headroom / meta data area in front of the actual packet data.
By 'simple' you mean without some new UAPI to signal the size of that 'reserved area' by the driver? I don't see any other easy way out as well :-/
Yeah, I don't think we can impose UAPI restrictions on the metadata area at this point. I guess the best we can do is to educate users that they should call the timestamp kfunc before they modify the metadata?
I may be misunderstanding the discussion, but I think the answer is that the driver must be fixed. The metadata-in-prepend problem also exists for simple adjust head use case, so it existed since early days of BPF. The driver should copy out (or parse) the metadata before it invokes the XDP prog. The nfp driver does that.
On Mon, 2025-01-27 at 10:04 -0800, Jakub Kicinski wrote:
On Fri, 24 Jan 2025 12:45:42 +0100 Toke Høiland-Jørgensen wrote:
I think there is no simple fix for that. That needs some discussion around the "expectations" to the headroom / meta data area in front of the actual packet data.
By 'simple' you mean without some new UAPI to signal the size of that 'reserved area' by the driver? I don't see any other easy way out as well :-/
Yeah, I don't think we can impose UAPI restrictions on the metadata area at this point. I guess the best we can do is to educate users that they should call the timestamp kfunc before they modify the metadata?
I may be misunderstanding the discussion, but I think the answer is that the driver must be fixed. The metadata-in-prepend problem also exists for simple adjust head use case, so it existed since early days of BPF. The driver should copy out (or parse) the metadata before it invokes the XDP prog. The nfp driver does that.
That would have to happen for each packet, without affecting ZC performance. How can that be achieved?
So we have at least two drivers with that problem, igc + nfp.
My main point: Enabling and implementing ZC (zero copy) mode at one hand, but then starting to copy the meta data for each packet doesn't sound reasonable.
linux-kselftest-mirror@lists.linaro.org