After reviewing the code, it was found that these macros are never
referenced in the code. Just remove them.
Signed-off-by: Ba Jing <bajing(a)cmss.chinamobile.com>
---
tools/testing/selftests/landlock/ptrace_test.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
index a19db4d0b3bd..8f31b673ff2d 100644
--- a/tools/testing/selftests/landlock/ptrace_test.c
+++ b/tools/testing/selftests/landlock/ptrace_test.c
@@ -22,8 +22,6 @@
/* Copied from security/yama/yama_lsm.c */
#define YAMA_SCOPE_DISABLED 0
#define YAMA_SCOPE_RELATIONAL 1
-#define YAMA_SCOPE_CAPABILITY 2
-#define YAMA_SCOPE_NO_ATTACH 3
static void create_domain(struct __test_metadata *const _metadata)
{
--
2.33.0
This series depends on: "[PATCH v2 0/3] tun: Unify vnet implementation
and fill full vnet header"
https://lore.kernel.org/r/20250109-tun-v2-0-388d7d5a287a@daynix.com
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v6:
- Extracted changes to fill vnet header holes into another series.
- Squashed patches "skbuff: Introduce SKB_EXT_TUN_VNET_HASH", "tun:
Introduce virtio-net hash reporting feature", and "tun: Introduce
virtio-net RSS" into patch "tun: Introduce virtio-net hash feature".
- Dropped the RFC tag.
- Link to v5: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
Changes in v5:
- Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE.
- Optimized the calculation of the hash value according to:
https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca…
- Added patch "tun: Unify vnet implementation".
- Dropped patch "tap: Pad virtio header with zero".
- Added patch "selftest: tun: Test vnet ioctls without device".
- Reworked selftests to skip for older kernels.
- Documented the case when the underlying device is deleted and packets
have queue_mapping set by TC.
- Reordered test harness arguments.
- Added code to handle fragmented packets.
- Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (6):
virtio_net: Add functions for hashing
net: flow_dissector: Export flow_keys_dissector_symmetric
tun: Introduce virtio-net hash feature
selftest: tun: Test vnet ioctls without device
selftest: tun: Add tests for virtio-net hashing
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
drivers/net/Kconfig | 1 +
drivers/net/tap.c | 50 ++-
drivers/net/tun.c | 93 ++++--
drivers/net/tun_vnet.c | 167 +++++++++-
drivers/net/tun_vnet.h | 33 +-
drivers/vhost/net.c | 16 +-
include/linux/if_tap.h | 2 +
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 188 +++++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 75 +++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tun.c | 630 ++++++++++++++++++++++++++++++++++-
16 files changed, 1224 insertions(+), 51 deletions(-)
---
base-commit: 9b2ffa6148b1e4468d08f7e0e7e371c43cac9ffe
change-id: 20240403-rss-e737d89efa77
prerequisite-change-id: 20241230-tun-66e10a49b0c7:v2
prerequisite-patch-id: 057e888c371f2ce750064b7c40c2cc6abbdf6819
prerequisite-patch-id: 22d53dd3443a2c72496bffb90f19d429972550a3
prerequisite-patch-id: 1520f0c1f7b11559d0898bea556f745f6b8914ac
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.
This API allows ptracers to obtain and modify system call details
in a straightforward and architecture-agnostic way.
Current implementation supports changing only those bits of system call
information that are used by strace, namely, syscall number, syscall
arguments, and syscall return value.
Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later
if needed, by re-using struct ptrace_syscall_info.reserved to specify
the additional details that should be set. Currently, the reserved
field of struct ptrace_syscall_info must be initialized with zeroes;
arch, instruction_pointer, and stack_pointer fields are ignored.
PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.
Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).
[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Dmitry V. Levin (6):
Revert "arch: remove unused function syscall_set_arguments()"
syscall.h: add syscall_set_arguments() on remaining
HAVE_ARCH_TRACEHOOK arches
syscall.h: introduce syscall_set_nr()
ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
ptrace: introduce PTRACE_SET_SYSCALL_INFO request
selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO
arch/arc/include/asm/syscall.h | 20 +
arch/arm/include/asm/syscall.h | 25 +
arch/arm64/include/asm/syscall.h | 20 +
arch/csky/include/asm/syscall.h | 13 +
arch/hexagon/include/asm/syscall.h | 14 +
arch/loongarch/include/asm/syscall.h | 15 +
arch/m68k/include/asm/syscall.h | 7 +
arch/microblaze/include/asm/syscall.h | 7 +
arch/mips/include/asm/syscall.h | 53 +++
arch/nios2/include/asm/syscall.h | 16 +
arch/openrisc/include/asm/syscall.h | 13 +
arch/parisc/include/asm/syscall.h | 19 +
arch/powerpc/include/asm/syscall.h | 15 +
arch/riscv/include/asm/syscall.h | 16 +
arch/s390/include/asm/syscall.h | 19 +
arch/sh/include/asm/syscall_32.h | 19 +
arch/sparc/include/asm/syscall.h | 17 +
arch/um/include/asm/syscall-generic.h | 19 +
arch/x86/include/asm/syscall.h | 43 ++
arch/xtensa/include/asm/syscall.h | 18 +
include/asm-generic/syscall.h | 30 ++
include/linux/ptrace.h | 3 +
include/uapi/linux/ptrace.h | 3 +-
kernel/ptrace.c | 154 ++++++-
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/set_syscall_info.c | 436 ++++++++++++++++++
26 files changed, 994 insertions(+), 22 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c
--
ldv
Implement comprehensive testing for netconsole userdata entry handling,
demonstrating correct behavior when creating maximum entries and
preventing unauthorized overflow.
Refactor existing test infrastructure to support modular, reusable
helper functions that validate strict entry limit enforcement.
Also, add a warning if update_userdata() sees more than
MAX_USERDATA_ITEMS entries. This shouldn't happen and it is a bug that
shouldn't be silently ignored.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v3:
- Added the new shell helpers files in the TEST_INCLUDES (Jakub)
- Link to v2: https://lore.kernel.org/r/20250103-netcons_overflow_test-v2-0-a49f9be64c21@…
Changes in v2:
- Add the new script (netcons_overflow.sh) in
tools/testing/selftests/drivers/net/Makefile as suggested by Simon
Horman
- Link to v1: https://lore.kernel.org/r/20241204-netcons_overflow_test-v1-0-a85a8d0ace21@…
---
Breno Leitao (4):
netconsole: Warn if MAX_USERDATA_ITEMS limit is exceeded
netconsole: selftest: Split the helpers from the selftest
netconsole: selftest: Delete all userdata keys
netconsole: selftest: verify userdata entry limit
MAINTAINERS | 3 +-
drivers/net/netconsole.c | 2 +-
tools/testing/selftests/drivers/net/Makefile | 2 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 225 +++++++++++++++++++++
.../testing/selftests/drivers/net/netcons_basic.sh | 218 +-------------------
.../selftests/drivers/net/netcons_overflow.sh | 67 ++++++
6 files changed, 298 insertions(+), 219 deletions(-)
---
base-commit: 7bf1659bad4e9413cdba132ef9cbd0caa9cabcc4
change-id: 20241204-netcons_overflow_test-eaf735d1f743
Best regards,
--
Breno Leitao <leitao(a)debian.org>
This patch allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail)
unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors. These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.
=== Changelog ===
Changes in v6:
* Use is_spilled_scalar_reg() helper and remove unnecessary comment
* Add back deleted selftest with different helper to dirty dst buffer
* Check size of spill is exactly key_size and update selftests
* Read slot_type from correct offset into the spi
* Rewrite selftests in C where possible
* Mark constant map keys as precise
Changes in v5:
* Dropped all acks
* Use s64 instead of long for const_map_key
* Ensure stack slot contains spilled reg before accessing spilled_ptr
* Ensure spilled reg is a scalar before accessing tnum const value
* Fix verifier selftest for 32-bit write to write at 8 byte alignment
to ensure spill is tracked
* Introduce more precise tracking of helper stack accesses
* Do constant map key extraction as part of helper argument processing
and then remove duplicated stack checks
* Use ret_flag instead of regs[BPF_REG_0].type
* Handle STACK_ZERO
* Fix bug in bpf_load_hdr_opt() arg annotation
Changes in v4:
* Only allow for CAP_BPF
* Add test for stack growing upwards
* Improve comment about stack growing upwards
Changes in v3:
* Check if stack is (erroneously) growing upwards
* Mention in commit message why existing tests needed change
Changes in v2:
* Added a check for when R2 is not a ptr to stack
* Added a check for when stack is uninitialized (no stack slot yet)
* Updated existing tests to account for null elision
* Added test case for when R2 can be both const and non-const
Daniel Xu (5):
bpf: verifier: Add missing newline on verbose() call
bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
bpf: verifier: Refactor helper access type tracking
bpf: verifier: Support eliding map lookup nullness
bpf: selftests: verifier: Add nullness elision tests
kernel/bpf/verifier.c | 139 +++++++++++----
net/core/filter.c | 2 +-
.../testing/selftests/bpf/progs/dynptr_fail.c | 6 +-
tools/testing/selftests/bpf/progs/iters.c | 14 +-
.../selftests/bpf/progs/map_kptr_fail.c | 2 +-
.../selftests/bpf/progs/test_global_func10.c | 2 +-
.../selftests/bpf/progs/uninit_stack.c | 5 +-
.../bpf/progs/verifier_array_access.c | 168 ++++++++++++++++++
.../bpf/progs/verifier_basic_stack.c | 2 +-
.../selftests/bpf/progs/verifier_const_or.c | 4 +-
.../progs/verifier_helper_access_var_len.c | 12 +-
.../selftests/bpf/progs/verifier_int_ptr.c | 2 +-
.../selftests/bpf/progs/verifier_map_in_map.c | 2 +-
.../selftests/bpf/progs/verifier_mtu.c | 2 +-
.../selftests/bpf/progs/verifier_raw_stack.c | 4 +-
.../selftests/bpf/progs/verifier_unpriv.c | 2 +-
.../selftests/bpf/progs/verifier_var_off.c | 8 +-
tools/testing/selftests/bpf/verifier/calls.c | 2 +-
.../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
19 files changed, 311 insertions(+), 69 deletions(-)
--
2.47.1
Reverse the order in which
the PML log is read to align more closely to the hardware. It should
not affect regular users of the dirty logging but it fixes a unit test
specific assumption in the dirty_log_test dirty-ring mode.
Best regards,
Maxim Levitsky
Maxim Levitsky (2):
KVM: VMX: refactor PML terminology
KVM: VMX: read the PML log in the same order as it was written
arch/x86/kvm/vmx/main.c | 2 +-
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 32 ++++++++++++++++++++------------
arch/x86/kvm/vmx/vmx.h | 5 ++++-
4 files changed, 26 insertions(+), 15 deletions(-)
--
2.26.3
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.
Suggested-by: Stanislav Fomichev <sdf(a)google.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song(a)intel.com>
---
Documentation/netlink/specs/netdev.yaml | 4 ++
Documentation/networking/xsk-tx-metadata.rst | 64 ++++++++++++++++++++
include/net/xdp_sock.h | 10 +++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 +++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 +++
tools/include/uapi/linux/netdev.h | 3 +
10 files changed, 110 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index cbb544bd6c84..e59c8a14f7d1 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -70,6 +70,10 @@ definitions:
name: tx-checksum
doc:
L3 checksum HW offload is supported by the driver.
+ -
+ name: tx-launch-time
+ doc:
+ Launch time HW offload is supported by the driver.
-
name: queue-type
type: enum
diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst
index e76b0cfc32f7..3cec089747ce 100644
--- a/Documentation/networking/xsk-tx-metadata.rst
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -50,6 +50,10 @@ The flags field enables the particular offload:
checksum. ``csum_start`` specifies byte offset of where the checksumming
should start and ``csum_offset`` specifies byte offset where the
device should store the computed checksum.
+- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the
+ packet for transmission at a pre-determined time called launch time. The
+ value of launch time is indicated by ``launch_time`` field of
+ ``union xsk_tx_metadata``.
Besides the flags above, in order to trigger the offloads, the first
packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
@@ -65,6 +69,65 @@ In this case, when running in ``XDK_COPY`` mode, the TX checksum
is calculated on the CPU. Do not enable this option in production because
it will negatively affect performance.
+Launch Time
+===========
+
+The value of the requested launch time should be based on the device's PTP
+Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path
+compared to the ETF queuing discipline, which organizes packets and delays
+their transmission. Instead, AF_XDP immediately hands off the packets to
+the device driver without rearranging their order or holding them prior to
+transmission. In scenarios where the launch time offload feature is
+disabled, the device driver is expected to disregard the launch time
+request. For correct interpretation and meaningful operation, the launch
+time should never be set to a value larger than the farthest programmable
+time in the future (the horizon). Different devices have different hardware
+limitations on the launch time offload feature.
+
+stmmac driver
+-------------
+
+For stmmac, TSO and launch time (TBS) features are mutually exclusive for
+each individual Tx Queue. By default, the driver configures Tx Queue 0 to
+support TSO and the rest of the Tx Queues to support TBS. The launch time
+hardware offload feature can be enabled or disabled by using the tc-etf
+command to call the driver's ndo_setup_tc() callback.
+
+The value of the launch time that is programmed in the Enhanced Normal
+Transmit Descriptors is a 32-bit value, where the most significant 8 bits
+represent the time in seconds and the remaining 24 bits represent the time
+in 256 ns increments. The programmed launch time is compared against the
+PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the
+horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the
+future.
+
+The stmmac driver maintains FIFO behavior and does not perform packet
+reordering. This means that a packet with a launch time request will block
+other packets in the same Tx Queue until it is transmitted.
+
+igc driver
+----------
+
+For igc, all four Tx Queues support the launch time feature. The launch
+time hardware offload feature can be enabled or disabled by using the
+tc-etf command to call the driver's ndo_setup_tc() callback. When entering
+TSN mode, the igc driver will reset the device and create a default Qbv
+schedule with a 1-second cycle time, with all Tx Queues open at all times.
+
+The value of the launch time that is programmed in the Advanced Transmit
+Context Descriptor is a relative offset to the starting time of the Qbv
+transmission window of the queue. The Frst flag of the descriptor can be
+set to schedule the packet for the next Qbv cycle. Therefore, the horizon
+of the launch time for i225 and i226 is the ending time of the next cycle
+of the Qbv transmission window of the queue. For example, when the Qbv
+cycle time is set to 1 second, the horizon of the launch time ranges
+from 1 second to 2 seconds, depending on where the Qbv cycle is currently
+running.
+
+The igc driver maintains FIFO behavior and does not perform packet
+reordering. This means that a packet with a launch time request will block
+other packets in the same Tx Queue until it is transmitted.
+
Querying Device Capabilities
============================
@@ -74,6 +137,7 @@ Refer to ``xsk-flags`` features bitmask in
- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``
+- ``tx-launch-time``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME``
See ``tools/net/ynl/samples/netdev.c`` on how to query this information.
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index bfe625b55d55..a58ae7589d12 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -110,11 +110,16 @@ struct xdp_sock {
* indicates position where checksumming should start.
* csum_offset indicates position where checksum should be stored.
*
+ * void (*tmo_request_launch_time)(u64 launch_time, void *priv)
+ * Called when AF_XDP frame requested launch time HW offload support.
+ * launch_time indicates the PTP time at which the device can schedule the
+ * packet for transmission.
*/
struct xsk_tx_metadata_ops {
void (*tmo_request_timestamp)(void *priv);
u64 (*tmo_fill_timestamp)(void *priv);
void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv);
+ void (*tmo_request_launch_time)(u64 launch_time, void *priv);
};
#ifdef CONFIG_XDP_SOCKETS
@@ -162,6 +167,11 @@ static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta,
if (!meta)
return;
+ if (ops->tmo_request_launch_time)
+ if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
+ ops->tmo_request_launch_time(meta->request.launch_time,
+ priv);
+
if (ops->tmo_request_timestamp)
if (meta->flags & XDP_TXMD_FLAGS_TIMESTAMP)
ops->tmo_request_timestamp(priv);
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 40085afd9160..78af371bc002 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -198,6 +198,7 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
#define XDP_TXMD_FLAGS_VALID ( \
XDP_TXMD_FLAGS_TIMESTAMP | \
XDP_TXMD_FLAGS_CHECKSUM | \
+ XDP_TXMD_FLAGS_LAUNCH_TIME | \
0)
static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta)
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 42ec5ddaab8d..42869770776e 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -127,6 +127,12 @@ struct xdp_options {
*/
#define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
+/* Request launch time hardware offload. The device will schedule the packet for
+ * transmission at a pre-determined time called launch time. The value of
+ * launch time is communicated via launch_time field of struct xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2)
+
/* AF_XDP offloads request. 'request' union member is consumed by the driver
* when the packet is being transmitted. 'completion' union member is
* filled by the driver when the transmit completion arrives.
@@ -142,6 +148,10 @@ struct xsk_tx_metadata {
__u16 csum_start;
/* Offset from csum_start where checksum should be stored. */
__u16 csum_offset;
+
+ /* XDP_TXMD_FLAGS_LAUNCH_TIME */
+ /* Launch time in nanosecond against the PTP HW Clock */
+ __u64 launch_time;
} request;
struct {
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index e4be227d3ad6..5ab85f4af009 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata {
* by the driver.
* @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
* driver.
+ * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the
+ * driver.
*/
enum netdev_xsk_flags {
NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
+ NETDEV_XSK_FLAGS_LAUNCH_TIME = 4,
};
enum netdev_queue_type {
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 9527dd46e4dc..e2515cf9190f 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -52,6 +52,8 @@ XDP_METADATA_KFUNC_xxx
xsk_features |= NETDEV_XSK_FLAGS_TX_TIMESTAMP;
if (netdev->xsk_tx_metadata_ops->tmo_request_checksum)
xsk_features |= NETDEV_XSK_FLAGS_TX_CHECKSUM;
+ if (netdev->xsk_tx_metadata_ops->tmo_request_launch_time)
+ xsk_features |= NETDEV_XSK_FLAGS_LAUNCH_TIME;
}
if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) ||
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 3fa70286c846..8feaa0e86f07 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -743,6 +743,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
goto free_err;
}
}
+
+ if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
+ skb->skb_mstamp_ns = meta->request.launch_time;
}
}
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 2f082b01ff22..67719f8966c2 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -127,6 +127,12 @@ struct xdp_options {
*/
#define XDP_TXMD_FLAGS_CHECKSUM (1 << 1)
+/* Request launch time hardware offload. The device will schedule the packet for
+ * transmission at a pre-determined time called launch time. The value of
+ * launch time is communicated via launch_time field of struct xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2)
+
/* AF_XDP offloads request. 'request' union member is consumed by the driver
* when the packet is being transmitted. 'completion' union member is
* filled by the driver when the transmit completion arrives.
@@ -142,6 +148,10 @@ struct xsk_tx_metadata {
__u16 csum_start;
/* Offset from csum_start where checksum should be stored. */
__u16 csum_offset;
+
+ /* XDP_TXMD_FLAGS_LAUNCH_TIME */
+ /* Launch time in nanosecond against the PTP HW Clock */
+ __u64 launch_time;
} request;
struct {
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index e4be227d3ad6..5ab85f4af009 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata {
* by the driver.
* @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
* driver.
+ * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the
+ * driver.
*/
enum netdev_xsk_flags {
NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
+ NETDEV_XSK_FLAGS_LAUNCH_TIME = 4,
};
enum netdev_queue_type {
--
2.34.1