Greetings:
Welcome to v4. Small functional change, which makes the code cleaner
(see changelog) and tests pass on my machine with mlx5 and netdevsim.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v4:
- Add patch 1, as suggested by Jakub, which adds an empty nest helper.
- Use the helper in patch 2, which makes the code cleaner and prevents
a possible bug.
v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (3):
netlink: Add nla_put_empty_nest helper
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/net/netlink.h | 15 ++++
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 12 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 89 +++++++++++++++++++
9 files changed, 178 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: f3eba8edd885db439f4bfaa2cf9d766bad1ae6c5
--
2.43.0
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET available to me.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k…
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k…
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k…
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k…
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke…
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke…
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 62 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 57 +++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 10 +-
kernel/fork.c | 96 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 635 insertions(+), 81 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V9:
- Remove the igc_desc_unused() checking (Maciej)
- Ensure that skb allocation and DMA mapping work before proceeding to
fill in igc_tx_buffer info, context desc, and data desc (Maciej)
- Rate limit the error messages (Maciej)
- Update the comment to indicate that the 2 descriptors needed by the
empty frame are already taken into consideration (Maciej)
- Handle the case where the insertion of an empty frame fails and
explain the reason behind (Maciej)
- put self SOB tag as last tag (Maciej)
V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int…
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
- submit another bug fix patch to set the buffer type for empty frame (Maciej):
https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int…
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 141 +++++++++++----
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 394 insertions(+), 39 deletions(-)
--
2.34.1
When I implemented virtio's hash-related features to tun/tap [1],
I found tun/tap does not fill the entire region reserved for the virtio
header, leaving some uninitialized hole in the middle of the buffer
after read()/recvmesg().
This series fills the uninitialized hole. More concretely, the
num_buffers field will be initialized with 1, and the other fields will
be inialized with 0. Setting the num_buffers field to 1 is mandated by
virtio 1.0 [2].
The change to virtio header is preceded by another change that refactors
tun and tap to unify their virtio-related code.
[1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
[2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v5:
- s/vnet_hdr_len_sz/vnet_hdr_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Changed to inline vnet implementations to TUN and TAP.
- Dropped patch "tun: Avoid double-tracking iov_iter length changes" and
"tap: Avoid double-tracking iov_iter length changes".
- Link to v4: https://lore.kernel.org/r/20250120-tun-v4-0-ee81dda03d7f@daynix.com
Changes in v4:
- s/sz/vnet_hdr_len_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Reverted to add CONFIG_TUN_VNET.
- Link to v3: https://lore.kernel.org/r/20250116-tun-v3-0-c6b2871e97f7@daynix.com
Changes in v3:
- Dropped changes to fill the vnet header.
- Splitted patch "tun: Unify vnet implementation".
- Reverted spurious changes in patch "tun: Unify vnet implementation".
- Merged tun_vnet.c into TAP.
- Link to v2: https://lore.kernel.org/r/20250109-tun-v2-0-388d7d5a287a@daynix.com
Changes in v2:
- Fixed num_buffers endian.
- Link to v1: https://lore.kernel.org/r/20250108-tun-v1-0-67d784b34374@daynix.com
---
Akihiko Odaki (7):
tun: Refactor CONFIG_TUN_VNET_CROSS_LE
tun: Keep hdr_len in tun_get_user()
tun: Decouple vnet from tun_struct
tun: Decouple vnet handling
tun: Extract the vnet handling code
tap: Keep hdr_len in tap_get_user()
tap: Use tun's vnet-related code
MAINTAINERS | 2 +-
drivers/net/tap.c | 168 ++++++------------------------------------
drivers/net/tun.c | 193 ++++++-------------------------------------------
drivers/net/tun_vnet.h | 184 ++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 231 insertions(+), 316 deletions(-)
---
base-commit: a32e14f8aef69b42826cf0998b068a43d486a9e9
change-id: 20241230-tun-66e10a49b0c7
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows Patches::
Patch "consolidate send buffers into netconsole_target struct"
=================================================
Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().
Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata
Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space
Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
PS: This patchset is on top of the patch that created
netcons_fragmented_msg selftest:
https://lore.kernel.org/all/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@de…
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v4:
- Fixed Kernel doc for netconsole_target (Simon)
- Fixed a typo in disable_sysdata_feature (Simon)
- Improved sysdata_cpu_nr_show() to return !! in a bit-wise operation
- Link to v3: https://lore.kernel.org/r/20250124-netcon_cpu-v3-0-12a0d286ba1d@debian.org
Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (8):
netconsole: consolidate send buffers into netconsole_target struct
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: Introduce configfs helpers for sysdata features
netconsole: Include sysdata in extradata entry count
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 ++++
drivers/net/netconsole.c | 261 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++
5 files changed, 427 insertions(+), 64 deletions(-)
---
base-commit: 145702ca2544bf82c6b244a49163f603fbe746f4
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Support for 32-bit s390 is very easy to implement and useful for
testing. For example I used to test some generic compat_ptr() logic,
which is only testable on 32-bit s390.
The series depends on my other series
"selftests/nolibc: test kernel configuration cleanups".
(It's not a hard dependency, only a minor diff conflict)
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Rebase unto nolibc-next
- Use 96 bytes of stack frame size
- Pick up Ack from Willy
- Link to v1: https://lore.kernel.org/r/20250122-nolibc-s390-v1-0-8c765f00e871@weissschuh…
---
Thomas Weißschuh (2):
selftests/nolibc: rename s390 to s390x
tools/nolibc: add support for 32-bit s390
tools/include/nolibc/arch-s390.h | 5 +++++
tools/include/nolibc/arch.h | 2 +-
tools/testing/selftests/nolibc/Makefile | 10 ++++++++--
tools/testing/selftests/nolibc/run-tests.sh | 7 ++++++-
4 files changed, 20 insertions(+), 4 deletions(-)
---
base-commit: c1f4a7a84037249d086a4114c0c4332a260e9091
change-id: 20250122-nolibc-s390-e57141682c88
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V8:
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty packet insertion into a reusable function
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 94 +++++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 359 insertions(+), 27 deletions(-)
--
2.34.1