This patch adds a new robust_list() syscall. The current syscall
can't be expanded to cover the following use case, so a new one is
needed. This new syscall allows users to set multiple robust lists per
process and to have either 32bit or 64bit pointers in the list.
* Use case
FEX-Emu[1] is an application that runs x86 and x86-64 binaries on an
AArch64 Linux host. One of the tasks of FEX-Emu is to translate syscalls
from one platform to another. Existing set_robust_list() can't be easily
translated because of two limitations:
1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel
this is not a problem, because of the compat entry point. But there's
no such compat entry point for AArch64, so the kernel would do the
pointer arithmetic wrongly. Is also unviable to userspace to keep
track every addition/removal to the robust list and keep a 64bit
version of it somewhere else to feed the kernel. Thus, the new
interface has an option of telling the kernel if the list is filled
with 32bit or 64bit pointers.
2) Apps can set just one robust list (in theory, x86-64 can set two if
they also use the compat entry point). That means that when a x86 app
asks FEX-Emu to call set_robust_list(), FEX have two options: to
overwrite their own robust list pointer and make the app robust, or
to ignore the app robust list and keep the emulator robust. The new
interface allows for multiple robust lists per application, solving
this.
* Interface
This is the proposed interface:
long set_robust_list2(void *head, int index, unsigned int flags)
`head` is the head of the userspace struct robust_list_head, just as old
set_robust_list(). It needs to be a void pointer since it can point to a normal
robust_list_head or a compat_robust_list_head.
`flags` can be used for defining the list type:
enum robust_list_type {
ROBUST_LIST_32BIT,
ROBUST_LIST_64BIT,
};
`index` is the index in the internal robust_list's linked list (the naming
starts to get confusing, I reckon). If `index == -1`, that means that user wants
to set a new robust_list, and the kernel will append it in the end of the list,
assign a new index and return this index to the user. If `index >= 0`, that
means that user wants to re-set `*head` of an already existing list (similarly
to what happens when you call set_robust_list() twice with different `*head`).
If `index` is out of range, or it points to a non-existing robust_list, or if
the internal list is full, an error is returned.
* Implementation
The implementation re-uses most of the existing robust list interface as
possible. The new task_struct member `struct list_head robust_list2` is just a
linked list where new lists are appended as the user requests more lists, and by
futex_cleanup(), the kernel walks through the internal list feeding
exit_robust_list() with the robust_list's.
This implementation supports up to 10 lists (defined at ROBUST_LISTS_PER_TASK),
but it was an arbitrary number for this RFC. For the described use case above, 4
should be enough, I'm not sure which should be the limit.
It doesn't support list removal (should it support?). It doesn't have a proper
get_robust_list2() yet as well, but I can add it in a next revision. We could
also have a generic robust_list() syscall that can be used to set/get and be
controlled by flags.
The new interface has a `unsigned int flags` argument, making it
extensible for future use cases as well.
It refuses unaligned `head` addresses. It doesn't have a limit for elements in a
single list (like ROBUST_LIST_LIMIT), it destroys the list as it is parsed to be
safe against circular lists.
* Testing
This patcheset has a selftest patch that expands this one:
https://lore.kernel.org/lkml/20250212131123.37431-1-andrealmeid@igalia.com/
Also, FEX-Emu added support for this interface to validate it:
https://github.com/FEX-Emu/FEX/pull/3966
Feedback is very welcomed!
Thanks,
André
[1] https://github.com/FEX-Emu/FEX
Changelog:
- Fixed compilation issues when CONFIG_COMPAT or CONFIG_FUTEX are not
set
- Rebased on top of new futex work (private hash)
v4: https://lore.kernel.org/lkml/20250225183531.682556-1-andrealmeid@igalia.com/
- Refuse unaligned head pointers
- Ignore ROBUST_LIST_LIMIT for lists created with this interface and make it
robust against circular lists
- Fix a get_robust_list() syscall bug for getting the list from another thread
- Adapt selftest to use the new interface
v3: https://lore.kernel.org/lkml/20241217174958.477692-1-andrealmeid@igalia.com/
- Old syscall set_robust_list() adds new head to the internal linked list of
robust lists pointers, instead of having a field just for them. Remove
tsk->robust_list and use only tsk->robust_list2
v2: https://lore.kernel.org/lkml/20241101162147.284993-1-andrealmeid@igalia.com/
- Added a patch to properly deal with exit_robust_list() in 64bit vs 32bit
- Wired-up syscall for all archs
- Added more of the cover letter to the commit message
v1: https://lore.kernel.org/lkml/20241024145735.162090-1-andrealmeid@igalia.com/
---
André Almeida (7):
selftests/futex: Add ASSERT_ macros
selftests/futex: Create test for robust list
futex: Use explicit sizes for compat_exit_robust_list
futex: Create set_robust_list2
futex: Remove the limit of elements for sys_set_robust_list2 lists
futex: Wire up set_robust_list2 syscall
selftests: futex: Expand robust list test for the new interface
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/compat.h | 12 +-
include/linux/futex.h | 30 +-
include/linux/sched.h | 5 +-
include/uapi/asm-generic/unistd.h | 2 +
include/uapi/linux/futex.h | 10 +
kernel/futex/core.c | 156 ++++-
kernel/futex/futex.h | 5 +
kernel/futex/syscalls.c | 85 ++-
kernel/sys_ni.c | 1 +
scripts/syscall.tbl | 1 +
.../testing/selftests/futex/functional/.gitignore | 1 +
tools/testing/selftests/futex/functional/Makefile | 3 +-
.../selftests/futex/functional/robust_list.c | 706 +++++++++++++++++++++
tools/testing/selftests/futex/include/logging.h | 38 ++
29 files changed, 1021 insertions(+), 49 deletions(-)
---
base-commit: a24cc6ce1933eade12aa2b9859de0fcd2dac2c06
change-id: 20250225-tonyk-robust_futex-60adeedac695
Best regards,
--
André Almeida <andrealmeid(a)igalia.com>
When compiling the RTC library functions test as a module, the module
has the non-descriptive name "lib_test.ko". Fix this by adding the
subsystem's name as a prefix.
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
drivers/rtc/Makefile | 2 +-
drivers/rtc/{lib_test.c => rtc_lib_test.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename drivers/rtc/{lib_test.c => rtc_lib_test.c} (100%)
diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile
index 4619aa2ac4697591..aa08d1c8a32ec23d 100644
--- a/drivers/rtc/Makefile
+++ b/drivers/rtc/Makefile
@@ -15,7 +15,7 @@ rtc-core-$(CONFIG_RTC_INTF_DEV) += dev.o
rtc-core-$(CONFIG_RTC_INTF_PROC) += proc.o
rtc-core-$(CONFIG_RTC_INTF_SYSFS) += sysfs.o
-obj-$(CONFIG_RTC_LIB_KUNIT_TEST) += lib_test.o
+obj-$(CONFIG_RTC_LIB_KUNIT_TEST) += rtc_lib_test.o
# Keep the list ordered.
diff --git a/drivers/rtc/lib_test.c b/drivers/rtc/rtc_lib_test.c
similarity index 100%
rename from drivers/rtc/lib_test.c
rename to drivers/rtc/rtc_lib_test.c
--
2.43.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v9 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (3):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (12):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option ceb/cep heuristic
tcp: accecn: AccECN ACE field multi-wrap heuristic
tcp: try to avoid safer when ACKs are thinned
.../networking/net_cachelines/tcp_sock.rst | 14 +
include/linux/tcp.h | 34 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 225 ++++++-
include/uapi/linux/tcp.h | 7 +
net/ipv4/syncookies.c | 3 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 622 +++++++++++++++++-
net/ipv4/tcp_ipv4.c | 7 +-
net/ipv4/tcp_minisocks.c | 91 ++-
net/ipv4/tcp_output.c | 325 ++++++++-
net/ipv6/syncookies.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
14 files changed, 1285 insertions(+), 96 deletions(-)
--
2.34.1
I am submitting a new selftest for the netpoll subsystem specifically
targeting the case where the RX is polling in the TX path, which is
a case that we don't have any test in the tree today. This is done when
netpoll_poll_dev() called, and this test creates a scenario when that is
probably.
The test does the following:
1) Configuring a single RX/TX queue to increase contention on the
interface.
2) Generating background traffic to saturate the network, mimicking
real-world congestion.
3) Sending netconsole messages to trigger netpoll polling and monitor
its behavior.
4) Using dynamic netconsole targets via configfs, with the ability to
delete and recreate targets during the test.
5) Running bpftrace in parallel to verify that netpoll_poll_dev() is
called when expected. If it is called, then the test passes,
otherwise the test is marked as skipped.
In order to achieve it, I stole Jakub's bpftrace helper from [1], and
did some small changes that I found useful to use the helper.
So, this patchset basically contains:
1) The code stolen from Jakub
2) Improvements on bpftrace() helper
3) The selftest itself
Link: https://lore.kernel.org/all/20250421222827.283737-22-kuba@kernel.org/ [1]
---
Changes in v3:
- Make pylint happy (Simon)
- Remove the unnecessary patch in bpftrace to raise an exception when it
fails. (Jakub)
- Improved the bpftrace code (Willem)
- Stop sending messages if bpftrace is not alive anymore.
- Link to v2: https://lore.kernel.org/r/20250625-netpoll_test-v2-0-47d27775222c@debian.org
Changes in v2:
- Stole Jakub's helper to run bpftrace
- Removed the DEBUG option and moved logs to logging
- Change the code to have a higher chance of calling netpoll_poll_dev().
In my current configuration, it is hitting multiple times during the
test.
- Save and restore TX/RX queue size (Jakub)
- Link to v1: https://lore.kernel.org/r/20250620-netpoll_test-v1-1-5068832f72fc@debian.org
---
Breno Leitao (2):
selftests: drv-net: Strip '@' prefix from bpftrace map keys
selftests: net: add netpoll basic functionality test
Jakub Kicinski (1):
selftests: drv-net: add helper/wrapper for bpftrace
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/py/__init__.py | 3 +-
.../testing/selftests/drivers/net/netpoll_basic.py | 345 +++++++++++++++++++++
tools/testing/selftests/net/lib/py/utils.py | 35 +++
4 files changed, 383 insertions(+), 1 deletion(-)
---
base-commit: 8efa26fcbf8a7f783fd1ce7dd2a409e9b7758df0
change-id: 20250612-netpoll_test-a1324d2057c8
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Fix a typo:
instaces -> instances
The typo has been identified using codespell, and the tool does not
report any additional issues in the selftests considered.
Signed-off-by: Andrea Mayer <andrea.mayer(a)uniroma2.it>
---
tools/testing/selftests/net/srv6_end_next_csid_l3vpn_test.sh | 2 +-
tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh | 2 +-
tools/testing/selftests/net/srv6_hencap_red_l3vpn_test.sh | 2 +-
tools/testing/selftests/net/srv6_hl2encap_red_l2vpn_test.sh | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/srv6_end_next_csid_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_next_csid_l3vpn_test.sh
index ba730655a7bf..4bc135e5c22c 100755
--- a/tools/testing/selftests/net/srv6_end_next_csid_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_next_csid_l3vpn_test.sh
@@ -594,7 +594,7 @@ setup_rt_local_sids()
dev "${DUMMY_DEVNAME}"
# all SIDs for VPNs start with a common locator. Routes and SRv6
- # Endpoint behavior instaces are grouped together in the 'localsid'
+ # Endpoint behavior instances are grouped together in the 'localsid'
# table.
ip -netns "${nsname}" -6 rule \
add to "${VPN_LOCATOR_SERVICE}::/16" \
diff --git a/tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh
index bedf0ce885c2..34b781a2ae74 100755
--- a/tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh
@@ -681,7 +681,7 @@ setup_rt_local_sids()
set_underlay_sids_reachability "${rt}" "${rt_neighs}"
# all SIDs for VPNs start with a common locator. Routes and SRv6
- # Endpoint behavior instaces are grouped together in the 'localsid'
+ # Endpoint behavior instances are grouped together in the 'localsid'
# table.
ip -netns "${nsname}" -6 rule \
add to "${VPN_LOCATOR_SERVICE}::/16" \
diff --git a/tools/testing/selftests/net/srv6_hencap_red_l3vpn_test.sh b/tools/testing/selftests/net/srv6_hencap_red_l3vpn_test.sh
index 3efce1718c5f..6a68c7eff1dc 100755
--- a/tools/testing/selftests/net/srv6_hencap_red_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_hencap_red_l3vpn_test.sh
@@ -395,7 +395,7 @@ setup_rt_local_sids()
dev "${VRF_DEVNAME}"
# all SIDs for VPNs start with a common locator. Routes and SRv6
- # Endpoint behavior instaces are grouped together in the 'localsid'
+ # Endpoint behavior instances are grouped together in the 'localsid'
# table.
ip -netns "${nsname}" -6 rule \
add to "${VPN_LOCATOR_SERVICE}::/16" \
diff --git a/tools/testing/selftests/net/srv6_hl2encap_red_l2vpn_test.sh b/tools/testing/selftests/net/srv6_hl2encap_red_l2vpn_test.sh
index cabc70538ffe..0979b5316fdf 100755
--- a/tools/testing/selftests/net/srv6_hl2encap_red_l2vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_hl2encap_red_l2vpn_test.sh
@@ -343,7 +343,7 @@ setup_rt_local_sids()
encap seg6local action End dev "${DUMMY_DEVNAME}"
# all SIDs for VPNs start with a common locator. Routes and SRv6
- # Endpoint behaviors instaces are grouped together in the 'localsid'
+ # Endpoint behaviors instances are grouped together in the 'localsid'
# table.
ip -netns "${nsname}" -6 rule add \
to "${VPN_LOCATOR_SERVICE}::/16" \
--
2.20.1