For a while now we have supported file handles for pidfds. This has
proven to be very useful.
Extend the concept to cover namespaces as well. After this patchset it
is possible to encode and decode namespace file handles using the
commong name_to_handle_at() and open_by_handle_at() apis.
Namespaces file descriptors can already be derived from pidfds which
means they aren't subject to overmount protection bugs. IOW, it's
irrelevant if the caller would not have access to an appropriate
/proc/<pid>/ns/ directory as they could always just derive the namespace
based on a pidfd already.
It has the same advantage as pidfds. It's possible to reliably and for
the lifetime of the system refer to a namespace without pinning any
resources and to compare them.
Permission checking is kept simple. If the caller is located in the
namespace the file handle refers to they are able to open it otherwise
they must hold privilege over the owning namespace of the relevant
namespace.
Both the network namespace and the mount namespace already have an
associated cookie that isn't recycled and is fully exposed to userspace.
Move this into ns_common and use the same id space for all namespaces so
they can trivially and reliably be compared.
There's more coming based on the iterator infrastructure but the series
is large enough and focuses on file handles.
Extensive selftests included. I still have various other test-suites to
run but it holds up so far.
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
Christian Brauner (32):
pidfs: validate extensible ioctls
nsfs: validate extensible ioctls
block: use extensible_ioctl_valid()
ns: move to_ns_common() to ns_common.h
nsfs: add nsfs.h header
ns: uniformly initialize ns_common
mnt: use ns_common_init()
ipc: use ns_common_init()
cgroup: use ns_common_init()
pid: use ns_common_init()
time: use ns_common_init()
uts: use ns_common_init()
user: use ns_common_init()
net: use ns_common_init()
ns: remove ns_alloc_inum()
nstree: make iterator generic
mnt: support iterator
cgroup: support iterator
ipc: support iterator
net: support iterator
pid: support iterator
time: support iterator
userns: support iterator
uts: support iterator
ns: add to_<type>_ns() to respective headers
nsfs: add current_in_namespace()
nsfs: support file handles
nsfs: support exhaustive file handles
nsfs: add missing id retrieval support
tools: update nsfs.h uapi header
selftests/namespaces: add identifier selftests
selftests/namespaces: add file handle selftests
block/blk-integrity.c | 8 +-
fs/fhandle.c | 6 +
fs/internal.h | 1 +
fs/mount.h | 10 +-
fs/namespace.c | 156 +--
fs/nsfs.c | 266 +++-
fs/pidfs.c | 2 +-
include/linux/cgroup.h | 5 +
include/linux/exportfs.h | 6 +
include/linux/fs.h | 14 +
include/linux/ipc_namespace.h | 5 +
include/linux/ns_common.h | 29 +
include/linux/nsfs.h | 40 +
include/linux/nsproxy.h | 11 -
include/linux/nstree.h | 89 ++
include/linux/pid_namespace.h | 5 +
include/linux/proc_ns.h | 32 +-
include/linux/time_namespace.h | 9 +
include/linux/user_namespace.h | 5 +
include/linux/utsname.h | 5 +
include/net/net_namespace.h | 6 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/nsfs.h | 12 +-
init/main.c | 2 +
ipc/msgutil.c | 1 +
ipc/namespace.c | 12 +-
ipc/shm.c | 2 +
kernel/Makefile | 2 +-
kernel/cgroup/cgroup.c | 2 +
kernel/cgroup/namespace.c | 24 +-
kernel/nstree.c | 233 ++++
kernel/pid_namespace.c | 13 +-
kernel/time/namespace.c | 23 +-
kernel/user_namespace.c | 17 +-
kernel/utsname.c | 28 +-
net/core/net_namespace.c | 59 +-
tools/include/uapi/linux/nsfs.h | 23 +-
tools/testing/selftests/namespaces/.gitignore | 2 +
tools/testing/selftests/namespaces/Makefile | 7 +
tools/testing/selftests/namespaces/config | 7 +
.../selftests/namespaces/file_handle_test.c | 1410 ++++++++++++++++++++
tools/testing/selftests/namespaces/nsid_test.c | 986 ++++++++++++++
42 files changed, 3306 insertions(+), 270 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250905-work-namespace-c68826dda0d4
This series improves the CPU cost of RX token management by adding an
attribute to NETDEV_CMD_BIND_RX that configures sockets using the
binding to avoid the xarray allocator and instead use a per-binding niov
array and a uref field in niov.
Improvement is ~13% cpu util per RX user thread.
Using kperf, the following results were observed:
Before:
Average RX worker idle %: 13.13, flows 4, test runs 11
After:
Average RX worker idle %: 26.32, flows 4, test runs 11
Two other approaches were tested, but with no improvement. Namely, 1)
using a hashmap for tokens and 2) keeping an xarray of atomic counters
but using RCU so that the hotpath could be mostly lockless. Neither of
these approaches proved better than the simple array in terms of CPU.
The attribute NETDEV_A_DMABUF_AUTORELEASE is added to toggle the
optimization. It is an optional attribute and defaults to 0 (i.e.,
optimization on).
To: David S. Miller <davem(a)davemloft.net>
To: Eric Dumazet <edumazet(a)google.com>
To: Jakub Kicinski <kuba(a)kernel.org>
To: Paolo Abeni <pabeni(a)redhat.com>
To: Simon Horman <horms(a)kernel.org>
To: Kuniyuki Iwashima <kuniyu(a)google.com>
To: Willem de Bruijn <willemb(a)google.com>
To: Neal Cardwell <ncardwell(a)google.com>
To: David Ahern <dsahern(a)kernel.org>
To: Mina Almasry <almasrymina(a)google.com>
To: Arnd Bergmann <arnd(a)arndb.de>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Andrew Lunn <andrew+netdev(a)lunn.ch>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Stanislav Fomichev <sdf(a)fomichev.me>
Cc: netdev(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)meta.com>
Changes in v7:
- use netlink instead of sockopt (Stan)
- restrict system to only one mode, dmabuf bindings can not co-exist
with different modes (Stan)
- use static branching to enforce single system-wide mode (Stan)
- Link to v6: https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-u…
Changes in v6:
- renamed 'net: devmem: use niov array for token management' to refer to
optionality of new config
- added documentation and tests
- make autorelease flag per-socket sockopt instead of binding
field / sysctl
- many per-patch changes (see Changes sections per-patch)
- Link to v5: https://lore.kernel.org/r/20251023-scratch-bobbyeshleman-devmem-tcp-token-u…
Changes in v5:
- add sysctl to opt-out of performance benefit, back to old token release
- Link to v4: https://lore.kernel.org/all/20250926-scratch-bobbyeshleman-devmem-tcp-token…
Changes in v4:
- rebase to net-next
- Link to v3: https://lore.kernel.org/r/20250926-scratch-bobbyeshleman-devmem-tcp-token-u…
Changes in v3:
- make urefs per-binding instead of per-socket, reducing memory
footprint
- fallback to cleaning up references in dmabuf unbind if socket
leaked tokens
- drop ethtool patch
- Link to v2: https://lore.kernel.org/r/20250911-scratch-bobbyeshleman-devmem-tcp-token-u…
Changes in v2:
- net: ethtool: prevent user from breaking devmem single-binding rule
(Mina)
- pre-assign niovs in binding->vec for RX case (Mina)
- remove WARNs on invalid user input (Mina)
- remove extraneous binding ref get (Mina)
- remove WARN for changed binding (Mina)
- always use GFP_ZERO for binding->vec (Mina)
- fix length of alloc for urefs
- use atomic_set(, 0) to initialize sk_user_frags.urefs
- Link to v1: https://lore.kernel.org/r/20250902-scratch-bobbyeshleman-devmem-tcp-token-u…
---
Bobby Eshleman (5):
net: devmem: rename tx_vec to vec in dmabuf binding
net: devmem: refactor sock_devmem_dontneed for autorelease split
net: devmem: implement autorelease token management
net: devmem: document NETDEV_A_DMABUF_AUTORELEASE netlink attribute
selftests: drv-net: devmem: add autorelease tests
Documentation/netlink/specs/netdev.yaml | 12 +++
Documentation/networking/devmem.rst | 70 +++++++++++++
include/net/netmem.h | 1 +
include/net/sock.h | 7 +-
include/uapi/linux/netdev.h | 1 +
net/core/devmem.c | 121 ++++++++++++++++++----
net/core/devmem.h | 13 ++-
net/core/netdev-genl-gen.c | 5 +-
net/core/netdev-genl.c | 13 ++-
net/core/sock.c | 103 ++++++++++++++----
net/ipv4/tcp.c | 78 +++++++++++---
net/ipv4/tcp_ipv4.c | 13 ++-
net/ipv4/tcp_minisocks.c | 3 +-
tools/include/uapi/linux/netdev.h | 1 +
tools/testing/selftests/drivers/net/hw/devmem.py | 22 +++-
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 19 ++--
16 files changed, 401 insertions(+), 81 deletions(-)
---
base-commit: 4c52142904b33b41c3ff7ee58670b4e3b3bf1120
change-id: 20250829-scratch-bobbyeshleman-devmem-tcp-token-upstream-292be174d503
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)meta.com>
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Plesae find the v5 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
v6:
- Update comment in #3 to highlight RX path is only used for virtio-net (Paolo Abeni <pabeni(a)redhat.com>)
- Rename TCP_CONG_WANTS_ECT_1 to TCP_CONG_ECT_1_NEGOTIATION to distiguish from TCP_CONG_ECT_1_ESTABLISH (Paolo Abeni <pabeni(a)redhat.com>)
- Move TCP_CONG_ECT_1_ESTABLISH in #6 to latter patch series (Paolo Abeni <pabeni(a)redhat.com>)
- Add new synack_type instead of moving the increment of num_retran in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Use new synack_type TCP_SYNACK_RETRANS and num_retrans for SYN/ACK retx fallbackk for AccECN in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Do not cast const struct into non-const in #11, and set AccECN fail mode after tcp_rtx_synack() (Paolo Abeni <pabeni(a)redhat.com>)
v5:
- Move previous #11 in v4 in latter patch after discussion with RFC author.
- Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav(a)nvidia.com>)
- Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet(a)google.com>)
- Add fixes: tag into #7 (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message of #8 and if condition check (Paolo Abeni <pabeni(a)redhat.com>)
- Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni(a)redhat.com>)
v4:
- Add previous #13 in v2 back after dicussion with the RFC author.
- Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option.
v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni(a)redhat.com>)
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni(a)redhat.com>)
- Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni(a)redhat.com>)
- Remove unnecessary brackets in #10. (Paolo Abeni <pabeni(a)redhat.com>)
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni(a)redhat.com>)
---
Chia-Yu Chang (12):
net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN
selftests/net: gro: add self-test for TCP CWR flag
tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: add TCP_SYNACK_RETRANS synack_type
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: detect loss ACK w/ AccECN option and add
TCP_ACCECN_OPTION_PERSIST
tcp: accecn: enable AccECN
Ilpo Järvinen (2):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
Documentation/networking/ip-sysctl.rst | 4 +-
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/skbuff.h | 14 ++-
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 32 ++++++-
include/net/tcp_ecn.h | 92 ++++++++++++++-----
net/ipv4/inet_connection_sock.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 4 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 5 +-
net/ipv4/tcp_input.c | 37 +++++++-
net/ipv4/tcp_minisocks.c | 46 +++++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 32 ++++---
net/ipv4/tcp_timer.c | 3 +
tools/testing/selftests/net/gro.c | 80 +++++++++++-----
17 files changed, 295 insertions(+), 88 deletions(-)
--
2.34.1
This patch set introduces the BPF_F_CPU and BPF_F_ALL_CPUS flags for
percpu maps, as the requirement of BPF_F_ALL_CPUS flag for percpu_array
maps was discussed in the thread of
"[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data"[1].
The goal of BPF_F_ALL_CPUS flag is to reduce data caching overhead in light
skeletons by allowing a single value to be reused to update values across all
CPUs. This avoids the M:N problem where M cached values are used to update a
map on N CPUs kernel.
The BPF_F_CPU flag is accompanied by *flags*-embedded cpu info, which
specifies the target CPU for the operation:
* For lookup operations: the flag field alongside cpu info enable querying
a value on the specified CPU.
* For update operations: the flag field alongside cpu info enable
updating value for specified CPU.
Links:
[1] https://lore.kernel.org/bpf/20250526162146.24429-1-leon.hwang@linux.dev/
Changes:
v10 -> v11:
* Support the combination of BPF_EXIST and BPF_F_CPU/BPF_F_ALL_CPUS for
update operations.
* Fix unstable lru_percpu_hash map test using the combination of
BPF_EXIST and BPF_F_CPU/BPF_F_ALL_CPUS to avoid LRU eviction
(reported by Alexei).
v9 -> v10:
* Add tests to verify array and hash maps do not support BPF_F_CPU and
BPF_F_ALL_CPUS flags.
* Address comment from Andrii:
* Copy map value using copy_map_value_long for percpu_cgroup_storage
maps in a separate patch.
v8 -> v9:
* Change value type from u64 to u32 in selftests.
* Address comments from Andrii:
* Keep value_size unaligned and update everywhere for consistency when
cpu flags are specified.
* Update value by getting pointer for percpu hash and percpu
cgroup_storage maps.
v7 -> v8:
* Address comments from Andrii:
* Check BPF_F_LOCK when update percpu_array, percpu_hash and
lru_percpu_hash maps.
* Refactor flags check in __htab_map_lookup_and_delete_batch().
* Keep value_size unaligned and copy value using copy_map_value() in
__htab_map_lookup_and_delete_batch() when BPF_F_CPU is specified.
* Update warn message in libbpf's validate_map_op().
* Update comment of libbpf's bpf_map__lookup_elem().
v6 -> v7:
* Get correct value size for percpu_hash and lru_percpu_hash in
update_batch API.
* Set 'count' as 'max_entries' in test cases for lookup_batch API.
* Address comment from Alexei:
* Move cpu flags check into bpf_map_check_op_flags().
v5 -> v6:
* Move bpf_map_check_op_flags() from 'bpf.h' to 'syscall.c'.
* Address comments from Alexei:
* Drop the refactoring code of data copying logic for percpu maps.
* Drop bpf_map_check_op_flags() wrappers.
v4 -> v5:
* Address comments from Andrii:
* Refactor data copying logic for all percpu maps.
* Drop this_cpu_ptr() micro-optimization.
* Drop cpu check in libbpf's validate_map_op().
* Enhance bpf_map_check_op_flags() using *allowed flags* instead of
'extra_flags_mask'.
v3 -> v4:
* Address comments from Andrii:
* Remove unnecessary map_type check in bpf_map_value_size().
* Reduce code churn.
* Remove unnecessary do_delete check in
__htab_map_lookup_and_delete_batch().
* Introduce bpf_percpu_copy_to_user() and bpf_percpu_copy_from_user().
* Rename check_map_flags() to bpf_map_check_op_flags() with
extra_flags_mask.
* Add human-readable pr_warn() explanations in validate_map_op().
* Use flags in bpf_map__delete_elem() and
bpf_map__lookup_and_delete_elem().
* Drop "for alignment reasons".
v3 link: https://lore.kernel.org/bpf/20250821160817.70285-1-leon.hwang@linux.dev/
v2 -> v3:
* Address comments from Alexei:
* Use BPF_F_ALL_CPUS instead of BPF_ALL_CPUS magic.
* Introduce these two cpu flags for all percpu maps.
* Address comments from Jiri:
* Reduce some unnecessary u32 cast.
* Refactor more generic map flags check function.
* A code style issue.
v2 link: https://lore.kernel.org/bpf/20250805163017.17015-1-leon.hwang@linux.dev/
v1 -> v2:
* Address comments from Andrii:
* Embed cpu info as high 32 bits of *flags* totally.
* Use ERANGE instead of E2BIG.
* Few format issues.
Leon Hwang (8):
bpf: Introduce internal bpf_map_check_op_flags helper function
bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags
bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array
maps
bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash
and lru_percpu_hash maps
bpf: Copy map value using copy_map_value_long for
percpu_cgroup_storage maps
bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for
percpu_cgroup_storage maps
libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps
selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags
include/linux/bpf-cgroup.h | 4 +-
include/linux/bpf.h | 44 ++-
include/uapi/linux/bpf.h | 2 +
kernel/bpf/arraymap.c | 32 +-
kernel/bpf/hashtab.c | 96 +++--
kernel/bpf/local_storage.c | 27 +-
kernel/bpf/syscall.c | 68 ++--
tools/include/uapi/linux/bpf.h | 2 +
tools/lib/bpf/bpf.h | 8 +
tools/lib/bpf/libbpf.c | 26 +-
tools/lib/bpf/libbpf.h | 21 +-
.../selftests/bpf/prog_tests/percpu_alloc.c | 335 ++++++++++++++++++
.../selftests/bpf/progs/percpu_alloc_array.c | 32 ++
13 files changed, 590 insertions(+), 107 deletions(-)
--
2.51.2
The bench test "trig-kernel-count" can be used as a baseline comparison
for fentry and other benchmarks, and the calling to bpf_get_numa_node_id()
should be considered as composition of the baseline. So, let's call it in
trigger_count(). Meanwhile, rename trigger_count() to
trigger_kernel_count() to make it easier understand.
Signed-off-by: Menglong Dong <dongml2(a)chinatelecom.cn>
---
tools/testing/selftests/bpf/benchs/bench_trigger.c | 4 ++--
tools/testing/selftests/bpf/progs/trigger_bench.c | 6 ++++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 1e2aff007c2a..34018fc3927f 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -180,10 +180,10 @@ static void trigger_kernel_count_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
- bpf_program__set_autoload(ctx.skel->progs.trigger_count, true);
+ bpf_program__set_autoload(ctx.skel->progs.trigger_kernel_count, true);
load_ctx();
/* override driver program */
- ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_count);
+ ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_kernel_count);
}
static void trigger_kprobe_setup(void)
diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c
index 3d5f30c29ae3..2898b3749d07 100644
--- a/tools/testing/selftests/bpf/progs/trigger_bench.c
+++ b/tools/testing/selftests/bpf/progs/trigger_bench.c
@@ -42,12 +42,14 @@ int bench_trigger_uprobe_multi(void *ctx)
const volatile int batch_iters = 0;
SEC("?raw_tp")
-int trigger_count(void *ctx)
+int trigger_kernel_count(void *ctx)
{
int i;
- for (i = 0; i < batch_iters; i++)
+ for (i = 0; i < batch_iters; i++) {
inc_counter();
+ bpf_get_numa_node_id();
+ }
return 0;
}
--
2.51.2
This series adds support for tests that use multiple devices, and adds
one new test, vfio_pci_device_init_perf_test, which measures parallel
device initialization time to demonstrate the improvement from commit
e908f58b6beb ("vfio/pci: Separate SR-IOV VF dev_set").
This series also breaks apart the monolithic vfio_util.h and
vfio_pci_device.c into separate files, to account for all the new code.
This required quite a bit of code motion so the diffstat looks large.
The final layout is more granular and provides a better separation of
the IOMMU code from the device code.
Final layout:
C files:
- tools/testing/selftests/vfio/lib/libvfio.c
- tools/testing/selftests/vfio/lib/iommu.c
- tools/testing/selftests/vfio/lib/iova_allocator.c
- tools/testing/selftests/vfio/lib/vfio_pci_device.c
- tools/testing/selftests/vfio/lib/vfio_pci_driver.c
H files:
- tools/testing/selftests/vfio/lib/include/libvfio.h
- tools/testing/selftests/vfio/lib/include/libvfio/assert.h
- tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
- tools/testing/selftests/vfio/lib/include/libvfio/iova_allocator.h
- tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
- tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_driver.h
Notably, vfio_util.h is now gone and replaced with libvfio.h.
This series is based on vfio/next plus Alex Mastro's series to add the
IOVA allocator [1]. It should apply cleanly to vfio/next once Alex's
series is merged by Linus into the next 6.18 rc and then merged into
vfio/next.
This series can be found on GitHub:
https://github.com/dmatlack/linux/tree/vfio/selftests/init_perf_test/v3
[1] https://lore.kernel.org/kvm/20251111-iova-ranges-v3-0-7960244642c5@fb.com/
Cc: Alex Mastro <amastro(a)fb.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Josh Hilke <jrhilke(a)google.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Vipin Sharma <vipinsh(a)google.com>
v3:
- Replace literal with NSEC_PER_SEC (Alex Mastro)
- Fix Makefile accumulate vs. assignment (Alex Mastro)
v2: https://lore.kernel.org/kvm/20251112192232.442761-1-dmatlack@google.com/
v1: https://lore.kernel.org/kvm/20251008232531.1152035-1-dmatlack@google.com/
David Matlack (18):
vfio: selftests: Move run.sh into scripts directory
vfio: selftests: Split run.sh into separate scripts
vfio: selftests: Allow passing multiple BDFs on the command line
vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode
vfio: selftests: Introduce struct iommu
vfio: selftests: Support multiple devices in the same
container/iommufd
vfio: selftests: Eliminate overly chatty logging
vfio: selftests: Prefix logs with device BDF where relevant
vfio: selftests: Upgrade driver logging to dev_err()
vfio: selftests: Rename struct vfio_dma_region to dma_region
vfio: selftests: Move IOMMU library code into iommu.c
vfio: selftests: Move IOVA allocator into iova_allocator.c
vfio: selftests: Stop passing device for IOMMU operations
vfio: selftests: Rename vfio_util.h to libvfio.h
vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c
vfio: selftests: Split libvfio.h into separate header files
vfio: selftests: Eliminate INVALID_IOVA
vfio: selftests: Add vfio_pci_device_init_perf_test
tools/testing/selftests/vfio/Makefile | 10 +-
.../selftests/vfio/lib/drivers/dsa/dsa.c | 36 +-
.../selftests/vfio/lib/drivers/ioat/ioat.c | 18 +-
.../selftests/vfio/lib/include/libvfio.h | 26 +
.../vfio/lib/include/libvfio/assert.h | 54 ++
.../vfio/lib/include/libvfio/iommu.h | 76 +++
.../vfio/lib/include/libvfio/iova_allocator.h | 23 +
.../lib/include/libvfio/vfio_pci_device.h | 125 ++++
.../lib/include/libvfio/vfio_pci_driver.h | 97 +++
.../selftests/vfio/lib/include/vfio_util.h | 331 -----------
tools/testing/selftests/vfio/lib/iommu.c | 465 +++++++++++++++
.../selftests/vfio/lib/iova_allocator.c | 94 +++
tools/testing/selftests/vfio/lib/libvfio.c | 78 +++
tools/testing/selftests/vfio/lib/libvfio.mk | 5 +-
.../selftests/vfio/lib/vfio_pci_device.c | 555 +-----------------
.../selftests/vfio/lib/vfio_pci_driver.c | 16 +-
tools/testing/selftests/vfio/run.sh | 109 ----
.../testing/selftests/vfio/scripts/cleanup.sh | 41 ++
tools/testing/selftests/vfio/scripts/lib.sh | 42 ++
tools/testing/selftests/vfio/scripts/run.sh | 16 +
tools/testing/selftests/vfio/scripts/setup.sh | 48 ++
.../selftests/vfio/vfio_dma_mapping_test.c | 46 +-
.../selftests/vfio/vfio_iommufd_setup_test.c | 2 +-
.../vfio/vfio_pci_device_init_perf_test.c | 168 ++++++
.../selftests/vfio/vfio_pci_device_test.c | 12 +-
.../selftests/vfio/vfio_pci_driver_test.c | 51 +-
26 files changed, 1481 insertions(+), 1063 deletions(-)
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/assert.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/iova_allocator.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_driver.h
delete mode 100644 tools/testing/selftests/vfio/lib/include/vfio_util.h
create mode 100644 tools/testing/selftests/vfio/lib/iommu.c
create mode 100644 tools/testing/selftests/vfio/lib/iova_allocator.c
create mode 100644 tools/testing/selftests/vfio/lib/libvfio.c
delete mode 100755 tools/testing/selftests/vfio/run.sh
create mode 100755 tools/testing/selftests/vfio/scripts/cleanup.sh
create mode 100755 tools/testing/selftests/vfio/scripts/lib.sh
create mode 100755 tools/testing/selftests/vfio/scripts/run.sh
create mode 100755 tools/testing/selftests/vfio/scripts/setup.sh
create mode 100644 tools/testing/selftests/vfio/vfio_pci_device_init_perf_test.c
base-commit: fa804aa4ac1b091ef2ec2981f08a1c28aaeba8e7
prerequisite-patch-id: dcf23dcc1198960bda3102eefaa21df60b2e4c54
prerequisite-patch-id: e32e56d5bf7b6c7dd40d737aa3521560407e00f5
prerequisite-patch-id: 4f79a41bf10a4c025ba5f433551b46035aa15878
prerequisite-patch-id: f903a45f0c32319138cd93a007646ab89132b18c
--
2.52.0.rc2.455.g230fcf2819-goog
Hi Shuah,
Thanks for pointing that out. Apologies for missing the mailing lists earlier.
Resending this follow-up with the correct CC list and in plain text format.
Please let me know if there’s anything else I should improve in this patch.
I’m happy to resend it as v4 if needed.
Thanks,
Sameeksha
On Mon, 24 Nov 2025 at 23:59, Shuah Khan <skhan(a)linuxfoundation.org> wrote:
>
> On 11/21/25 23:21, Sameeksha Sankpal wrote:
> > Hi,
> > Just following up on this patch.
> > It’s been a few months, so I wanted to check if there is anything else I
> > should address or improve to move it forward.
>
> I see that you didn't cc any mailing list on this email? Please keep
> everybody in the loop when you send responses.
>
> >
> > Thanks,
> > Sameeksha Sankpal
> >
> > On Fri, 30 May 2025 at 04:25, Sameeksha Sankpal <sameekshasankpal(a)gmail.com>
> > wrote:
> >
> >> Rebase the error logging enhancement for get_proc_stat() against the
> >> upstream seccomp tree with proper indentation formatting.
> >>
> >> Suggested-by: Kees Cook <kees(a)kernel.org>
> >> Signed-off-by: Sameeksha Sankpal <sameekshasankpal(a)gmail.com>
> >> ---
> >> v1 -> v2:
> >> - Used TH_LOG instead of printf for error logging
> >> - Moved variable declaration to the top of the function
> >> - Applied review suggestion by Kees Cook
> >>
> >> v2 -> v3:
> >> - Rebased against upstream seccomp tree (was previously against v1)
> >> - Fixed indentation to use tabs instead of spaces
> >> - Used scripts/checkpatch.pl to check the patch for common errors
> >> - Removed the blank line beforeS S-o-b added in v2
> >>
> >> tools/testing/selftests/seccomp/seccomp_bpf.c | 5 +++++
> >> 1 file changed, 5 insertions(+)
> >>
> >> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c
> >> b/tools/testing/selftests/seccomp/seccomp_bpf.c
> >> index 61acbd45ffaa..dbd7e705a2af 100644
> >> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> >> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> >> @@ -4508,9 +4508,14 @@ static char get_proc_stat(struct __test_metadata
> >> *_metadata, pid_t pid)
> >> char proc_path[100] = {0};
> >> char status;
> >> char *line;
> >> + int rc;
> >>
> >> snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid);
> >> ASSERT_EQ(get_nth(_metadata, proc_path, 3, &line), 1);
> >> + rc = get_nth(_metadata, proc_path, 3, &line);
> >> + ASSERT_EQ(rc, 1) {
> >> + TH_LOG("user_notification_fifo: failed to read stat for
> >> PID %d (rc=%d)", pid, rc);
> >> + }
> >>
> >> status = *line;
> >> free(line);
> >> --
> >> 2.43.0
> >>
> >>
> >
> thanks,
> -- Shuah