This patch set aims to allow ublk server threads to better balance load
amongst themselves by decoupling server threads from ublk_queues/hctxs,
so that multiple threads can service I/Os that are issued from a single
CPU. This can improve performance for workloads in which ublk server CPU
is a bottleneck, and for which load is issued from CPUs which are not
balanced across ublk_queues/hctxs.
Performance
-----------
First create two ublk devices with:
ublkb0: ./kublk add -t null -q 2 --nthreads 2
ublkb1: ./kublk add -t null -q 2 --nthreads 2 --per_io_tasks
Then run load with:
taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb0: 1.90M IOPS
taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb1: 2.18M IOPS
Since ublkb1 has per-io-tasks, the second command is able to make use of
both ublk server worker threads and therefore has increased max
throughput.
Caveats:
- This testing was done on a system with 2 numa nodes, but the penalty
of having I/O cross a numa (or LLC) boundary in the per_io_tasks case
is quite high. So these numbers were obtained after moving all ublk
server threads and the application threads to CPUs on the same numa
node/LLC.
- One might expect the scaling to be linear - because ublkb1 can make
use of twice as many ublk server threads, it should be able to drive
twice the throughput. However this is not true (the improvement is
~15%), and needs further investigation.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v7:
- Fix queue_rqs batch dispatch for per-io daemons
- Kick round-robin tag allocation changes to a followup
- Add explicit feature flag for per-task daemons (Ming Lei, Caleb Sander
Mateos)
- Move some variable assignments to avoid redundant computation (Caleb
Sander Mateos)
- Switch from storing pointers in ublk_io to computing based on address
with container_of in a couple places (Ming Lei)
- Link to v6: https://lore.kernel.org/r/20250507-ublk_task_per_io-v6-0-a2a298783c01@pures…
Changes in v6:
- Add a feature flag for this feature, called UBLK_F_RR_TAGS (Ming Lei)
- Add test for this feature (Ming Lei)
- Add documentation for this feature (Ming Lei)
- Link to v5: https://lore.kernel.org/r/20250416-ublk_task_per_io-v5-0-9261ad7bff20@pures…
Changes in v5:
- Set io->task before ublk_mark_io_ready (Caleb Sander Mateos)
- Set io->task atomically, read it atomically when needed
- Return 0 on success from command-specific helpers in
__ublk_ch_uring_cmd (Caleb Sander Mateos)
- Rename ublk_handle_need_get_data to ublk_get_data (Caleb Sander
Mateos)
- Link to v4: https://lore.kernel.org/r/20250415-ublk_task_per_io-v4-0-54210b91a46f@pures…
Changes in v4:
- Drop "ublk: properly serialize all FETCH_REQs" since Ming is taking it
in another set
- Prevent data races by marking data structures which should be
read-only in the I/O path as const (Ming Lei)
- Link to v3: https://lore.kernel.org/r/20250410-ublk_task_per_io-v3-0-b811e8f4554a@pures…
Changes in v3:
- Check for UBLK_IO_FLAG_ACTIVE on I/O again after taking lock to ensure
that two concurrent FETCH_REQs on the same I/O can't succeed (Caleb
Sander Mateos)
- Link to v2: https://lore.kernel.org/r/20250408-ublk_task_per_io-v2-0-b97877e6fd50@pures…
Changes in v2:
- Remove changes split into other patches
- To ease error handling/synchronization, associate each I/O (instead of
each queue) to the last task that issues a FETCH_REQ against it. Only
that task is allowed to operate on the I/O.
- Link to v1: https://lore.kernel.org/r/20241002224437.3088981-1-ushankar@purestorage.com
---
Uday Shankar (8):
ublk: have a per-io daemon instead of a per-queue daemon
selftests: ublk: kublk: plumb q_id in io_uring user_data
selftests: ublk: kublk: tie sqe allocation to io instead of queue
selftests: ublk: kublk: lift queue initialization out of thread
selftests: ublk: kublk: move per-thread data out of ublk_queue
selftests: ublk: kublk: decouple ublk_queues from ublk server threads
selftests: ublk: add test for per io daemons
Documentation: ublk: document UBLK_F_PER_IO_DAEMON
Documentation/block/ublk.rst | 35 ++-
drivers/block/ublk_drv.c | 108 +++----
include/uapi/linux/ublk_cmd.h | 9 +
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/fault_inject.c | 4 +-
tools/testing/selftests/ublk/file_backed.c | 20 +-
tools/testing/selftests/ublk/kublk.c | 345 ++++++++++++++-------
tools/testing/selftests/ublk/kublk.h | 73 +++--
tools/testing/selftests/ublk/null.c | 22 +-
tools/testing/selftests/ublk/stripe.c | 17 +-
tools/testing/selftests/ublk/test_generic_12.sh | 55 ++++
.../selftests/ublk/trace/count_ios_per_tid.bt | 11 +
12 files changed, 470 insertions(+), 230 deletions(-)
---
base-commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d
change-id: 20250408-ublk_task_per_io-c693cf608d7a
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
In some application like bpftrace [1], need to get the cwd from the pid.
This patch provides a new kfunc that can get the cwd of the process from
the pid.
[1] https://github.com/bpftrace/bpftrace/issues/3314
Rong Tao (2):
bpf: Add bpf_task_cwd_from_pid() kfunc
selftests/bpf: Add selftests for bpf_task_cwd_from_pid()
kernel/bpf/helpers.c | 45 ++++++++++++++++++
.../selftests/bpf/prog_tests/task_kfunc.c | 3 ++
.../selftests/bpf/progs/task_kfunc_common.h | 1 +
.../selftests/bpf/progs/task_kfunc_success.c | 47 +++++++++++++++++++
4 files changed, 96 insertions(+)
--
2.49.0
This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and
removing a dummy interface and then confirming that the system
correctly receives join and removal notifications for the 224.0.0.1
and ff02::1 multicast addresses.
The test relies on the iproute2 version to be 6.13+.
Tested by the following command:
$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \
TEST_GEN_PROGS="" run_tests
Cc: Maciej Żenczykowski <maze(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com>
---
tools/testing/selftests/net/rtnetlink.sh | 34 ++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 2e8243a65b50..9dbcaaeaf8cd 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -21,6 +21,7 @@ ALL_TESTS="
kci_test_vrf
kci_test_encap
kci_test_macsec
+ kci_test_mcast_addr_notification
kci_test_ipsec
kci_test_ipsec_offload
kci_test_fdb_get
@@ -1334,6 +1335,39 @@ kci_test_mngtmpaddr()
return $ret
}
+kci_test_mcast_addr_notification()
+{
+ local tmpfile
+ local monitor_pid
+ local match_result
+
+ tmpfile=$(mktemp)
+
+ ip monitor maddr > $tmpfile &
+ monitor_pid=$!
+ sleep 1
+
+ run_cmd ip link add name test-dummy1 type dummy
+ run_cmd ip link set test-dummy1 up
+ run_cmd ip link del dev test-dummy1
+ sleep 1
+
+ match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile)
+
+ kill $monitor_pid
+ rm $tmpfile
+ # There should be 4 line matches as follows.
+ # 13: test-dummy1 inet6 mcast ff02::1 scope global
+ # 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global
+ if [ $match_result -ne 4 ];then
+ end_test "FAIL: mcast addr notification"
+ return 1
+ fi
+ end_test "PASS: mcast addr notification"
+}
+
kci_test_rtnl()
{
local current_test
--
2.49.0.1204.g71687c7c1d-goog
Let's test some basic functionality using /dev/mem. These tests will
implicitly cover some PAT (Page Attribute Handling) handling on x86.
These tests will only run when /dev/mem access to the first two pages
in physical address space is possible and allowed; otherwise, the tests
are skipped.
On current x86-64 with PAT inside a VM, all tests pass:
TAP version 13
1..6
# Starting 6 tests from 1 test cases.
# RUN pfnmap.madvise_disallowed ...
# OK pfnmap.madvise_disallowed
ok 1 pfnmap.madvise_disallowed
# RUN pfnmap.munmap_split ...
# OK pfnmap.munmap_split
ok 2 pfnmap.munmap_split
# RUN pfnmap.mremap_fixed ...
# OK pfnmap.mremap_fixed
ok 3 pfnmap.mremap_fixed
# RUN pfnmap.mremap_shrink ...
# OK pfnmap.mremap_shrink
ok 4 pfnmap.mremap_shrink
# RUN pfnmap.mremap_expand ...
# OK pfnmap.mremap_expand
ok 5 pfnmap.mremap_expand
# RUN pfnmap.fork ...
# OK pfnmap.fork
ok 6 pfnmap.fork
# PASSED: 6 / 6 tests passed.
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
However, we are able to trigger:
[ 27.888251] x86/PAT: pfnmap:1790 freeing invalid memtype [mem 0x00000000-0x00000fff]
There are probably more things worth testing in the future, such as
MAP_PRIVATE handling. But this set of tests is sufficient to cover most of
the things we will rework regarding PAT handling.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
Hopefully I didn't miss any review feedback.
v1 -> v2:
* Rewrite using kselftest_harness, which simplifies a lot of things
* Add to .gitignore and run_vmtests.sh
* Register signal handler on demand
* Use volatile trick to force a read (not factoring out FORCE_READ just yet)
* Drop mprotect() test case
* Add some more comments why we test certain things
* Use NULL for mmap() first parameter instead of 0
* Smaller fixes
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/pfnmap.c | 196 ++++++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
4 files changed, 202 insertions(+)
create mode 100644 tools/testing/selftests/mm/pfnmap.c
diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index 91db34941a143..824266982aa36 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -20,6 +20,7 @@ mremap_test
on-fault-limit
transhuge-stress
pagemap_ioctl
+pfnmap
*.tmp*
protection_keys
protection_keys_32
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index ad4d6043a60f0..ae6f994d3add7 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -84,6 +84,7 @@ TEST_GEN_FILES += mremap_test
TEST_GEN_FILES += mseal_test
TEST_GEN_FILES += on-fault-limit
TEST_GEN_FILES += pagemap_ioctl
+TEST_GEN_FILES += pfnmap
TEST_GEN_FILES += thuge-gen
TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += uffd-stress
diff --git a/tools/testing/selftests/mm/pfnmap.c b/tools/testing/selftests/mm/pfnmap.c
new file mode 100644
index 0000000000000..8a9d19b6020c7
--- /dev/null
+++ b/tools/testing/selftests/mm/pfnmap.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Basic VM_PFNMAP tests relying on mmap() of '/dev/mem'
+ *
+ * Copyright 2025, Red Hat, Inc.
+ *
+ * Author(s): David Hildenbrand <david(a)redhat.com>
+ */
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <linux/mman.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include "../kselftest_harness.h"
+#include "vm_util.h"
+
+static sigjmp_buf sigjmp_buf_env;
+
+static void signal_handler(int sig)
+{
+ siglongjmp(sigjmp_buf_env, -EFAULT);
+}
+
+static int test_read_access(char *addr, size_t size, size_t pagesize)
+{
+ size_t offs;
+ int ret;
+
+ if (signal(SIGSEGV, signal_handler) == SIG_ERR)
+ return -EINVAL;
+
+ ret = sigsetjmp(sigjmp_buf_env, 1);
+ if (!ret) {
+ for (offs = 0; offs < size; offs += pagesize)
+ /* Force a read that the compiler cannot optimize out. */
+ *((volatile char *)(addr + offs));
+ }
+ if (signal(SIGSEGV, signal_handler) == SIG_ERR)
+ return -EINVAL;
+
+ return ret;
+}
+
+FIXTURE(pfnmap)
+{
+ size_t pagesize;
+ int dev_mem_fd;
+ char *addr1;
+ size_t size1;
+ char *addr2;
+ size_t size2;
+};
+
+FIXTURE_SETUP(pfnmap)
+{
+ self->pagesize = getpagesize();
+
+ self->dev_mem_fd = open("/dev/mem", O_RDONLY);
+ if (self->dev_mem_fd < 0)
+ SKIP(return, "Cannot open '/dev/mem'\n");
+
+ /* We'll require the first two pages throughout our tests ... */
+ self->size1 = self->pagesize * 2;
+ self->addr1 = mmap(NULL, self->size1, PROT_READ, MAP_SHARED,
+ self->dev_mem_fd, 0);
+ if (self->addr1 == MAP_FAILED)
+ SKIP(return, "Cannot mmap '/dev/mem'\n");
+
+ /* ... and want to be able to read from them. */
+ if (test_read_access(self->addr1, self->size1, self->pagesize))
+ SKIP(return, "Cannot read-access mmap'ed '/dev/mem'\n");
+
+ self->size2 = 0;
+ self->addr2 = MAP_FAILED;
+}
+
+FIXTURE_TEARDOWN(pfnmap)
+{
+ if (self->addr2 != MAP_FAILED)
+ munmap(self->addr2, self->size2);
+ if (self->addr1 != MAP_FAILED)
+ munmap(self->addr1, self->size1);
+ if (self->dev_mem_fd >= 0)
+ close(self->dev_mem_fd);
+}
+
+TEST_F(pfnmap, madvise_disallowed)
+{
+ int advices[] = {
+ MADV_DONTNEED,
+ MADV_DONTNEED_LOCKED,
+ MADV_FREE,
+ MADV_WIPEONFORK,
+ MADV_COLD,
+ MADV_PAGEOUT,
+ MADV_POPULATE_READ,
+ MADV_POPULATE_WRITE,
+ };
+ int i;
+
+ /* All these advices must be rejected. */
+ for (i = 0; i < ARRAY_SIZE(advices); i++) {
+ EXPECT_LT(madvise(self->addr1, self->pagesize, advices[i]), 0);
+ EXPECT_EQ(errno, EINVAL);
+ }
+}
+
+TEST_F(pfnmap, munmap_split)
+{
+ /*
+ * Unmap the first page. This munmap() call is not really expected to
+ * fail, but we might be able to trigger other internal issues.
+ */
+ ASSERT_EQ(munmap(self->addr1, self->pagesize), 0);
+
+ /*
+ * Remap the first page while the second page is still mapped. This
+ * makes sure that any PAT tracking on x86 will allow for mmap()'ing
+ * a page again while some parts of the first mmap() are still
+ * around.
+ */
+ self->size2 = self->pagesize;
+ self->addr2 = mmap(NULL, self->pagesize, PROT_READ, MAP_SHARED,
+ self->dev_mem_fd, 0);
+ ASSERT_NE(self->addr2, MAP_FAILED);
+}
+
+TEST_F(pfnmap, mremap_fixed)
+{
+ char *ret;
+
+ /* Reserve a destination area. */
+ self->size2 = self->size1;
+ self->addr2 = mmap(NULL, self->size2, PROT_READ, MAP_ANON | MAP_PRIVATE,
+ -1, 0);
+ ASSERT_NE(self->addr2, MAP_FAILED);
+
+ /* mremap() over our destination. */
+ ret = mremap(self->addr1, self->size1, self->size2,
+ MREMAP_FIXED | MREMAP_MAYMOVE, self->addr2);
+ ASSERT_NE(ret, MAP_FAILED);
+}
+
+TEST_F(pfnmap, mremap_shrink)
+{
+ char *ret;
+
+ /* Shrinking is expected to work. */
+ ret = mremap(self->addr1, self->size1, self->size1 - self->pagesize, 0);
+ ASSERT_NE(ret, MAP_FAILED);
+}
+
+TEST_F(pfnmap, mremap_expand)
+{
+ /*
+ * Growing is not expected to work, and getting it right would
+ * be challenging. So this test primarily serves as an early warning
+ * that something that probably should never work suddenly works.
+ */
+ self->size2 = self->size1 + self->pagesize;
+ self->addr2 = mremap(self->addr1, self->size1, self->size2, MREMAP_MAYMOVE);
+ ASSERT_EQ(self->addr2, MAP_FAILED);
+}
+
+TEST_F(pfnmap, fork)
+{
+ pid_t pid;
+ int ret;
+
+ /* fork() a child and test if the child can access the pages. */
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (!pid) {
+ EXPECT_EQ(test_read_access(self->addr1, self->size1,
+ self->pagesize), 0);
+ exit(0);
+ }
+
+ wait(&ret);
+ if (WIFEXITED(ret))
+ ret = WEXITSTATUS(ret);
+ else
+ ret = -EINVAL;
+ ASSERT_EQ(ret, 0);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 188b125bf1f6b..dddd1dd8af145 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -63,6 +63,8 @@ separated by spaces:
test soft dirty page bit semantics
- pagemap
test pagemap_scan IOCTL
+- pfnmap
+ tests for VM_PFNMAP handling
- cow
test copy-on-write semantics
- thp
@@ -472,6 +474,8 @@ fi
CATEGORY="pagemap" run_test ./pagemap_ioctl
+CATEGORY="pfnmap" run_test ./pfnmap
+
# COW tests
CATEGORY="cow" run_test ./cow
--
2.49.0
Use `kernel::ffi::c_void` instead of `core::ffi::c_void` for consistency
and to centralize abstraction.
Since `kernel::ffi::c_void` is a straightforward re-export of
`core::ffi::c_void`, both are functionally equivalent. However, using
`kernel::ffi::c_void` improves consistency across the kernel's Rust code
and provides a unified reference point in case the definition ever needs
to change, even if such a change is unlikely.
Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com>
Link: https://rust-for-linux.zulipchat.com/#narrow/channel/288089/topic/x/near/52…
---
So in sum, I believe it's reasonable to keep the diff unchanged... but
I'm happy to adjust if you'd prefer a different approach.
---
Changes in v2:
- Add "Link" tag to the related discussion on Zulip
- Reword the commit message to clarify `kernel::ffi::c_void` is a re-export
- Link to v1: https://lore.kernel.org/rust-for-linux/20250526162429.1114862-1-y.j3ms.n@gm…
---
rust/kernel/kunit.rs | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 81833a687b75..bd6fc712dd79 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -6,7 +6,8 @@
//!
//! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html>
-use core::{ffi::c_void, fmt};
+use core::fmt;
+use kernel::ffi::c_void;
/// Prints a KUnit error-level message.
///
base-commit: f4daa80d6be7d3c55ca72a8e560afc4e21f886aa
--
2.39.5
Use a match expression with slice patterns instead of length checks and
indexing. The result is more idiomatic, which is a better example for
future Rust code authors.
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
scripts/rustdoc_test_gen.rs | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 1ca253594d38..a3dc251221e0 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -85,24 +85,25 @@ fn find_candidates(
}
}
- assert!(
- valid_paths.len() > 0,
- "No path candidates found for `{file}`. This is likely a bug in the build system, or some \
- files went away while compiling."
- );
-
- if valid_paths.len() > 1 {
- eprintln!("Several path candidates found:");
- for path in valid_paths {
- eprintln!(" {path:?}");
+ match valid_paths.as_slice() {
+ [] => panic!(
+ "No path candidates found for `{file}`. This is likely a bug in the build system, or \
+ some files went away while compiling."
+ ),
+ [valid_path] => {
+ valid_path.to_str().unwrap()
+ }
+ valid_paths => {
+ eprintln!("Several path candidates found:");
+ for path in valid_paths {
+ eprintln!(" {path:?}");
+ }
+ panic!(
+ "Several path candidates found for `{file}`, please resolve the ambiguity by \
+ renaming a file or folder."
+ );
}
- panic!(
- "Several path candidates found for `{file}`, please resolve the ambiguity by renaming \
- a file or folder."
- );
}
-
- valid_paths[0].to_str().unwrap()
}
fn main() {
---
base-commit: bfc3cd87559bc593bb32bb1482f9cae3308b6398
change-id: 20250527-idiomatic-match-slice-26a79d100e4d
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>