On ARM64, when running with --configs '36*SRCU-P', I noticed that only 1 instance
instead of 36 for starting.
Fix it by checking for Image files, instead of bzImage which ARM does
not seem to have. With this I see all 36 instances running at the same
time in the batch.
Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com>
---
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index ad79784e552d..957800c9ffba 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -73,7 +73,7 @@ config_override_param "$config_dir/CFcommon.$(uname -m)" KcList \
cp $T/KcList $resdir/ConfigFragment
base_resdir=`echo $resdir | sed -e 's/\.[0-9]\+$//'`
-if test "$base_resdir" != "$resdir" && test -f $base_resdir/bzImage && test -f $base_resdir/vmlinux
+if test "$base_resdir" != "$resdir" && (test -f $base_resdir/bzImage || test -f $base_resdir/Image) && test -f $base_resdir/vmlinux
then
# Rerunning previous test, so use that test's kernel.
QEMU="`identify_qemu $base_resdir/vmlinux`"
--
2.43.0
I was writing a benchmark based on sockmap + TCP and discovered several
issues:
1. When EAGAIN occurs, the direction of skb is incorrect, causing data
loss when retry.
2. When sending partial data, the offset is not recorded, leading to
duplicate data being sent when retry.
3. An unexpected BUG_ON() judgment in skb_linearize is triggered.
4. The memory of psock->ingress_skb is not limited by the socket buffer
and memcg.
Issues 1, 2, and 3 are described in each patch's commit message.
Regarding issue 4, this patchset does not cover it as it is difficult to
handle in practice, and I am still working on it.
Here is a brief description of the issue:
When using sockmap to skb/stream redirect, if the receiving end does not
perform read operations, all data will be buffered in ingress_skb.
For example:
'''
// set memory limit to 50G
cgcreate -g memory:myGroup
cgset -r memory.max="5000M" myGroup
// start benchmark and disable consumer from reading
cgexec -g "memory:myGroup" ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --delay-consumer=-1 -d 100
Iter 0 ( 29.179us): Send Speed 2668.548 MB/s (20360.406 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
Iter 1 ( -7.237us): Send Speed 2694.467 MB/s (20557.149 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
Iter 2 ( -1.918us): Send Speed 2693.404 MB/s (20548.039 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
Iter 3 ( -0.684us): Send Speed 2693.138 MB/s (20548.014 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
Iter 4 ( 7.879us): Send Speed 2698.620 MB/s (20588.838 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
Iter 5 ( -3.224us): Send Speed 2696.553 MB/s (20573.066 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
Iter 6 ( -5.409us): Send Speed 2699.705 MB/s (20597.111 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
Iter 7 ( -0.439us): Send Speed 2699.691 MB/s (20597.009 calls/s), ... Rcv Speed 0.000 MB/s ( 0.000 calls/s)
...
// memory usage are not limited
cat /proc/slabinfo | grep skb
skbuff_small_head 11824024 11824024 704 46 8 : tunables 0 0 0 : slabdata 257044 257044 0
skbuff_fclone_cache 11822080 11822080 512 32 4 : tunables 0 0 0 : slabdata 369440 369440 0
'''
Thus, a simple socket in a large file upload/download model can eat the
entire OS memory.
We must charge the skb memory to psock->sk, and if we do not want losing
skb, we need to feedback the error info to read_sock/read_skb when the
enqueue operation of psock->ingress_skb fails.
---
My another patch related to stability also requires maintainers to spare
some time from their busy schedules for review.
https://lore.kernel.org/bpf/20250317092257.68760-1-jiayuan.chen@linux.dev/T…
Jiayuan Chen (4):
bpf, sockmap: Fix data lost during EAGAIN retries
bpf, sockmap: fix duplicated data transmission
bpf, sockmap: Fix panic when calling skb_linearize
selftest/bpf/benchs: Add benchmark for sockmap usage
net/core/skmsg.c | 48 +-
tools/testing/selftests/bpf/Makefile | 2 +
tools/testing/selftests/bpf/bench.c | 4 +
.../selftests/bpf/benchs/bench_sockmap.c | 599 ++++++++++++++++++
.../selftests/bpf/progs/bench_sockmap_prog.c | 65 ++
5 files changed, 697 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/benchs/bench_sockmap.c
create mode 100644 tools/testing/selftests/bpf/progs/bench_sockmap_prog.c
--
2.47.1
Syzkaller reported this issue [1].
The current sockmap has a dependency on sk_socket in both read and write
stages, but there is a possibility that sk->sk_socket is released during
the process, leading to panic situations. For a detailed reproduction,
please refer to the description in the v2:
https://lore.kernel.org/bpf/20250228055106.58071-1-jiayuan.chen@linux.dev/
The corresponding fix approaches are described in the commit messages of
each patch.
By the way, the current sockmap lacks statistical information, especially
global statistics, such as the number of successful or failed rx and tx
operations. These statistics cannot be obtained from the socket interface
itself.
These data will be of great help in troubleshooting issues and observing
sockmap behavior.
If the maintainer/reviewer does not object, I think we can provide these
statistical information in the future, either through proc/trace/bpftool.
[1] https://syzkaller.appspot.com/bug?extid=dd90a702f518e0eac072
---
v3 -> v4:
1. Rebase on -rc.
2. Incorporated valuable feedback from the v3 thread into the commit
message, making it easier to review.
https://lore.kernel.org/all/20250317092257.68760-3-jiayuan.chen@linux.dev/
v2 -> v3:
1. Michal Luczaj reported similar race issue under sockmap sending path.
2. Rcu lock is conflict with mutex_lock in unix socket read implementation.
https://lore.kernel.org/bpf/20250228055106.58071-1-jiayuan.chen@linux.dev/
v1 -> v2:
1. Add Fixes tag.
2. Extend selftest of edge case for TCP/UDP sockets.
3. Add Reviewed-by and Acked-by tag.
https://lore.kernel.org/bpf/20250226132242.52663-1-jiayuan.chen@linux.dev/T…
Jiayuan Chen (3):
bpf, sockmap: avoid using sk_socket after free when sending
bpf, sockmap: avoid using sk_socket after free when reading
selftests/bpf: Add edge case tests for sockmap
net/core/skmsg.c | 22 ++++++-
.../selftests/bpf/prog_tests/socket_helpers.h | 13 +++-
.../selftests/bpf/prog_tests/sockmap_basic.c | 60 +++++++++++++++++++
3 files changed, 91 insertions(+), 4 deletions(-)
--
2.47.1
We can reproduce the issue using the existing test program:
'./test_sockmap --ktls'
Or use the selftest I provided, which will cause a panic:
------------[ cut here ]------------
kernel BUG at lib/iov_iter.c:629!
PKRU: 55555554
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? iov_iter_revert+0x178/0x180
? iov_iter_revert+0x178/0x180
? do_error_trap+0x7d/0x110
? iov_iter_revert+0x178/0x180
? exc_invalid_op+0x50/0x70
? iov_iter_revert+0x178/0x180
? asm_exc_invalid_op+0x1a/0x20
? iov_iter_revert+0x178/0x180
? iov_iter_revert+0x5c/0x180
tls_sw_sendmsg_locked.isra.0+0x794/0x840
tls_sw_sendmsg+0x52/0x80
? inet_sendmsg+0x1f/0x70
__sys_sendto+0x1cd/0x200
? find_held_lock+0x2b/0x80
? syscall_trace_enter+0x140/0x270
? __lock_release.isra.0+0x5e/0x170
? find_held_lock+0x2b/0x80
? syscall_trace_enter+0x140/0x270
? lockdep_hardirqs_on_prepare+0xda/0x190
? ktime_get_coarse_real_ts64+0xc2/0xd0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x90/0x170
1. It looks like the issue started occurring after bpf being introduced to
ktls and later the addition of assertions to iov_iter has caused a panic.
If my fix tag is incorrect, please assist me in correcting the fix tag.
2. I make minimal changes for now, it's enough to make ktls work
correctly.
---
v1->v2: Added more content to the commit message
https://lore.kernel.org/all/20250123171552.57345-1-mrpre@163.com/#r
---
Jiayuan Chen (2):
bpf: fix ktls panic with sockmap
selftests/bpf: add ktls selftest
net/tls/tls_sw.c | 8 +-
.../selftests/bpf/prog_tests/sockmap_ktls.c | 174 +++++++++++++++++-
.../selftests/bpf/progs/test_sockmap_ktls.c | 26 +++
3 files changed, 205 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
--
2.47.1
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.15-rc2
Fixes tpm2, futex, and mincore tests. Creates a dedicated .gitignore
for tpm2
Details:
selftests: tpm2: test_smoke: use POSIX-conformant expression operator
selftests/futex: futex_waitv wouldblock test should fail
selftests: tpm2: create a dedicated .gitignore
selftests/mincore: Allow read-ahead pages to reach the end of the file
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 0af2f6be1b4281385b618cb86ad946eded089ac8:
Linux 6.15-rc1 (2025-04-06 13:11:33 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.15-rc2
for you to fetch changes up to 197c1eaa7ba633a482ed7588eea6fd4aa57e08d4:
selftests/mincore: Allow read-ahead pages to reach the end of the file (2025-04-08 17:08:50 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.15-rc2
Fixes tpm2, futex, and mincore tests. Creates a dedicated .gitignore
for tpm2
Details:
selftests: tpm2: test_smoke: use POSIX-conformant expression operator
selftests/futex: futex_waitv wouldblock test should fail
selftests: tpm2: create a dedicated .gitignore
selftests/mincore: Allow read-ahead pages to reach the end of the file
----------------------------------------------------------------
Ahmed Salem (1):
selftests: tpm2: test_smoke: use POSIX-conformant expression operator
Edward Liaw (1):
selftests/futex: futex_waitv wouldblock test should fail
Khaled Elnaggar (1):
selftests: tpm2: create a dedicated .gitignore
Qiuxu Zhuo (1):
selftests/mincore: Allow read-ahead pages to reach the end of the file
tools/testing/selftests/.gitignore | 1 -
tools/testing/selftests/futex/functional/futex_wait_wouldblock.c | 2 +-
tools/testing/selftests/mincore/mincore_selftest.c | 3 ---
tools/testing/selftests/tpm2/.gitignore | 3 +++
tools/testing/selftests/tpm2/test_smoke.sh | 2 +-
5 files changed, 5 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/tpm2/.gitignore
----------------------------------------------------------------
On Tue, 8 Apr 2025 20:18:26 +0200 Matthieu Baerts wrote:
> On 02/04/2025 19:23, Stanislav Fomichev wrote:
> > Recent change [0] resulted in a "BUG: using __this_cpu_read() in
> > preemptible" splat [1]. PREEMPT kernels have additional requirements
> > on what can and can not run with/without preemption enabled.
> > Expose those constrains in the debug kernels.
>
> Good idea to suggest this to find more bugs!
>
> I did some quick tests on my side with our CI, and the MPTCP selftests
> seem to take a bit more time, but without impacting the results.
> Hopefully, there will be no impact in slower/busy environments :)
What kind of slow down do you see? I think we get up to 50% more time
spent in the longer tests. Not sure how bad is too bad.. I'm leaning
towards applying this to net-next and we can see if people running
on linux-next complain?
Let me CC kselftests, patch in question:
https://lore.kernel.org/all/20250402172305.1775226-1-sdf@fomichev.me/
A fix [1] came in that fixed the notrace_filter side of the subops processing
of the function graph tracer. When I started testing that fix, I discovered
that the many more functions were being enabled than were being traced.
The function graph infrastructure uses ftrace to hook to functions. It has
a single ftrace_ops to manage all the users of function graph. Each
individual user (tracing, bpf, fprobes, etc) has its own ftrace_ops to
track the functions it will have its callback called from. These
ftrace_ops are "subops" to the main ftrace_ops of the function graph
infrastructure.
Each ftrace_ops has a filter_hash and a notrace_hash that is defined as:
Only trace functions that are in the filter_hash but not in the
notrace_hash.
If the filter_hash is empty, it means to trace all functions.
If the notrace_hash is empty, it means do not disable any function.
The function graph main ftrace_ops needs to be a superset containing all
the functions to be traced by all the subops it has. The algorithm to
perform this merge was incorrect. It was merging the filter_hashes
of all the subops and taking the intersect of all the notrace_hashes
of the subops. But by taking the intersect of all the notrace_hashes
it ignored how those notrace_hashes are dependent on the associated
filter_hashes of each individual subops.
Instead, modify the algorithm to be a bit simpler and correct.
First, when adding a new subops, do not add the notrace_hash if the
filter_hash is not empty. Instead, just add the functions that are in the
filter_hash of the subops but not in the notrace_hash of the subops into the
main ops filter_hash. There's no reason to add anything to the main ops
notrace_hash for this case.
The notrace_hash of the main ops should only be non empty iff all subops
filter_hashes are empty (meaning to trace all functions) and all subops
notrace_hashes have the same functions.
That is, the main ops notrace_hash is empty if any subops filter_hash is
non empty.
The main ops notrace_hash only has content in it if all subops
filter_hashes are empty, and the content are only functions that intersect
all the subops notrace_hashes. If any subops notrace_hash is empty, then
so is the main ops notrace_hash.
[1] https://lore.kernel.org/all/20250408160258.48563-1-andybnac@gmail.com/
Steven Rostedt (2):
ftrace: Fix accounting of subop hashes
tracing/selftest: Add test to better test subops filtering of function graph
----
kernel/trace/ftrace.c | 314 ++++++++++++---------
.../ftrace/test.d/ftrace/fgraph-multi-filter.tc | 177 ++++++++++++
2 files changed, 354 insertions(+), 137 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi-filter.tc