From: "Paul E. McKenney" <paulmck(a)kernel.org>
Mixing different flavors of RCU readers is forbidden, for example, you
should not use srcu_read_lock() and srcu_read_lock_nmisafe() on the same
srcu_struct structure. There are checks for this, but these checks are
not tested on a regular basis. This commit therefore adds such tests
to srcu_lockdep.sh.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com>
---
.../selftests/rcutorture/bin/srcu_lockdep.sh | 31 +++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh
index b94f6d3445c6..208be7d09a61 100755
--- a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh
+++ b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh
@@ -79,6 +79,37 @@ do
done
done
+# Test lockdep-enabled testing of mixed SRCU readers.
+for val in 0x1 0xf
+do
+ err=
+ tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --kconfig "CONFIG_FORCE_NEED_SRCU_NMI_SAFE=y" --bootargs "rcutorture.reader_flavor=$val" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1
+ ret=$?
+ mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val"
+ if ! grep -q '^CONFIG_PROVE_LOCKING=y' .config
+ then
+ echo "rcu_torture_init_srcu_lockdep:Error: CONFIG_PROVE_LOCKING disabled in rcutorture SRCU-P scenario"
+ nerrs=$((nerrs+1))
+ err=1
+ fi
+ if test "$val" -eq 0xf && test "$ret" -eq 0
+ then
+ err=1
+ echo -n Unexpected success for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
+ fi
+ if test "$val" -eq 0x1 && test "$ret" -ne 0
+ then
+ err=1
+ echo -n Unexpected failure for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
+ fi
+ if test -n "$err"
+ then
+ grep "rcu_torture_init_srcu_lockdep: test_srcu_lockdep = " "$RCUTORTURE/res/$ds/$val/SRCU-P/console.log" | sed -e 's/^.*rcu_torture_init_srcu_lockdep://' >> "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
+ cat "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
+ nerrs=$((nerrs+1))
+ fi
+done
+
# Set up exit code.
if test "$nerrs" -ne 0
then
--
2.43.0
From: "Paul E. McKenney" <paulmck(a)kernel.org>
The srcu_lockdep.sh currently blindly trusts the rcutorture SRCU-P
scenario to build its kernel with lockdep enabled. Of course, this
dependency might not be obvious to someone rebalancing SRCU scenarios.
This commit therefore adds code to srcu_lockdep.sh that verifies that
the .config file has lockdep enabled.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com>
---
.../testing/selftests/rcutorture/bin/srcu_lockdep.sh | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh
index 2db12c5cad9c..b94f6d3445c6 100755
--- a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh
+++ b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh
@@ -39,8 +39,9 @@ do
shift
done
-err=
nerrs=0
+
+# Test lockdep's handling of deadlocks.
for d in 0 1
do
for t in 0 1 2
@@ -52,6 +53,12 @@ do
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --kconfig "CONFIG_FORCE_NEED_SRCU_NMI_SAFE=y" --bootargs "rcutorture.test_srcu_lockdep=$val rcutorture.reader_flavor=0x2" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1
ret=$?
mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val"
+ if ! grep -q '^CONFIG_PROVE_LOCKING=y' .config
+ then
+ echo "rcu_torture_init_srcu_lockdep:Error: CONFIG_PROVE_LOCKING disabled in rcutorture SRCU-P scenario"
+ nerrs=$((nerrs+1))
+ err=1
+ fi
if test "$d" -ne 0 && test "$ret" -eq 0
then
err=1
@@ -71,6 +78,8 @@ do
done
done
done
+
+# Set up exit code.
if test "$nerrs" -ne 0
then
exit 1
--
2.43.0
From: Steven Rostedt <rostedt(a)goodmis.org>
Running the following commands was broken:
# cd /sys/kernel/tracing
# echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter
# echo 1 > events/syscalls/sys_enter_openat/enable
# ls /proc/$$/maps
# cat trace
And would produce nothing when it should have produced something like:
ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0)
Add a test to check this case so that it will be caught if it breaks
again.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Shuah, I'm Cc'ing you on this for your information, but I'll take it
through my tree as it will be attached with the fix, as it will fail
without it.
.../test.d/filter/event-filter-function.tc | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 118247b8dd84..ab449a2cea8c 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -80,6 +80,25 @@ if [ $misscnt -gt 0 ]; then
exit_fail
fi
+# Check strings too
+if [ -f events/syscalls/sys_enter_openat/filter ]; then
+ echo "filename.ustring ~ \"*test.d*\"" > events/syscalls/sys_enter_openat/filter
+ echo 1 > events/syscalls/sys_enter_openat/enable
+ echo 1 > tracing_on
+ ls /bin/sh
+ nocnt=`grep openat trace | wc -l`
+ ls $TEST_DIR
+ echo 0 > tracing_on
+ hitcnt=`grep openat trace | wc -l`;
+ echo 0 > events/syscalls/sys_enter_openat/enable
+ if [ $nocnt -gt 0 ]; then
+ exit_fail
+ fi
+ if [ $hitcnt -eq 0 ]; then
+ exit_fail
+ fi
+fi
+
reset_events_filter
exit 0
--
2.47.2
This series is a follow-up to [1], which adds mTHP support to khugepaged.
mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl
configs to make sense. Without it global="defer" and mTHP="inherit" case
is "undefined" behavior.
We've seen cases were customers switching from RHEL7 to RHEL8 see a
significant increase in the memory footprint for the same workloads.
Through our investigations we found that a large contributing factor to
the increase in RSS was an increase in THP usage.
For workloads like MySQL, or when using allocators like jemalloc, it is
often recommended to set /transparent_hugepages/enabled=never. This is
in part due to performance degradations and increased memory waste.
This series introduces enabled=defer, this setting acts as a middle
ground between always and madvise. If the mapping is MADV_HUGEPAGE, the
page fault handler will act normally, making a hugepage if possible. If
the allocation is not MADV_HUGEPAGE, then the page fault handler will
default to the base size allocation. The caveat is that khugepaged can
still operate on pages thats not MADV_HUGEPAGE.
This allows for three things... one, applications specifically designed to
use hugepages will get them, and two, applications that don't use
hugepages can still benefit from them without aggressively inserting
THPs at every possible chance. This curbs the memory waste, and defers
the use of hugepages to khugepaged. Khugepaged can then scan the memory
for eligible collapsing. Lastly there is the added benefit for those who
want THPs but experience higher latency PFs. Now you can get base page
performance at the PF handler and Hugepage performance for those mappings
after they collapse.
Admins may want to lower max_ptes_none, if not, khugepaged may
aggressively collapse single allocations into hugepages.
TESTING:
- Built for x86_64, aarch64, ppc64le, and s390x
- selftests mm
- In [1] I provided a script [2] that has multiple access patterns
- lots of general use.
- redis testing. This test was my original case for the defer mode. What I
was able to prove was that THP=always leads to increased max_latency
cases; hence why it is recommended to disable THPs for redis servers.
However with 'defer' we dont have the max_latency spikes and can still
get the system to utilize THPs. I further tested this with the mTHP
defer setting and found that redis (and probably other jmalloc users)
can utilize THPs via defer (+mTHP defer) without a large latency
penalty and some potential gains. I uploaded some mmtest results
here[3] which compares:
stock+thp=never
stock+(m)thp=always
khugepaged-mthp + defer (max_ptes_none=64)
The results show that (m)THPs can cause some throughput regression in
some cases, but also has gains in other cases. The mTHP+defer results
have more gains and less losses over the (m)THP=always case.
V4 Changes:
- Minor Documentation fixes
- rebased the dependent series [1] onto mm-unstable
commit 0e68b850b1d3 ("vmalloc: use atomic_long_add_return_relaxed()")
V3 Changes:
- moved some Documentation to the other series and merged the remaining
Documentation updates into one
V2 Changes:
- rebase changes ontop mTHP khugepaged support series
- Fix selftests parsing issue
- add mTHP defer option
- add mTHP defer Documentation
[1] - https://lore.kernel.org/lkml/20250417000238.74567-1-npache@redhat.com/
[2] - https://gitlab.com/npache/khugepaged_mthp_test
[3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.h…
Nico Pache (4):
mm: defer THP insertion to khugepaged
mm: document (m)THP defer usage
khugepaged: add defer option to mTHP options
selftests: mm: add defer to thp setting parser
Documentation/admin-guide/mm/transhuge.rst | 31 +++++++---
include/linux/huge_mm.h | 18 +++++-
mm/huge_memory.c | 69 +++++++++++++++++++---
mm/khugepaged.c | 10 ++--
tools/testing/selftests/mm/thp_settings.c | 1 +
tools/testing/selftests/mm/thp_settings.h | 1 +
6 files changed, 107 insertions(+), 23 deletions(-)
--
2.48.1
Greetings:
Welcome to v2.
This series fixes netdevsim to correctly set the NAPI ID on the skb.
This is helpful for writing tests around features that use
SO_INCOMING_NAPI_ID.
In addition to the netdevsim fix in patch 1, patches 2-4 do some self
test refactoring and add a test for NAPI IDs. The test itself (patch 4)
introduces a C helper because apparently python doesn't have
socket.SO_INCOMING_NAPI_ID.
Thanks,
Joe
v2:
- No longer an RFC
- Minor whitespace change in patch 1 (no functional change).
- Patches 2-4 new in v2
rfcv1: https://lore.kernel.org/netdev/20250329000030.39543-1-jdamato@fastly.com/
Joe Damato (4):
netdevsim: Mark NAPI ID on skb in nsim_rcv
selftests: drv-net: Factor out ksft C helpers
selftests: net: Allow custom net ns paths
selftests: drv-net: Test that NAPI ID is non-zero
drivers/net/netdevsim/netdev.c | 2 +
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 6 +-
tools/testing/selftests/drivers/net/ksft.h | 56 +++++++++++++
.../testing/selftests/drivers/net/napi_id.py | 24 ++++++
.../selftests/drivers/net/napi_id_helper.c | 83 +++++++++++++++++++
.../selftests/drivers/net/xdp_helper.c | 49 +----------
tools/testing/selftests/net/lib/py/netns.py | 4 +-
8 files changed, 175 insertions(+), 50 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/ksft.h
create mode 100755 tools/testing/selftests/drivers/net/napi_id.py
create mode 100644 tools/testing/selftests/drivers/net/napi_id_helper.c
base-commit: bbfc077d457272bcea4f14b3a28247ade99b196d
--
2.43.0
The commit df6f8c4d72ae ("selftests/pcie_bwctrl: Add
'set_pcie_speed.sh' to TEST_PROGS") added set_pcie_speed.sh into
TEST_PROGS but that script is a helper that is only being called by
set_pcie_cooling_state.sh, not a test case itself. When
set_pcie_speed.sh is in TEST_PROGS, selftest harness will execute also
it leading to bwctrl selftest errors:
# selftests: pcie_bwctrl: set_pcie_speed.sh
# cat: /cur_state: No such file or directory
not ok 2 selftests: pcie_bwctrl: set_pcie_speed.sh # exit=1
Place set_pcie_speed.sh into TEST_FILES instead to have it included
into installed test files but not execute it from the test harness.
Fixes: df6f8c4d72ae ("selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
I'm sorry I didn't realize this while the fix was submitted, I'm not that
familiar with all the kselftest harness variables and the justification
given for the fix sounded valid enough to raise any alarm bells in my
mind that something would be off with the approach the fix patch used.
tools/testing/selftests/pcie_bwctrl/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/pcie_bwctrl/Makefile b/tools/testing/selftests/pcie_bwctrl/Makefile
index 48ec048f47af..277f92f9d753 100644
--- a/tools/testing/selftests/pcie_bwctrl/Makefile
+++ b/tools/testing/selftests/pcie_bwctrl/Makefile
@@ -1,2 +1,3 @@
-TEST_PROGS = set_pcie_cooling_state.sh set_pcie_speed.sh
+TEST_PROGS = set_pcie_cooling_state.sh
+TEST_FILES = set_pcie_speed.sh
include ../lib.mk
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
--
2.39.5
If we try to access argument which is pointer to const void, it's an
UNKNOWN type, verifier will fail to load.
Use is_void_or_int_ptr to check if type is void or int pointer.
And fix selftests.
---
KaFai Wan (2):
bpf: Allow access to const void pointer arguments in tracing programs
selftests/bpf: Add test to access const void pointer argument in
tracing program
kernel/bpf/btf.c | 6 +++---
net/bpf/test_run.c | 8 +++++++-
.../selftests/bpf/progs/verifier_btf_ctx_access.c | 12 ++++++++++++
3 files changed, 22 insertions(+), 4 deletions(-)
Changelog:
v1->v2: Addressed comments from jirka
- use btf_type_is_void to check if type is void
- merge is_void_ptr and is_int_ptr to is_void_or_int_ptr
- fix selftests
Some details in here:
https://lore.kernel.org/all/20250412170626.3638516-1-kafai.wan@hotmail.com/
--
2.43.0
This patch series was motivated by fixing a few bugs in the bonding
driver related to xfrm state migration on device failover.
struct xfrm_dev_offload has two net_device pointers: dev and real_dev.
The first one is the device the xfrm_state is offloaded on and the
second one is used by the bonding driver to manage the underlying device
xfrm_states are actually offloaded on. When bonding isn't used, the two
pointers are the same.
This causes confusion in drivers: Which device pointer should they use?
If they want to support bonding, they need to only use real_dev and
never look at dev.
Furthermore, real_dev is used without proper locking from multiple code
paths and changing it is dangerous. See commit [1] for example.
This patch series clears things out by removing all uses of real_dev
from outside the bonding driver.
Then, the bonding driver is refactored to fix a couple of long standing
races and the original bug which motivated this patch series.
[1] commit f8cde9805981 ("bonding: fix xfrm real_dev null pointer
dereference")
v2 -> v3:
Added a comment with locking expectations for real_dev.
Removed unnecessary bond variable from bond_ipsec_del_sa().
v1 -> v2:
Added missing kdoc for various functions.
Made bond_ipsec_del_sa() use xso.real_dev instead of curr_active_slave.
Cosmin Ratiu (6):
Cleaning up unnecessary uses of xso.real_dev:
net/mlx5: Avoid using xso.real_dev unnecessarily
xfrm: Use xdo.dev instead of xdo.real_dev
xfrm: Remove unneeded device check from validate_xmit_xfrm
Refactoring device operations to get an explicit device pointer:
xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
Fixing a bonding xfrm state migration bug:
bonding: Mark active offloaded xfrm_states
Fixing long standing races in bonding:
bonding: Fix multiple long standing offload races
Documentation/networking/xfrm_device.rst | 10 +-
drivers/net/bonding/bond_main.c | 119 +++++++++---------
.../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 20 +--
.../inline_crypto/ch_ipsec/chcr_ipsec.c | 18 ++-
.../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 41 +++---
drivers/net/ethernet/intel/ixgbevf/ipsec.c | 21 ++--
.../marvell/octeontx2/nic/cn10k_ipsec.c | 18 +--
.../mellanox/mlx5/core/en_accel/ipsec.c | 28 ++---
.../mellanox/mlx5/core/en_accel/ipsec.h | 1 +
.../net/ethernet/netronome/nfp/crypto/ipsec.c | 11 +-
drivers/net/netdevsim/ipsec.c | 15 ++-
include/linux/netdevice.h | 10 +-
include/net/xfrm.h | 11 ++
net/xfrm/xfrm_device.c | 13 +-
net/xfrm/xfrm_state.c | 16 +--
15 files changed, 185 insertions(+), 167 deletions(-)
--
2.45.0