- Linux-kselftest-mirror - lists.linaro.org

[PATCH v2 0/1] Add KUnit tests for llist

by Artur Alves

Hi all, This is part of a hackathon organized by LKCAMP[1], focused on writing tests using KUnit. We reached out a while ago asking for advice on what would be a useful contribution[2] and ended up choosing data structures that did not yet have tests. This patch adds tests for the llist data structure, defined in include/linux/llist.h, and is inspired by the KUnit tests for the doubly linked list in lib/list-test.c[3]. It is important to note that this patch depends on the patch referenced in [4], as it utilizes the newly created lib/tests/ subdirectory. [1] https://lkcamp.dev/about/ [2] https://lore.kernel.org/all/Zktnt7rjKryTh9-N@arch/ [3] https://elixir.bootlin.com/linux/latest/source/lib/list-test.c [4] https://lore.kernel.org/all/20240720181025.work.002-kees@kernel.org/ --- Changes in v2: - Add MODULE_DESCRIPTION() - Move the tests from lib/llist_kunit.c to lib/tests/llist_kunit.c - Change the license from "GPL v2" to "GPL" Artur Alves (1): lib/llist_kunit.c: add KUnit tests for llist lib/Kconfig.debug | 11 ++ lib/tests/Makefile | 1 + lib/tests/llist_kunit.c | 361 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 373 insertions(+) create mode 100644 lib/tests/llist_kunit.c -- 2.46.0

10 months

3
3
0 0

[PATCH v4] lib/math: Add int_pow test suite

by Luis Felipe Hernandez

Adds test suite for integer based power function. Signed-off-by: Luis Felipe Hernandez <luis.hernandez093(a)gmail.com> --- Changes in v4: - Address checkpatch warning and make kconfig description longer - Use GPL-2.0-only for consistency - Spelling fix fith -> fifth Changes in v3: - Fix compiler warning: explicitly define constant as unsigned int - Add changes in patch revisions Changes in v2: - Address review feedback - Add kconfig entry - Use correct dir and file convention for KUnit - Fix typo - Remove unused static_stub header - Refactor test suite to use paramerterized test cases - Add close to max allowable value to in large_result test case - Add test case with non-power of two exponent - Fix module license --- lib/Kconfig.debug | 9 ++++++ lib/math/Makefile | 1 + lib/math/tests/Makefile | 3 ++ lib/math/tests/int_pow_kunit.c | 52 ++++++++++++++++++++++++++++++++++ 4 files changed, 65 insertions(+) create mode 100644 lib/math/tests/Makefile create mode 100644 lib/math/tests/int_pow_kunit.c diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index a30c03a66172..0f98f73d4322 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -3051,3 +3051,12 @@ config RUST_KERNEL_DOCTESTS endmenu # "Rust" endmenu # Kernel hacking + +config INT_POW_TEST + tristate "Integer exponentiation (int_pow) test" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS + help + This option enables the KUnit test suite for the int_pow function, + which performs integer exponentiation. The test suite is designed to + verify that the implementation of int_pow correctly computes the power + of a given base raised to a given exponent. + + Enabling this option will include tests that check various scenarios + and edge cases to ensure the accuracy and reliability of the exponentiation + function. + + If unsure, say N diff --git a/lib/math/Makefile b/lib/math/Makefile index 91fcdb0c9efe..3c1f92a7459d 100644 --- a/lib/math/Makefile +++ b/lib/math/Makefile @@ -5,5 +5,6 @@ obj-$(CONFIG_CORDIC) += cordic.o obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o obj-$(CONFIG_RATIONAL) += rational.o +obj-$(CONFIG_INT_POW_TEST) += tests/int_pow_kunit.o obj-$(CONFIG_TEST_DIV64) += test_div64.o obj-$(CONFIG_RATIONAL_KUNIT_TEST) += rational-test.o diff --git a/lib/math/tests/Makefile b/lib/math/tests/Makefile new file mode 100644 index 000000000000..6a169123320a --- /dev/null +++ b/lib/math/tests/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-$(CONFIG_INT_POW_TEST) += int_pow_kunit.o diff --git a/lib/math/tests/int_pow_kunit.c b/lib/math/tests/int_pow_kunit.c new file mode 100644 index 000000000000..7b6a5ae70eb4 --- /dev/null +++ b/lib/math/tests/int_pow_kunit.c @@ -0,0 +1,52 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <kunit/test.h> +#include <linux/math.h> + +struct test_case_params { + u64 base; + unsigned int exponent; + u64 expected_result; + const char *name; +}; + +static const struct test_case_params params[] = { + { 64, 0, 1, "Power of zero" }, + { 64, 1, 64, "Power of one"}, + { 0, 5, 0, "Base zero" }, + { 1, 64, 1, "Base one" }, + { 2, 2, 4, "Two squared"}, + { 2, 3, 8, "Two cubed"}, + { 5, 5, 3125, "Five raised to the fifth power" }, + { U64_MAX, 1, U64_MAX, "Max base" }, + { 2, 63, 9223372036854775808, "Large result"}, +}; + +static void get_desc(const struct test_case_params *tc, char *desc) +{ + strscpy(desc, tc->name, KUNIT_PARAM_DESC_SIZE); +} + +KUNIT_ARRAY_PARAM(int_pow, params, get_desc); + +static void int_pow_test(struct kunit *test) +{ + const struct test_case_params *tc = (const struct test_case_params *)test->param_value; + + KUNIT_EXPECT_EQ(test, tc->expected_result, int_pow(tc->base, tc->exponent)); +} + +static struct kunit_case math_int_pow_test_cases[] = { + KUNIT_CASE_PARAM(int_pow_test, int_pow_gen_params), + {} +}; + +static struct kunit_suite int_pow_test_suite = { + .name = "math-int_pow", + .test_cases = math_int_pow_test_cases, +}; + +kunit_test_suites(&int_pow_test_suite); + +MODULE_DESCRIPTION("math.int_pow KUnit test suite"); +MODULE_LICENSE("GPL"); -- 2.46.0

10 months

2
2
0 0

[PATCH net-next 0/3] lan743x: This series of patches are for lan743x driver testing

by Mohan Prasad J

This series of patches are for testing the lan743x network driver. Testing comprises autonegotiation, speed, duplex and throughput checks. Tools such as ethtool, iperf3 are used in the testing process. Performance test is done for TCP streams at different speeds. Signed-off-by: Mohan Prasad J <mohan.prasad(a)microchip.com> Mohan Prasad J (3): selftests: lan743x: Add testfile for lan743x network driver selftests: lan743x: Add testcase to check speed and duplex state of lan743x selftests: lan743x: Add testcase to check throughput of lan743x MAINTAINERS | 2 + tools/testing/selftests/Makefile | 2 +- .../drivers/net/hw/microchip/lan743x/Makefile | 7 ++ .../net/hw/microchip/lan743x/lan743x.py | 117 ++++++++++++++++++ .../hw/microchip/lan743x/lib/py/__init__.py | 16 +++ 5 files changed, 143 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/drivers/net/hw/microchip/lan743x/Makefile create mode 100755 tools/testing/selftests/drivers/net/hw/microchip/lan743x/lan743x.py create mode 100644 tools/testing/selftests/drivers/net/hw/microchip/lan743x/lib/py/__init__.py -- 2.43.0

10 months

4
14
0 0

[PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests

by Reinette Chatre

The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald Rapids systems. The test failures result from the following two properties of these systems: 1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl MBA and MBM selftests measure memory traffic for which a hardcoded 250MB buffer has been sufficient so far. On platforms with L3 cache larger than the buffer, the buffer fits in the L3 cache and thus no/very little memory traffic is generated during the "memory bandwidth" tests. 2) Some platform features, for example RAS features or memory performance features that generate memory traffic may drive accesses that are counted differently by performance counters and MBM respectively, for instance generating "overhead" traffic which is not counted against any specific RMID. Until now these counting differences have always been "in the noise". On Emerald Rapids systems the maximum MBA throttling (10% memory bandwidth) throttles memory bandwidth to where memory accesses by these other platform features push the memory bandwidth difference between memory controller performance counters and resctrl (MBM) beyond the tests' hardcoded tolerance. Make the tests more robust against platform variations: 1) Let the buffer used by memory bandwidth tests be guided by the size of the L3 cache. 2) Larger buffers require longer initialization time before the buffer can be used to measurement. Rework the tests to ensure that buffer initialization is complete before measurements start. 3) Do not compare performance counters and MBM measurements at low bandwidth. The value of "low" is hardcoded to 750MiB based on measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake systems. This limit is not applicable to AMD systems since it only applies to the MBA and MBM tests that are isolated to Intel. [1] https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-plat… Reinette Chatre (6): selftests/resctrl: Fix sparse warnings selftests/resctrl: Ensure measurements skip initialization of default benchmark selftests/resctrl: Simplify benchmark parameter passing selftests/resctrl: Use cache size to determine "fill_buf" buffer size selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth selftests/resctrl: Keep results from first test run tools/testing/selftests/resctrl/cmt_test.c | 33 +-- tools/testing/selftests/resctrl/fill_buf.c | 19 +- tools/testing/selftests/resctrl/mba_test.c | 26 +- tools/testing/selftests/resctrl/mbm_test.c | 25 +- tools/testing/selftests/resctrl/resctrl.h | 57 +++-- .../testing/selftests/resctrl/resctrl_tests.c | 15 +- tools/testing/selftests/resctrl/resctrl_val.c | 223 +++++------------- 7 files changed, 152 insertions(+), 246 deletions(-) -- 2.46.0

10 months

2
29
0 0

[PATCH v2 00/19] random: Resolve circular include dependency and include <linux/percpu.h>

by Uros Bizjak

There were several attempts to resolve circular include dependency after the addition of percpu.h: 1c9df907da83 ("random: fix circular include dependency on arm64 after addition of percpu.h"), c0842fbc1b18 ("random32: move the pseudo-random 32-bit definitions to prandom.h") and finally d9f29deb7fe8 ("prandom: Remove unused include") that completely removes the inclusion of <linux/percpu.h>. Due to legacy reasons, <linux/random.h> includes <linux/prandom.h>, but with the commit entry remark: --quote-- A further cleanup step would be to remove this from <linux/random.h> entirely, and make people who use the prandom infrastructure include just the new header file. That's a bit of a churn patch, but grepping for "prandom_" and "next_pseudo_random32" "struct rnd_state" should catch most users. But it turns out that that nice cleanup step is fairly painful, because a _lot_ of code currently seems to depend on the implicit include of <linux/random.h>, which can currently come in a lot of ways, including such fairly core headfers as <linux/net.h>. So the "nice cleanup" part may or may never happen. --/quote-- We would like to include <linux/percpu.h> in <linux/prandom.h>. In [1] we would like to repurpose __percpu tag as a named address space qualifier, where __percpu macro uses defines from <linux/percpu.h>. The major roadblock to inclusion of <linux/percpu.h> is the above mentioned legacy inclusion of <linux/prandom.h> in <linux/random.h> that causes circular include dependency that prevents <linux/percpu.h> inclusion. This patch series is the "nice cleanup" part that: a) Substitutes the inclusion of <linux/random.h> with the inclusion of <linux/prandom.h> where needed (patches 1 - 17). b) Removes legacy inclusion of <linux/prandom.h> from <linux/random.h> (patch 18). c) Includes <linux/percpu.h> in <linux/prandom.h> (patch 19). The whole series was tested by compiling the kernel for x86_64 allconfig and some popular architectures, namely arm64 defconfig, powerpc defconfig and loongarch defconfig. [1] https://lore.kernel.org/lkml/20240812115945.484051-4-ubizjak@gmail.com/ Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: x86(a)kernel.org Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Jani Nikula <jani.nikula(a)linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com> Cc: Tvrtko Ursulin <tursulin(a)ursulin.net> Cc: David Airlie <airlied(a)gmail.com> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Hans Verkuil <hverkuil(a)xs4all.nl> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Miquel Raynal <miquel.raynal(a)bootlin.com> Cc: Richard Weinberger <richard(a)nod.at> Cc: Vignesh Raghavendra <vigneshr(a)ti.com> Cc: Eric Biggers <ebiggers(a)kernel.org> Cc: "Theodore Y. Ts'o" <tytso(a)mit.edu> Cc: Jaegeuk Kim <jaegeuk(a)kernel.org> Cc: "Jason A. Donenfeld" <Jason(a)zx2c4.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Hannes Reinecke <hare(a)suse.de> Cc: "James E.J. Bottomley" <James.Bottomley(a)HansenPartnership.com> Cc: "Martin K. Petersen" <martin.petersen(a)oracle.com> Cc: Alexei Starovoitov <ast(a)kernel.org> Cc: Daniel Borkmann <daniel(a)iogearbox.net> Cc: John Fastabend <john.fastabend(a)gmail.com> Cc: Andrii Nakryiko <andrii(a)kernel.org> Cc: Martin KaFai Lau <martin.lau(a)linux.dev> Cc: Eduard Zingerman <eddyz87(a)gmail.com> Cc: Song Liu <song(a)kernel.org> Cc: Yonghong Song <yonghong.song(a)linux.dev> Cc: KP Singh <kpsingh(a)kernel.org> Cc: Stanislav Fomichev <sdf(a)fomichev.me> Cc: Hao Luo <haoluo(a)google.com> Cc: Jiri Olsa <jolsa(a)kernel.org> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Brendan Higgins <brendan.higgins(a)linux.dev> Cc: David Gow <davidgow(a)google.com> Cc: Rae Moar <rmoar(a)google.com> Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Eric Dumazet <edumazet(a)google.com> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Paolo Abeni <pabeni(a)redhat.com> Cc: Jiri Pirko <jiri(a)resnulli.us> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org> Cc: Stephen Hemminger <stephen(a)networkplumber.org> Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Cong Wang <xiyou.wangcong(a)gmail.com> Cc: Kent Overstreet <kent.overstreet(a)linux.dev> --- v2: - Reword commit messages to mention the removal of legacy inclusion of <linux/prandom.h> from <linux/random.h> - Add missing substitution in crypto/testmgr.c (reported by kernel test robot) - Add Acked-by:. Uros Bizjak (19): x86/kaslr: Include <linux/prandom.h> instead of <linux/random.h> crypto: testmgr: Include <linux/prandom.h> instead of <linux/random.h> drm/i915/selftests: Include <linux/prandom.h> instead of <linux/random.h> drm/lib: Include <linux/prandom.h> instead of <linux/random.h> media: vivid: Include <linux/prandom.h> in vivid-vid-cap.c mtd: tests: Include <linux/prandom.h> instead of <linux/random.h> fscrypt: Include <linux/once.h> in fs/crypto/keyring.c scsi: libfcoe: Include <linux/prandom.h> instead of <linux/random.h> bpf: Include <linux/prandom.h> instead of <linux/random.h> lib/interval_tree_test.c: Include <linux/prandom.h> instead of <linux/random.h> kunit: string-stream-test: Include <linux/prandom.h> instead of <linux/random.h> random32: Include <linux/prandom.h> instead of <linux/random.h> lib/rbtree-test: Include <linux/prandom.h> instead of <linux/random.h> bpf/tests: Include <linux/prandom.h> instead of <linux/random.h> lib/test_parman: Include <linux/prandom.h> instead of <linux/random.h> lib/test_scanf: Include <linux/prandom.h> instead of <linux/random.h> netem: Include <linux/prandom.h> in sch_netem.c random: Do not include <linux/prandom.h> in <linux/random.h> prandom: Include <linux/percpu.h> in <linux/prandom.h> arch/x86/mm/kaslr.c | 2 +- crypto/testmgr.c | 2 +- drivers/gpu/drm/i915/selftests/i915_gem.c | 2 +- drivers/gpu/drm/i915/selftests/i915_random.h | 2 +- drivers/gpu/drm/i915/selftests/scatterlist.c | 2 +- drivers/gpu/drm/lib/drm_random.h | 2 +- drivers/media/test-drivers/vivid/vivid-vid-cap.c | 1 + drivers/mtd/tests/oobtest.c | 2 +- drivers/mtd/tests/pagetest.c | 2 +- drivers/mtd/tests/subpagetest.c | 2 +- fs/crypto/keyring.c | 1 + include/linux/prandom.h | 1 + include/linux/random.h | 7 ------- include/scsi/libfcoe.h | 2 +- kernel/bpf/core.c | 2 +- lib/interval_tree_test.c | 2 +- lib/kunit/string-stream-test.c | 1 + lib/random32.c | 2 +- lib/rbtree_test.c | 2 +- lib/test_bpf.c | 2 +- lib/test_parman.c | 2 +- lib/test_scanf.c | 2 +- net/sched/sch_netem.c | 1 + 23 files changed, 22 insertions(+), 24 deletions(-) -- 2.46.0

10 months

1
19
0 0

[PATCH net-next v24 00/13] Device Memory TCP

by Mina Almasry

v24: https://patchwork.kernel.org/project/netdevbpf/list/?series=884556&state=* ==== Changes: - Fix failing ynl regen error. - Error path fixes & extack error messages in dmabuf binding. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v24/ v23: https://patchwork.kernel.org/project/netdevbpf/list/?series=882978&state=* ==== Fixing relatively minor issues called out in v22. (thanks again!) Mostly code cleanups, extack error messages, and minor reworks. Nothing major really changed, so the exact changes per commit is called in the commit messages. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v23/ v22: https://patchwork.kernel.org/project/netdevbpf/list/?series=881158&state=* ==== v22 aims to resolve the pending issue pointed to in v21, which is the interaction with xdp. In this series I rebase on top of the minor refactor which refactors propagating xdp configuration to slave devices: https://patchwork.kernel.org/project/netdevbpf/list/?series=881994&state=* I then disable setting xdp on devices using memory providers, and propagating xdp configuration to devices using memory providers. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v22/ v21: https://patchwork.kernel.org/project/netdevbpf/list/?series=880735&state=* ==== v20 addressed some comments and resolved a test failure, but introduced an unfortunate build error with a config edge case I wasn't testing. v21 simply resolves that error. Major Changes: - Resolve build error with CONFIG_PAGE_POOL=n && CONFIG_NET=y Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v21/ v20: https://patchwork.kernel.org/project/netdevbpf/list/?series=879373&state=* ==== v20 aims to resolve a couple of bug reports against v19, and addresses some review comments around the page_pool_check_memory_provider mechanism. Major changes: - Test edge cases such as header split disabled in selftest. - Change `offset = 0` back to `offset = offset - start` to resolve issue found in RX path by Taehee (thanks!) - Address a few comments around page_pool_check_memory_provider() from Pavel & Jakub. - Removed some unnecessary includes across various patches in the series. - Removed unnecessary EXPORT_SYMBOL(page_pool_mem_providers) (Jakub). - Fix regression caused by incorrect dev_get_max_mp_channel check, along with rename (Jakub). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v20/ v19: https://patchwork.kernel.org/project/netdevbpf/list/?series=876852&state=* ==== v18 got a thorough review (thanks!), and this iteration addresses the feedback. Major changes: - Prevent deactivating mp bound queues. - Prevent installing xdp on mp bound netdevs, or installing mps on xdp installed netdevs. - Fix corner cases in netlink API vis-a-vis missing attributes. - Iron out the unreadable netmem driver support story. To be honest, the conversation with Jakub & Pavel got a bit confusing for me. I've implemented an approach in this set that makes sense to me, and AFAICT, addresses the requirements. It may be good as-is, or it may be a conversation starter/continuer. To be honest IMO there are many ways to skin this cat and I don't see an extremely strong reason to go for one approach over another. Here is one approach you may like. - Don't reset niov dma_addr on allocation & free. - Add some tests to the selftest that catches some of the issues around missing netlink attributes or deactivating mp-bound queues. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v19/ v18: https://patchwork.kernel.org/project/netdevbpf/list/?series=874848&state=* ==== v17 got minor feedback: (a) to beef up the description on patch 1 and (b) to remove the leading underscores in the header definition. I applied (a). (b) seems to be against current conventions so I did not apply before further discussion. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v17: https://patchwork.kernel.org/project/netdevbpf/list/?series=869900&state=* ==== v16 also got a very thorough review and some testing (thanks again!). Thes version addresses all the concerns reported on v15, in terms of feedback and issues reported. Major changes: - Use ASSERT_RTNL. - Moved around some of the page_pool helpers definitions so I can hide some netmem helpers in private files as Jakub suggested. - Don't make every net_iov hold a ref on the binding as Jakub suggested. - Fix issue reported by Taehee where we access queues after they have been freed. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=* ==== v15 got a thorough review and some testing, and this version addresses almost all the feedback. Some more minor comments where the authors said it could be done later, I left out. Major changes: - Addition of dma-buf introspection to page-pool-get and queue-get. - Fixes to selftests suggested by Taehee. - Fixes to documentation suggested by Donald. - A couple of suggestions and fixes to TCP patches by Eric and David. - Fixes to number assignements suggested by Arnd. - Use rtnl_lock()ing to guard against queue reconfiguration while the page_pool initialization is happening. (Jakub). - Fixes to a few warnings reproduced by Taehee. - Fixes to dma-buf binding suggested by Taehee and Jakub. - Fixes to netlink UAPI suggested by Jakub - Applied a number of Reviewed-bys and Acked-bys (including ones I lost from v13+). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v16/ One caveat: Taehee reproduced a KASAN warning and reported it here: https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd… I estimate the issue to be minor and easily fixable: https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW… I hope to be able to follow up with a fix to net tree as net-next closes imminently, but if this iteration doesn't make it in, I will repost with a fix squashed after net-next reopens, no problem. v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=* ==== No material changes in this version, only a fix to linking against libynl.a from the last version. Per Jakub's instructions I've pulled one of his patches into this series, and now use the new libynl.a correctly, I hope. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v15/ v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=… ==== No material changes in this version. Only rebase and re-verification on top of net-next. v13, I think, raced with commit ebad6d0334793 ("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to net-next that caused a patchwork failure to apply. This series should apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded IP_GRE config"). I did not wait the customary 24hr as Jakub said it's OK to repost as soon as I build test the rebased version: https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/ v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=… ==== Major changes: -------------- This iteration addresses Pavel's review comments, applies his reviewed-by's, and seeks to fix the patchwork build error (sorry!). As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v13/ v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=* ==== Major changes: -------------- This iteration only addresses one minor comment from Pavel with regards to the trace printing of netmem, and the patchwork build error introduced in v11 because I missed doing an allmodconfig build, sorry. Other than that v11, AFAICT, received no feedback. There is one discussion about how the specifics of plugging io uring memory through the page pool, but not relevant to content in this particular patchset, AFAICT. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v12/ v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=* ==== Major Changes: -------------- v11 addresses feedback received in v10. The major change is the removal of the memory provider ops as requested by Christoph. We still accomplish the same thing, but utilizing direct function calls with if statements rather than generic ops. Additionally address sparse warnings, bugs and review comments from folks that reviewed. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v11/ Detailed changelog: ------------------- - Fixes in netdev_rx_queue_restart() from Pavel & David. - Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for custom page providers") from the series to address Christoph's feedback and rebased other patches on the series on this change. - Fixed build errors with CONFIG_DMA_SHARED_BUFFER && !CONFIG_GENERIC_ALLOCATOR build. - Fixed sparse warnings pointed out by Paolo. - Drop unnecessary gro_pull_from_frag0 checks. - Added Bagas reviewed-by to docs. v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=* ==== Major Changes: -------------- v9 was sent right before the merge window closed (sorry!). v10 is almost a re-send of the series now that the merge window re-opened. Only rebased to latest net-next and addressed some minor iterative comments received on v9. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v10/ Detailed changelog: ------------------- - Fixed tokens leaking in DONTNEED setsockopt (Nikolay). - Moved net_iov_dma_addr() to devmem.c and made it a devmem specific helpers (David). - Rename hook alloc_pages to alloc_netmems as alloc_pages is now preprocessor macro defined and causes a build error. v9: === Major Changes: -------------- GVE queue API has been merged. Submitting this version as non-RFC after rebasing on top of the merged API, and dropped the out of tree queue API I was carrying on github. Addressed the little feedback v8 has received. Detailed changelog: ------------------ - Added new patch from David Wei to this series for netdev_rx_queue_restart() - Fixed sparse error. - Removed CONFIG_ checks in netmem_is_net_iov() - Flipped skb->readable to skb->unreadable - Minor fixes to selftests & docs. RFC v8: ======= Major Changes: -------------- - Fixed build error generated by patch-by-patch build. - Applied docs suggestions from Randy. RFC v7: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the feedback RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v7/ Detailed changelog: - Use admin-perm in netlink API. - Addressed feedback from Jakub with regards to netlink API implementation. - Renamed devmem.c functions to something more appropriate for that file. - Improve the performance seen through the page_pool benchmark. - Fix the value definition of all the SO_DEVMEM_* uapi. - Various fixes to documentation. Perf - page-pool benchmark: --------------------------- Improved performance of bench_page_pool_simple.ko tests compared to v6: https://pastebin.com/raw/v5dYRg8L net-next base: 8 cycle fast path. RFC v6: 10 cycle fast path. RFC v7: 9 cycle fast path. RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path, same as baseline. Perf - Devmem TCP benchmark: --------------------- Perf is about the same regardless of the changes in v7, namely the removal of the static_branch_unlikely to improve the page_pool benchmark performance: 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. RFC v6: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the little feedback RFCv5 received. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v6/ This version also comes with some performance data recorded in the cover letter (see below changelog). Detailed changelog: - Rebased on top of the merged netmem_ref changes. - Converted skb->dmabuf to skb->readable (Pavel). Pavel's original suggestion was to remove the skb->dmabuf flag entirely, but when I looked into it closely, I found the issue that if we remove the flag we have to dereference the shinfo(skb) pointer to obtain the first frag to tell whether an skb is readable or not. This can cause a performance regression if it dirties the cache line when the shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf flag into a generic skb->readable flag which can be re-used by io_uring 0-copy RX. - Squashed a few locking optimizations from Eric Dumazet in the RX path and the DEVMEM_DONTNEED setsockopt. - Expanded the tests a bit. Added validation for invalid scenarios and added some more coverage. Perf - page-pool benchmark: --------------------------- bench_page_pool_simple.ko tests with and without these changes: https://pastebin.com/raw/ncHDwAbn AFAIK the number that really matters in the perf tests is the 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8 cycles without the changes but there is some 1 cycle noise in some results. With the patches this regresses to 9 cycles with the changes but there is 1 cycle noise occasionally running this test repeatedly. Lastly I tried disable the static_branch_unlikely() in netmem_is_net_iov() check. To my surprise disabling the static_branch_unlikely() check reduces the fast path back to 8 cycles, but the 1 cycle noise remains. Perf - Devmem TCP benchmark: --------------------- 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. Major changes in RFC v5: ======================== 1. Rebased on top of 'Abstract page from net stack' series and used the new netmem type to refer to LSB set pointers instead of re-using struct page. 2. Downgraded this series back to RFC and called it RFC v5. This is because this series is now dependent on 'Abstract page from net stack'[1] and the queue API. Both are removed from the series to reduce the patch # and those bits are fairly independent or pre-requisite work. 3. Reworked the page_pool devmem support to use netmem and for some more unified handling. 4. Reworked the reference counting of net_iov (renamed from page_pool_iov) to use pp_ref_count for refcounting. The full changes including the dependent series and GVE page pool support is here: https://github.com/mina/linux/commits/tcpdevmem-rfcv5/ [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774 Major changes in v1: ==================== 1. Implemented MVP queue API ndos to remove the userspace-visible driver reset. 2. Fixed issues in the napi_pp_put_page() devmem frag unref path. 3. Removed RFC tag. Many smaller addressed comments across all the patches (patches have individual change log). Full tree including the rest of the GVE driver changes: https://github.com/mina/linux/commits/tcpdevmem-v1 Changes in RFC v3: ================== 1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the series reviewable and mergeable. 2. Implemented multi-rx-queue binding which was a todo in v2. 3. Fix to cmsg handling. The sticking point in RFC v2[2] was the device reset required to refill the device rx-queues after the dmabuf bind/unbind. The solution suggested as I understand is a subset of the per-queue management ops Jakub suggested or similar: https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/ This is not addressed in this revision, because: 1. This point was discussed at netconf & netdev and there is openness to using the current approach of requiring a device reset. 2. Implementing individual queue resetting seems to be difficult for my test bed with GVE. My prototype to test this ran into issues with the rx-queues not coming back up properly if reset individually. At the moment I'm unsure if it's a mistake in the POC or a genuine issue in the virtualization stack behind GVE, which currently doesn't test individual rx-queue restart. 3. Our usecases are not bothered by requiring a device reset to refill the buffer queues, and we'd like to support NICs that run into this limitation with resetting individual queues. My thought is that drivers that have trouble with per-queue configs can use the support in this series, while drivers that support new netdev ops to reset individual queues can automatically reset the queue as part of the dma-buf bind/unbind. The same approach with device resets is presented again for consideration with other sticking points addressed. This proposal includes the rx devmem path only proposed for merge. For a snapshot of my entire tree which includes the GVE POC page pool support & device memory support: https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3 [1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.… [2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4… Changes in RFC v2: ================== The sticking point in RFC v1[1] was the dma-buf pages approach we used to deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept that attempts to resolve this by implementing scatterlist support in the networking stack, such that we can import the dma-buf scatterlist directly. This is the approach proposed at a high level here[2]. Detailed changes: 1. Replaced dma-buf pages approach with importing scatterlist into the page pool. 2. Replace the dma-buf pages centric API with a netlink API. 3. Removed the TX path implementation - there is no issue with implementing the TX path with scatterlist approach, but leaving out the TX path makes it easier to review. 4. Functionality is tested with this proposal, but I have not conducted perf testing yet. I'm not sure there are regressions, but I removed perf claims from the cover letter until they can be re-confirmed. 5. Added Signed-off-by: contributors to the implementation. 6. Fixed some bugs with the RX path since RFC v1. Any feedback welcome, but specifically the biggest pending questions needing feedback IMO are: 1. Feedback on the scatterlist-based approach in general. 2. Netlink API (Patch 1 & 2). 3. Approach to handle all the drivers that expect to receive pages from the page pool (Patch 6). [1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c… [2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX… ================== * TL;DR: Device memory TCP (devmem TCP) is a proposal for transferring data to and/or from device memory efficiently, without bouncing the data to a host memory buffer. * Problem: A large amount of data transfers have device memory as the source and/or destination. Accelerators drastically increased the volume of such transfers. Some examples include: - ML accelerators transferring large amounts of training data from storage into GPU/TPU memory. In some cases ML training setup time can be as long as 50% of TPU compute time, improving data transfer throughput & efficiency can help improving GPU/TPU utilization. - Distributed training, where ML accelerators, such as GPUs on different hosts, exchange data among them. - Distributed raw block storage applications transfer large amounts of data with remote SSDs, much of this data does not require host processing. Today, the majority of the Device-to-Device data transfers the network are implemented as the following low level operations: Device-to-Host copy, Host-to-Host network transfer, and Host-to-Device copy. The implementation is suboptimal, especially for bulk data transfers, and can put significant strains on system resources, such as host memory bandwidth, PCIe bandwidth, etc. One important reason behind the current state is the kernel’s lack of semantics to express device to network transfers. * Proposal: In this patch series we attempt to optimize this use case by implementing socket APIs that enable the user to: 1. send device memory across the network directly, and 2. receive incoming network packets directly into device memory. Packet _payloads_ go directly from the NIC to device memory for receive and from device memory to NIC for transmit. Packet _headers_ go to/from host memory and are processed by the TCP/IP stack normally. The NIC _must_ support header split to achieve this. Advantages: - Alleviate host memory bandwidth pressure, compared to existing network-transfer + device-copy semantics. - Alleviate PCIe BW pressure, by limiting data transfer to the lowest level of the PCIe tree, compared to traditional path which sends data through the root complex. * Patch overview: ** Part 1: netlink API Gives user ability to bind dma-buf to an RX queue. ** Part 2: scatterlist support Currently the standard for device memory sharing is DMABUF, which doesn't generate struct pages. On the other hand, networking stack (skbs, drivers, and page pool) operate on pages. We have 2 options: 1. Generate struct pages for dmabuf device memory, or, 2. Modify the networking stack to process scatterlist. Approach #1 was attempted in RFC v1. RFC v2 implements approach #2. ** part 3: page pool support We piggy back on page pool memory providers proposal: https://github.com/kuba-moo/linux/tree/pp-providers It allows the page pool to define a memory provider that provides the page allocation and freeing. It helps abstract most of the device memory TCP changes from the driver. ** part 4: support for unreadable skb frags Page pool iovs are not accessible by the host; we implement changes throughput the networking stack to correctly handle skbs with unreadable frags. ** Part 5: recvmsg() APIs We define user APIs for the user to send and receive device memory. Not included with this series is the GVE devmem TCP support, just to simplify the review. Code available here if desired: https://github.com/mina/linux/tree/tcpdevmem This series is built on top of net-next with Jakub's pp-providers changes cherry-picked. * NIC dependencies: 1. (strict) Devmem TCP require the NIC to support header split, i.e. the capability to split incoming packets into a header + payload and to put each into a separate buffer. Devmem TCP works by using device memory for the packet payload, and host memory for the packet headers. 2. (optional) Devmem TCP works better with flow steering support & RSS support, i.e. the NIC's ability to steer flows into certain rx queues. This allows the sysadmin to enable devmem TCP on a subset of the rx queues, and steer devmem TCP traffic onto these queues and non devmem TCP elsewhere. The NIC I have access to with these properties is the GVE with DQO support running in Google Cloud, but any NIC that supports these features would suffice. I may be able to help reviewers bring up devmem TCP on their NICs. * Testing: The series includes a udmabuf kselftest that show a simple use case of devmem TCP and validates the entire data path end to end without a dependency on a specific dmabuf provider. ** Test Setup Kernel: net-next with this series and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: David Wei <dw(a)davidwei.uk> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Yunsheng Lin <linyunsheng(a)huawei.com> Cc: Shailend Chand <shailend(a)google.com> Cc: Harshitha Ramamurthy <hramamurthy(a)google.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Jeroen de Borst <jeroendb(a)google.com> Cc: Praveen Kaligineedi <pkaligineedi(a)google.com> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Nikolay Aleksandrov <razor(a)blackwall.org> Cc: Taehee Yoo <ap420073(a)gmail.com> Cc: Donald Hunter <donald.hunter(a)gmail.com> Mina Almasry (13): netdev: add netdev_rx_queue_restart() net: netdev netlink api to bind dma-buf to a net device netdev: support binding dma-buf to netdevice netdev: netdevice devmem allocator page_pool: devmem support memory-provider: dmabuf devmem memory provider net: support non paged skb frags net: add support for skbs with unreadable frags tcp: RX path for devmem TCP net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags net: add devmem TCP documentation selftests: add ncdevmem, netcat for devmem TCP netdev: add dmabuf introspection Documentation/netlink/specs/netdev.yaml | 61 +++ Documentation/networking/devmem.rst | 269 +++++++++++ Documentation/networking/index.rst | 1 + arch/alpha/include/uapi/asm/socket.h | 6 + arch/mips/include/uapi/asm/socket.h | 6 + arch/parisc/include/uapi/asm/socket.h | 6 + arch/sparc/include/uapi/asm/socket.h | 6 + include/linux/netdevice.h | 2 + include/linux/skbuff.h | 61 ++- include/linux/skbuff_ref.h | 9 +- include/linux/socket.h | 1 + include/net/devmem.h | 136 ++++++ include/net/mp_dmabuf_devmem.h | 44 ++ include/net/netdev_rx_queue.h | 5 + include/net/netmem.h | 163 ++++++- include/net/page_pool/helpers.h | 39 +- include/net/page_pool/types.h | 22 +- include/net/sock.h | 2 + include/net/tcp.h | 5 +- include/trace/events/page_pool.h | 12 +- include/uapi/asm-generic/socket.h | 6 + include/uapi/linux/netdev.h | 13 + include/uapi/linux/uio.h | 17 + net/Kconfig | 5 + net/core/Makefile | 2 + net/core/datagram.c | 6 + net/core/dev.c | 28 +- net/core/devmem.c | 388 ++++++++++++++++ net/core/gro.c | 3 +- net/core/netdev-genl-gen.c | 23 + net/core/netdev-genl-gen.h | 6 + net/core/netdev-genl.c | 137 +++++- net/core/netdev_rx_queue.c | 81 ++++ net/core/netmem_priv.h | 31 ++ net/core/page_pool.c | 117 +++-- net/core/page_pool_priv.h | 46 ++ net/core/page_pool_user.c | 31 +- net/core/skbuff.c | 77 +++- net/core/sock.c | 68 +++ net/ethtool/common.c | 8 + net/ipv4/esp4.c | 3 +- net/ipv4/tcp.c | 261 ++++++++++- net/ipv4/tcp_input.c | 13 +- net/ipv4/tcp_ipv4.c | 16 + net/ipv4/tcp_minisocks.c | 2 + net/ipv4/tcp_output.c | 5 +- net/ipv6/esp6.c | 3 +- net/packet/af_packet.c | 4 +- net/xdp/xsk_buff_pool.c | 5 + tools/include/uapi/linux/netdev.h | 13 + tools/net/ynl/lib/.gitignore | 1 + tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 9 + tools/testing/selftests/net/ncdevmem.c | 570 ++++++++++++++++++++++++ 54 files changed, 2731 insertions(+), 124 deletions(-) create mode 100644 Documentation/networking/devmem.rst create mode 100644 include/net/devmem.h create mode 100644 include/net/mp_dmabuf_devmem.h create mode 100644 net/core/devmem.c create mode 100644 net/core/netdev_rx_queue.c create mode 100644 net/core/netmem_priv.h create mode 100644 tools/testing/selftests/net/ncdevmem.c -- 2.46.0.469.g59c65b2a67-goog

10 months

3
21
0 0

[PATCH net-next v5 0/2] net-timestamp: introduce a flag to filter out rx software and hardware report

by Jason Xing

From: Jason Xing <kernelxing(a)tencent.com> When one socket is set SOF_TIMESTAMPING_RX_SOFTWARE which means the whole system turns on the netstamp_needed_key button, other sockets that only have SOF_TIMESTAMPING_SOFTWARE will be affected and then print the rx timestamp information even without setting SOF_TIMESTAMPING_RX_SOFTWARE generation flag. How to solve it without breaking users? We introduce a new flag named SOF_TIMESTAMPING_OPT_RX_FILTER. Using it together with SOF_TIMESTAMPING_SOFTWARE can stop reporting the rx software timestamp. Similarly, we also filter out the hardware case where one process enables the rx hardware generation flag, then another process only passing SOF_TIMESTAMPING_RAW_HARDWARE gets the timestamp. So we can set both SOF_TIMESTAMPING_RAW_HARDWARE and SOF_TIMESTAMPING_OPT_RX_FILTER to stop reporting rx hardware timestamp after this patch applied. v5 Link: https://lore.kernel.org/all/20240905071738.3725-1-kerneljasonxing@gmail.com/ 1. squash the hardware case patch into this one (Willem) 2. update corresponding commit message and doc (Willem) 3. remove the limitation in sock_set_timestamping() and restore the simplification branches. (Willem) 4. add missing type and another test in selftests v4 Link: https://lore.kernel.org/all/20240830153751.86895-1-kerneljasonxing@gmail.co… 1. revise the doc and commit message (Willem) 2. add patch [2/4] to make the doc right (Willem) 3. add patch [3/4] to cover the hardware use (Willem) 4. add testcase for hardware use. Note: the reason why I split into 4 patches is try to make each commit clean, atomic, easy to review. v3 Link: https://lore.kernel.org/all/20240828160145.68805-1-kerneljasonxing@gmail.co… 1. introduce a new flag to avoid application breakage, suggested by Willem. 2. add it into the selftests. v2 Link: https://lore.kernel.org/all/20240825152440.93054-1-kerneljasonxing@gmail.co… Discussed with Willem 1. update the documentation accordingly 2. add more comments in each patch 3. remove the previous test statements in __sock_recv_timestamp() Jason Xing (2): net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag net-timestamp: add selftests for SOF_TIMESTAMPING_OPT_RX_FILTER Documentation/networking/timestamping.rst | 27 +++++++++++++++++++++++ include/uapi/linux/net_tstamp.h | 3 ++- net/ethtool/common.c | 1 + net/ipv4/tcp.c | 9 ++++++-- net/socket.c | 10 +++++++-- tools/testing/selftests/net/rxtimestamp.c | 18 +++++++++++++++ 6 files changed, 63 insertions(+), 5 deletions(-) -- 2.37.3

10 months

2
9
0 0

[crng-random:jd/arm64-vdso] [selftests] f68b079b1d: kernel-selftests.vDSO.vdso_standalone_test_x86.fail

by kernel test robot

Hello, kernel test robot noticed "kernel-selftests.vDSO.vdso_standalone_test_x86.fail" on: commit: f68b079b1d5ec46687a097347303b616927eb9ff ("selftests: vDSO: build tests with O2 optimization") https://git.kernel.org/cgit/linux/kernel/git/crng/random.git jd/arm64-vdso in testcase: kernel-selftests version: kernel-selftests-x86_64-977d51cf-1_20240508 with following parameters: group: group-03 compiler: gcc-12 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang(a)intel.com> | Closes: https://lore.kernel.org/oe-lkp/202409082121.553d4c89-oliver.sang@intel.com # timeout set to 300 # selftests: vDSO: vdso_standalone_test_x86 # Segmentation fault not ok 5 selftests: vDSO: vdso_standalone_test_x86 # exit=139 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240908/202409082121.553d4c89-oliv… -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

10 months

1
0
0 0

[PATCH v2] selftests/futex: Create test for robust list

by André Almeida

Create a test for the robust list mechanism. Signed-off-by: André Almeida <andrealmeid(a)igalia.com> --- Changes from v1: - Change futex type from int to _Atomic(unsigned int) - Use old futex(FUTEX_WAIT) instead of the new sys_futex_wait() --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- .../selftests/futex/functional/robust_list.c | 448 ++++++++++++++++++ 3 files changed, 451 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/futex/functional/robust_list.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index fbcbdb6963b3..4726e1be7497 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -9,3 +9,4 @@ futex_wait_wouldblock futex_wait futex_requeue futex_waitv +robust_list diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index f79f9bac7918..b8635a1ac7f6 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -17,7 +17,8 @@ TEST_GEN_PROGS := \ futex_wait_private_mapped_file \ futex_wait \ futex_requeue \ - futex_waitv + futex_waitv \ + robust_list TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c new file mode 100644 index 000000000000..9308eb189d48 --- /dev/null +++ b/tools/testing/selftests/futex/functional/robust_list.c @@ -0,0 +1,448 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2024 Igalia S.L. + * + * Robust list test by André Almeida <andrealmeid(a)igalia.com> + * + * The robust list uAPI allows userspace to create "robust" locks, in the sense + * that if the lock holder thread dies, the remaining threads that are waiting + * for the lock won't block forever, waiting for a lock that will never be + * released. + * + * This is achieve by userspace setting a list where a thread can enter all the + * locks (futexes) that it is holding. The robust list is a linked list, and + * userspace register the start of the list with the syscall set_robust_list(). + * If such thread eventually dies, the kernel will walk this list, waking up one + * thread waiting for each futex and marking the futex word with the flag + * FUTEX_OWNER_DIED. + * + * See also + * man set_robust_list + * Documententation/locking/robust-futex-ABI.rst + * Documententation/locking/robust-futexes.rst + */ + +#define _GNU_SOURCE + +#include "../../kselftest_harness.h" + +#include "futextest.h" + +#include <pthread.h> +#include <stdatomic.h> +#include <stddef.h> + +#define STACK_SIZE (1024 * 1024) + +#define FUTEX_TIMEOUT 3 + +static pthread_barrier_t barrier, barrier2; + +int set_robust_list(struct robust_list_head *head, size_t len) +{ + return syscall(SYS_set_robust_list, head, len); +} + +int get_robust_list(int pid, struct robust_list_head **head, size_t *len_ptr) +{ + return syscall(SYS_get_robust_list, pid, head, len_ptr); +} + +/* + * Basic lock struct, contains just the futex word and the robust list element + * Real implementations have also a *prev to easily walk in the list + */ +struct lock_struct { + _Atomic(unsigned int) futex; + struct robust_list list; +}; + +/* + * Helper function to spawn a child thread. Returns -1 on error, pid on success + */ +static int create_child(int (*fn)(void *arg), void *arg) +{ + char *stack; + pid_t pid; + + stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + if (stack == MAP_FAILED) + return -1; + + stack += STACK_SIZE; + + pid = clone(fn, stack, CLONE_VM | SIGCHLD, arg); + + if (pid == -1) + return -1; + + return pid; +} + +/* + * Helper function to prepare and register a robust list + */ +static int set_list(struct robust_list_head *head) +{ + int ret; + + ret = set_robust_list(head, sizeof(struct robust_list_head)); + if (ret) + return ret; + + head->futex_offset = (size_t) offsetof(struct lock_struct, futex) - + (size_t) offsetof(struct lock_struct, list); + head->list.next = &head->list; + head->list_op_pending = NULL; + + return 0; +} + +/* + * A basic (and incomplete) mutex lock function with robustness + */ +static int mutex_lock(struct lock_struct *lock, struct robust_list_head *head, bool error_inject) +{ + _Atomic(unsigned int) *futex = &lock->futex; + int zero = 0, ret = -1; + pid_t tid = gettid(); + + /* + * Set list_op_pending before starting the lock, so the kernel can catch + * the case where the thread died during the lock operation + */ + head->list_op_pending = &lock->list; + + if (atomic_compare_exchange_strong(futex, &zero, tid)) { + /* + * We took the lock, insert it in the robust list + */ + struct robust_list *list = &head->list; + + /* Error injection to test list_op_pending */ + if (error_inject) + return 0; + + while (list->next != &head->list) + list = list->next; + + list->next = &lock->list; + lock->list.next = &head->list; + + ret = 0; + } else { + /* + * We didn't take the lock, wait until the owner wakes (or dies) + */ + struct timespec to; + + clock_gettime(CLOCK_MONOTONIC, &to); + to.tv_sec = to.tv_sec + FUTEX_TIMEOUT; + + tid = atomic_load(futex); + /* Kernel ignores futexes without the waiters flag */ + tid |= FUTEX_WAITERS; + atomic_store(futex, tid); + + ret = futex_wait((futex_t *) futex, tid, &to, 0); + + /* + * A real mutex_lock() implementation would loop here to finally + * take the lock. We don't care about that, so we stop here. + */ + } + + head->list_op_pending = NULL; + + return ret; +} + +/* + * This child thread will succeed taking the lock, and then will exit holding it + */ +static int child_fn_lock(void *arg) +{ + struct lock_struct *lock = (struct lock_struct *) arg; + struct robust_list_head head; + int ret; + + ret = set_list(&head); + if (ret) + ksft_test_result_fail("set_robust_list error\n"); + + ret = mutex_lock(lock, &head, false); + if (ret) + ksft_test_result_fail("mutex_lock error\n"); + + pthread_barrier_wait(&barrier); + + /* + * There's a race here: the parent thread needs to be inside + * futex_wait() before the child thread dies, otherwise it will miss the + * wakeup from handle_futex_death() that this child will emit. We wait a + * little bit just to make sure that this happens. + */ + sleep(1); + + return 0; +} + +/* + * Spawns a child thread that will set a robust list, take the lock, register it + * in the robust list and die. The parent thread will wait on this futex, and + * should be waken up when the child exits. + */ +TEST(robustness) +{ + struct lock_struct lock = { .futex = 0 }; + struct robust_list_head head; + _Atomic(unsigned int) *futex = &lock.futex; + int ret; + + ret = set_list(&head); + ASSERT_EQ(ret, 0); + + /* + * Lets use a barrier to ensure that the child thread takes the lock + * before the parent + */ + ret = pthread_barrier_init(&barrier, NULL, 2); + ASSERT_EQ(ret, 0); + + ret = create_child(&child_fn_lock, &lock); + ASSERT_NE(ret, -1); + + pthread_barrier_wait(&barrier); + ret = mutex_lock(&lock, &head, false); + + /* + * futex_wait() should return 0 and the futex word should be marked with + * FUTEX_OWNER_DIED + */ + ASSERT_EQ(ret, 0) TH_LOG("futex wait returned %d", errno); + ASSERT_TRUE(*futex | FUTEX_OWNER_DIED); + + pthread_barrier_destroy(&barrier); +} + +/* + * The only valid value for len is sizeof(*head) + */ +TEST(set_robust_list_invalid_size) +{ + struct robust_list_head head; + size_t head_size = sizeof(struct robust_list_head); + int ret; + + ret = set_robust_list(&head, head_size); + ASSERT_EQ(ret, 0); + + ret = set_robust_list(&head, head_size * 2); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EINVAL); + + ret = set_robust_list(&head, head_size - 1); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EINVAL); + + ret = set_robust_list(&head, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EINVAL); +} + +/* + * Test get_robust_list with pid = 0, getting the list of the running thread + */ +TEST(get_robust_list_self) +{ + struct robust_list_head head, head2, *get_head; + size_t head_size = sizeof(struct robust_list_head), len_ptr; + int ret; + + ret = set_robust_list(&head, head_size); + ASSERT_EQ(ret, 0); + + ret = get_robust_list(0, &get_head, &len_ptr); + ASSERT_EQ(ret, 0); + ASSERT_EQ(get_head, &head); + ASSERT_EQ(head_size, len_ptr); + + ret = set_robust_list(&head2, head_size); + ASSERT_EQ(ret, 0); + + ret = get_robust_list(0, &get_head, &len_ptr); + ASSERT_EQ(ret, 0); + ASSERT_EQ(get_head, &head2); + ASSERT_EQ(head_size, len_ptr); +} + +static int child_list(void *arg) +{ + struct robust_list_head *head = (struct robust_list_head *) arg; + int ret; + + ret = set_robust_list(head, sizeof(struct robust_list_head)); + if (ret) + ksft_test_result_fail("set_robust_list error\n"); + + pthread_barrier_wait(&barrier); + pthread_barrier_wait(&barrier2); + + return 0; +} + +/* + * Test get_robust_list from another thread. We use two barriers here to ensure + * that: + * 1) the child thread set the list before we try to get it from the + * parent + * 2) the child thread still alive when we try to get the list from it + */ +TEST(get_robust_list_child) +{ + pid_t tid; + int ret; + struct robust_list_head head, *get_head; + size_t len_ptr; + + ret = pthread_barrier_init(&barrier, NULL, 2); + ret = pthread_barrier_init(&barrier2, NULL, 2); + ASSERT_EQ(ret, 0); + + tid = create_child(&child_list, &head); + ASSERT_NE(tid, -1); + + pthread_barrier_wait(&barrier); + + ret = get_robust_list(tid, &get_head, &len_ptr); + ASSERT_EQ(ret, 0); + ASSERT_EQ(&head, get_head); + + pthread_barrier_wait(&barrier2); + + pthread_barrier_destroy(&barrier); + pthread_barrier_destroy(&barrier2); +} + +static int child_fn_lock_with_error(void *arg) +{ + struct lock_struct *lock = (struct lock_struct *) arg; + struct robust_list_head head; + int ret; + + ret = set_list(&head); + if (ret) + ksft_test_result_fail("set_robust_list error\n"); + + ret = mutex_lock(lock, &head, true); + if (ret) + ksft_test_result_fail("mutex_lock error\n"); + + pthread_barrier_wait(&barrier); + + sleep(1); + + return 0; +} + +/* + * Same as robustness test, but inject an error where the mutex_lock() exits + * earlier, just after setting list_op_pending and taking the lock, to test the + * list_op_pending mechanism + */ +TEST(set_list_op_pending) +{ + struct lock_struct lock = { .futex = 0 }; + struct robust_list_head head; + _Atomic(unsigned int) *futex = &lock.futex; + int ret; + + ret = set_list(&head); + ASSERT_EQ(ret, 0); + + ret = pthread_barrier_init(&barrier, NULL, 2); + ASSERT_EQ(ret, 0); + + ret = create_child(&child_fn_lock_with_error, &lock); + ASSERT_NE(ret, -1); + + pthread_barrier_wait(&barrier); + ret = mutex_lock(&lock, &head, false); + + ASSERT_EQ(ret, 0) TH_LOG("futex wait returned %d", errno); + ASSERT_TRUE(*futex | FUTEX_OWNER_DIED); + + pthread_barrier_destroy(&barrier); +} + +#define CHILD_NR 10 + +static int child_lock_holder(void *arg) +{ + struct lock_struct *locks = (struct lock_struct *) arg; + struct robust_list_head head; + int i; + + set_list(&head); + + for (i = 0; i < CHILD_NR; i++) { + locks[i].futex = 0; + mutex_lock(&locks[i], &head, false); + } + + pthread_barrier_wait(&barrier); + pthread_barrier_wait(&barrier2); + + sleep(1); + return 0; +} + +static int child_wait_lock(void *arg) +{ + struct lock_struct *lock = (struct lock_struct *) arg; + struct robust_list_head head; + int ret; + + pthread_barrier_wait(&barrier2); + ret = mutex_lock(lock, &head, false); + + if (ret) + ksft_test_result_fail("mutex_lock error\n"); + + if (!(lock->futex | FUTEX_OWNER_DIED)) + ksft_test_result_fail("futex not marked with FUTEX_OWNER_DIED\n"); + + return 0; +} + +/* + * Test a robust list of more than one element. All the waiters should wake when + * the holder dies + */ +TEST(robust_list_multiple_elements) +{ + struct lock_struct locks[CHILD_NR]; + int i, ret; + + ret = pthread_barrier_init(&barrier, NULL, 2); + ASSERT_EQ(ret, 0); + ret = pthread_barrier_init(&barrier2, NULL, CHILD_NR + 1); + ASSERT_EQ(ret, 0); + + create_child(&child_lock_holder, &locks); + + /* Wait until the locker thread takes the look */ + pthread_barrier_wait(&barrier); + + for (i = 0; i < CHILD_NR; i++) + create_child(&child_wait_lock, &locks[i]); + + /* Wait for all children to return */ + while (wait(NULL) > 0); + + pthread_barrier_destroy(&barrier); + pthread_barrier_destroy(&barrier2); +} + +TEST_HARNESS_MAIN -- 2.46.0

10 months, 1 week

3
2
0 0

[PATCH v2] kselftest/cgroup: Add missing newline in test_zswap.c

by Mohammed Anees

Thank you for the review, I have added the changelog as requested. Changelog: - Added missing newline to the `ksft_print_msg` in `test_zswap_writeback` function. Signed-off-by: Mohammed Anees <pvmohammedanees2003(a)gmail.com> --- tools/testing/selftests/cgroup/test_zswap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c index 190096017..7c849d836 100644 --- a/tools/testing/selftests/cgroup/test_zswap.c +++ b/tools/testing/selftests/cgroup/test_zswap.c @@ -351,7 +351,7 @@ static int test_zswap_writeback(const char *root, bool wb) goto out; if (wb != !!zswpwb_after) { - ksft_print_msg("zswpwb_after is %ld while wb is %s", + ksft_print_msg("zswpwb_after is %ld while wb is %s\n", zswpwb_after, wb ? "enabled" : "disabled"); goto out; } -- 2.43.0

10 months, 1 week

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror