- Linux-kselftest-mirror - lists.linaro.org

[PATCH] kunit: Make default kunit_test timeout configurable via both a module parameter and a Kconfig option

by Marie Zhussupova

To accommodate varying hardware performance and use cases, the default kunit test case timeout (currently 300 seconds) is now configurable. Users can adjust the timeout by either setting the 'timeout' module parameter or the KUNIT_DEFAULT_TIMEOUT Kconfig option to their desired timeout in seconds. Signed-off-by: Marie Zhussupova <marievic(a)google.com> --- lib/kunit/Kconfig | 13 +++++++++++++ lib/kunit/test.c | 15 ++++++++------- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig index a97897edd964..c10ede4b1d22 100644 --- a/lib/kunit/Kconfig +++ b/lib/kunit/Kconfig @@ -93,4 +93,17 @@ config KUNIT_AUTORUN_ENABLED In most cases this should be left as Y. Only if additional opt-in behavior is needed should this be set to N. +config KUNIT_DEFAULT_TIMEOUT + int "Default value of the timeout module parameter" + default 300 + help + Sets the default timeout, in seconds, for Kunit test cases. This value + is further multiplied by a factor determined by the assigned speed + setting: 1x for `DEFAULT`, 3x for `KUNIT_SPEED_SLOW`, and 12x for + `KUNIT_SPEED_VERY_SLOW`. This allows slower tests on slower machines + sufficient time to complete. + + If unsure, the default timeout of 300 seconds is suitable for most + cases. + endif # KUNIT diff --git a/lib/kunit/test.c b/lib/kunit/test.c index 002121675605..f3c6b11f12b8 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -69,6 +69,13 @@ static bool enable_param; module_param_named(enable, enable_param, bool, 0); MODULE_PARM_DESC(enable, "Enable KUnit tests"); +/* + * Configure the base timeout. + */ +static unsigned long kunit_base_timeout = CONFIG_KUNIT_DEFAULT_TIMEOUT; +module_param_named(timeout, kunit_base_timeout, ulong, 0644); +MODULE_PARM_DESC(timeout, "Set the base timeout for Kunit test cases"); + /* * KUnit statistic mode: * 0 - disabled @@ -393,12 +400,6 @@ static int kunit_timeout_mult(enum kunit_speed speed) static unsigned long kunit_test_timeout(struct kunit_suite *suite, struct kunit_case *test_case) { int mult = 1; - /* - * TODO: Make the default (base) timeout configurable, so that users with - * particularly slow or fast machines can successfully run tests, while - * still taking advantage of the relative speed. - */ - unsigned long default_timeout = 300; /* * The default test timeout is 300 seconds and will be adjusted by mult @@ -409,7 +410,7 @@ static unsigned long kunit_test_timeout(struct kunit_suite *suite, struct kunit_ mult = kunit_timeout_mult(suite->attr.speed); if (test_case->attr.speed != KUNIT_SPEED_UNSET) mult = kunit_timeout_mult(test_case->attr.speed); - return mult * default_timeout * msecs_to_jiffies(MSEC_PER_SEC); + return mult * kunit_base_timeout * msecs_to_jiffies(MSEC_PER_SEC); } -- 2.50.0.rc2.761.g2dc52ea45b-goog

1 week

2
1
0 0

[PATCH net-next v5] page_pool: import Jesper's page_pool benchmark

by Mina Almasry

From: Jesper Dangaard Brouer <hawk(a)kernel.org> We frequently consult with Jesper's out-of-tree page_pool benchmark to evaluate page_pool changes. Import the benchmark into the upstream linux kernel tree so that (a) we're all running the same version, (b) pave the way for shared improvements, and (c) maybe one day integrate it with nipa, if possible. Import bench_page_pool_simple from commit 35b1716d0c30 ("Add page_bench06_walk_all"), from this repository: https://github.com/netoptimizer/prototype-kernel.git Changes done during upstreaming: - Fix checkpatch issues. - Remove the tasklet logic not needed. - Move under tools/testing - Create ksft for the benchmark. - Changed slightly how the benchmark gets build. Out of tree, time_bench is built as an independent .ko. Here it is included in bench_page_pool.ko Steps to run: ``` mkdir -p /tmp/run-pp-bench make -C ./tools/testing/selftests/net/bench make -C ./tools/testing/selftests/net/bench install INSTALL_PATH=/tmp/run-pp-bench rsync --delete -avz --progress /tmp/run-pp-bench mina@$SERVER:~/ ssh mina@$SERVER << EOF cd ~/run-pp-bench && sudo ./test_bench_page_pool.sh EOF ``` Note that by default, the Makefile will build the benchmark for the currently installed kernel in /lib/modules/$(shell uname -r)/build. To build against the current tree, do: make KDIR=$(pwd) -C ./tools/testing/selftests/net/bench Output (from Jesper): ``` sudo ./test_bench_page_pool.sh (benchmark dmesg logs snipped) Fast path results: no-softirq-page_pool01 Per elem: 23 cycles(tsc) 6.571 ns ptr_ring results: no-softirq-page_pool02 Per elem: 60 cycles(tsc) 16.862 ns slow path results: no-softirq-page_pool03 Per elem: 265 cycles(tsc) 73.739 ns ``` Output (from me): ``` sudo ./test_bench_page_pool.sh (benchmark dmesg logs snipped) Fast path results: no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.177 ns ptr_ring results: no-softirq-page_pool02 Per elem: 51 cycles(tsc) 19.117 ns slow path results: no-softirq-page_pool03 Per elem: 168 cycles(tsc) 62.469 ns ``` Results of course will vary based on hardware/kernel/configs, and some variance may be there from run to run due to some noise. Cc: Jesper Dangaard Brouer <hawk(a)kernel.org> Cc: Ilias Apalodimas <ilias.apalodimas(a)linaro.org> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Toke Høiland-Jørgensen <toke(a)toke.dk> Signed-off-by: Mina Almasry <almasrymina(a)google.com> Acked-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org> Signed-off-by: Jesper Dangaard Brouer <hawk(a)kernel.org> --- v5: https://lore.kernel.org/netdev/20250615205914.835368-1-almasrymina@google.c… - Update results in the commit message - Update to build against current tree v4: https://lore.kernel.org/netdev/20250614100853.3f2372f2@kernel.org/ - Fix more checkpatch and coccicheck issues (Jakub) v3: - Non RFC - Collect Signed-off-by from Jesper and Acked-by Ilias. - Move test_bench_page_pool.sh to address nipa complaint. - Remove `static inline` in .c files to address nipa complaint. v2: - Move under tools/selftests (Jakub) - Create ksft for it. - Remove the tasklet logic no longer needed (Jesper + Toke) RFC discussion points: - Desirable to import it? - Can the benchmark be imported as-is for an initial version? Or needs lots of modifications? - Code location. I retained the location in Jesper's tree, but a path like net/core/bench/ may make more sense. --- tools/testing/selftests/net/bench/Makefile | 7 + .../selftests/net/bench/page_pool/Makefile | 17 + .../bench/page_pool/bench_page_pool_simple.c | 276 ++++++++++++ .../net/bench/page_pool/time_bench.c | 394 ++++++++++++++++++ .../net/bench/page_pool/time_bench.h | 238 +++++++++++ .../net/bench/test_bench_page_pool.sh | 32 ++ 6 files changed, 964 insertions(+) create mode 100644 tools/testing/selftests/net/bench/Makefile create mode 100644 tools/testing/selftests/net/bench/page_pool/Makefile create mode 100644 tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c create mode 100644 tools/testing/selftests/net/bench/page_pool/time_bench.c create mode 100644 tools/testing/selftests/net/bench/page_pool/time_bench.h create mode 100755 tools/testing/selftests/net/bench/test_bench_page_pool.sh diff --git a/tools/testing/selftests/net/bench/Makefile b/tools/testing/selftests/net/bench/Makefile new file mode 100644 index 000000000000..2546c45e42f7 --- /dev/null +++ b/tools/testing/selftests/net/bench/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +TEST_GEN_MODS_DIR := page_pool + +TEST_PROGS += test_bench_page_pool.sh + +include ../../lib.mk diff --git a/tools/testing/selftests/net/bench/page_pool/Makefile b/tools/testing/selftests/net/bench/page_pool/Makefile new file mode 100644 index 000000000000..0549a16ba275 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/Makefile @@ -0,0 +1,17 @@ +BENCH_PAGE_POOL_SIMPLE_TEST_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))) +KDIR ?= /lib/modules/$(shell uname -r)/build + +ifeq ($(V),1) +Q = +else +Q = @ +endif + +obj-m += bench_page_pool.o +bench_page_pool-y += bench_page_pool_simple.o time_bench.o + +all: + +$(Q)make -C $(KDIR) M=$(BENCH_PAGE_POOL_SIMPLE_TEST_DIR) modules + +clean: + +$(Q)make -C $(KDIR) M=$(BENCH_PAGE_POOL_SIMPLE_TEST_DIR) clean diff --git a/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c b/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c new file mode 100644 index 000000000000..f183d5e30dc6 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/bench_page_pool_simple.c @@ -0,0 +1,276 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Benchmark module for page_pool. + * + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/mutex.h> + +#include <linux/version.h> +#include <net/page_pool/helpers.h> + +#include <linux/interrupt.h> +#include <linux/limits.h> + +#include "time_bench.h" + +static int verbose = 1; +#define MY_POOL_SIZE 1024 + +static void _page_pool_put_page(struct page_pool *pool, struct page *page, + bool allow_direct) +{ + page_pool_put_page(pool, page, -1, allow_direct); +} + +/* Makes tests selectable. Useful for perf-record to analyze a single test. + * Hint: Bash shells support writing binary number like: $((2#101010) + * + * # modprobe bench_page_pool_simple run_flags=$((2#100)) + */ +static unsigned long run_flags = 0xFFFFFFFF; +module_param(run_flags, ulong, 0); +MODULE_PARM_DESC(run_flags, "Limit which bench test that runs"); + +/* Count the bit number from the enum */ +enum benchmark_bit { + bit_run_bench_baseline, + bit_run_bench_no_softirq01, + bit_run_bench_no_softirq02, + bit_run_bench_no_softirq03, +}; + +#define bit(b) (1 << (b)) +#define enabled(b) ((run_flags & (bit(b)))) + +/* notice time_bench is limited to U32_MAX nr loops */ +static unsigned long loops = 10000000; +module_param(loops, ulong, 0); +MODULE_PARM_DESC(loops, "Specify loops bench will run"); + +/* Timing at the nanosec level, we need to know the overhead + * introduced by the for loop itself + */ +static int time_bench_for_loop(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + int i; + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + } + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +static int time_bench_atomic_inc(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + atomic_t cnt; + int i; + + atomic_set(&cnt, 0); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + atomic_inc(&cnt); + barrier(); /* avoid compiler to optimize this loop */ + } + loops_cnt = atomic_read(&cnt); + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +/* The ptr_ping in page_pool uses a spinlock. We need to know the minimum + * overhead of taking+releasing a spinlock, to know the cycles that can be saved + * by e.g. amortizing this via bulking. + */ +static int time_bench_lock(struct time_bench_record *rec, void *data) +{ + uint64_t loops_cnt = 0; + spinlock_t lock; + int i; + + spin_lock_init(&lock); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + spin_lock(&lock); + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + spin_unlock(&lock); + } + time_bench_stop(rec, loops_cnt); + return loops_cnt; +} + +/* Helper for filling some page's into ptr_ring */ +static void pp_fill_ptr_ring(struct page_pool *pp, int elems) +{ + /* GFP_ATOMIC needed when under run softirq */ + gfp_t gfp_mask = GFP_ATOMIC; + struct page **array; + int i; + + array = kcalloc(elems, sizeof(struct page *), gfp_mask); + + for (i = 0; i < elems; i++) + array[i] = page_pool_alloc_pages(pp, gfp_mask); + for (i = 0; i < elems; i++) + _page_pool_put_page(pp, array[i], false); + + kfree(array); +} + +enum test_type { type_fast_path, type_ptr_ring, type_page_allocator }; + +/* Depends on compile optimizing this function */ +static int time_bench_page_pool(struct time_bench_record *rec, void *data, + enum test_type type, const char *func) +{ + uint64_t loops_cnt = 0; + gfp_t gfp_mask = GFP_ATOMIC; /* GFP_ATOMIC is not really needed */ + int i, err; + + struct page_pool *pp; + struct page *page; + + struct page_pool_params pp_params = { + .order = 0, + .flags = 0, + .pool_size = MY_POOL_SIZE, + .nid = NUMA_NO_NODE, + .dev = NULL, /* Only use for DMA mapping */ + .dma_dir = DMA_BIDIRECTIONAL, + }; + + pp = page_pool_create(&pp_params); + if (IS_ERR(pp)) { + err = PTR_ERR(pp); + pr_warn("%s: Error(%d) creating page_pool\n", func, err); + goto out; + } + pp_fill_ptr_ring(pp, 64); + + if (in_serving_softirq()) + pr_warn("%s(): in_serving_softirq fast-path\n", func); + else + pr_warn("%s(): Cannot use page_pool fast-path\n", func); + + time_bench_start(rec); + /** Loop to measure **/ + for (i = 0; i < rec->loops; i++) { + /* Common fast-path alloc that depend on in_serving_softirq() */ + page = page_pool_alloc_pages(pp, gfp_mask); + if (!page) + break; + loops_cnt++; + barrier(); /* avoid compiler to optimize this loop */ + + /* The benchmarks purpose it to test different return paths. + * Compiler should inline optimize other function calls out + */ + if (type == type_fast_path) { + /* Fast-path recycling e.g. XDP_DROP use-case */ + page_pool_recycle_direct(pp, page); + + } else if (type == type_ptr_ring) { + /* Normal return path */ + _page_pool_put_page(pp, page, false); + + } else if (type == type_page_allocator) { + /* Test if not pages are recycled, but instead + * returned back into systems page allocator + */ + get_page(page); /* cause no-recycling */ + _page_pool_put_page(pp, page, false); + put_page(page); + } else { + BUILD_BUG(); + } + } + time_bench_stop(rec, loops_cnt); +out: + page_pool_destroy(pp); + return loops_cnt; +} + +static int time_bench_page_pool01_fast_path(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_fast_path, __func__); +} + +static int time_bench_page_pool02_ptr_ring(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_ptr_ring, __func__); +} + +static int time_bench_page_pool03_slow(struct time_bench_record *rec, + void *data) +{ + return time_bench_page_pool(rec, data, type_page_allocator, __func__); +} + +static int run_benchmark_tests(void) +{ + uint32_t nr_loops = loops; + + /* Baseline tests */ + if (enabled(bit_run_bench_baseline)) { + time_bench_loop(nr_loops * 10, 0, "for_loop", NULL, + time_bench_for_loop); + time_bench_loop(nr_loops * 10, 0, "atomic_inc", NULL, + time_bench_atomic_inc); + time_bench_loop(nr_loops, 0, "lock", NULL, time_bench_lock); + } + + /* This test cannot activate correct code path, due to no-softirq ctx */ + if (enabled(bit_run_bench_no_softirq01)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool01", NULL, + time_bench_page_pool01_fast_path); + if (enabled(bit_run_bench_no_softirq02)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool02", NULL, + time_bench_page_pool02_ptr_ring); + if (enabled(bit_run_bench_no_softirq03)) + time_bench_loop(nr_loops, 0, "no-softirq-page_pool03", NULL, + time_bench_page_pool03_slow); + + return 0; +} + +static int __init bench_page_pool_simple_module_init(void) +{ + if (verbose) + pr_info("Loaded\n"); + + if (loops > U32_MAX) { + pr_err("Module param loops(%lu) exceeded U32_MAX(%u)\n", loops, + U32_MAX); + return -ECHRNG; + } + + run_benchmark_tests(); + + return 0; +} +module_init(bench_page_pool_simple_module_init); + +static void __exit bench_page_pool_simple_module_exit(void) +{ + if (verbose) + pr_info("Unloaded\n"); +} +module_exit(bench_page_pool_simple_module_exit); + +MODULE_DESCRIPTION("Benchmark of page_pool simple cases"); +MODULE_AUTHOR("Jesper Dangaard Brouer <netoptimizer(a)brouer.com>"); +MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/net/bench/page_pool/time_bench.c b/tools/testing/selftests/net/bench/page_pool/time_bench.c new file mode 100644 index 000000000000..073bb36ec5f2 --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/time_bench.c @@ -0,0 +1,394 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Benchmarking code execution time inside the kernel + * + * Copyright (C) 2014, Red Hat, Inc., Jesper Dangaard Brouer + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/time.h> + +#include <linux/perf_event.h> /* perf_event_create_kernel_counter() */ + +/* For concurrency testing */ +#include <linux/completion.h> +#include <linux/sched.h> +#include <linux/workqueue.h> +#include <linux/kthread.h> + +#include "time_bench.h" + +static int verbose = 1; + +/** TSC (Time-Stamp Counter) based ** + * See: linux/time_bench.h + * tsc_start_clock() and tsc_stop_clock() + */ + +/** Wall-clock based ** + */ + +/** PMU (Performance Monitor Unit) based ** + */ +#define PERF_FORMAT \ + (PERF_FORMAT_GROUP | PERF_FORMAT_ID | PERF_FORMAT_TOTAL_TIME_ENABLED | \ + PERF_FORMAT_TOTAL_TIME_RUNNING) + +struct raw_perf_event { + uint64_t config; /* event */ + uint64_t config1; /* umask */ + struct perf_event *save; + char *desc; +}; + +/* if HT is enable a maximum of 4 events (5 if one is instructions + * retired can be specified, if HT is disabled a maximum of 8 (9 if + * one is instructions retired) can be specified. + * + * From Table 19-1. Architectural Performance Events + * Architectures Software Developer’s Manual Volume 3: System Programming + * Guide + */ +struct raw_perf_event perf_events[] = { + { 0x3c, 0x00, NULL, "Unhalted CPU Cycles" }, + { 0xc0, 0x00, NULL, "Instruction Retired" } +}; + +#define NUM_EVTS (ARRAY_SIZE(perf_events)) + +/* WARNING: PMU config is currently broken! + */ +bool time_bench_PMU_config(bool enable) +{ + int i; + struct perf_event_attr perf_conf; + struct perf_event *perf_event; + int cpu; + + preempt_disable(); + cpu = smp_processor_id(); + pr_info("DEBUG: cpu:%d\n", cpu); + preempt_enable(); + + memset(&perf_conf, 0, sizeof(struct perf_event_attr)); + perf_conf.type = PERF_TYPE_RAW; + perf_conf.size = sizeof(struct perf_event_attr); + perf_conf.read_format = PERF_FORMAT; + perf_conf.pinned = 1; + perf_conf.exclude_user = 1; /* No userspace events */ + perf_conf.exclude_kernel = 0; /* Only kernel events */ + + for (i = 0; i < NUM_EVTS; i++) { + perf_conf.disabled = enable; + //perf_conf.disabled = (i == 0) ? 1 : 0; + perf_conf.config = perf_events[i].config; + perf_conf.config1 = perf_events[i].config1; + if (verbose) + pr_info("%s() enable PMU counter: %s\n", + __func__, perf_events[i].desc); + perf_event = perf_event_create_kernel_counter(&perf_conf, cpu, + NULL /* task */, + NULL /* overflow_handler*/, + NULL /* context */); + if (perf_event) { + perf_events[i].save = perf_event; + pr_info("%s():DEBUG perf_event success\n", __func__); + + perf_event_enable(perf_event); + } else { + pr_info("%s():DEBUG perf_event is NULL\n", __func__); + } + } + + return true; +} + +/** Generic functions ** + */ + +/* Calculate stats, store results in record */ +bool time_bench_calc_stats(struct time_bench_record *rec) +{ +#define NANOSEC_PER_SEC 1000000000 /* 10^9 */ + uint64_t ns_per_call_tmp_rem = 0; + uint32_t ns_per_call_remainder = 0; + uint64_t pmc_ipc_tmp_rem = 0; + uint32_t pmc_ipc_remainder = 0; + uint32_t pmc_ipc_div = 0; + uint32_t invoked_cnt_precision = 0; + uint32_t invoked_cnt = 0; /* 32-bit due to div_u64_rem() */ + + if (rec->flags & TIME_BENCH_LOOP) { + if (rec->invoked_cnt < 1000) { + pr_err("ERR: need more(>1000) loops(%llu) for timing\n", + rec->invoked_cnt); + return false; + } + if (rec->invoked_cnt > ((1ULL << 32) - 1)) { + /* div_u64_rem() can only support div with 32bit*/ + pr_err("ERR: Invoke cnt(%llu) too big overflow 32bit\n", + rec->invoked_cnt); + return false; + } + invoked_cnt = (uint32_t)rec->invoked_cnt; + } + + /* TSC (Time-Stamp Counter) records */ + if (rec->flags & TIME_BENCH_TSC) { + rec->tsc_interval = rec->tsc_stop - rec->tsc_start; + if (rec->tsc_interval == 0) { + pr_err("ABORT: timing took ZERO TSC time\n"); + return false; + } + /* Calculate stats */ + if (rec->flags & TIME_BENCH_LOOP) + rec->tsc_cycles = rec->tsc_interval / invoked_cnt; + else + rec->tsc_cycles = rec->tsc_interval; + } + + /* Wall-clock time calc */ + if (rec->flags & TIME_BENCH_WALLCLOCK) { + rec->time_start = rec->ts_start.tv_nsec + + (NANOSEC_PER_SEC * rec->ts_start.tv_sec); + rec->time_stop = rec->ts_stop.tv_nsec + + (NANOSEC_PER_SEC * rec->ts_stop.tv_sec); + rec->time_interval = rec->time_stop - rec->time_start; + if (rec->time_interval == 0) { + pr_err("ABORT: timing took ZERO wallclock time\n"); + return false; + } + /* Calculate stats */ + /*** Division in kernel it tricky ***/ + /* Orig: time_sec = (time_interval / NANOSEC_PER_SEC); */ + /* remainder only correct because NANOSEC_PER_SEC is 10^9 */ + rec->time_sec = div_u64_rem(rec->time_interval, NANOSEC_PER_SEC, + &rec->time_sec_remainder); + //TODO: use existing struct timespec records instead of div? + + if (rec->flags & TIME_BENCH_LOOP) { + /*** Division in kernel it tricky ***/ + /* Orig: ns = ((double)time_interval / invoked_cnt); */ + /* First get quotient */ + rec->ns_per_call_quotient = + div_u64_rem(rec->time_interval, invoked_cnt, + &ns_per_call_remainder); + /* Now get decimals .xxx precision (incorrect roundup)*/ + ns_per_call_tmp_rem = ns_per_call_remainder; + invoked_cnt_precision = invoked_cnt / 1000; + if (invoked_cnt_precision > 0) { + rec->ns_per_call_decimal = + div_u64_rem(ns_per_call_tmp_rem, + invoked_cnt_precision, + &ns_per_call_remainder); + } + } + } + + /* Performance Monitor Unit (PMU) counters */ + if (rec->flags & TIME_BENCH_PMU) { + //FIXME: Overflow handling??? + rec->pmc_inst = rec->pmc_inst_stop - rec->pmc_inst_start; + rec->pmc_clk = rec->pmc_clk_stop - rec->pmc_clk_start; + + /* Calc Instruction Per Cycle (IPC) */ + /* First get quotient */ + rec->pmc_ipc_quotient = div_u64_rem(rec->pmc_inst, rec->pmc_clk, + &pmc_ipc_remainder); + /* Now get decimals .xxx precision (incorrect roundup)*/ + pmc_ipc_tmp_rem = pmc_ipc_remainder; + pmc_ipc_div = rec->pmc_clk / 1000; + if (pmc_ipc_div > 0) { + rec->pmc_ipc_decimal = div_u64_rem(pmc_ipc_tmp_rem, + pmc_ipc_div, + &pmc_ipc_remainder); + } + } + + return true; +} + +/* Generic function for invoking a loop function and calculating + * execution time stats. The function being called/timed is assumed + * to perform a tight loop, and update the timing record struct. + */ +bool time_bench_loop(uint32_t loops, int step, char *txt, void *data, + int (*func)(struct time_bench_record *record, void *data)) +{ + struct time_bench_record rec; + + /* Setup record */ + memset(&rec, 0, sizeof(rec)); /* zero func might not update all */ + rec.version_abi = 1; + rec.loops = loops; + rec.step = step; + rec.flags = (TIME_BENCH_LOOP | TIME_BENCH_TSC | TIME_BENCH_WALLCLOCK); + + /*** Loop function being timed ***/ + if (!func(&rec, data)) { + pr_err("ABORT: function being timed failed\n"); + return false; + } + + if (rec.invoked_cnt < loops) + pr_warn("WARNING: Invoke count(%llu) smaller than loops(%d)\n", + rec.invoked_cnt, loops); + + /* Calculate stats */ + time_bench_calc_stats(&rec); + + pr_info("Type:%s Per elem: %llu cycles(tsc) %llu.%03llu ns (step:%d) - (measurement period time:%llu.%09u sec time_interval:%llu) - (invoke count:%llu tsc_interval:%llu)\n", + txt, rec.tsc_cycles, rec.ns_per_call_quotient, + rec.ns_per_call_decimal, rec.step, rec.time_sec, + rec.time_sec_remainder, rec.time_interval, rec.invoked_cnt, + rec.tsc_interval); + if (rec.flags & TIME_BENCH_PMU) + pr_info("Type:%s PMU inst/clock%llu/%llu = %llu.%03llu IPC (inst per cycle)\n", + txt, rec.pmc_inst, rec.pmc_clk, rec.pmc_ipc_quotient, + rec.pmc_ipc_decimal); + return true; +} + +/* Function getting invoked by kthread */ +static int invoke_test_on_cpu_func(void *private) +{ + struct time_bench_cpu *cpu = private; + struct time_bench_sync *sync = cpu->sync; + cpumask_t newmask = CPU_MASK_NONE; + void *data = cpu->data; + + /* Restrict CPU */ + cpumask_set_cpu(cpu->rec.cpu, &newmask); + set_cpus_allowed_ptr(current, &newmask); + + /* Synchronize start of concurrency test */ + atomic_inc(&sync->nr_tests_running); + wait_for_completion(&sync->start_event); + + /* Start benchmark function */ + if (!cpu->bench_func(&cpu->rec, data)) { + pr_err("ERROR: function being timed failed on CPU:%d(%d)\n", + cpu->rec.cpu, smp_processor_id()); + } else { + if (verbose) + pr_info("SUCCESS: ran on CPU:%d(%d)\n", cpu->rec.cpu, + smp_processor_id()); + } + cpu->did_bench_run = true; + + /* End test */ + atomic_dec(&sync->nr_tests_running); + /* Wait for kthread_stop() telling us to stop */ + while (!kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); + schedule(); + } + __set_current_state(TASK_RUNNING); + return 0; +} + +void time_bench_print_stats_cpumask(const char *desc, + struct time_bench_cpu *cpu_tasks, + const struct cpumask *mask) +{ + uint64_t average = 0; + int cpu; + int step = 0; + struct sum { + uint64_t tsc_cycles; + int records; + } sum = { 0 }; + + /* Get stats */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + struct time_bench_record *rec = &c->rec; + + /* Calculate stats */ + time_bench_calc_stats(rec); + + pr_info("Type:%s CPU(%d) %llu cycles(tsc) %llu.%03llu ns (step:%d) - (measurement period time:%llu.%09u sec time_interval:%llu) - (invoke count:%llu tsc_interval:%llu)\n", + desc, cpu, rec->tsc_cycles, rec->ns_per_call_quotient, + rec->ns_per_call_decimal, rec->step, rec->time_sec, + rec->time_sec_remainder, rec->time_interval, + rec->invoked_cnt, rec->tsc_interval); + + /* Collect average */ + sum.records++; + sum.tsc_cycles += rec->tsc_cycles; + step = rec->step; + } + + if (sum.records) /* avoid div-by-zero */ + average = sum.tsc_cycles / sum.records; + pr_info("Sum Type:%s Average: %llu cycles(tsc) CPUs:%d step:%d\n", desc, + average, sum.records, step); +} + +void time_bench_run_concurrent(uint32_t loops, int step, void *data, + const struct cpumask *mask, /* Support masking outsome CPUs*/ + struct time_bench_sync *sync, + struct time_bench_cpu *cpu_tasks, + int (*func)(struct time_bench_record *record, void *data)) +{ + int cpu, running = 0; + + if (verbose) // DEBUG + pr_warn("%s() Started on CPU:%d\n", __func__, + smp_processor_id()); + + /* Reset sync conditions */ + atomic_set(&sync->nr_tests_running, 0); + init_completion(&sync->start_event); + + /* Spawn off jobs on all CPUs */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + + running++; + c->sync = sync; /* Send sync variable along */ + c->data = data; /* Send opaque along */ + + /* Init benchmark record */ + memset(&c->rec, 0, sizeof(struct time_bench_record)); + c->rec.version_abi = 1; + c->rec.loops = loops; + c->rec.step = step; + c->rec.flags = (TIME_BENCH_LOOP | TIME_BENCH_TSC | + TIME_BENCH_WALLCLOCK); + c->rec.cpu = cpu; + c->bench_func = func; + c->task = kthread_run(invoke_test_on_cpu_func, c, + "time_bench%d", cpu); + if (IS_ERR(c->task)) { + pr_err("%s(): Failed to start test func\n", __func__); + return; /* Argh, what about cleanup?! */ + } + } + + /* Wait until all processes are running */ + while (atomic_read(&sync->nr_tests_running) < running) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(10); + } + /* Kick off all CPU concurrently on completion event */ + complete_all(&sync->start_event); + + /* Wait for CPUs to finish */ + while (atomic_read(&sync->nr_tests_running)) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(10); + } + + /* Stop the kthreads */ + for_each_cpu(cpu, mask) { + struct time_bench_cpu *c = &cpu_tasks[cpu]; + + kthread_stop(c->task); + } + + if (verbose) // DEBUG - happens often, finish on another CPU + pr_warn("%s() Finished on CPU:%d\n", __func__, + smp_processor_id()); +} diff --git a/tools/testing/selftests/net/bench/page_pool/time_bench.h b/tools/testing/selftests/net/bench/page_pool/time_bench.h new file mode 100644 index 000000000000..e113fcf341dc --- /dev/null +++ b/tools/testing/selftests/net/bench/page_pool/time_bench.h @@ -0,0 +1,238 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Benchmarking code execution time inside the kernel + * + * Copyright (C) 2014, Red Hat, Inc., Jesper Dangaard Brouer + * for licensing details see kernel-base/COPYING + */ +#ifndef _LINUX_TIME_BENCH_H +#define _LINUX_TIME_BENCH_H + +/* Main structure used for recording a benchmark run */ +struct time_bench_record { + uint32_t version_abi; + uint32_t loops; /* Requested loop invocations */ + uint32_t step; /* option for e.g. bulk invocations */ + + uint32_t flags; /* Measurements types enabled */ +#define TIME_BENCH_LOOP BIT(0) +#define TIME_BENCH_TSC BIT(1) +#define TIME_BENCH_WALLCLOCK BIT(2) +#define TIME_BENCH_PMU BIT(3) + + uint32_t cpu; /* Used when embedded in time_bench_cpu */ + + /* Records */ + uint64_t invoked_cnt; /* Returned actual invocations */ + uint64_t tsc_start; + uint64_t tsc_stop; + struct timespec64 ts_start; + struct timespec64 ts_stop; + /* PMU counters for instruction and cycles + * instructions counter including pipelined instructions + */ + uint64_t pmc_inst_start; + uint64_t pmc_inst_stop; + /* CPU unhalted clock counter */ + uint64_t pmc_clk_start; + uint64_t pmc_clk_stop; + + /* Result records */ + uint64_t tsc_interval; + uint64_t time_start, time_stop, time_interval; /* in nanosec */ + uint64_t pmc_inst, pmc_clk; + + /* Derived result records */ + uint64_t tsc_cycles; // +decimal? + uint64_t ns_per_call_quotient, ns_per_call_decimal; + uint64_t time_sec; + uint32_t time_sec_remainder; + uint64_t pmc_ipc_quotient, pmc_ipc_decimal; /* inst per cycle */ +}; + +/* For synchronizing parallel CPUs to run concurrently */ +struct time_bench_sync { + atomic_t nr_tests_running; + struct completion start_event; +}; + +/* Keep track of CPUs executing our bench function. + * + * Embed a time_bench_record for storing info per cpu + */ +struct time_bench_cpu { + struct time_bench_record rec; + struct time_bench_sync *sync; /* back ptr */ + struct task_struct *task; + /* "data" opaque could have been placed in time_bench_sync, + * but to avoid any false sharing, place it per CPU + */ + void *data; + /* Support masking outsome CPUs, mark if it ran */ + bool did_bench_run; + /* int cpu; // note CPU stored in time_bench_record */ + int (*bench_func)(struct time_bench_record *record, void *data); +}; + +/* + * Below TSC assembler code is not compatible with other archs, and + * can also fail on guests if cpu-flags are not correct. + * + * The way TSC reading is used, many iterations, does not require as + * high accuracy as described below (in Intel Doc #324264). + * + * Considering changing to use get_cycles() (#include <asm/timex.h>). + */ + +/** TSC (Time-Stamp Counter) based ** + * Recommend reading, to understand details of reading TSC accurately: + * Intel Doc #324264, "How to Benchmark Code Execution Times on Intel" + * + * Consider getting exclusive ownership of CPU by using: + * unsigned long flags; + * preempt_disable(); + * raw_local_irq_save(flags); + * _your_code_ + * raw_local_irq_restore(flags); + * preempt_enable(); + * + * Clobbered registers: "%rax", "%rbx", "%rcx", "%rdx" + * RDTSC only change "%rax" and "%rdx" but + * CPUID clears the high 32-bits of all (rax/rbx/rcx/rdx) + */ +static __always_inline uint64_t tsc_start_clock(void) +{ + /* See: Intel Doc #324264 */ + unsigned int hi, lo; + + asm volatile("CPUID\n\t" + "RDTSC\n\t" + "mov %%edx, %0\n\t" + "mov %%eax, %1\n\t" + : "=r"(hi), "=r"(lo)::"%rax", "%rbx", "%rcx", "%rdx"); + //FIXME: on 32bit use clobbered %eax + %edx + return ((uint64_t)lo) | (((uint64_t)hi) << 32); +} + +static __always_inline uint64_t tsc_stop_clock(void) +{ + /* See: Intel Doc #324264 */ + unsigned int hi, lo; + + asm volatile("RDTSCP\n\t" + "mov %%edx, %0\n\t" + "mov %%eax, %1\n\t" + "CPUID\n\t" + : "=r"(hi), "=r"(lo)::"%rax", "%rbx", "%rcx", "%rdx"); + return ((uint64_t)lo) | (((uint64_t)hi) << 32); +} + +/** Wall-clock based ** + * + * use: getnstimeofday() + * getnstimeofday(&rec->ts_start); + * getnstimeofday(&rec->ts_stop); + * + * API changed see: Documentation/core-api/timekeeping.rst + * https://www.kernel.org/doc/html/latest/core-api/timekeeping.html#c.getnstim… + * + * We should instead use: ktime_get_real_ts64() is a direct + * replacement, but consider using monotonic time (ktime_get_ts64()) + * and/or a ktime_t based interface (ktime_get()/ktime_get_real()). + */ + +/** PMU (Performance Monitor Unit) based ** + * + * Needed for calculating: Instructions Per Cycle (IPC) + * - The IPC number tell how efficient the CPU pipelining were + */ +//lookup: perf_event_create_kernel_counter() + +bool time_bench_PMU_config(bool enable); + +/* Raw reading via rdpmc() using fixed counters + * + * From: https://github.com/andikleen/simple-pmu + */ +enum { + FIXED_SELECT = (1U << 30), /* == 0x40000000 */ + FIXED_INST_RETIRED_ANY = 0, + FIXED_CPU_CLK_UNHALTED_CORE = 1, + FIXED_CPU_CLK_UNHALTED_REF = 2, +}; + +static __always_inline unsigned int long long p_rdpmc(unsigned int in) +{ + unsigned int d, a; + + asm volatile("rdpmc" : "=d"(d), "=a"(a) : "c"(in) : "memory"); + return ((unsigned long long)d << 32) | a; +} + +/* These PMU counter needs to be enabled, but I don't have the + * configure code implemented. My current hack is running: + * sudo perf stat -e cycles:k -e instructions:k insmod lib/ring_queue_test.ko + */ +/* Reading all pipelined instruction */ +static __always_inline unsigned long long pmc_inst(void) +{ + return p_rdpmc(FIXED_SELECT | FIXED_INST_RETIRED_ANY); +} + +/* Reading CPU clock cycles */ +static __always_inline unsigned long long pmc_clk(void) +{ + return p_rdpmc(FIXED_SELECT | FIXED_CPU_CLK_UNHALTED_CORE); +} + +/* Raw reading via MSR rdmsr() is likely wrong + * FIXME: How can I know which raw MSR registers are conf for what? + */ +#define MSR_IA32_PCM0 0x400000C1 /* PERFCTR0 */ +#define MSR_IA32_PCM1 0x400000C2 /* PERFCTR1 */ +#define MSR_IA32_PCM2 0x400000C3 +static inline uint64_t msr_inst(unsigned long long *msr_result) +{ + return rdmsrq_safe(MSR_IA32_PCM0, msr_result); +} + +/** Generic functions ** + */ +bool time_bench_loop(uint32_t loops, int step, char *txt, void *data, + int (*func)(struct time_bench_record *rec, void *data)); +bool time_bench_calc_stats(struct time_bench_record *rec); + +void time_bench_run_concurrent(uint32_t loops, int step, void *data, + const struct cpumask *mask, /* Support masking outsome CPUs*/ + struct time_bench_sync *sync, struct time_bench_cpu *cpu_tasks, + int (*func)(struct time_bench_record *record, void *data)); +void time_bench_print_stats_cpumask(const char *desc, + struct time_bench_cpu *cpu_tasks, + const struct cpumask *mask); + +//FIXME: use rec->flags to select measurement, should be MACRO +static __always_inline void time_bench_start(struct time_bench_record *rec) +{ + //getnstimeofday(&rec->ts_start); + ktime_get_real_ts64(&rec->ts_start); + if (rec->flags & TIME_BENCH_PMU) { + rec->pmc_inst_start = pmc_inst(); + rec->pmc_clk_start = pmc_clk(); + } + rec->tsc_start = tsc_start_clock(); +} + +static __always_inline void time_bench_stop(struct time_bench_record *rec, + uint64_t invoked_cnt) +{ + rec->tsc_stop = tsc_stop_clock(); + if (rec->flags & TIME_BENCH_PMU) { + rec->pmc_inst_stop = pmc_inst(); + rec->pmc_clk_stop = pmc_clk(); + } + //getnstimeofday(&rec->ts_stop); + ktime_get_real_ts64(&rec->ts_stop); + rec->invoked_cnt = invoked_cnt; +} + +#endif /* _LINUX_TIME_BENCH_H */ diff --git a/tools/testing/selftests/net/bench/test_bench_page_pool.sh b/tools/testing/selftests/net/bench/test_bench_page_pool.sh new file mode 100755 index 000000000000..7b8b18cfedce --- /dev/null +++ b/tools/testing/selftests/net/bench/test_bench_page_pool.sh @@ -0,0 +1,32 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# + +set -e + +DRIVER="./page_pool/bench_page_pool.ko" +result="" + +function run_test() +{ + rmmod "bench_page_pool.ko" || true + insmod $DRIVER > /dev/null 2>&1 + result=$(dmesg | tail -10) + echo "$result" + + echo + echo "Fast path results:" + echo "${result}" | grep -o -E "no-softirq-page_pool01 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" + + echo + echo "ptr_ring results:" + echo "${result}" | grep -o -E "no-softirq-page_pool02 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" + + echo + echo "slow path results:" + echo "${result}" | grep -o -E "no-softirq-page_pool03 Per elem: ([0-9]+) cycles$tsc$ ([0-9]+\.[0-9]+) ns" +} + +run_test + +exit 0 base-commit: afc783fa0aab9cc093fbb04871bfda406480cf8d -- 2.50.0.rc2.701.gf1e915cc24-goog

1 week

4
7
0 0

[PATCH rc v2 0/4] Fix iommufd selftest FAIL and warnings with v6.16

by Nicolin Chen

A few selftest harness changes being merged to v6.16, which exposed some bugs and vulnerabilities in the iommufd selftest code. Fix them properly. Note that the patch fixing the build warnings at mfd is not ideal, as it has possibly hit some corner case in the gcc: https://lore.kernel.org/all/aEi8DV+ReF3v3Rlf@nvidia.com/ This is on github: https://github.com/nicolinc/iommufd/commits/iommufd_selftest_fixes-v6.16 Changelog: v2 * Add "Reviewed-by" from Jason * Only use kfree() in the teardown() * Add an mmap_buffer_size for readability v1 https://lore.kernel.org/all/cover.1750049883.git.nicolinc@nvidia.com/ Thanks Nicolin Nicolin Chen (4): iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes iommufd/selftest: Add missing close(mfd) in memfd_mmap() iommufd/selftest: Add asserts testing global mfd iommufd/selftest: Fix build warnings due to uninitialized mfd tools/testing/selftests/iommu/iommufd_utils.h | 9 ++++- tools/testing/selftests/iommu/iommufd.c | 40 ++++++++++++++----- 2 files changed, 36 insertions(+), 13 deletions(-) -- 2.43.0

1 week

2
5
0 0

[PATCH 0/2] rust: replace `allow(...)` with `expect(...)`

by Onur Özkan

Replaces various `#[allow(...)]` with `#[expect(...)]` as suggested in the kernel coding guidelines: [link] [link]: https://docs.kernel.org/rust/coding-guidelines.html#lints After switching to `#[expect(...)]`, I found some dead linting rules that are no longer needed which are removed in the second patch. Onur Özkan (1): replace `#[allow(...)]` with `#[expect(...)]` onur-ozkan (1): rust: drop unnecessary lints caught by `#[expect(...)]` drivers/gpu/nova-core/regs.rs | 2 +- rust/compiler_builtins.rs | 2 +- rust/kernel/alloc/allocator_test.rs | 2 +- rust/kernel/cpufreq.rs | 1 - rust/kernel/devres.rs | 2 +- rust/kernel/driver.rs | 2 +- rust/kernel/drm/ioctl.rs | 8 ++++---- rust/kernel/error.rs | 3 +-- rust/kernel/init.rs | 6 +++--- rust/kernel/kunit.rs | 2 +- rust/kernel/opp.rs | 4 ++-- rust/kernel/types.rs | 2 +- rust/macros/helpers.rs | 2 +- 13 files changed, 18 insertions(+), 20 deletions(-) -- 2.50.0

1 week

3
4
0 0

[PATCH 0/6] selftests/mm: Fix false positives and skip unsupported tests

by Aboorva Devarajan

This patch series fixes some of the false positives in generic mm selftests and skips tests that cannot run correctly due to missing features or system limitations. Please let us know if you have any feedback. Thanks, Aboorva Aboorva Devarajan (2): selftests/mm: Fix child process exit codes in KSM tests selftests/mm: Mark thuge-gen as skipped if shmmax is too small or no 1G pages Donet Tom (4): mm/selftests: Fix virtual_address_range test issues. selftest/mm: Fix ksm_funtional_test failures selftests/mm : fix test_prctl_fork_exec failure mm/selftests: Fix split_huge_page_test failure on systems with 64KB page size .../selftests/mm/ksm_functional_tests.c | 24 +++++++++++++------ .../selftests/mm/split_huge_page_test.c | 23 ++++++++++++++---- tools/testing/selftests/mm/thuge-gen.c | 11 +++++---- .../selftests/mm/virtual_address_range.c | 14 +++-------- 4 files changed, 45 insertions(+), 27 deletions(-) -- 2.43.5

1 week

6
43
0 0

[PATCH] selftests/mm: remove duplicate .gitignore entries

by Moon Hee Lee

Remove redundant entries in .gitignore confirmed by: $ sort tools/testing/selftests/mm/.gitignore | uniq -d hugetlb_dio pkey_sighandler_tests_32 pkey_sighandler_tests_64 These entries were originally added by [1], and later duplicated by [2]. [1] https://lore.kernel.org/all/20240924185911.117937-1-lorenzo.stoakes@oracle.… [2] https://lore.kernel.org/all/20241125064036.413536-1-lizhijian@fujitsu.com/ Signed-off-by: Moon Hee Lee <moonhee.lee.ca(a)gmail.com> --- tools/testing/selftests/mm/.gitignore | 3 --- 1 file changed, 3 deletions(-) diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 824266982aa3..f2dafa0b700b 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -38,9 +38,6 @@ map_fixed_noreplace write_to_hugetlbfs hmm-tests memfd_secret -hugetlb_dio -pkey_sighandler_tests_32 -pkey_sighandler_tests_64 soft-dirty split_huge_page_test ksm_tests -- 2.43.0

1 week, 1 day

2
1
0 0

[PATCH 0/3] tools/nolibc: add support for SuperH

by Thomas Weißschuh

Add support for SuperH/"sh" to nolibc. Only sh4 is tested for now. This is only tested on QEMU so far. Additional testing would be very welcome. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Thomas Weißschuh (3): selftests/nolibc: fix EXTRACONFIG variables ordering selftests/nolibc: use file driver for QEMU serial tools/nolibc: add support for SuperH tools/include/nolibc/arch-sh.h | 162 ++++++++++++++++++++++++++++ tools/include/nolibc/arch.h | 2 + tools/testing/selftests/nolibc/Makefile | 15 ++- tools/testing/selftests/nolibc/run-tests.sh | 3 +- 4 files changed, 177 insertions(+), 5 deletions(-) --- base-commit: 6275a61db2f0586b8a5d651dfc7b4aacf9d0b2d6 change-id: 20250528-nolibc-sh-8b4e3bb8efcb Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

1 week, 1 day

4
9
0 0

[PATCH v5 0/7] use per-vma locks for /proc/pid/maps reads and PROCMAP_QUERY

by Suren Baghdasaryan

Reading /proc/pid/maps requires read-locking mmap_lock which prevents any other task from concurrently modifying the address space. This guarantees coherent reporting of virtual address ranges, however it can block important updates from happening. Oftentimes /proc/pid/maps readers are low priority monitoring tasks and them blocking high priority tasks results in priority inversion. Locking the entire address space is required to present fully coherent picture of the address space, however even current implementation does not strictly guarantee that by outputting vmas in page-size chunks and dropping mmap_lock in between each chunk. Address space modifications are possible while mmap_lock is dropped and userspace reading the content is expected to deal with possible concurrent address space modifications. Considering these relaxed rules, holding mmap_lock is not strictly needed as long as we can guarantee that a concurrently modified vma is reported either in its original form or after it was modified. This patchset switches from holding mmap_lock while reading /proc/pid/maps to taking per-vma locks as we walk the vma tree. This reduces the contention with tasks modifying the address space because they would have to contend for the same vma as opposed to the entire address space. Same is done for PROCMAP_QUERY ioctl which locks only the vma that fell into the requested range instead of the entire address space. Previous version of this patchset [1] tried to perform /proc/pid/maps reading under RCU, however its implementation is quite complex and the results are worse than the new version because it still relied on mmap_lock speculation which retries if any part of the address space gets modified. New implementaion is both simpler and results in less contention. Note that similar approach would not work for /proc/pid/smaps reading as it also walks the page table and that's not RCU-safe. Paul McKenney's designed a test [2] to measure mmap/munmap latencies while concurrently reading /proc/pid/maps. The test has a pair of processes scanning /proc/PID/maps, and another process unmapping and remapping 4K pages from a 128MB range of anonymous memory. At the end of each 10 second run, the latency of each mmap() or munmap() operation is measured, and for each run the maximum and mean latency is printed. The map/unmap process is started first, its PID is passed to the scanners, and then the map/unmap process waits until both scanners are running before starting its timed test. The scanners keep scanning until the specified /proc/PID/maps file disappears. This test registered close to 10x improvement in update latencies: Before the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.011 0.008 0.455 0.011 0.008 0.472 0.011 0.008 0.535 0.011 0.009 0.545 ... 0.011 0.014 2.875 0.011 0.014 2.913 0.011 0.014 3.007 0.011 0.015 3.018 After the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.006 0.005 0.036 0.006 0.005 0.039 0.006 0.005 0.039 0.006 0.005 0.039 ... 0.006 0.006 0.403 0.006 0.006 0.474 0.006 0.006 0.479 0.006 0.006 0.498 The patchset also adds a number of tests to check for /proc/pid/maps data coherency. They are designed to detect any unexpected data tearing while performing some common address space modifications (vma split, resize and remap). Even before these changes, reading /proc/pid/maps might have inconsistent data because the file is read page-by-page with mmap_lock being dropped between the pages. An example of user-visible inconsistency can be that the same vma is printed twice: once before it was modified and then after the modifications. For example if vma was extended, it might be found and reported twice. What is not expected is to see a gap where there should have been a vma both before and after modification. This patchset increases the chances of such tearing, therefore it's even more important now to test for unexpected inconsistencies. In [3] Lorenzo identified the following possible vma merging/splitting scenarios: Merges with changes to existing vmas: 1 Merge both - mapping a vma over another one and between two vmas which can be merged after this replacement; 2. Merge left full - mapping a vma at the end of an existing one and completely over its right neighbor; 3. Merge left partial - mapping a vma at the end of an existing one and partially over its right neighbor; 4. Merge right full - mapping a vma before the start of an existing one and completely over its left neighbor; 5. Merge right partial - mapping a vma before the start of an existing one and partially over its left neighbor; Merges without changes to existing vmas: 6. Merge both - mapping a vma into a gap between two vmas which can be merged after the insertion; 7. Merge left - mapping a vma at the end of an existing one; 8. Merge right - mapping a vma before the start end of an existing one; Splits 9. Split with new vma at the lower address; 10. Split with new vma at the higher address; If such merges or splits happen concurrently with the /proc/maps reading we might report a vma twice, once before the modification and once after it is modified: Case 1 might report overwritten and previous vma along with the final merged vma; Case 2 might report previous and the final merged vma; Case 3 might cause us to retry once we detect the temporary gap caused by shrinking of the right neighbor; Case 4 might report overritten and the final merged vma; Case 5 might cause us to retry once we detect the temporary gap caused by shrinking of the left neighbor; Case 6 might report previous vma and the gap along with the final marged vma; Case 7 might report previous and the final merged vma; Case 8 might report the original gap and the final merged vma covering the gap; Case 9 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma start; Case 10 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma end; In all these cases the retry mechanism prevents us from reporting possible temporary gaps. Changes from v4 [4]: - refactored trylock_vma() and other locking parts into mmap_lock.c, per Lorenzo - renamed {lock|unlock}_content() into {lock|unlock}_vma_range(), per Lorenzo - added clarifying comments for sentinels, per Lorenzo - introduced is_sentinel_pos() helper function - fixed position reset logic when last_addr is a sentinel, per Lorenzo - added Acked-by to the last patch, per Andrii Nakryiko [1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [2] https://github.com/paulmckrcu/proc-mmap_sem-test [3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.lo… [4] https://lore.kernel.org/all/20250604231151.799834-1-surenb@google.com/ Suren Baghdasaryan (7): selftests/proc: add /proc/pid/maps tearing from vma split test selftests/proc: extend /proc/pid/maps tearing test to include vma resizing selftests/proc: extend /proc/pid/maps tearing test to include vma remapping selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified selftests/proc: add verbose more for tests to facilitate debugging mm/maps: read proc/pid/maps under per-vma lock mm/maps: execute PROCMAP_QUERY ioctl under per-vma locks fs/proc/internal.h | 5 + fs/proc/task_mmu.c | 179 ++++- include/linux/mmap_lock.h | 11 + mm/mmap_lock.c | 88 +++ tools/testing/selftests/proc/proc-pid-vm.c | 793 ++++++++++++++++++++- 5 files changed, 1053 insertions(+), 23 deletions(-) base-commit: 0b2a863368fb0cf674b40925c55dc8898c5a33af -- 2.50.0.714.g196bf9f422-goog

1 week, 1 day

2
9
0 0

Re: [PATCH 1/6] mm/selftests: Fix virtual_address_range test issues.

by Donet Tom

eOn Tue, Jun 24, 2025 at 11:45:09AM +0530, Dev Jain wrote: > > On 23/06/25 11:02 pm, Donet Tom wrote: > > On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote: > > > On 21/06/25 11:25 pm, Donet Tom wrote: > > > > On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote: > > > > > On 19/06/25 1:53 pm, Donet Tom wrote: > > > > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote: > > > > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote: > > > > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote: > > > > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote: > > > > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote: > > > > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote: > > > > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote: > > > > > > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that > > > > > > > > > > > > first. > > > > > > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue. > > > > > > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max > > > > > > > > > > mapping count check. > > > > > > > > > > > > > > > > > > > > In do_mmap(): > > > > > > > > > > > > > > > > > > > > /* Too many mappings? */ > > > > > > > > > > if (mm->map_count > sysctl_max_map_count) > > > > > > > > > > return -ENOMEM; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As well as numerous other checks in mm/vma.c. > > > > > > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding > > > > > > > > > this. > > > > > > > > No problem! It's hard to be aware of everything in mm :) > > > > > > > > > > > > > > > > > > I'm not sure why an overcommit toggle is even necessary when you could use > > > > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits? > > > > > > > > > > > > > > > > > > > > I'm pretty confused as to what this test is really achieving honestly. This > > > > > > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell. > > > > > > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood > > > > > > > > > to discuss that but if you'd like me to explain from start to end what the test > > > > > > > > > is doing, I can do that : ) > > > > > > > > > > > > > > > > > I just don't have time right now, I guess I'll have to come back to it > > > > > > > > later... it's not the end of the world for it to be iffy in my view as long as > > > > > > > > it passes, but it might just not be of great value. > > > > > > > > > > > > > > > > Philosophically I'd rather we didn't assert internal implementation details like > > > > > > > > where we place mappings in userland memory. At no point do we promise to not > > > > > > > > leave larger gaps if we feel like it :) > > > > > > > You have a fair point. Anyhow a debate for another day. > > > > > > > > > > > > > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion > > > > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints. > > > > > > > > > > > > > > > > But again I'm not sure that achieves much and again also is asserting internal > > > > > > > > implementation details. > > > > > > > > > > > > > > > > Correct behaviour of this kind of thing probably better belongs to tests in the > > > > > > > > userland VMA testing I'd say. > > > > > > > > > > > > > > > > Sorry I don't mean to do down work you've done before, just giving an honest > > > > > > > > technical appraisal! > > > > > > > Nah, it will be rather hilarious to see it all go down the drain xD > > > > > > > > > > > > > > > Anyway don't let this block work to fix the test if it's failing. We can revisit > > > > > > > > this later. > > > > > > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide > > > > > > > the gap check at the crossing boundary. What do you think? > > > > > > > > > > > > > One problem I am seeing with this approach is that, since the hint address > > > > > > is generated randomly, the VMAs are also being created at randomly based on > > > > > > the hint address.So, for the VMAs created at high addresses, we cannot guarantee > > > > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE. > > > > > > > > > > > > High address VMAs > > > > > > ----------------- > > > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0 > > > > > > 2000000000000-2000040000000 r--p 00000000 00:00 0 > > > > > > 4000000000000-4000040000000 r--p 00000000 00:00 0 > > > > > > 8000000000000-8000040000000 r--p 00000000 00:00 0 > > > > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0 > > > > > > > > > > > > I have a different approach to solve this issue. > > > > > It is really weird that such a large amount of VA space > > > > > is left between the two VMAs yet mmap is failing. > > > > > > > > > > > > > > > > > > > > Can you please do the following: > > > > > set /proc/sys/vm/max_map_count to the highest value possible. > > > > > If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1. > > > > > In validate_complete_va_space: > > > > > > > > > > if (start_addr >= HIGH_ADDR_MARK && found == false) { > > > > > found = true; > > > > > continue; > > > > > } > > > > Thanks Dev for the suggestion. I set max_map_count and set overcommit > > > > memory to 1, added this code change as well, and then tried. Still, the > > > > test is failing > > > > > > > > > where found is initialized to false. This will skip the check > > > > > for the boundary. > > > > > > > > > > After this can you tell whether the test is still failing. > > > > > > > > > > Also can you give me the complete output of proc/pid/maps > > > > > after putting a sleep at the end of the test. > > > > > > > > > on powerpc support DEFAULT_MAP_WINDOW is 128TB and with > > > > total address space size is 4PB With hint it can map upto > > > > 4PB. Since the hint addres is random in this test random hing VMAs > > > > are getting created. IIUC this is expected only. > > > > > > > > > > > > 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range > > > > 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range > > > > 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range > > > > 30000000-10030000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > 10030770000-100307a0000 rw-p 00000000 00:00 0 [heap] > > > > 1004f000000-7fff8f000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0 > > > > 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6 > > > > 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6 > > > > 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6 > > > > 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6 > > > > 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6 > > > > 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6 > > > > 7fff90160000-7fff901a0000 r--p 00000000 00:00 0 [vvar] > > > > 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0 [vdso] > > > > 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2 > > > > 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2 > > > > 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2 > > > > 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0 [stack] > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > 2000000000000-2000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > 4000000000000-4000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > 8000000000000-8000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > eb95410220000-fffff90220000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > > > > > > > > > > > > > > > > > If I give the hint address serially from 128TB then the address > > > > space is contigous and gap is also MAP_SIZE, the test is passing. > > > > > > > > 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range > > > > 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range > > > > 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range > > > > 33000000-10033000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > 10033380000-100333b0000 rw-p 00000000 00:00 0 [heap] > > > > 1006f0f0000-10071000000 rw-p 00000000 00:00 0 > > > > 10071000000-7fffb1000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6 > > > > 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6 > > > > 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6 > > > > 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6 > > > > 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6 > > > > 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6 > > > > 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0 [vvar] > > > > 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0 [vdso] > > > > 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2 > > > > 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2 > > > > 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2 > > > > 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0 [stack] > > > > 800000000000-2aab000000000 r--p 00000000 00:00 0 [anon:virtual_address_range] > > > > > > > > > > > Thank you for this output. I can't wrap my head around why this behaviour changes > > > when you generate the hint sequentially. The mmap() syscall is supposed to do the > > > following (irrespective of high VA space or not) - if the allocation at the hint > > Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is > > 128TB, and the system can map up to 4PB. > > > > In the test, the first mmap call maps memory up to 128TB without any > > hint, so the VMAs are created below the 128TB boundary. > > > > In the second mmap call, we provide a hint starting from 256TB, and > > the hint address is generated randomly above 256TB. The mappings are > > correctly created at these hint addresses. Since the hint addresses > > are random, the resulting VMAs are also created at random locations. > > > > So, what I tried is: mapping from 0 to 128TB without any hint, and > > then for the second mmap, instead of starting the hint from 256TB, I > > started from 128TB. Instead of using random hint addresses, I used > > sequential hint addresses from 128TB up to 512TB. With this change, > > the VMAs are created in order, and the test passes. > > > > 800000000000-2aab000000000 r--p 00000000 00:00 0 128TB to 512TB VMA > > > > I think we will see same behaviour on x86 with X86_FEATURE_LA57. > > > > I will send the updated patch in V2. > > Since you say it fails on both radix and hash, it means that the generic > code path is failing. I see that on my system, when I run the test with > LPA2 config, write() fails with errno set to -ENOMEM. Can you apply > the following diff and check whether the test fails still. Doing this > fixed it for arm64. > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c > > index b380e102b22f..3032902d01f2 100644 > > --- a/tools/testing/selftests/mm/virtual_address_range.c > > +++ b/tools/testing/selftests/mm/virtual_address_range.c > > @@ -173,10 +173,6 @@ static int validate_complete_va_space(void) > > */ > > hop = 0; > > while (start_addr + hop < end_addr) { > > - if (write(fd, (void *)(start_addr + hop), 1) != 1) > > - return 1; > > - lseek(fd, 0, SEEK_SET); > > - > > if (is_marked_vma(vma_name)) > > munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE); > Even with this change, the test is still failing. In this case, we are allocating physical memory and writing into it, but our issue seems to be with the gap between VMAs, so I believe this might not be directly related. I will send the next revision where the test passes and no issues are observed Just curious — with LPA2, is the second mmap() call successful? And are the VMAs being created at the hint address as expected? > > > > > addr succeeds, then all is well, otherwise, do a top-down search for a large > > > enough gap. I am not aware of the nuances in powerpc but I really am suspecting > > > a bug in powerpc mmap code. Can you try to do some tracing - which function > > > eventually fails to find the empty gap? > > > > > > Through my limited code tracing - we should end up in slice_find_area_topdown, > > > then we ask the generic code to find the gap using vm_unmapped_area. So I > > > suspect something is happening between this, probably slice_scan_available(). > > > > > > > > > From 0 to 128TB, we map memory directly without using any hint. For the range above > > > > > > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test, > > > > > > we use random hint addresses, but I have modified it to generate hint addresses linearly > > > > > > starting from 128TB. > > > > > > > > > > > > With this change: > > > > > > > > > > > > The 0–128TB range is mapped without hints and verified accordingly. > > > > > > > > > > > > The 128TB–512TB range is mapped using linear hint addresses and then verified. > > > > > > > > > > > > Below are the VMAs obtained with this approach: > > > > > > > > > > > > 10000000-10010000 r-xp 00000000 fd:05 135019531 > > > > > > 10010000-10020000 r--p 00000000 fd:05 135019531 > > > > > > 10020000-10030000 rw-p 00010000 fd:05 135019531 > > > > > > 20000000-10020000000 r--p 00000000 00:00 0 > > > > > > 10020800000-10020830000 rw-p 00000000 00:00 0 > > > > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0 > > > > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0 > > > > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355 > > > > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355 > > > > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355 > > > > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358 > > > > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358 > > > > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358 > > > > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0 > > > > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0 > > > > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351 > > > > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351 > > > > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351 > > > > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0 > > > > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0 > > > > > > 800000000000-2000000000000 r--p 00000000 00:00 0 -> High Address (128TB to 512TB) > > > > > > > > > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c > > > > > > index 4c4c35eac15e..0be008cba4b0 100644 > > > > > > --- a/tools/testing/selftests/mm/virtual_address_range.c > > > > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c > > > > > > @@ -56,21 +56,21 @@ > > > > > > #ifdef __aarch64__ > > > > > > #define HIGH_ADDR_MARK ADDR_MARK_256TB > > > > > > -#define HIGH_ADDR_SHIFT 49 > > > > > > +#define HIGH_ADDR_SHIFT 48 > > > > > > #define NR_CHUNKS_LOW NR_CHUNKS_256TB > > > > > > #define NR_CHUNKS_HIGH NR_CHUNKS_3840TB > > > > > > #else > > > > > > #define HIGH_ADDR_MARK ADDR_MARK_128TB > > > > > > -#define HIGH_ADDR_SHIFT 48 > > > > > > +#define HIGH_ADDR_SHIFT 47 > > > > > > #define NR_CHUNKS_LOW NR_CHUNKS_128TB > > > > > > #define NR_CHUNKS_HIGH NR_CHUNKS_384TB > > > > > > #endif > > > > > > -static char *hint_addr(void) > > > > > > +static char *hint_addr(int hint) > > > > > > { > > > > > > - int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT); > > > > > > + unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE)); > > > > > > - return (char *) (1UL << bits); > > > > > > + return (char *) (addr); > > > > > > } > > > > > > static void validate_addr(char *ptr, int high_addr) > > > > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[]) > > > > > > } > > > > > > for (i = 0; i < NR_CHUNKS_HIGH; i++) { > > > > > > - hint = hint_addr(); > > > > > > + hint = hint_addr(i); > > > > > > hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ, > > > > > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > > > > > > > > > > > > > > > > > > > > > > > Can we fix it this way?

1 week, 1 day

2
1
0 0

[PATCH net 03/10] netlink: specs: ethtool: replace underscores with dashes in names

by Jakub Kicinski

We're trying to add a strict regexp for the name format in the spec. Underscores will not be allowed, dashes should be used instead. This makes no difference to C (codegen replaces special chars in names) but gives more uniform naming in Python. Fixes: 13e59344fb9d ("net: ethtool: add support for symmetric-xor RSS hash") Fixes: 46fb3ba95b93 ("ethtool: Add an interface for flashing transceiver modules' firmware") Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: andrew(a)lunn.ch CC: donald.hunter(a)gmail.com CC: shuah(a)kernel.org CC: kory.maincent(a)bootlin.com CC: sdf(a)fomichev.me CC: gal(a)nvidia.com CC: noren(a)nvidia.com CC: ahmed.zaki(a)intel.com CC: wojciech.drewek(a)intel.com CC: petrm(a)nvidia.com CC: danieller(a)nvidia.com CC: linux-kselftest(a)vger.kernel.org --- Documentation/netlink/specs/ethtool.yaml | 6 +++--- tools/testing/selftests/drivers/net/hw/rss_input_xfrm.py | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml index 72a076b0e1b5..348c6ad548f5 100644 --- a/Documentation/netlink/specs/ethtool.yaml +++ b/Documentation/netlink/specs/ethtool.yaml @@ -48,7 +48,7 @@ c-version-name: ethtool-genl-version name: started doc: The firmware flashing process has started. - - name: in_progress + name: in-progress doc: The firmware flashing process is in progress. - name: completed @@ -1422,7 +1422,7 @@ c-version-name: ethtool-genl-version name: hkey type: binary - - name: input_xfrm + name: input-xfrm type: u32 - name: start-context @@ -2238,7 +2238,7 @@ c-version-name: ethtool-genl-version - hfunc - indir - hkey - - input_xfrm + - input-xfrm dump: request: attributes: diff --git a/tools/testing/selftests/drivers/net/hw/rss_input_xfrm.py b/tools/testing/selftests/drivers/net/hw/rss_input_xfrm.py index f439c434ba36..648ff50bc1c3 100755 --- a/tools/testing/selftests/drivers/net/hw/rss_input_xfrm.py +++ b/tools/testing/selftests/drivers/net/hw/rss_input_xfrm.py @@ -38,7 +38,7 @@ from lib.py import rand_port raise KsftSkipEx("socket.SO_INCOMING_CPU was added in Python 3.11") input_xfrm = cfg.ethnl.rss_get( - {'header': {'dev-name': cfg.ifname}}).get('input_xfrm') + {'header': {'dev-name': cfg.ifname}}).get('input-xfrm') # Check for symmetric xor/or-xor if not input_xfrm or (input_xfrm != 1 and input_xfrm != 2): -- 2.49.0

1 week, 1 day

3
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror