October 2024 - Linux-kselftest-mirror

kselftest/fixes kselftest-lib: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc3)

by kernelci.org bot

kselftest/fixes kselftest-lib: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc3) Regressions Summary ------------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/job/kselftest/branch/fixes/kernel/linux_kselftest… Test: kselftest-lib Tree: kselftest Branch: fixes Describe: linux_kselftest-fixes-6.12-rc3 URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git SHA: 4ee5ca9a29384fcf3f18232fdf8474166dea8dca Test Regressions ---------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/plan/id/670d06ca62e90ff6e7c86855 Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+kselftest Compiler: gcc-12 (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0) Plain log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… HTML log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… Rootfs: http://storage.kernelci.org/images/rootfs/debian/bookworm-kselftest/2024031… * kselftest-lib.login: https://kernelci.org/test/case/id/670d06ca62e90ff6e7c86856 failing since 5 days (last pass: v6.12-rc1-5-g45a8897db67d4, first fail: linux_kselftest-fixes-6.12-rc2-4-g34d5b600172b)

1 year, 1 month

1
0
0 0

kselftest/fixes build: 7 builds: 2 failed, 5 passed, 1 warning (linux_kselftest-fixes-6.12-rc3)

by kernelci.org bot

kselftest/fixes build: 7 builds: 2 failed, 5 passed, 1 warning (linux_kselftest-fixes-6.12-rc3) Full Build Summary: https://kernelci.org/build/kselftest/branch/fixes/kernel/linux_kselftest-fi… Tree: kselftest Branch: fixes Git Describe: linux_kselftest-fixes-6.12-rc3 Git Commit: 4ee5ca9a29384fcf3f18232fdf8474166dea8dca Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git Built: 4 unique architectures Build Failures Detected: arm64: defconfig+kselftest+arm64-chromebook: (clang-16) FAIL defconfig+kselftest+arm64-chromebook: (gcc-12) FAIL Warnings Detected: arm64: arm: i386: x86_64: x86_64_defconfig+kselftest (clang-16): 1 warning Warnings summary: 1 vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x23: relocation to !ENDBR: .text+0x14fd19 ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- defconfig+kselftest (arm64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig+kselftest+arm64-chromebook (arm64, gcc-12) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig+kselftest+arm64-chromebook (arm64, clang-16) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- i386_defconfig+kselftest (i386, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- multi_v7_defconfig+kselftest (arm, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig+kselftest (x86_64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig+kselftest (x86_64, clang-16) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x23: relocation to !ENDBR: .text+0x14fd19 --- For more info write to <info(a)kernelci.org>

1 year, 1 month

1
0
0 0

[PATCH v1 0/1] update mseal.rst

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)chromium.org> Pedro Falcato's optimization [1] for checking sealed VMAs, which replaces the can_modify_mm() function with an in-loop check, necessitates an update to the mseal.rst documentation to reflect this change. Furthermore, the document has received offline comments regarding the code sample and suggestions for sentence clarification to enhance reader comprehension. [1] https://lore.kernel.org/linux-mm/20240817-mseal-depessimize-v3-0-d8d2e037df… Jeff Xu (1): mseal: update mseal.rst Documentation/userspace-api/mseal.rst | 290 ++++++++++++-------------- 1 file changed, 136 insertions(+), 154 deletions(-) -- 2.46.1.824.gd892dcdcdd-goog

1 year, 1 month

5
11
0 0

[PATCH V12 00/14] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

by Adrian Hunter

Hi Note for V12: There was a small conflict between the Intel PT changes in "KVM: x86: Fix Intel PT Host/Guest mode when host tracing" and the changes in this patch set, so I have put the patch sets together, along with outstanding fix "perf/x86/intel/pt: Fix buffer full but size is 0 case" Cover letter for KVM changes (patches 2 to 4): There is a long-standing problem whereby running Intel PT on host and guest in Host/Guest mode, causes VM-Entry failure. The motivation for this patch set is to provide a fix for stable kernels prior to the advent of the "Mediated Passthrough vPMU" patch set: https://lore.kernel.org/kvm/20240801045907.4010984-1-mizhang@google.com/ which would render a large part of the fix unnecessary but likely not be suitable for backport to stable due to its size and complexity. Ideally, this patch set would be applied before "Mediated Passthrough vPMU" Note that the fix does not conflict with "Mediated Passthrough vPMU", it is just that "Mediated Passthrough vPMU" will make the code to stop and restart Intel PT unnecessary. Note for V11: Moving aux_paused into a union within struct hw_perf_event caused a regression because aux_paused was being written unconditionally even though it is valid only for AUX (e.g. Intel PT) PMUs. That is fixed in V11. Hardware traces, such as instruction traces, can produce a vast amount of trace data, so being able to reduce tracing to more specific circumstances can be useful. The ability to pause or resume tracing when another event happens, can do that. These patches add such a facilty and show how it would work for Intel Processor Trace. Maintainers of other AUX area tracing implementations are requested to consider if this is something they might employ and then whether or not the ABI would work for them. Note, thank you to James Clark (ARM) for evaluating the API for Coresight. Suzuki K Poulose (ARM) also responded positively to the RFC. Changes to perf tools are now (since V4) fleshed out. Please note, Intel® Architecture Instruction Set Extensions and Future Features Programming Reference March 2024 319433-052, currently: https://cdrdv2.intel.com/v1/dl/getContent/671368 introduces hardware pause / resume for Intel PT in a feature named Intel PT Trigger Tracing. For that more fields in perf_event_attr will be necessary. The main differences are: - it can be applied not just to overflows, but optionally to every event - a packet is emitted into the trace, optionally with IP information - no PMI - works with PMC and DR (breakpoint) events only Here are the proposed additions to perf_event_attr, please comment: diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index 0c557f0a17b3..05dcc43f11bb 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -369,6 +369,22 @@ enum perf_event_read_format { PERF_FORMAT_MAX = 1U << 5, /* non-ABI */ }; +enum { + PERF_AUX_ACTION_START_PAUSED = 1U << 0, + PERF_AUX_ACTION_PAUSE = 1U << 1, + PERF_AUX_ACTION_RESUME = 1U << 2, + PERF_AUX_ACTION_EMIT = 1U << 3, + PERF_AUX_ACTION_NR = 0x1f << 4, + PERF_AUX_ACTION_NO_IP = 1U << 9, + PERF_AUX_ACTION_PAUSE_ON_EVT = 1U << 10, + PERF_AUX_ACTION_RESUME_ON_EVT = 1U << 11, + PERF_AUX_ACTION_EMIT_ON_EVT = 1U << 12, + PERF_AUX_ACTION_NR_ON_EVT = 0x1f << 13, + PERF_AUX_ACTION_NO_IP_ON_EVT = 1U << 18, + PERF_AUX_ACTION_MASK = ~PERF_AUX_ACTION_START_PAUSED, + PERF_AUX_PAUSE_RESUME_MASK = PERF_AUX_ACTION_PAUSE | PERF_AUX_ACTION_RESUME, +}; + #define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */ #define PERF_ATTR_SIZE_VER1 72 /* add: config2 */ #define PERF_ATTR_SIZE_VER2 80 /* add: branch_sample_type */ @@ -515,10 +531,19 @@ struct perf_event_attr { union { __u32 aux_action; struct { - __u32 aux_start_paused : 1, /* start AUX area tracing paused */ - aux_pause : 1, /* on overflow, pause AUX area tracing */ - aux_resume : 1, /* on overflow, resume AUX area tracing */ - __reserved_3 : 29; + __u32 aux_start_paused : 1, /* start AUX area tracing paused */ + aux_pause : 1, /* on overflow, pause AUX area tracing */ + aux_resume : 1, /* on overflow, resume AUX area tracing */ + aux_emit : 1, /* generate AUX records instead of events */ + aux_nr : 5, /* AUX area tracing reference number */ + aux_no_ip : 1, /* suppress IP in AUX records */ + /* Following apply to event occurrence not overflows */ + aux_pause_on_evt : 1, /* on event, pause AUX area tracing */ + aux_resume_on_evt : 1, /* on event, resume AUX area tracing */ + aux_emit_on_evt : 1, /* generate AUX records instead of events */ + aux_nr_on_evt : 5, /* AUX area tracing reference number */ + aux_no_ip_on_evt : 1, /* suppress IP in AUX records */ + __reserved_3 : 13; }; }; Changes in V12: Add previously sent patch "perf/x86/intel/pt: Fix buffer full but size is 0 case" Add previously sent patch set "KVM: x86: Fix Intel PT Host/Guest mode when host tracing" Rebase on current tip plus patch set "KVM: x86: Fix Intel PT Host/Guest mode when host tracing" Changes in V11: perf/core: Add aux_pause, aux_resume, aux_start_paused Make assignment to event->hw.aux_paused conditional on (pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE). perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling Remove definition of has_aux_action() because it has already been added as an inline function. perf/x86/intel/pt: Fix sampling synchronization perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64 perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF Dropped because they have already been applied Changes in V10: perf/core: Add aux_pause, aux_resume, aux_start_paused Move aux_paused into a union within struct hw_perf_event. Additional comment wrt PERF_EF_PAUSE/PERF_EF_RESUME. Factor out has_aux_action() as an inline function. Use scoped_guard for irqsave. Move calls of perf_event_aux_pause() from __perf_event_output() to __perf_event_overflow(). Changes in V9: perf/x86/intel/pt: Fix sampling synchronization New patch perf/core: Add aux_pause, aux_resume, aux_start_paused Move aux_paused to struct hw_perf_event perf/x86/intel/pt: Add support for pause / resume Add more comments and barriers for resume_allowed and pause_allowed Always use WRITE_ONCE with resume_allowed Changes in V8: perf tools: Parse aux-action Fix clang warning: util/auxtrace.c:821:7: error: missing field 'aux_action' initializer [-Werror,-Wmissing-field-initializers] 821 | {NULL}, | ^ Changes in V7: Add Andi's Reviewed-by for patches 2-12 Re-base Changes in V6: perf/core: Add aux_pause, aux_resume, aux_start_paused Removed READ/WRITE_ONCE from __perf_event_aux_pause() Expanded comment about guarding against NMI Changes in V5: perf/core: Add aux_pause, aux_resume, aux_start_paused Added James' Ack perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling New patch perf tools Added Ian's Ack Changes in V4: perf/core: Add aux_pause, aux_resume, aux_start_paused Rename aux_output_cfg -> aux_action Reorder aux_action bits from: aux_pause, aux_resume, aux_start_paused to: aux_start_paused, aux_pause, aux_resume Fix aux_action bits __u64 -> __u32 coresight: Have a stab at support for pause / resume Dropped perf tools All new patches Changes in RFC V3: coresight: Have a stab at support for pause / resume 'mode' -> 'flags' so it at least compiles Changes in RFC V2: Use ->stop() / ->start() instead of ->pause_resume() Move aux_start_paused bit into aux_output_cfg Tighten up when Intel PT pause / resume is allowed Add an example of how it might work for CoreSight Adrian Hunter (14): perf/x86/intel/pt: Fix buffer full but size is 0 case KVM: x86: Fix Intel PT IA32_RTIT_CTL MSR validation KVM: x86: Fix Intel PT Host/Guest mode when host tracing also KVM: selftests: Add guest Intel PT test perf/core: Add aux_pause, aux_resume, aux_start_paused perf/x86/intel/pt: Add support for pause / resume perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling perf tools: Add aux_start_paused, aux_pause and aux_resume perf tools: Add aux-action config term perf tools: Parse aux-action perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume perf intel-pt: Improve man page format perf intel-pt: Add documentation for pause / resume perf intel-pt: Add a test for pause / resume arch/x86/events/intel/core.c | 4 +- arch/x86/events/intel/pt.c | 209 +++++++- arch/x86/events/intel/pt.h | 16 + arch/x86/include/asm/intel_pt.h | 4 + arch/x86/kvm/vmx/vmx.c | 26 +- arch/x86/kvm/vmx/vmx.h | 1 - include/linux/perf_event.h | 28 + include/uapi/linux/perf_event.h | 11 +- kernel/events/core.c | 72 ++- kernel/events/internal.h | 1 + tools/include/uapi/linux/perf_event.h | 11 +- tools/perf/Documentation/perf-intel-pt.txt | 596 +++++++++++++-------- tools/perf/Documentation/perf-record.txt | 4 + tools/perf/builtin-record.c | 4 +- tools/perf/tests/shell/test_intel_pt.sh | 28 + tools/perf/util/auxtrace.c | 67 ++- tools/perf/util/auxtrace.h | 6 +- tools/perf/util/evsel.c | 13 +- tools/perf/util/evsel.h | 1 + tools/perf/util/evsel_config.h | 1 + tools/perf/util/parse-events.c | 10 + tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 1 + tools/perf/util/perf_event_attr_fprintf.c | 3 + tools/perf/util/pmu.c | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 1 + tools/testing/selftests/kvm/x86_64/intel_pt.c | 381 +++++++++++++ 28 files changed, 1238 insertions(+), 264 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/intel_pt.c Regards Adrian

1 year, 1 month

2
15
0 0

[PATCH net-next 0/3] Threads support in proc connector

by Anjali Kulkarni

Recently we committed a fix to allow processes to receive notifications for non-zero exits via the process connector module. Commit is a4c9a56e6a2c. However, for threads, when it does a pthread_exit(&exit_status) call, the kernel is not aware of the exit status with which pthread_exit is called. It is sent by child thread to the parent process, if it is waiting in pthread_join(). Hence, for a thread exiting abnormally, kernel cannot send notifications to any listening processes. The exception to this is if the thread is sent a signal which it has not handled, and dies along with it's process as a result; for eg. SIGSEGV or SIGKILL. In this case, kernel is aware of the non-zero exit and sends a notification for it. For our use case, we cannot have parent wait in pthread_join, one of the main reasons for this being that we do not want to track normal pthread_exit(), which could be a very large number. We only want to be notified of any abnormal exits. Hence, threads are created with pthread_attr_t set to PTHREAD_CREATE_DETACHED. To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector API, which allows a thread to send it's exit status to kernel either when it needs to call pthread_exit() with non-zero value to indicate some error or from signal handler before pthread_exit(). Anjali Kulkarni (3): connector/cn_proc: Add hash table for threads connector/cn_proc: Kunit tests for threads hash table connector/cn_proc: Selftest for threads drivers/connector/Makefile | 2 +- drivers/connector/cn_hash.c | 240 ++++++++++++++++++ drivers/connector/cn_proc.c | 59 ++++- drivers/connector/connector.c | 96 ++++++- include/linux/connector.h | 47 ++++ include/linux/sched.h | 2 +- include/uapi/linux/cn_proc.h | 4 +- lib/Kconfig.debug | 17 ++ lib/Makefile | 1 + lib/cn_hash_test.c | 167 ++++++++++++ lib/cn_hash_test.h | 12 + tools/testing/selftests/connector/Makefile | 23 +- .../testing/selftests/connector/proc_filter.c | 5 + tools/testing/selftests/connector/thread.c | 90 +++++++ .../selftests/connector/thread_filter.c | 93 +++++++ 15 files changed, 848 insertions(+), 10 deletions(-) create mode 100644 drivers/connector/cn_hash.c create mode 100644 lib/cn_hash_test.c create mode 100644 lib/cn_hash_test.h create mode 100644 tools/testing/selftests/connector/thread.c create mode 100644 tools/testing/selftests/connector/thread_filter.c -- 2.46.0

1 year, 1 month

2
6
0 0

[PATCH kdevops] defconfig: add linux-modules-kpd defconfig symlink

by Luis Chamberlain

We have now two kdevops proof of concepts with kernel-patches-daemon [0], one for Linux kernel modules testing [1] and the other with radix tree testing (xarray, maple tree) [2]. These trees just contain the required .github/workflows/* files used to trigger a github self-hosted runner to run kdevops since evaluation shows that using github hosted runners will just not work or scale for Linux kernel testing [3]. The way this works with KPD is that KPD has an app in the linux-kdevops organization which is in charge of taking patch series posted to your respective subsystem patchwork (you can have dedicated filters on a mailing list for only specific files if you don't have a dedicated mailing list), it creates a git tree branch using your configured KPD main development tree source, and pushes it out to a respective test tree under github for for you. For example, in the case of development for Linux modules it pushes out a branch with a delta onto the linux-modules-kpd tree [4] and in it, it will also merge the latest kdevops-ci-modules [1] work, which is where the github runner work gets developed. For the radix tree we currently do not yet have a patchwork instance defined but we *could*, and the way it would work is that KPD would push out a branch into the linux-radix-tree-kpd [5] tree with the github actions defined in its respective kdevops-ci-radix-tree [3] tree. What these PoC shows is that the way kdevops has designed testing selftests is that we actually only need to differ in *one* single line of code on the github actions runner to test either of these two Linux kernel subsystems: the defconfig used. To be able to *share* the *same* Linux kernel github actions runner code development between the Linux kernel module tests and the radix tree, all we need to do then is use the git tree onto which a delta was pushed onto as the source for the defconfig. So all we have to do now is just add a symlink of the respective development test tree onto its corresponding defconfig. Add the respective defconfig then for linux-modules-kpd by symlinking it to the seltests-kmod-cli defconfig. This will let us later share *one* github development action runner code for self-hosted runners for *all* Linux kernel sefltests we define in *one* development tree which KPD could leverage. Now that we have locked down the linux-kdevops github organization to only allow respective developers to be able to trigger pushes or PRs, this also allows us to add dedicated self-hosted runners per target test development repository so we can scale our testing as we need with security in mind. The only thing left to do here now, is to evaluate if we want an allow check for who's patches we want to enable automatic testing for through KPD. [0] https://github.com/facebookincubator/kernel-patches-daemon [1] https://github.com/linux-kdevops/kdevops-ci-modules [2] https://github.com/linux-kdevops/kdevops-ci-radix-tree [3] https://lore.kernel.org/kdevops/CAB=NE6VKWSkv1JZ_Z2LKq4o7+JBkKc6u8Wa1zxxBnG… [4] https://github.com/linux-kdevops/linux-modules-kpd [5] https://github.com/linux-kdevops/linux-radix-tree-kpd Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org> --- defconfigs/linux-modules-kpd | 1 + 1 file changed, 1 insertion(+) create mode 120000 defconfigs/linux-modules-kpd diff --git a/defconfigs/linux-modules-kpd b/defconfigs/linux-modules-kpd new file mode 120000 index 000000000000..e61fd7f687b0 --- /dev/null +++ b/defconfigs/linux-modules-kpd @@ -0,0 +1 @@ +seltests-kmod-cli \ No newline at end of file -- 2.43.0

1 year, 1 month

1
0
0 0

[PATCH net-next v21 11/14] mm: page_frag: add testing for the newly added prepare API

by Yunsheng Lin

Add testing for the newly added prepare API, for both aligned and non-aligned API, also probe API is also tested along with prepare API. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> --- .../selftests/mm/page_frag/page_frag_test.c | 76 +++++++++++++++++-- tools/testing/selftests/mm/run_vmtests.sh | 4 + tools/testing/selftests/mm/test_page_frag.sh | 27 +++++++ 3 files changed, 102 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index e806c1866e36..1e47e9ad66f0 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -32,6 +32,10 @@ static bool test_align; module_param(test_align, bool, 0); MODULE_PARM_DESC(test_align, "use align API for testing"); +static bool test_prepare; +module_param(test_prepare, bool, 0); +MODULE_PARM_DESC(test_prepare, "use prepare API for testing"); + static int test_alloc_len = 2048; module_param(test_alloc_len, int, 0); MODULE_PARM_DESC(test_alloc_len, "alloc len for testing"); @@ -74,6 +78,21 @@ static int page_frag_pop_thread(void *arg) return 0; } +static void frag_frag_test_commit(struct page_frag_cache *nc, + struct page_frag *prepare_pfrag, + struct page_frag *probe_pfrag, + unsigned int used_sz) +{ + if (prepare_pfrag->page != probe_pfrag->page || + prepare_pfrag->offset != probe_pfrag->offset || + prepare_pfrag->size != probe_pfrag->size) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "wrong probed info\n"); + } + + page_frag_commit(nc, prepare_pfrag, used_sz); +} + static int page_frag_push_thread(void *arg) { struct ptr_ring *ring = arg; @@ -86,15 +105,61 @@ static int page_frag_push_thread(void *arg) int ret; if (test_align) { - va = page_frag_alloc_align(&test_nc, test_alloc_len, - GFP_KERNEL, SMP_CACHE_BYTES); + if (test_prepare) { + struct page_frag prepare_frag, probe_frag; + void *probe_va; + + va = page_frag_alloc_refill_prepare_align(&test_nc, + test_alloc_len, + &prepare_frag, + GFP_KERNEL, + SMP_CACHE_BYTES); + + probe_va = __page_frag_alloc_refill_probe_align(&test_nc, + test_alloc_len, + &probe_frag, + -SMP_CACHE_BYTES); + if (va != probe_va) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "wrong va\n"); + } + + if (likely(va)) + frag_frag_test_commit(&test_nc, &prepare_frag, + &probe_frag, test_alloc_len); + } else { + va = page_frag_alloc_align(&test_nc, + test_alloc_len, + GFP_KERNEL, + SMP_CACHE_BYTES); + } if ((unsigned long)va & (SMP_CACHE_BYTES - 1)) { force_exit = true; WARN_ONCE(true, TEST_FAILED_PREFIX "unaligned va returned\n"); } } else { - va = page_frag_alloc(&test_nc, test_alloc_len, GFP_KERNEL); + if (test_prepare) { + struct page_frag prepare_frag, probe_frag; + void *probe_va; + + va = page_frag_alloc_refill_prepare(&test_nc, test_alloc_len, + &prepare_frag, GFP_KERNEL); + + probe_va = page_frag_alloc_refill_probe(&test_nc, test_alloc_len, + &probe_frag); + + if (va != probe_va) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "wrong va\n"); + } + + if (likely(va)) + frag_frag_test_commit(&test_nc, &prepare_frag, + &probe_frag, test_alloc_len); + } else { + va = page_frag_alloc(&test_nc, test_alloc_len, GFP_KERNEL); + } } if (!va) @@ -176,8 +241,9 @@ static int __init page_frag_test_init(void) } duration = (u64)ktime_us_delta(ktime_get(), start); - pr_info("%d of iterations for %s testing took: %lluus\n", nr_test, - test_align ? "aligned" : "non-aligned", duration); + pr_info("%d of iterations for %s %s API testing took: %lluus\n", nr_test, + test_align ? "aligned" : "non-aligned", + test_prepare ? "prepare" : "alloc", duration); out: ptr_ring_cleanup(&ptr_ring, NULL); diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 2c5394584af4..f6ff9080a6f2 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -464,6 +464,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned +CATEGORY="page_frag" run_test ./test_page_frag.sh aligned_prepare + +CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned_prepare + echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix echo "1..${count_total}" | tap_output diff --git a/tools/testing/selftests/mm/test_page_frag.sh b/tools/testing/selftests/mm/test_page_frag.sh index f55b105084cf..1c757fd11844 100755 --- a/tools/testing/selftests/mm/test_page_frag.sh +++ b/tools/testing/selftests/mm/test_page_frag.sh @@ -43,6 +43,8 @@ check_test_failed_prefix() { SMOKE_PARAM="test_push_cpu=$TEST_CPU_0 test_pop_cpu=$TEST_CPU_1" NONALIGNED_PARAM="$SMOKE_PARAM test_alloc_len=75 nr_test=$NR_TEST" ALIGNED_PARAM="$NONALIGNED_PARAM test_align=1" +NONALIGNED_PREPARE_PARAM="$NONALIGNED_PARAM test_prepare=1" +ALIGNED_PREPARE_PARAM="$ALIGNED_PARAM test_prepare=1" check_test_requirements() { @@ -77,6 +79,20 @@ run_aligned_check() insmod $DRIVER $ALIGNED_PARAM > /dev/null 2>&1 } +run_nonaligned_prepare_check() +{ + echo "Run performance tests to evaluate how fast nonaligned prepare API is." + + insmod $DRIVER $NONALIGNED_PREPARE_PARAM > /dev/null 2>&1 +} + +run_aligned_prepare_check() +{ + echo "Run performance tests to evaluate how fast aligned prepare API is." + + insmod $DRIVER $ALIGNED_PREPARE_PARAM > /dev/null 2>&1 +} + run_smoke_check() { echo "Run smoke test." @@ -87,6 +103,7 @@ run_smoke_check() usage() { echo -n "Usage: $0 [ aligned ] | [ nonaligned ] | | [ smoke ] | " + echo "[ aligned_prepare ] | [ nonaligned_prepare ] | " echo "manual parameters" echo echo "Valid tests and parameters:" @@ -107,6 +124,12 @@ usage() echo "# Performance testing for aligned alloc API" echo "$0 aligned" echo + echo "# Performance testing for nonaligned prepare API" + echo "$0 nonaligned_prepare" + echo + echo "# Performance testing for aligned prepare API" + echo "$0 aligned_prepare" + echo exit 0 } @@ -158,6 +181,10 @@ function run_test() run_nonaligned_check elif [[ "$1" = "aligned" ]]; then run_aligned_check + elif [[ "$1" = "nonaligned_prepare" ]]; then + run_nonaligned_prepare_check + elif [[ "$1" = "aligned_prepare" ]]; then + run_aligned_prepare_check else run_manual_check $@ fi -- 2.33.0

1 year, 1 month

1
0
0 0

[PATCH net-next v21 04/14] mm: page_frag: avoid caller accessing 'page_frag_cache' directly

by Yunsheng Lin

Use appropriate frag_page API instead of caller accessing 'page_frag_cache' directly. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com> Acked-by: Chuck Lever <chuck.lever(a)oracle.com> --- drivers/vhost/net.c | 2 +- include/linux/page_frag_cache.h | 10 ++++++++++ net/core/skbuff.c | 6 +++--- net/rxrpc/conn_object.c | 4 +--- net/rxrpc/local_object.c | 4 +--- net/sunrpc/svcsock.c | 6 ++---- tools/testing/selftests/mm/page_frag/page_frag_test.c | 2 +- 7 files changed, 19 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f16279351db5..9ad37c012189 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1325,7 +1325,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) vqs[VHOST_NET_VQ_RX]); f->private_data = n; - n->pf_cache.va = NULL; + page_frag_cache_init(&n->pf_cache); return 0; } diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 67ac8626ed9b..0a52f7a179c8 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -7,6 +7,16 @@ #include <linux/mm_types_task.h> #include <linux/types.h> +static inline void page_frag_cache_init(struct page_frag_cache *nc) +{ + nc->va = NULL; +} + +static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) +{ + return !!nc->pfmemalloc; +} + void page_frag_cache_drain(struct page_frag_cache *nc); void __page_frag_cache_drain(struct page *page, unsigned int count); void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 00afeb90c23a..6841e61a6bd0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -753,14 +753,14 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, if (in_hardirq() || irqs_disabled()) { nc = this_cpu_ptr(&netdev_alloc_cache); data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(nc); } else { local_bh_disable(); local_lock_nested_bh(&napi_alloc_cache.bh_lock); nc = this_cpu_ptr(&napi_alloc_cache.page); data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(nc); local_unlock_nested_bh(&napi_alloc_cache.bh_lock); local_bh_enable(); @@ -850,7 +850,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) len = SKB_HEAD_ALIGN(len); data = page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc = nc->page.pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(&nc->page); } local_unlock_nested_bh(&napi_alloc_cache.bh_lock); diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index 1539d315afe7..694c4df7a1a3 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -337,9 +337,7 @@ static void rxrpc_clean_up_connection(struct work_struct *work) */ rxrpc_purge_queue(&conn->rx_queue); - if (conn->tx_data_alloc.va) - __page_frag_cache_drain(virt_to_page(conn->tx_data_alloc.va), - conn->tx_data_alloc.pagecnt_bias); + page_frag_cache_drain(&conn->tx_data_alloc); call_rcu(&conn->rcu, rxrpc_rcu_free_connection); } diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index f9623ace2201..2792d2304605 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -452,9 +452,7 @@ void rxrpc_destroy_local(struct rxrpc_local *local) #endif rxrpc_purge_queue(&local->rx_queue); rxrpc_purge_client_connections(local); - if (local->tx_alloc.va) - __page_frag_cache_drain(virt_to_page(local->tx_alloc.va), - local->tx_alloc.pagecnt_bias); + page_frag_cache_drain(&local->tx_alloc); } /* diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 825ec5357691..b785425c3315 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1608,7 +1608,6 @@ static void svc_tcp_sock_detach(struct svc_xprt *xprt) static void svc_sock_free(struct svc_xprt *xprt) { struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt); - struct page_frag_cache *pfc = &svsk->sk_frag_cache; struct socket *sock = svsk->sk_sock; trace_svcsock_free(svsk, sock); @@ -1618,8 +1617,7 @@ static void svc_sock_free(struct svc_xprt *xprt) sockfd_put(sock); else sock_release(sock); - if (pfc->va) - __page_frag_cache_drain(virt_to_head_page(pfc->va), - pfc->pagecnt_bias); + + page_frag_cache_drain(&svsk->sk_frag_cache); kfree(svsk); } diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index 13c44133e009..e806c1866e36 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -126,7 +126,7 @@ static int __init page_frag_test_init(void) u64 duration; int ret; - test_nc.va = NULL; + page_frag_cache_init(&test_nc); atomic_set(&nthreads, 2); init_completion(&wait); -- 2.33.0

1 year, 1 month

1
0
0 0

[PATCH net-next v21 02/14] mm: move the page fragment allocator from page_alloc into its own file

by Yunsheng Lin

Inspired by [1], move the page fragment allocator from page_alloc into its own c file and header file, as we are about to make more change for it to replace another page_frag implementation in sock.c As this patchset is going to replace 'struct page_frag' with 'struct page_frag_cache' in sched.h, including page_frag_cache.h in sched.h has a compiler error caused by interdependence between mm_types.h and mm.h for asm-offsets.c, see [2]. So avoid the compiler error by moving 'struct page_frag_cache' to mm_types_task.h as suggested by Alexander, see [3]. 1. https://lore.kernel.org/all/20230411160902.4134381-3-dhowells@redhat.com/ 2. https://lore.kernel.org/all/15623dac-9358-4597-b3ee-3694a5956920@gmail.com/ 3. https://lore.kernel.org/all/CAKgT0UdH1yD=LSCXFJ=YM_aiA4OomD-2wXykO42bizaWMt… CC: David Howells <dhowells(a)redhat.com> CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Acked-by: Andrew Morton <akpm(a)linux-foundation.org> Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com> --- include/linux/gfp.h | 22 --- include/linux/mm_types.h | 18 --- include/linux/mm_types_task.h | 18 +++ include/linux/page_frag_cache.h | 31 ++++ include/linux/skbuff.h | 1 + mm/Makefile | 1 + mm/page_alloc.c | 136 ---------------- mm/page_frag_cache.c | 145 ++++++++++++++++++ .../selftests/mm/page_frag/page_frag_test.c | 2 +- 9 files changed, 197 insertions(+), 177 deletions(-) create mode 100644 include/linux/page_frag_cache.h create mode 100644 mm/page_frag_cache.c diff --git a/include/linux/gfp.h b/include/linux/gfp.h index a951de920e20..a0a6d25f883f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -371,28 +371,6 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas extern void __free_pages(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); -struct page_frag_cache; -void page_frag_cache_drain(struct page_frag_cache *nc); -extern void __page_frag_cache_drain(struct page *page, unsigned int count); -void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, - gfp_t gfp_mask, unsigned int align_mask); - -static inline void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align) -{ - WARN_ON_ONCE(!is_power_of_2(align)); - return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); -} - -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) -{ - return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); -} - -extern void page_frag_free(void *addr); - #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..92314ef2d978 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -521,9 +521,6 @@ static_assert(sizeof(struct ptdesc) <= sizeof(struct page)); */ #define STRUCT_PAGE_MAX_SHIFT (order_base_2(sizeof(struct page))) -#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) -#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) - /* * page_private can be used on tail pages. However, PagePrivate is only * checked by the VM on the head page. So page_private on the tail pages @@ -542,21 +539,6 @@ static inline void *folio_get_private(struct folio *folio) return folio->private; } -struct page_frag_cache { - void * va; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; - __u16 size; -#else - __u32 offset; -#endif - /* we maintain a pagecount bias, so that we dont dirty cache line - * containing page->_refcount every time we allocate a fragment. - */ - unsigned int pagecnt_bias; - bool pfmemalloc; -}; - typedef unsigned long vm_flags_t; /* diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index bff5706b76e1..0ac6daebdd5c 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -8,6 +8,7 @@ * (These are defined separately to decouple sched.h from mm_types.h as much as possible.) */ +#include <linux/align.h> #include <linux/types.h> #include <asm/page.h> @@ -43,6 +44,23 @@ struct page_frag { #endif }; +#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) +#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) +struct page_frag_cache { + void *va; +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + __u16 offset; + __u16 size; +#else + __u32 offset; +#endif + /* we maintain a pagecount bias, so that we dont dirty cache line + * containing page->_refcount every time we allocate a fragment. + */ + unsigned int pagecnt_bias; + bool pfmemalloc; +}; + /* Track pages that require TLB flushes */ struct tlbflush_unmap_batch { #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h new file mode 100644 index 000000000000..67ac8626ed9b --- /dev/null +++ b/include/linux/page_frag_cache.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_PAGE_FRAG_CACHE_H +#define _LINUX_PAGE_FRAG_CACHE_H + +#include <linux/log2.h> +#include <linux/mm_types_task.h> +#include <linux/types.h> + +void page_frag_cache_drain(struct page_frag_cache *nc); +void __page_frag_cache_drain(struct page *page, unsigned int count); +void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, + gfp_t gfp_mask, unsigned int align_mask); + +static inline void *page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align) +{ + WARN_ON_ONCE(!is_power_of_2(align)); + return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); +} + +static inline void *page_frag_alloc(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask) +{ + return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); +} + +void page_frag_free(void *addr); + +#endif diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 39f1d16f3628..560e2b49f98b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -31,6 +31,7 @@ #include <linux/in6.h> #include <linux/if_packet.h> #include <linux/llist.h> +#include <linux/page_frag_cache.h> #include <net/flow.h> #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include <linux/netfilter/nf_conntrack_common.h> diff --git a/mm/Makefile b/mm/Makefile index d5639b036166..dba52bb0da8a 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -65,6 +65,7 @@ page-alloc-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o memory-hotplug-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o obj-y += page-alloc.o +obj-y += page_frag_cache.o obj-y += init-mm.o obj-y += memblock.o obj-y += $(memory-hotplug-y) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8afab64814dc..6ca2abce857b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4836,142 +4836,6 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); -/* - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides - * within a 0 or higher order page. Multiple fragments within that page - * are individually refcounted, in the page's reference counter. - * - * The page_frag functions below provide a simple allocation framework for - * page fragments. This is used by the network stack and network device - * drivers to provide a backing region of memory for use as either an - * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. - */ -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) -{ - struct page *page = NULL; - gfp_t gfp = gfp_mask; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | - __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; -#endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; - - return page; -} - -void page_frag_cache_drain(struct page_frag_cache *nc) -{ - if (!nc->va) - return; - - __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias); - nc->va = NULL; -} -EXPORT_SYMBOL(page_frag_cache_drain); - -void __page_frag_cache_drain(struct page *page, unsigned int count) -{ - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); - - if (page_ref_sub_and_test(page, count)) - free_unref_page(page, compound_order(page)); -} -EXPORT_SYMBOL(__page_frag_cache_drain); - -void *__page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) -{ - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; - - if (unlikely(!nc->va)) { -refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) - return NULL; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; - } - - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) - goto refill; - - if (unlikely(nc->pfmemalloc)) { - free_unref_page(page, compound_order(page)); - goto refill; - } - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - - /* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } - } - - nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; - - return nc->va + offset; -} -EXPORT_SYMBOL(__page_frag_alloc_align); - -/* - * Frees a page fragment allocated out of either a compound or order 0 page. - */ -void page_frag_free(void *addr) -{ - struct page *page = virt_to_head_page(addr); - - if (unlikely(put_page_testzero(page))) - free_unref_page(page, compound_order(page)); -} -EXPORT_SYMBOL(page_frag_free); - static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c new file mode 100644 index 000000000000..609a485cd02a --- /dev/null +++ b/mm/page_frag_cache.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Page fragment allocator + * + * Page Fragment: + * An arbitrary-length arbitrary-offset area of memory which resides within a + * 0 or higher order page. Multiple fragments within that page are + * individually refcounted, in the page's reference counter. + * + * The page_frag functions provide a simple allocation framework for page + * fragments. This is used by the network stack and network device drivers to + * provide a backing region of memory for use as either an sk_buff->head, or to + * be used in the "frags" portion of skb_shared_info. + */ + +#include <linux/export.h> +#include <linux/gfp_types.h> +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/page_frag_cache.h> +#include "internal.h" + +static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) +{ + struct page *page = NULL; + gfp_t gfp = gfp_mask; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | + __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; + page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, + PAGE_FRAG_CACHE_MAX_ORDER); + nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; +#endif + if (unlikely(!page)) + page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + + nc->va = page ? page_address(page) : NULL; + + return page; +} + +void page_frag_cache_drain(struct page_frag_cache *nc) +{ + if (!nc->va) + return; + + __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias); + nc->va = NULL; +} +EXPORT_SYMBOL(page_frag_cache_drain); + +void __page_frag_cache_drain(struct page *page, unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count)) + free_unref_page(page, compound_order(page)); +} +EXPORT_SYMBOL(__page_frag_cache_drain); + +void *__page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + unsigned int size = PAGE_SIZE; + struct page *page; + int offset; + + if (unlikely(!nc->va)) { +refill: + page = __page_frag_cache_refill(nc, gfp_mask); + if (!page) + return NULL; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* Even if we own the page, we do not use atomic_set(). + * This would break get_page_unless_zero() users. + */ + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + + /* reset page count bias and offset to start of new frag */ + nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + nc->offset = size; + } + + offset = nc->offset - fragsz; + if (unlikely(offset < 0)) { + page = virt_to_page(nc->va); + + if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) + goto refill; + + if (unlikely(nc->pfmemalloc)) { + free_unref_page(page, compound_order(page)); + goto refill; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* OK, page count is 0, we can safely set it */ + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + + /* reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = size - fragsz; + if (unlikely(offset < 0)) { + /* + * The caller is trying to allocate a fragment + * with fragsz > PAGE_SIZE but the cache isn't big + * enough to satisfy the request, this may + * happen in low memory conditions. + * We don't release the cache page because + * it could make memory pressure worse + * so we simply return NULL here. + */ + return NULL; + } + } + + nc->pagecnt_bias--; + offset &= align_mask; + nc->offset = offset; + + return nc->va + offset; +} +EXPORT_SYMBOL(__page_frag_alloc_align); + +/* + * Frees a page fragment allocated out of either a compound or order 0 page. + */ +void page_frag_free(void *addr) +{ + struct page *page = virt_to_head_page(addr); + + if (unlikely(put_page_testzero(page))) + free_unref_page(page, compound_order(page)); +} +EXPORT_SYMBOL(page_frag_free); diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index 912d97b99107..13c44133e009 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -6,12 +6,12 @@ * Copyright (C) 2024 Yunsheng Lin <linyunsheng(a)huawei.com> */ -#include <linux/mm.h> #include <linux/module.h> #include <linux/cpumask.h> #include <linux/completion.h> #include <linux/ptr_ring.h> #include <linux/kthread.h> +#include <linux/page_frag_cache.h> #define TEST_FAILED_PREFIX "page_frag_test failed: " -- 2.33.0

1 year, 1 month

1
0
0 0

[PATCH net-next v21 01/14] mm: page_frag: add a test module for page_frag

by Yunsheng Lin

The testing is done by ensuring that the fragment allocated from a frag_frag_cache instance is pushed into a ptr_ring instance in a kthread binded to a specified cpu, and a kthread binded to a specified cpu will pop the fragment from the ptr_ring and free the fragment. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com> --- tools/testing/selftests/mm/Makefile | 3 + tools/testing/selftests/mm/page_frag/Makefile | 18 ++ .../selftests/mm/page_frag/page_frag_test.c | 198 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 8 + tools/testing/selftests/mm/test_page_frag.sh | 175 ++++++++++++++++ 5 files changed, 402 insertions(+) create mode 100644 tools/testing/selftests/mm/page_frag/Makefile create mode 100644 tools/testing/selftests/mm/page_frag/page_frag_test.c create mode 100755 tools/testing/selftests/mm/test_page_frag.sh diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 02e1204971b0..acec529baaca 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -36,6 +36,8 @@ MAKEFLAGS += --no-builtin-rules CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES) LDLIBS = -lrt -lpthread -lm +TEST_GEN_MODS_DIR := page_frag + TEST_GEN_FILES = cow TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_longterm @@ -126,6 +128,7 @@ TEST_FILES += test_hmm.sh TEST_FILES += va_high_addr_switch.sh TEST_FILES += charge_reserved_hugetlb.sh TEST_FILES += hugetlb_reparenting_test.sh +TEST_FILES += test_page_frag.sh # required by charge_reserved_hugetlb.sh TEST_FILES += write_hugetlb_memory.sh diff --git a/tools/testing/selftests/mm/page_frag/Makefile b/tools/testing/selftests/mm/page_frag/Makefile new file mode 100644 index 000000000000..58dda74d50a3 --- /dev/null +++ b/tools/testing/selftests/mm/page_frag/Makefile @@ -0,0 +1,18 @@ +PAGE_FRAG_TEST_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))) +KDIR ?= $(abspath $(PAGE_FRAG_TEST_DIR)/../../../../..) + +ifeq ($(V),1) +Q = +else +Q = @ +endif + +MODULES = page_frag_test.ko + +obj-m += page_frag_test.o + +all: + +$(Q)make -C $(KDIR) M=$(PAGE_FRAG_TEST_DIR) modules + +clean: + +$(Q)make -C $(KDIR) M=$(PAGE_FRAG_TEST_DIR) clean diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c new file mode 100644 index 000000000000..912d97b99107 --- /dev/null +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -0,0 +1,198 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Test module for page_frag cache + * + * Copyright (C) 2024 Yunsheng Lin <linyunsheng(a)huawei.com> + */ + +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/cpumask.h> +#include <linux/completion.h> +#include <linux/ptr_ring.h> +#include <linux/kthread.h> + +#define TEST_FAILED_PREFIX "page_frag_test failed: " + +static struct ptr_ring ptr_ring; +static int nr_objs = 512; +static atomic_t nthreads; +static struct completion wait; +static struct page_frag_cache test_nc; +static int test_popped; +static int test_pushed; +static bool force_exit; + +static int nr_test = 2000000; +module_param(nr_test, int, 0); +MODULE_PARM_DESC(nr_test, "number of iterations to test"); + +static bool test_align; +module_param(test_align, bool, 0); +MODULE_PARM_DESC(test_align, "use align API for testing"); + +static int test_alloc_len = 2048; +module_param(test_alloc_len, int, 0); +MODULE_PARM_DESC(test_alloc_len, "alloc len for testing"); + +static int test_push_cpu; +module_param(test_push_cpu, int, 0); +MODULE_PARM_DESC(test_push_cpu, "test cpu for pushing fragment"); + +static int test_pop_cpu; +module_param(test_pop_cpu, int, 0); +MODULE_PARM_DESC(test_pop_cpu, "test cpu for popping fragment"); + +static int page_frag_pop_thread(void *arg) +{ + struct ptr_ring *ring = arg; + + pr_info("page_frag pop test thread begins on cpu %d\n", + smp_processor_id()); + + while (test_popped < nr_test) { + void *obj = __ptr_ring_consume(ring); + + if (obj) { + test_popped++; + page_frag_free(obj); + } else { + if (force_exit) + break; + + cond_resched(); + } + } + + if (atomic_dec_and_test(&nthreads)) + complete(&wait); + + pr_info("page_frag pop test thread exits on cpu %d\n", + smp_processor_id()); + + return 0; +} + +static int page_frag_push_thread(void *arg) +{ + struct ptr_ring *ring = arg; + + pr_info("page_frag push test thread begins on cpu %d\n", + smp_processor_id()); + + while (test_pushed < nr_test && !force_exit) { + void *va; + int ret; + + if (test_align) { + va = page_frag_alloc_align(&test_nc, test_alloc_len, + GFP_KERNEL, SMP_CACHE_BYTES); + + if ((unsigned long)va & (SMP_CACHE_BYTES - 1)) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "unaligned va returned\n"); + } + } else { + va = page_frag_alloc(&test_nc, test_alloc_len, GFP_KERNEL); + } + + if (!va) + continue; + + ret = __ptr_ring_produce(ring, va); + if (ret) { + page_frag_free(va); + cond_resched(); + } else { + test_pushed++; + } + } + + pr_info("page_frag push test thread exits on cpu %d\n", + smp_processor_id()); + + if (atomic_dec_and_test(&nthreads)) + complete(&wait); + + return 0; +} + +static int __init page_frag_test_init(void) +{ + struct task_struct *tsk_push, *tsk_pop; + int last_pushed = 0, last_popped = 0; + ktime_t start; + u64 duration; + int ret; + + test_nc.va = NULL; + atomic_set(&nthreads, 2); + init_completion(&wait); + + if (test_alloc_len > PAGE_SIZE || test_alloc_len <= 0 || + !cpu_active(test_push_cpu) || !cpu_active(test_pop_cpu)) + return -EINVAL; + + ret = ptr_ring_init(&ptr_ring, nr_objs, GFP_KERNEL); + if (ret) + return ret; + + tsk_push = kthread_create_on_cpu(page_frag_push_thread, &ptr_ring, + test_push_cpu, "page_frag_push"); + if (IS_ERR(tsk_push)) + return PTR_ERR(tsk_push); + + tsk_pop = kthread_create_on_cpu(page_frag_pop_thread, &ptr_ring, + test_pop_cpu, "page_frag_pop"); + if (IS_ERR(tsk_pop)) { + kthread_stop(tsk_push); + return PTR_ERR(tsk_pop); + } + + start = ktime_get(); + wake_up_process(tsk_push); + wake_up_process(tsk_pop); + + pr_info("waiting for test to complete\n"); + + while (!wait_for_completion_timeout(&wait, msecs_to_jiffies(10000))) { + /* exit if there is no progress for push or pop size */ + if (last_pushed == test_pushed || last_popped == test_popped) { + WARN_ONCE(true, TEST_FAILED_PREFIX "no progress\n"); + force_exit = true; + continue; + } + + last_pushed = test_pushed; + last_popped = test_popped; + pr_info("page_frag_test progress: pushed = %d, popped = %d\n", + test_pushed, test_popped); + } + + if (force_exit) { + pr_err(TEST_FAILED_PREFIX "exit with error\n"); + goto out; + } + + duration = (u64)ktime_us_delta(ktime_get(), start); + pr_info("%d of iterations for %s testing took: %lluus\n", nr_test, + test_align ? "aligned" : "non-aligned", duration); + +out: + ptr_ring_cleanup(&ptr_ring, NULL); + page_frag_cache_drain(&test_nc); + + return -EAGAIN; +} + +static void __exit page_frag_test_exit(void) +{ +} + +module_init(page_frag_test_init); +module_exit(page_frag_test_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Yunsheng Lin <linyunsheng(a)huawei.com>"); +MODULE_DESCRIPTION("Test module for page_frag"); diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index c5797ad1d37b..2c5394584af4 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -75,6 +75,8 @@ separated by spaces: read-only VMAs - mdwe test prctl(PR_SET_MDWE, ...) +- page_frag + test handling of page fragment allocation and freeing example: ./run_vmtests.sh -t "hmm mmap ksm" EOF @@ -456,6 +458,12 @@ CATEGORY="mkdirty" run_test ./mkdirty CATEGORY="mdwe" run_test ./mdwe_test +CATEGORY="page_frag" run_test ./test_page_frag.sh smoke + +CATEGORY="page_frag" run_test ./test_page_frag.sh aligned + +CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned + echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix echo "1..${count_total}" | tap_output diff --git a/tools/testing/selftests/mm/test_page_frag.sh b/tools/testing/selftests/mm/test_page_frag.sh new file mode 100755 index 000000000000..f55b105084cf --- /dev/null +++ b/tools/testing/selftests/mm/test_page_frag.sh @@ -0,0 +1,175 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2024 Yunsheng Lin <linyunsheng(a)huawei.com> +# Copyright (C) 2018 Uladzislau Rezki (Sony) <urezki(a)gmail.com> +# +# This is a test script for the kernel test driver to test the +# correctness and performance of page_frag's implementation. +# Therefore it is just a kernel module loader. You can specify +# and pass different parameters in order to: +# a) analyse performance of page fragment allocations; +# b) stressing and stability check of page_frag subsystem. + +DRIVER="./page_frag/page_frag_test.ko" +CPU_LIST=$(grep -m 2 processor /proc/cpuinfo | cut -d ' ' -f 2) +TEST_CPU_0=$(echo $CPU_LIST | awk '{print $1}') + +if [ $(echo $CPU_LIST | wc -w) -gt 1 ]; then + TEST_CPU_1=$(echo $CPU_LIST | awk '{print $2}') + NR_TEST=100000000 +else + TEST_CPU_1=$TEST_CPU_0 + NR_TEST=1000000 +fi + +# 1 if fails +exitcode=1 + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +check_test_failed_prefix() { + if dmesg | grep -q 'page_frag_test failed:';then + echo "page_frag_test failed, please check dmesg" + exit $exitcode + fi +} + +# +# Static templates for testing of page_frag APIs. +# Also it is possible to pass any supported parameters manually. +# +SMOKE_PARAM="test_push_cpu=$TEST_CPU_0 test_pop_cpu=$TEST_CPU_1" +NONALIGNED_PARAM="$SMOKE_PARAM test_alloc_len=75 nr_test=$NR_TEST" +ALIGNED_PARAM="$NONALIGNED_PARAM test_align=1" + +check_test_requirements() +{ + uid=$(id -u) + if [ $uid -ne 0 ]; then + echo "$0: Must be run as root" + exit $ksft_skip + fi + + if ! which insmod > /dev/null 2>&1; then + echo "$0: You need insmod installed" + exit $ksft_skip + fi + + if [ ! -f $DRIVER ]; then + echo "$0: You need to compile page_frag_test module" + exit $ksft_skip + fi +} + +run_nonaligned_check() +{ + echo "Run performance tests to evaluate how fast nonaligned alloc API is." + + insmod $DRIVER $NONALIGNED_PARAM > /dev/null 2>&1 +} + +run_aligned_check() +{ + echo "Run performance tests to evaluate how fast aligned alloc API is." + + insmod $DRIVER $ALIGNED_PARAM > /dev/null 2>&1 +} + +run_smoke_check() +{ + echo "Run smoke test." + + insmod $DRIVER $SMOKE_PARAM > /dev/null 2>&1 +} + +usage() +{ + echo -n "Usage: $0 [ aligned ] | [ nonaligned ] | | [ smoke ] | " + echo "manual parameters" + echo + echo "Valid tests and parameters:" + echo + modinfo $DRIVER + echo + echo "Example usage:" + echo + echo "# Shows help message" + echo "$0" + echo + echo "# Smoke testing" + echo "$0 smoke" + echo + echo "# Performance testing for nonaligned alloc API" + echo "$0 nonaligned" + echo + echo "# Performance testing for aligned alloc API" + echo "$0 aligned" + echo + exit 0 +} + +function validate_passed_args() +{ + VALID_ARGS=`modinfo $DRIVER | awk '/parm:/ {print $2}' | sed 's/:.*//'` + + # + # Something has been passed, check it. + # + for passed_arg in $@; do + key=${passed_arg//=*/} + valid=0 + + for valid_arg in $VALID_ARGS; do + if [[ $key = $valid_arg ]]; then + valid=1 + break + fi + done + + if [[ $valid -ne 1 ]]; then + echo "Error: key is not correct: ${key}" + exit $exitcode + fi + done +} + +function run_manual_check() +{ + # + # Validate passed parameters. If there is wrong one, + # the script exists and does not execute further. + # + validate_passed_args $@ + + echo "Run the test with following parameters: $@" + insmod $DRIVER $@ > /dev/null 2>&1 +} + +function run_test() +{ + if [ $# -eq 0 ]; then + usage + else + if [[ "$1" = "smoke" ]]; then + run_smoke_check + elif [[ "$1" = "nonaligned" ]]; then + run_nonaligned_check + elif [[ "$1" = "aligned" ]]; then + run_aligned_check + else + run_manual_check $@ + fi + fi + + check_test_failed_prefix + + echo "Done." + echo "Check the kernel ring buffer to see the summary." +} + +check_test_requirements +run_test $@ + +exit 0 -- 2.33.0

1 year, 1 month

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror October 2024