April 2023 - Linux-kselftest-mirror

[PATCH] selftests: allow runners to override the timeout

by Luis Chamberlain

The default timeout for selftests tests is 45 seconds. Although we already have 13 settings for tests of about 96 sefltests which use a timeout greater than this, we want to try to avoid encouraging more tests to forcing a higher test timeout as selftests strives to run all tests quickly. Selftests also uses the timeout as a non-fatal error. Only tests runners which have control over a system would know if to treat a timeout as fatal or not. To help with all this: o Enhance documentation to avoid future increases of insane timeouts o Add the option to allow overriding the default timeout with test runners with a command line option Suggested-by: Shuah Khan <skhan(a)linuxfoundation.org> Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org> --- Documentation/dev-tools/kselftest.rst | 22 +++++++++++++++++++++ tools/testing/selftests/kselftest/runner.sh | 11 ++++++++++- tools/testing/selftests/run_kselftest.sh | 5 +++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 12b575b76b20..dd214af7b7ff 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -168,6 +168,28 @@ the `-t` option for specific single tests. Either can be used multiple times:: For other features see the script usage output, seen with the `-h` option. +Timeout for selftests +===================== + +Selftests are designed to be quick and so a default timeout is used of 45 +seconds for each test. Tests can override the default timeout by adding +a settings file in their directory and set a timeout variable there to the +configured a desired upper timeout for the test. Only a few tests override +the timeout with a value higher than 45 seconds, selftests strives to keep +it that way. Timeouts in selftests are not considered fatal because the +system under which a test runs may change and this can also modify the +expected time it takes to run a test. If you have control over the systems +which will run the tests you can configure a test runner on those systems to +use a greater or lower timeout on the command line as with the `-o` or +the `--override-timeout` argument. For example to use 165 seconds instead +one would use: + + $ ./run_kselftest.sh --override-timeout 165 + +You can look at the TAP output to see if you ran into the timeout. Test +runners which know a test must run under a specific time can then optionally +treat these timeouts then as fatal. + Packaging selftests =================== diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh index 294619ade49f..1c952d1401d4 100644 --- a/tools/testing/selftests/kselftest/runner.sh +++ b/tools/testing/selftests/kselftest/runner.sh @@ -8,7 +8,8 @@ export logfile=/dev/stdout export per_test_logging= # Defaults for "settings" file fields: -# "timeout" how many seconds to let each test run before failing. +# "timeout" how many seconds to let each test run before running +# over our soft timeout limit. export kselftest_default_timeout=45 # There isn't a shell-agnostic way to find the path of a sourced file, @@ -90,6 +91,14 @@ run_one() done < "$settings" fi + # Command line timeout overrides the settings file + if [ -n "$kselftest_override_timeout" ]; then + kselftest_timeout="$kselftest_override_timeout" + echo "# overriding timeout to $kselftest_timeout" >> "$logfile" + else + echo "# timeout set to $kselftest_timeout" >> "$logfile" + fi + TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST" echo "# $TEST_HDR_MSG" if [ ! -e "$TEST" ]; then diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index 97165a83df63..9a981b36bd7f 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -26,6 +26,7 @@ Usage: $0 [OPTIONS] -l | --list List the available collection:test entries -d | --dry-run Don't actually run any tests -h | --help Show this usage info + -o | --override-timeout Number of seconds after which we timeout EOF exit $1 } @@ -33,6 +34,7 @@ EOF COLLECTIONS="" TESTS="" dryrun="" +kselftest_override_timeout="" while true; do case "$1" in -s | --summary) @@ -51,6 +53,9 @@ while true; do -d | --dry-run) dryrun="echo" shift ;; + -o | --override-timeout) + kselftest_override_timeout="$2" + shift 2 ;; -h | --help) usage 0 ;; "") -- 2.39.2

2 years, 6 months

3
4
0 0

[PATCH 0/2] drivers: base: Add tests showing devm handling inconsistencies

by Maxime Ripard

Hi, This follows the discussion here: https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@hou… This shows a couple of inconsistencies with regard to how device-managed resources are cleaned up. Basically, devm resources will only be cleaned up if the device is attached to a bus and bound to a driver. Failing any of these cases, a call to device_unregister will not end up in the devm resources being released. We had to work around it in DRM to provide helpers to create a device for kunit tests, but the current discussion around creating similar, generic, helpers for kunit resumed interest in fixing this. This can be tested using the command: ./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/ Let me know what you think, Maxime Signed-off-by: Maxime Ripard <maxime(a)cerno.tech> --- Maxime Ripard (2): drivers: base: Add basic devm tests for root devices drivers: base: Add basic devm tests for platform devices drivers/base/test/.kunitconfig | 2 + drivers/base/test/Kconfig | 4 + drivers/base/test/Makefile | 3 + drivers/base/test/platform-device-test.c | 278 +++++++++++++++++++++++++++++++ drivers/base/test/root-device-test.c | 120 +++++++++++++ 5 files changed, 407 insertions(+) --- base-commit: a6faf7ea9fcb7267d06116d4188947f26e00e57e change-id: 20230329-kunit-devm-inconsistencies-test-5e5a7d01e60d Best regards, -- Maxime Ripard <maxime(a)cerno.tech>

2 years, 6 months

3
8
0 0

[PATCH AUTOSEL 6.2 08/30] selftests/bpf: check that modifier resolves after pointer

by Sasha Levin

From: Lorenz Bauer <lorenz.bauer(a)isovalent.com> [ Upstream commit dfdd608c3b365f0fd49d7e13911ebcde06b9865b ] Add a regression test that ensures that a VAR pointing at a modifier which follows a PTR (or STRUCT or ARRAY) is resolved correctly by the datasec validator. Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com> Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/prog_tests/btf.c | 28 ++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c index de1b5b9eb93a8..d8d1292e73b53 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf.c +++ b/tools/testing/selftests/bpf/prog_tests/btf.c @@ -879,6 +879,34 @@ static struct btf_raw_test raw_tests[] = { .btf_load_err = true, .err_str = "Invalid elem", }, +{ + .descr = "var after datasec, ptr followed by modifier", + .raw_types = { + /* .bss section */ /* [1] */ + BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 2), + sizeof(void*)+4), + BTF_VAR_SECINFO_ENC(4, 0, sizeof(void*)), + BTF_VAR_SECINFO_ENC(6, sizeof(void*), 4), + /* int */ /* [2] */ + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), + /* int* */ /* [3] */ + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2), + BTF_VAR_ENC(NAME_TBD, 3, 0), /* [4] */ + /* const int */ /* [5] */ + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 2), + BTF_VAR_ENC(NAME_TBD, 5, 0), /* [6] */ + BTF_END_RAW, + }, + .str_sec = "\0a\0b\0c\0", + .str_sec_size = sizeof("\0a\0b\0c\0"), + .map_type = BPF_MAP_TYPE_ARRAY, + .map_name = ".bss", + .key_size = sizeof(int), + .value_size = sizeof(void*)+4, + .key_type_id = 0, + .value_type_id = 1, + .max_entries = 1, +}, /* Test member exceeds the size of struct. * * struct A { -- 2.39.2

2 years, 7 months

3
10
0 0

[PATCH v2 0/7] KVM: selftests: Add tests for pmu event filter

by Jinrong Liang

From: Jinrong Liang <cloudliang(a)tencent.com> From: Jinrong Liang <cloudliang(a)tencent.com> Hi, This patch set adds some tests to ensure consistent PMU performance event filter behavior. Specifically, the patches aim to improve KVM's PMU event filter by strengthening the test coverage, adding documentation, and making other small changes. The first patch replaces int with uint32_t for nevents to ensure consistency and readability in the code. The second patch adds fixed_counter_bitmap to create_pmu_event_filter() to support the use of the same creator to control the use of guest fixed counters. The third patch adds test cases for unsupported input values in PMU filter, including unsupported "action" values, unsupported "flags" values, and unsupported "nevents" values. Also, it tests setting non-existent fixed counters in the fixed bitmap doesn't fail. The fourth patch updates the documentation for KVM_SET_PMU_EVENT_FILTER ioctl to include a detailed description of how fixed performance events are handled in the pmu filter. The fifth patch adds tests to cover that pmu_event_filter works as expected when applied to fixed performance counters, even if there is no fixed counter exists. The sixth patch adds a test to ensure that setting both generic and fixed performance event filters does not affect the consistency of the fixed performance filter behavior in KVM. The seventh patch adds a test to verify the behavior of the pmu event filter when an incomplete kvm_pmu_event_filter structure is used. These changes help to ensure that KVM's PMU event filter functions as expected in all supported use cases. These patches have been tested and verified to function properly. Thanks for your review and feedback. Sincerely, Jinrong Liang Previous: https://lore.kernel.org/kvm/20230414110056.19665-1-cloudliang@tencent.com v2: - Wrap the code from the documentation in a block of code; (Bagas Sanjaya) Jinrong Liang (7): KVM: selftests: Replace int with uint32_t for nevents KVM: selftests: Apply create_pmu_event_filter() to fixed ctrs KVM: selftests: Test unavailable event filters are rejected KVM: x86/pmu: Add documentation for fixed ctr on PMU filter KVM: selftests: Check if pmu_event_filter meets expectations on fixed ctrs KVM: selftests: Check gp event filters without affecting fixed event filters KVM: selftests: Test pmu event filter with incompatible kvm_pmu_event_filter Documentation/virt/kvm/api.rst | 21 ++ .../kvm/x86_64/pmu_event_filter_test.c | 239 ++++++++++++++++-- 2 files changed, 243 insertions(+), 17 deletions(-) base-commit: a25497a280bbd7bbcc08c87ddb2b3909affc8402 -- 2.31.1

2 years, 7 months

2
18
0 0

[PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

by Chao Peng

This is the v7 of this series which tries to implement the fd-based KVM guest private memory. The patches are based on latest kvm/queue branch commit: b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU split_desc_cache only by default capacity Introduction ------------ In general this patch series introduce fd-based memslot which provides guest memory through memory file descriptor fd[offset,size] instead of hva/size. The fd can be created from a supported memory filesystem like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM and the the memory backing store exchange callbacks when such memslot gets created. At runtime KVM will call into callbacks provided by the backing store to get the pfn with the fd+offset. Memory backing store will also call into KVM callbacks when userspace punch hole on the fd to notify KVM to unmap secondary MMU page table entries. Comparing to existing hva-based memslot, this new type of memslot allows guest memory unmapped from host userspace like QEMU and even the kernel itself, therefore reduce attack surface and prevent bugs. Based on this fd-based memslot, we can build guest private memory that is going to be used in confidential computing environments such as Intel TDX and AMD SEV. When supported, the memory backing store can provide more enforcement on the fd and KVM can use a single memslot to hold both the private and shared part of the guest memory. mm extension --------------------- Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file created with these flags cannot read(), write() or mmap() etc via normal MMU operations. The file content can only be used with the newly introduced memfile_notifier extension. The memfile_notifier extension provides two sets of callbacks for KVM to interact with the memory backing store: - memfile_notifier_ops: callbacks for memory backing store to notify KVM when memory gets invalidated. - backing store callbacks: callbacks for KVM to call into memory backing store to request memory pages for guest private memory. The memfile_notifier extension also provides APIs for memory backing store to register/unregister itself and to trigger the notifier when the bookmarked memory gets invalidated. The patchset also introduces a new memfd seal F_SEAL_AUTO_ALLOCATE to prevent double allocation caused by unintentional guest when we only have a single side of the shared/private memfds effective. memslot extension ----------------- Add the private fd and the fd offset to existing 'shared' memslot so that both private/shared guest memory can live in one single memslot. A page in the memslot is either private or shared. Whether a guest page is private or shared is maintained through reusing existing SEV ioctls KVM_MEMORY_ENCRYPT_{UN,}REG_REGION. Test ---- To test the new functionalities of this patch TDX patchset is needed. Since TDX patchset has not been merged so I did two kinds of test: - Regresion test on kvm/queue (this patchset) Most new code are not covered. Code also in below repo: https://github.com/chao-p/linux/tree/privmem-v7 - New Funational test on latest TDX code The patch is rebased to latest TDX code and tested the new funcationalities. See below repos: Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx QEMU: https://github.com/chao-p/qemu/tree/privmem-v7 An example QEMU command line for TDX test: -object tdx-guest,id=tdx,debug=off,sept-ve-disable=off \ -machine confidential-guest-support=tdx \ -object memory-backend-memfd-private,id=ram1,size=${mem} \ -machine memory-backend=ram1 Changelog ---------- v7: - Move the private/shared info from backing store to KVM. - Introduce F_SEAL_AUTO_ALLOCATE to avoid double allocation. - Rework on the sync mechanism between zap/page fault paths. - Addressed other comments in v6. v6: - Re-organzied patch for both mm/KVM parts. - Added flags for memfile_notifier so its consumers can state their features and memory backing store can check against these flags. - Put a backing store reference in the memfile_notifier and move pfn_ops into backing store. - Only support boot time backing store register. - Overall KVM part improvement suggested by Sean and some others. v5: - Removed userspace visible F_SEAL_INACCESSIBLE, instead using an in-kernel flag (SHM_F_INACCESSIBLE for shmem). Private fd can only be created by MFD_INACCESSIBLE. - Introduced new APIs for backing store to register itself to memfile_notifier instead of direct function call. - Added the accounting and restriction for MFD_INACCESSIBLE memory. - Added KVM API doc for new memslot extensions and man page for the new MFD_INACCESSIBLE flag. - Removed the overlap check for mapping the same file+offset into multiple gfns due to perf consideration, warned in document. - Addressed other comments in v4. v4: - Decoupled the callbacks between KVM/mm from memfd and use new name 'memfile_notifier'. - Supported register multiple memslots to the same backing store. - Added per-memslot pfn_ops instead of per-system. - Reworked the invalidation part. - Improved new KVM uAPIs (private memslot extension and memory error) per Sean's suggestions. - Addressed many other minor fixes for comments from v3. v3: - Added locking protection when calling invalidate_page_range/fallocate callbacks. - Changed memslot structure to keep use useraddr for shared memory. - Re-organized F_SEAL_INACCESSIBLE and MEMFD_OPS. - Added MFD_INACCESSIBLE flag to force F_SEAL_INACCESSIBLE. - Commit message improvement. - Many small fixes for comments from the last version. Links to previous discussions ----------------------------- [1] Original design proposal: https://lkml.kernel.org/kvm/20210824005248.200037-1-seanjc@google.com/ [2] Updated proposal and RFC patch v1: https://lkml.kernel.org/linux-fsdevel/20211111141352.26311-1-chao.p.peng@li… [3] Patch v5: https://lkml.org/lkml/2022/5/19/861 Chao Peng (12): mm: Add F_SEAL_AUTO_ALLOCATE seal to memfd selftests/memfd: Add tests for F_SEAL_AUTO_ALLOCATE mm: Introduce memfile_notifier mm/memfd: Introduce MFD_INACCESSIBLE flag KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS KVM: Use gfn instead of hva for mmu_notifier_retry KVM: Rename mmu_notifier_* KVM: Extend the memslot to support fd-based private memory KVM: Add KVM_EXIT_MEMORY_FAULT exit KVM: Register/unregister the guest private memory regions KVM: Handle page fault for private memory KVM: Enable and expose KVM_MEM_PRIVATE Kirill A. Shutemov (1): mm/shmem: Support memfile_notifier Documentation/virt/kvm/api.rst | 77 +++++- arch/arm64/kvm/mmu.c | 8 +- arch/mips/include/asm/kvm_host.h | 2 +- arch/mips/kvm/mmu.c | 10 +- arch/powerpc/include/asm/kvm_book3s_64.h | 2 +- arch/powerpc/kvm/book3s_64_mmu_host.c | 4 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +- arch/powerpc/kvm/book3s_hv_nested.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 +- arch/powerpc/kvm/e500_mmu_host.c | 4 +- arch/riscv/kvm/mmu.c | 4 +- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/Kconfig | 3 + arch/x86/kvm/mmu.h | 2 - arch/x86/kvm/mmu/mmu.c | 74 +++++- arch/x86/kvm/mmu/mmu_internal.h | 18 ++ arch/x86/kvm/mmu/mmutrace.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 4 +- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 105 +++++--- include/linux/memfile_notifier.h | 91 +++++++ include/linux/shmem_fs.h | 2 + include/uapi/linux/fcntl.h | 1 + include/uapi/linux/kvm.h | 37 +++ include/uapi/linux/memfd.h | 1 + mm/Kconfig | 4 + mm/Makefile | 1 + mm/memfd.c | 18 +- mm/memfile_notifier.c | 123 ++++++++++ mm/shmem.c | 125 +++++++++- tools/testing/selftests/memfd/memfd_test.c | 166 +++++++++++++ virt/kvm/Kconfig | 3 + virt/kvm/kvm_main.c | 272 ++++++++++++++++++--- virt/kvm/pfncache.c | 14 +- 35 files changed, 1074 insertions(+), 127 deletions(-) create mode 100644 include/linux/memfile_notifier.h create mode 100644 mm/memfile_notifier.c -- 2.25.1

2 years, 7 months

23
171
0 0

[PATCH RESEND v15 0/5] Implement IOCTL to get and optionally clear info about PTEs

by Muhammad Usama Anjum

*Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)* From the definition of GetWriteWatch(), we feel like kernel's soft-dirty feature can be used under the hood with some additions like: * reset soft-dirty flag for only a specific region of memory instead of clearing the flag for the entire process * get and clear soft-dirty flag for a specific region atomically So we decided to use ioctl on pagemap file to read or/and reset soft-dirty flag. But using soft-dirty flag, sometimes we get extra pages which weren't even written. They had become soft-dirty because of VMA merging and VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were able to by-pass this short coming by ignoring VM_SOFTDIRTY until David reported that mprotect etc messes up the soft-dirty flag while ignoring VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We discussed if we can revert these patches. But we could not reach to any conclusion. So at this point, I made couple of tries to solve this whole VM_SOFTDIRTY issue by correcting the soft-dirty implementation: * [7] Correct the bug fixed wrongly back in 2014. It had potential to cause regression. We left it behind. * [8] Keep a list of soft-dirty part of a VMA across splits and merges. I got the reply don't increase the size of the VMA by 8 bytes. At this point, we left soft-dirty considering it is too much delicate and userfaultfd [9] seemed like the only way forward. From there onward, we have been basing soft-dirty emulation on userfaultfd wp feature where kernel resolves the faults itself when WP_ASYNC feature is used. It was straight forward to add WP_ASYNC feature in userfautlfd. Now we get only those pages dirty or written-to which are really written in reality. (PS There is another WP_UNPOPULATED userfautfd feature is required which is needed to avoid pre-faulting memory before write-protecting [9].) All the different masks were added on the request of CRIU devs to create interface more generic and better. [1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-… [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com [3] https://github.com/google/sanitizers [4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit [5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com [6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/ [7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com [10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com * Original Cover letter from v8* Hello, Note: Soft-dirty pages and pages which have been written-to are synonyms. As kernel already has soft-dirty feature inside which we have given up to use, we are using written-to terminology while using UFFD async WP under the hood. This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear the info about page table entries. The following operations are supported in this ioctl: - Get the information if the pages have been written-to (PAGE_IS_WRITTEN), file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped (PAGE_IS_SWAPPED). - Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which pages have been written-to. - Find pages which have been written-to and write protect the pages (atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE) It is possible to find and clear soft-dirty pages entirely in userspace. But it isn't efficient: - The mprotect and SIGSEGV handler for bookkeeping - The userfaultfd wp (synchronous) with the handler for bookkeeping Some benchmarks can be seen here[1]. This series adds features that weren't present earlier: - There is no atomic get soft-dirty/Written-to status and clear present in the kernel. - The pages which have been written-to can not be found in accurate way. (Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty pages than there actually are.) Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The procfs interface is enough for finding the soft-dirty bit status and clearing the soft-dirty bit of all the pages of a process. We have the use case where we need to track the soft-dirty PTE bit for only specific pages on-demand. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. *(Moved to using UFFD instead of soft-dirtyi feature to find pages which have been written-to from v7 patch series)*: Stop using the soft-dirty flags for finding which pages have been written to. It is too delicate and wrong as it shows more soft-dirty pages than the actual soft-dirty pages. There is no interest in correcting it [2][3] as this is how the feature was written years ago. It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [4] as it is based inherently on the PTEs. So in this patch series, I've added a new mode to the UFFD which is asynchronous version of the write protect. When this variant of the UFFD WP is used, the page faults are resolved automatically by the kernel. The pages which have been written-to can be found by reading pagemap file (!PM_UFFD_WP). This feature can be used successfully to find which pages have been written to from the time the pages were write protected. This works just like the soft-dirty flag without showing any extra pages which aren't soft-dirty in reality. The information related to pages if the page is file mapped, present and swapped is required for the CRIU project [5][6]. The addition of the required mask, any mask, excluded mask and return masks are also required for the CRIU project [5]. The IOCTL returns the addresses of the pages which match the specific masks. The page addresses are returned in struct page_region in a compact form. The max_pages is needed to support a use case where user only wants to get a specific number of pages. So there is no need to find all the pages of interest in the range when max_pages is specified. The IOCTL returns when the maximum number of the pages are found. The max_pages is optional. If max_pages is specified, it must be equal or greater than the vec_size. This restriction is needed to handle worse case when one page_region only contains info of one page and it cannot be compacted. This is needed to emulate the Windows getWriteWatch() syscall. The patch series include the detailed selftest which can be used as an example for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as well. [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora… [2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n [5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/ [6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/ Regards, Muhammad Usama Anjum Muhammad Usama Anjum (4): fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs tools headers UAPI: Update linux/fs.h with the kernel sources mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL selftests: mm: add pagemap ioctl tests Peter Xu (1): userfaultfd: UFFD_FEATURE_WP_ASYNC Documentation/admin-guide/mm/pagemap.rst | 56 + Documentation/admin-guide/mm/userfaultfd.rst | 35 + fs/proc/task_mmu.c | 481 +++++++ fs/userfaultfd.c | 26 +- include/linux/userfaultfd_k.h | 21 +- include/uapi/linux/fs.h | 53 + include/uapi/linux/userfaultfd.h | 9 +- mm/hugetlb.c | 32 +- mm/memory.c | 27 +- tools/include/uapi/linux/fs.h | 53 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 3 +- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 15 files changed, 2105 insertions(+), 23 deletions(-) create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh -- 2.39.2

2 years, 7 months

2
14
0 0

[RFC PATCH v2 00/11] Intel IA32_SPEC_CTRL Virtualization

by Chao Gao

Changes since RFC v1: * add two kselftests (patch 10-11) * set virtual MSRs also on APs [Pawan] * enable "virtualize IA32_SPEC_CTRL" for L2 to prevent L2 from changing some bits of IA32_SPEC_CTRL (patch 4) * other misc cleanup and cosmetic changes RFC v1: https://lore.kernel.org/lkml/20221210160046.2608762-1-chen.zhang@intel.com/ This series introduces "virtualize IA32_SPEC_CTRL" support. Here are introduction and use cases of this new feature. ### Virtualize IA32_SPEC_CTRL "Virtualize IA32_SPEC_CTRL" [1] is a new VMX feature on Intel CPUs. This feature allows VMM to lock some bits of IA32_SPEC_CTRL MSR even when the MSR is pass-thru'd to a guest. ### Use cases of "virtualize IA32_SPEC_CTRL" [2] Software mitigations like Retpoline and software BHB-clearing sequence depend on CPU microarchitectures. And guest cannot know exactly the underlying microarchitecture. When a guest is migrated between processors of different microarchitectures, software mitigations which work perfectly on previous microachitecture may be not effective on the new one. To fix the problem, some hardware mitigations should be used in conjunction with software mitigations. Using virtual IA32_SPEC_CTRL, VMM can enforce hardware mitigations transparently to guests and avoid those hardware mitigations being unintentionally disabled when guest changes IA32_SPEC_CTRL MSR. ### Intention of this series This series adds the capability of enforcing hardware mitigations for guests transparently and efficiently (i.e., without intecepting IA32_SPEC_CTRL MSR accesses) to kvm. The capability can be used to solve the VM migration issue in a pool consisting of processors of different microarchitectures. Specifically, below are two target scenarios of this series: Scenario 1: If retpoline is used by a VM to mitigate IMBTI in CPL0, VMM can set RRSBA_DIS_S on parts enumerates RRSBA. Note that the VM is presented with a microarchitecture doesn't enumerate RRSBA. Scenario 2: If a VM uses software BHB-clearing sequence on transitions into CPL0 to mitigate BHI, VMM can use "virtualize IA32_SPEC_CTRL" to set BHI_DIS_S on new parts which doesn't enumerate BHI_NO. Intel defines some virtual MSRs [2] for guests to report in-use software mitigations. This allows guests to opt in VMM's deploying hardware mitigations for them if the guests are either running or later migrated to a system on which in-use software mitigations are not effective. The virtual MSRs interface is also added in this series. ### Organization of this series 1. Patch 1-3 Advertise RRSBA_CTRL and BHI_CTRL to guest 2. Patch 4 Add "virtualize IA32_SPEC_CTRL" support 3. Patch 5-9 Allow guests to report in-use software mitigations to KVM so that KVM can enable hardware mitigations for guests. 4. Patch 10-11 Add kselftest for virtual MSRs and IA32_SPEC_CTRL [1]: https://cdrdv2.intel.com/v1/dl/getContent/671368 Ref. #319433-047 Chapter 12 [2]: https://www.intel.com/content/www/us/en/developer/articles/technical/softwa… Chao Gao (3): KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT KVM: selftests: Add tests for virtual enumeration/mitigation MSRs KVM: selftests: Add tests for IA32_SPEC_CTRL MSR Pawan Gupta (1): x86/bugs: Use Virtual MSRs to request hardware mitigations Zhang Chen (7): x86/msr-index: Add bit definitions for BHI_DIS_S and BHI_NO KVM: x86: Advertise CPUID.7.2.EDX and RRSBA_CTRL support KVM: x86: Advertise BHI_CTRL support KVM: VMX: Add IA32_SPEC_CTRL virtualization support KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support KVM: VMX: Advertise MITIGATION_CTRL support KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT arch/x86/include/asm/msr-index.h | 33 +++- arch/x86/include/asm/vmx.h | 5 + arch/x86/include/asm/vmxfeatures.h | 2 + arch/x86/kernel/cpu/bugs.c | 25 +++ arch/x86/kvm/cpuid.c | 22 ++- arch/x86/kvm/reverse_cpuid.h | 8 + arch/x86/kvm/svm/svm.c | 3 + arch/x86/kvm/vmx/capabilities.h | 5 + arch/x86/kvm/vmx/nested.c | 13 ++ arch/x86/kvm/vmx/vmcs.h | 2 + arch/x86/kvm/vmx/vmx.c | 112 ++++++++++- arch/x86/kvm/vmx/vmx.h | 43 ++++- arch/x86/kvm/x86.c | 19 +- tools/arch/x86/include/asm/msr-index.h | 37 +++- tools/testing/selftests/kvm/Makefile | 2 + .../selftests/kvm/include/x86_64/processor.h | 5 + .../selftests/kvm/x86_64/spec_ctrl_msr_test.c | 178 ++++++++++++++++++ .../kvm/x86_64/virtual_mitigation_msr_test.c | 175 +++++++++++++++++ 18 files changed, 676 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c create mode 100644 tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c base-commit: 400d2132288edbd6d500f45eab5d85526ca94e46 -- 2.40.0

2 years, 7 months

4
5
0 0

[PATCH v3 0/2] selftests/ftrace: Add tests for kprobes and optimized probes

by Akanksha J N

This patchset adds a stress test for kprobes and a test for checking optimized probes. The two tests are being added based on the below discussion: https://lore.kernel.org/all/20230128101622.ce6f8e64d929e29d36b08b73@kernel.… kprobe_opt_types.tc is modified as per the below review comments: https://lore.kernel.org/all/1682506809.uus6y0ir3i.naveen@linux.ibm.com/#t Changelog: v3: * Add Acked-by for kprobe_insn_boundary.tc * Simplify test for optimized probe, as suggested by Masami * Add exit_unresolved to exit as unresolved in case no probe was optimized v2: * Add an explicit fork after enabling the events ( echo "forked" ) * Remove the extended test from multiple_kprobe_types.tc which adds multiple consecutive probes in a function and add it as a separate test case. * Add new test case which checks for optimized probes. Akanksha J N (2): selftests/ftrace: Add new test case which adds multiple consecutive probes in a function selftests/ftrace: Add new test case which checks for optimized probes .../test.d/kprobe/kprobe_insn_boundary.tc | 19 +++++++++++ .../ftrace/test.d/kprobe/kprobe_opt_types.tc | 34 +++++++++++++++++++ 2 files changed, 53 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_insn_boundary.tc create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc -- 2.31.1

2 years, 7 months

4
7
0 0

[PATCH v2 0/2] KVM: s390: CMMA migration selftest and small bugfix

by Nico Boehr

v2: --- * swap order of patches (thanks Claudio) * add r-b * add comment why memslots are zeroed Add a new selftest for CMMA migration. Also fix a small issue found during development of the test. Nico Boehr (2): KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes KVM: s390: selftests: add selftest for CMMA migration arch/s390/kvm/kvm-s390.c | 4 + tools/testing/selftests/kvm/Makefile | 1 + tools/testing/selftests/kvm/s390x/cmma_test.c | 680 ++++++++++++++++++ 3 files changed, 685 insertions(+) create mode 100644 tools/testing/selftests/kvm/s390x/cmma_test.c -- 2.39.1

2 years, 7 months

3
7
0 0

[PATCH v1] selftests/bpf: Do not use sign-file as testcase

by Alexey Gladkov

The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c, but the utility should not be called as a test. Executing this utility produces the following error: selftests: /linux/tools/testing/selftests/bpf: urandom_read ok 16 selftests: /linux/tools/testing/selftests/bpf: urandom_read selftests: /linux/tools/testing/selftests/bpf: sign-file not ok 17 selftests: /linux/tools/testing/selftests/bpf: sign-file # exit=2 Fixes: fc97590668ae ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc") Signed-off-by: Alexey Gladkov <legion(a)kernel.org> --- tools/testing/selftests/bpf/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index b677dcd0b77a..fd214d1526d4 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -88,8 +88,7 @@ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ xskxceiver xdp_redirect_multi xdp_synproxy veristat xdp_hw_metadata \ xdp_features -TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file -TEST_GEN_FILES += liburandom_read.so +TEST_GEN_FILES += liburandom_read.so urandom_read sign-file # Emit succinct information message describing current building step # $1 - generic step name (e.g., CC, LINK, etc); -- 2.33.7

2 years, 7 months

5
6
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror April 2023