August 2023 - Linux-kselftest-mirror

by Willy Tarreau

Hi Shuah, hi Paul, I'm sending you the list of planned nolibc changes for 6.6. A doc update may possibly follow a bit later to try to document the contribution process. We also noticed a slight increase in binary sizes that might be fixed soon but I wouldn't bet on this since it will require lot of testing again and I'd rather postpone this by default. In any case I have no intent to push any significant updates/fixes for 6.6 at this point. I'm also pasting a summary of the changes in this pull request, feel free to use it for the merge commit message if you need. For any question or if anything is not clear, do not hesitate to ask! Thanks, Willy ----- changes ------ Nolibc: - improved portability by removing build errors with -ENOSYS - added syscall6() on MIPS to support pselect6() and mmap() - added setvbuf(), rmdir(), pipe(), pipe2() - add support for ppc/ppc64 - environ is no longer optional - fixed frame pointer issues at -O0 - dropped sys_stat() in favor of sys_statx() - centralized _start_c() to remove lots of asm code - switched size_t to __SIZE_TYPE__ Selftests: - improved status reporting (success/warning/failure counts, path to log file) - various code cleanups (indent, unused variables, ...) - more consistent test numbering - enabled compiler warnings - dropped unreliable chmod_net test - improved reliability (create /dev/zero & /tmp, rely less on /proc) - new tests (brk/sbrk/mmap/munmap) - improved compatibility with musl - new run-nolibc-test target to build and run natively - new run-libc-test target to build and run against native libc - made the cmdline parser more reliable against boolean arguments - dropped dependency on memfd for vfprintf() test - nolibc-test is no longer stripped - added support for extending ARCH via XARCH Other: - add Thomas as co-maintainer ----------- The following changes since commit 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5: Linux 6.5-rc1 (2023-07-09 13:53:13 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/nolibc.git/ 20230806-for-6.6-1 for you to fetch changes up to d98c1e27e46e47a3ae67e1d048f153598ba82611: tools/nolibc: stackprotector.h: make __stack_chk_init static (2023-08-06 18:44:47 +0200) ---------------------------------------------------------------- Ryan Roberts (1): tools/nolibc/stdio: add setvbuf() to set buffering mode Thomas Weißschuh (22): selftests/nolibc: drop test chmod_net selftests/nolibc: simplify call to ioperm tools/nolibc: completely remove optional environ support selftests/nolibc: make evaluation of test conditions selftests/nolibc: simplify status printing selftests/nolibc: avoid gaps in test numbers selftests/nolibc: avoid buffer underrun in space printing tools/nolibc: drop unused variables tools/nolibc: fix return type of getpagesize() tools/nolibc: setvbuf: avoid unused parameter warnings tools/nolibc: sys: avoid implicit sign cast tools/nolibc: stdint: use __SIZE_TYPE__ for size_t selftests/nolibc: drop unused variables selftests/nolibc: mark test helpers as potentially unused selftests/nolibc: make functions static if possible selftests/nolibc: avoid unused parameter warnings selftests/nolibc: avoid sign-compare warnings selftests/nolibc: use correct return type for read() and write() selftests/nolibc: prevent out of bounds access in expect_vfprintf selftests/nolibc: don't strip nolibc-test selftests/nolibc: enable compiler warnings MAINTAINERS: nolibc: add myself as co-maintainer Willy Tarreau (1): selftests/nolibc: avoid warnings during intptr tests Yuan Tan (2): tools/nolibc: add pipe() and pipe2() support selftests/nolibc: add testcase for pipe Zhangjin Wu (74): selftests/nolibc: add a standalone test report macro selftests/nolibc: always print the path to test log file selftests/nolibc: restore the failed tests print tools/nolibc: fix up #error compile failures with -ENOSYS tools/nolibc: fix up undeclared syscall macros with #ifdef and -ENOSYS tools/nolibc: sys.h: add a syscall return helper tools/nolibc: unistd.h: apply __sysret() helper tools/nolibc: sys.h: apply __sysret() helper tools/nolibc: unistd.h: reorder the syscall macros tools/nolibc: arch-*.h: fix up code indent errors toolc/nolibc: arch-*.h: clean up whitespaces after __asm__ tools/nolibc: arch-loongarch.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST tools/nolibc: arch-mips.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST tools/nolibc: add missing my_syscall6() for mips tools/nolibc: __sysret: support syscalls who return a pointer tools/nolibc: clean up mmap() routine tools/nolibc: clean up sbrk() routine selftests/nolibc: export argv0 for some tests selftests/nolibc: prepare: create /dev/zero selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER selftests/nolibc: add sbrk_0 to test current brk getting selftests/nolibc: add mmap_bad test case selftests/nolibc: add munmap_bad test case selftests/nolibc: add mmap_munmap_good test case selftests/nolibc: add run-libc-test target selftests/nolibc: stat_fault: silence NULL argument warning with glibc selftests/nolibc: gettid: restore for glibc and musl selftests/nolibc: add _LARGEFILE64_SOURCE for musl selftests/nolibc: fix up int_fast16/32_t test cases for musl tools/nolibc: types.h: add RB_ flags for reboot() selftests/nolibc: prefer <sys/reboot.h> to <linux/reboot.h> selftests/nolibc: fix up kernel parameters support selftests/nolibc: link_cross: use /proc/self/cmdline tools/nolibc: add rmdir() support selftests/nolibc: add a new rmdir() test case selftests/nolibc: fix up failures when CONFIG_PROC_FS=n selftests/nolibc: prepare /tmp for tests that need to write selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency selftests/nolibc: chdir_root: restore current path after test selftests/nolibc: stat_timestamps: remove procfs dependency selftests/nolibc: chroot_exe: remove procfs dependency selftests/nolibc: add chmod_argv0 test selftests/nolibc: report: print a summarized test status selftests/nolibc: report: print total tests selftests/nolibc: report: align passed, skipped and failed selftests/nolibc: report: extrude the test status line selftests/nolibc: report: add newline before test failures tools/nolibc: arch-*.h: add missing space after ',' tools/nolibc: fix up startup failures for -O0 under gcc < 11.1.0 tools/nolibc: remove the old sys_stat support tools/nolibc: add new crt.h with _start_c tools/nolibc: stackprotector.h: add empty __stack_chk_init for !_NOLIBC_STACKPROTECTOR tools/nolibc: crt.h: initialize stack protector tools/nolibc: arm: shrink _start with _start_c tools/nolibc: aarch64: shrink _start with _start_c tools/nolibc: i386: shrink _start with _start_c tools/nolibc: x86_64: shrink _start with _start_c tools/nolibc: mips: shrink _start with _start_c tools/nolibc: loongarch: shrink _start with _start_c tools/nolibc: riscv: shrink _start with _start_c tools/nolibc: s390: shrink _start with _start_c selftests/nolibc: add EXPECT_PTRGE, EXPECT_PTRGT, EXPECT_PTRLE, EXPECT_PTRLT selftests/nolibc: add testcases for startup code selftests/nolibc: allow run nolibc-test locally selftests/nolibc: allow test -include /path/to/nolibc.h selftests/nolibc: mmap_munmap_good: fix up return value tools/nolibc: add support for powerpc tools/nolibc: add support for powerpc64 selftests/nolibc: add XARCH and ARCH mapping support selftests/nolibc: add test support for ppc selftests/nolibc: add test support for ppc64le selftests/nolibc: add test support for ppc64 selftests/nolibc: allow report with existing test log tools/nolibc: stackprotector.h: make __stack_chk_init static MAINTAINERS | 1 + tools/include/nolibc/Makefile | 1 + tools/include/nolibc/arch-aarch64.h | 85 +--- tools/include/nolibc/arch-arm.h | 111 +---- tools/include/nolibc/arch-i386.h | 86 +--- tools/include/nolibc/arch-loongarch.h | 83 +--- tools/include/nolibc/arch-mips.h | 147 +++---- tools/include/nolibc/arch-powerpc.h | 213 ++++++++++ tools/include/nolibc/arch-riscv.h | 83 +--- tools/include/nolibc/arch-s390.h | 77 +--- tools/include/nolibc/arch-x86_64.h | 86 +--- tools/include/nolibc/arch.h | 2 + tools/include/nolibc/crt.h | 61 +++ tools/include/nolibc/nolibc.h | 9 +- tools/include/nolibc/stackprotector.h | 5 +- tools/include/nolibc/stdint.h | 2 +- tools/include/nolibc/stdio.h | 27 ++ tools/include/nolibc/stdlib.h | 12 +- tools/include/nolibc/sys.h | 554 +++++++----------------- tools/include/nolibc/types.h | 22 +- tools/include/nolibc/unistd.h | 13 +- tools/testing/selftests/nolibc/Makefile | 109 +++-- tools/testing/selftests/nolibc/nolibc-test.c | 609 ++++++++++++++++++++------- 23 files changed, 1216 insertions(+), 1182 deletions(-) create mode 100644 tools/include/nolibc/arch-powerpc.h create mode 100644 tools/include/nolibc/crt.h

2 years, 4 months

2
9
0 0

[PATCH net-next] selftests: bonding: remove redundant delete action of device link1_1

by Zhengchao Shao

When run command "ip netns delete client", device link1_1 has been deleted. So, it is no need to delete link1_1 again. Remove it. Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- .../drivers/net/bonding/bond-arp-interval-causes-panic.sh | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/bonding/bond-arp-interval-causes-panic.sh b/tools/testing/selftests/drivers/net/bonding/bond-arp-interval-causes-panic.sh index 71c00bfafbc9..7b2d421f09cf 100755 --- a/tools/testing/selftests/drivers/net/bonding/bond-arp-interval-causes-panic.sh +++ b/tools/testing/selftests/drivers/net/bonding/bond-arp-interval-causes-panic.sh @@ -11,7 +11,6 @@ finish() { ip netns delete server || true ip netns delete client || true - ip link del link1_1 || true } trap finish EXIT -- 2.34.1

2 years, 4 months

4
7
0 0

[PATCH v31 0/6] Implement IOCTL to get and optionally clear info about PTEs

by Muhammad Usama Anjum

*Changes in v30*: - Rebase on top of next-20230815 - Minor nitpicks *Changes in v29:* - Polish IOCTL and improve documentation *Changes in v28:* - Fix walk_end and add 17 test cases in selftests patch *Changes in v27:* - Handle review comments and minor improvements - Add performance improvement patch on top with test for easy review *Changes in v26:* - Code re-structurring and API changes in PAGEMAP_IOCTL *Changes in v25*: - Do proper filtering on hole as well (hole got missed earlier) *Changes in v24*: - Rebase on top of next-20230710 - Place WP markers in case of hole as well *Changes in v23*: - Set vec_buf_index in loop only when vec_buf_index is set - Return -EFAULT instead of -EINVAL if vec is NULL - Correctly return the walk ending address to the page granularity *Changes in v22*: - Interface change: - Replace [start start + len) with [start, end) - Return the ending address of the address walk in start *Changes in v21*: - Abort walk instead of returning error if WP is to be performed on partial hugetlb *Changes in v20* - Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO *Changes in v19* - Minor changes and interface updates *Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() and ResetWriteWatch() syscalls [1]. The GetWriteWatch() retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)* From the definition of GetWriteWatch(), we feel like kernel's soft-dirty feature can be used under the hood with some additions like: * reset soft-dirty flag for only a specific region of memory instead of clearing the flag for the entire process * get and clear soft-dirty flag for a specific region atomically So we decided to use ioctl on pagemap file to read or/and reset soft-dirty flag. But using soft-dirty flag, sometimes we get extra pages which weren't even written. They had become soft-dirty because of VMA merging and VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were able to by-pass this short coming by ignoring VM_SOFTDIRTY until David reported that mprotect etc messes up the soft-dirty flag while ignoring VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We discussed if we can revert these patches. But we could not reach to any conclusion. So at this point, I made couple of tries to solve this whole VM_SOFTDIRTY issue by correcting the soft-dirty implementation: * [7] Correct the bug fixed wrongly back in 2014. It had potential to cause regression. We left it behind. * [8] Keep a list of soft-dirty part of a VMA across splits and merges. I got the reply don't increase the size of the VMA by 8 bytes. At this point, we left soft-dirty considering it is too much delicate and userfaultfd [9] seemed like the only way forward. From there onward, we have been basing soft-dirty emulation on userfaultfd wp feature where kernel resolves the faults itself when WP_ASYNC feature is used. It was straight forward to add WP_ASYNC feature in userfautlfd. Now we get only those pages dirty or written-to which are really written in reality. (PS There is another WP_UNPOPULATED userfautfd feature is required which is needed to avoid pre-faulting memory before write-protecting [9].) All the different masks were added on the request of CRIU devs to create interface more generic and better. [1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-… [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com [3] https://github.com/google/sanitizers [4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit [5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com [6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/ [7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com [10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com * Original Cover letter from v8* Hello, Note: Soft-dirty pages and pages which have been written-to are synonyms. As kernel already has soft-dirty feature inside which we have given up to use, we are using written-to terminology while using UFFD async WP under the hood. It is possible to find and clear soft-dirty pages entirely in userspace. But it isn't efficient: - The mprotect and SIGSEGV handler for bookkeeping - The userfaultfd wp (synchronous) with the handler for bookkeeping Some benchmarks can be seen here[1]. This series adds features that weren't present earlier: - There is no atomic get soft-dirty/Written-to status and clear present in the kernel. - The pages which have been written-to can not be found in accurate way. (Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty pages than there actually are.) Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The procfs interface is enough for finding the soft-dirty bit status and clearing the soft-dirty bit of all the pages of a process. We have the use case where we need to track the soft-dirty PTE bit for only specific pages on-demand. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. *(Moved to using UFFD instead of soft-dirty feature to find pages which have been written-to from v7 patch series)*: Stop using the soft-dirty flags for finding which pages have been written to. It is too delicate and wrong as it shows more soft-dirty pages than the actual soft-dirty pages. There is no interest in correcting it [2][3] as this is how the feature was written years ago. It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [4] as it is based inherently on the PTEs. So in this patch series, I've added a new mode to the UFFD which is asynchronous version of the write protect. When this variant of the UFFD WP is used, the page faults are resolved automatically by the kernel. The pages which have been written-to can be found by reading pagemap file (!PM_UFFD_WP). This feature can be used successfully to find which pages have been written to from the time the pages were write protected. This works just like the soft-dirty flag without showing any extra pages which aren't soft-dirty in reality. The information related to pages if the page is file mapped, present and swapped is required for the CRIU project [5][6]. The addition of the required mask, any mask, excluded mask and return masks are also required for the CRIU project [5]. The IOCTL returns the addresses of the pages which match the specific masks. The page addresses are returned in struct page_region in a compact form. The max_pages is needed to support a use case where user only wants to get a specific number of pages. So there is no need to find all the pages of interest in the range when max_pages is specified. The IOCTL returns when the maximum number of the pages are found. The max_pages is optional. If max_pages is specified, it must be equal or greater than the vec_size. This restriction is needed to handle worse case when one page_region only contains info of one page and it cannot be compacted. This is needed to emulate the Windows getWriteWatch() syscall. The patch series include the detailed selftest which can be used as an example for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as well. [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora… [2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n [5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/ [6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/ Regards, Muhammad Usama Anjum Muhammad Usama Anjum (5): fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs fs/proc/task_mmu: Add fast paths to get/clear PAGE_IS_WRITTEN flag tools headers UAPI: Update linux/fs.h with the kernel sources mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL selftests: mm: add pagemap ioctl tests Peter Xu (1): userfaultfd: UFFD_FEATURE_WP_ASYNC Documentation/admin-guide/mm/pagemap.rst | 89 + Documentation/admin-guide/mm/userfaultfd.rst | 35 + fs/proc/task_mmu.c | 708 ++++++++ fs/userfaultfd.c | 26 +- include/linux/hugetlb.h | 1 + include/linux/userfaultfd_k.h | 28 +- include/uapi/linux/fs.h | 59 + include/uapi/linux/userfaultfd.h | 9 +- mm/hugetlb.c | 34 +- mm/memory.c | 28 +- tools/include/uapi/linux/fs.h | 59 + tools/testing/selftests/mm/.gitignore | 2 + tools/testing/selftests/mm/Makefile | 3 +- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/pagemap_ioctl.c | 1660 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 16 files changed, 2722 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c -- 2.40.1

2 years, 4 months

2
8
0 0

[PATCH v30 0/6] Implement IOCTL to get and optionally clear info about PTEs

by Muhammad Usama Anjum

*Changes in v30*: - Rebase on top of next-20230815 - Minor nitpicks *Changes in v29:* - Polish IOCTL and improve documentation *Changes in v28:* - Fix walk_end and add 17 test cases in selftests patch *Changes in v27:* - Handle review comments and minor improvements - Add performance improvement patch on top with test for easy review *Changes in v26:* - Code re-structurring and API changes in PAGEMAP_IOCTL *Changes in v25*: - Do proper filtering on hole as well (hole got missed earlier) *Changes in v24*: - Rebase on top of next-20230710 - Place WP markers in case of hole as well *Changes in v23*: - Set vec_buf_index in loop only when vec_buf_index is set - Return -EFAULT instead of -EINVAL if vec is NULL - Correctly return the walk ending address to the page granularity *Changes in v22*: - Interface change: - Replace [start start + len) with [start, end) - Return the ending address of the address walk in start *Changes in v21*: - Abort walk instead of returning error if WP is to be performed on partial hugetlb *Changes in v20* - Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO *Changes in v19* - Minor changes and interface updates *Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() and ResetWriteWatch() syscalls [1]. The GetWriteWatch() retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)* From the definition of GetWriteWatch(), we feel like kernel's soft-dirty feature can be used under the hood with some additions like: * reset soft-dirty flag for only a specific region of memory instead of clearing the flag for the entire process * get and clear soft-dirty flag for a specific region atomically So we decided to use ioctl on pagemap file to read or/and reset soft-dirty flag. But using soft-dirty flag, sometimes we get extra pages which weren't even written. They had become soft-dirty because of VMA merging and VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were able to by-pass this short coming by ignoring VM_SOFTDIRTY until David reported that mprotect etc messes up the soft-dirty flag while ignoring VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We discussed if we can revert these patches. But we could not reach to any conclusion. So at this point, I made couple of tries to solve this whole VM_SOFTDIRTY issue by correcting the soft-dirty implementation: * [7] Correct the bug fixed wrongly back in 2014. It had potential to cause regression. We left it behind. * [8] Keep a list of soft-dirty part of a VMA across splits and merges. I got the reply don't increase the size of the VMA by 8 bytes. At this point, we left soft-dirty considering it is too much delicate and userfaultfd [9] seemed like the only way forward. From there onward, we have been basing soft-dirty emulation on userfaultfd wp feature where kernel resolves the faults itself when WP_ASYNC feature is used. It was straight forward to add WP_ASYNC feature in userfautlfd. Now we get only those pages dirty or written-to which are really written in reality. (PS There is another WP_UNPOPULATED userfautfd feature is required which is needed to avoid pre-faulting memory before write-protecting [9].) All the different masks were added on the request of CRIU devs to create interface more generic and better. [1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-… [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com [3] https://github.com/google/sanitizers [4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit [5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com [6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/ [7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com [10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com * Original Cover letter from v8* Hello, Note: Soft-dirty pages and pages which have been written-to are synonyms. As kernel already has soft-dirty feature inside which we have given up to use, we are using written-to terminology while using UFFD async WP under the hood. It is possible to find and clear soft-dirty pages entirely in userspace. But it isn't efficient: - The mprotect and SIGSEGV handler for bookkeeping - The userfaultfd wp (synchronous) with the handler for bookkeeping Some benchmarks can be seen here[1]. This series adds features that weren't present earlier: - There is no atomic get soft-dirty/Written-to status and clear present in the kernel. - The pages which have been written-to can not be found in accurate way. (Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty pages than there actually are.) Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The procfs interface is enough for finding the soft-dirty bit status and clearing the soft-dirty bit of all the pages of a process. We have the use case where we need to track the soft-dirty PTE bit for only specific pages on-demand. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. *(Moved to using UFFD instead of soft-dirty feature to find pages which have been written-to from v7 patch series)*: Stop using the soft-dirty flags for finding which pages have been written to. It is too delicate and wrong as it shows more soft-dirty pages than the actual soft-dirty pages. There is no interest in correcting it [2][3] as this is how the feature was written years ago. It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [4] as it is based inherently on the PTEs. So in this patch series, I've added a new mode to the UFFD which is asynchronous version of the write protect. When this variant of the UFFD WP is used, the page faults are resolved automatically by the kernel. The pages which have been written-to can be found by reading pagemap file (!PM_UFFD_WP). This feature can be used successfully to find which pages have been written to from the time the pages were write protected. This works just like the soft-dirty flag without showing any extra pages which aren't soft-dirty in reality. The information related to pages if the page is file mapped, present and swapped is required for the CRIU project [5][6]. The addition of the required mask, any mask, excluded mask and return masks are also required for the CRIU project [5]. The IOCTL returns the addresses of the pages which match the specific masks. The page addresses are returned in struct page_region in a compact form. The max_pages is needed to support a use case where user only wants to get a specific number of pages. So there is no need to find all the pages of interest in the range when max_pages is specified. The IOCTL returns when the maximum number of the pages are found. The max_pages is optional. If max_pages is specified, it must be equal or greater than the vec_size. This restriction is needed to handle worse case when one page_region only contains info of one page and it cannot be compacted. This is needed to emulate the Windows getWriteWatch() syscall. The patch series include the detailed selftest which can be used as an example for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as well. [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora… [2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n [5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/ [6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/ Regards, Muhammad Usama Anjum Muhammad Usama Anjum (5): fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs fs/proc/task_mmu: Add fast paths to get/clear PAGE_IS_WRITTEN flag tools headers UAPI: Update linux/fs.h with the kernel sources mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL selftests: mm: add pagemap ioctl tests Peter Xu (1): userfaultfd: UFFD_FEATURE_WP_ASYNC Documentation/admin-guide/mm/pagemap.rst | 89 + Documentation/admin-guide/mm/userfaultfd.rst | 35 + fs/proc/task_mmu.c | 705 ++++++++ fs/userfaultfd.c | 26 +- include/linux/hugetlb.h | 1 + include/linux/userfaultfd_k.h | 28 +- include/uapi/linux/fs.h | 59 + include/uapi/linux/userfaultfd.h | 9 +- mm/hugetlb.c | 34 +- mm/memory.c | 28 +- tools/include/uapi/linux/fs.h | 59 + tools/testing/selftests/mm/.gitignore | 2 + tools/testing/selftests/mm/Makefile | 3 +- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/pagemap_ioctl.c | 1660 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 16 files changed, 2719 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c -- 2.40.1

2 years, 4 months

3
9
0 0

[net-next v2 0/2] seg6: add NEXT-C-SID support for SRv6 End.X behavior

by Andrea Mayer

In the Segment Routing (SR) architecture a list of instructions, called segments, can be added to the packet headers to influence the forwarding and processing of the packets in an SR enabled network. Considering the Segment Routing over IPv6 data plane (SRv6) [1], the segment identifiers (SIDs) are IPv6 addresses (128 bits) and the segment list (SID List) is carried in the Segment Routing Header (SRH). A segment may correspond to a "behavior" that is executed by a node when the packet is received. The Linux kernel currently supports a large subset of the behaviors described in [2] (e.g., End, End.X, End.T and so on). In some SRv6 scenarios, the number of segments carried by the SID List may increase dramatically, reducing the MTU (Maximum Transfer Unit) size and/or limiting the processing power of legacy hardware devices (due to longer IPv6 headers). The NEXT-C-SID mechanism [3] extends the SRv6 architecture by providing several ways to efficiently represent the SID List. By leveraging the NEXT-C-SID, it is possible to encode several SRv6 segments within a single 128 bit SID address (also referenced as Compressed SID Container). In this way, the length of the SID List can be drastically reduced. The NEXT-C-SID mechanism is built upon the "flavors" framework defined in [2]. This framework is already supported by the Linux SRv6 subsystem and is used to modify and/or extend a subset of existing behaviors. In this patchset, we extend the SRv6 End.X behavior in order to support the NEXT-C-SID mechanism. In details, the patchset is made of: - patch 1/2: add NEXT-C-SID support for SRv6 End.X behavior; - patch 2/2: add selftest for NEXT-C-SID in SRv6 End.X behavior. From the user space perspective, we do not need to change the iproute2 code to support the NEXT-C-SID flavor for the SRv6 End.X behavior. However, we will update the man page considering the NEXT-C-SID flavor applied to the SRv6 End.X behavior in a separate patch. Comments, improvements and suggestions are always appreciated. Thank you all, Andrea [1] - https://datatracker.ietf.org/doc/html/rfc8754 [2] - https://datatracker.ietf.org/doc/html/rfc8986 [3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression v1 -> v2: - Fix author tags in the commit message in patch 2/2, thanks to Paolo Abeni; - Remove unnecessary supp_ops == 0 check in patch 1/2, thanks to Hangbin Liu; - Fix 'is it possible' -> 'it is possible' in cover letter, thanks to Hangbin Liu. Andrea Mayer (1): seg6: add NEXT-C-SID support for SRv6 End.X behavior Paolo Lungaroni (1): selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End.X behavior net/ipv6/seg6_local.c | 108 +- tools/testing/selftests/net/Makefile | 1 + .../net/srv6_end_x_next_csid_l3vpn_test.sh | 1213 +++++++++++++++++ 3 files changed, 1302 insertions(+), 20 deletions(-) create mode 100755 tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh -- 2.20.1

2 years, 4 months

4
6
0 0

[PATCH bpf-next v3] selftests/bpf: trace_helpers.c: optimize kallsyms cache

by Rong Tao

From: Rong Tao <rongtao(a)cestc.cn> Static ksyms often have problems because the number of symbols exceeds the MAX_SYMS limit. Like changing the MAX_SYMS from 300000 to 400000 in commit e76a014334a6("selftests/bpf: Bump and validate MAX_SYMS") solves the problem somewhat, but it's not the perfect way. This commit uses dynamic memory allocation, which completely solves the problem caused by the limitation of the number of kallsyms. Signed-off-by: Rong Tao <rongtao(a)cestc.cn> --- v3: Do not use structs and judge ksyms__add_symbol function return value. v2: https://lore.kernel.org/lkml/tencent_B655EE5E5D463110D70CD2846AB3262EED09@q… Do the usual len/capacity scheme here to amortize the cost of realloc, and don't free symbols. v1: https://lore.kernel.org/lkml/tencent_AB461510B10CD484E0B2F62E3754165F2909@q… --- tools/testing/selftests/bpf/trace_helpers.c | 42 ++++++++++++++++----- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c index f83d9f65c65b..d8391a2122b4 100644 --- a/tools/testing/selftests/bpf/trace_helpers.c +++ b/tools/testing/selftests/bpf/trace_helpers.c @@ -18,10 +18,32 @@ #define TRACEFS_PIPE "/sys/kernel/tracing/trace_pipe" #define DEBUGFS_PIPE "/sys/kernel/debug/tracing/trace_pipe" -#define MAX_SYMS 400000 -static struct ksym syms[MAX_SYMS]; +static struct ksym *syms; +static int sym_cap; static int sym_cnt; +static int ksyms__add_symbol(const char *name, unsigned long addr) +{ + void *tmp; + unsigned int new_cap; + + if (sym_cnt + 1 > sym_cap) { + new_cap = sym_cap * 4 / 3; + tmp = realloc(syms, sizeof(struct ksym) * new_cap); + if (!tmp) + return -ENOMEM; + syms = tmp; + sym_cap = new_cap; + } + + syms[sym_cnt].addr = addr; + syms[sym_cnt].name = strdup(name); + + sym_cnt++; + + return 0; +} + static int ksym_cmp(const void *p1, const void *p2) { return ((struct ksym *)p1)->addr - ((struct ksym *)p2)->addr; @@ -33,9 +55,13 @@ int load_kallsyms_refresh(void) char func[256], buf[256]; char symbol; void *addr; - int i = 0; + int ret; + sym_cap = 1024; sym_cnt = 0; + syms = malloc(sizeof(struct ksym) * sym_cap); + if (!syms) + return -ENOMEM; f = fopen("/proc/kallsyms", "r"); if (!f) @@ -46,15 +72,11 @@ int load_kallsyms_refresh(void) break; if (!addr) continue; - if (i >= MAX_SYMS) - return -EFBIG; - - syms[i].addr = (long) addr; - syms[i].name = strdup(func); - i++; + ret = ksyms__add_symbol(func, (unsigned long)addr); + if (ret) + return ret; } fclose(f); - sym_cnt = i; qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp); return 0; } -- 2.41.0

2 years, 4 months

3
3
0 0

[PATCH v7 0/4] iommufd: Add iommu hardware info reporting

by Yi Liu

iommufd gives userspace the capability to manipulate iommu subsytem. e.g. DMA map/unmap etc. In the near future, it will support iommu nested translation. Different platform vendors have different implementation for the nested translation. For example, Intel VT-d supports using guest I/O page table as the stage-1 translation table. This requires guest I/O page table be compatible with hardware IOMMU. So before set up nested translation, userspace needs to know the hardware iommu information to understand the nested translation requirements. This series reports the iommu hardware information for a given device which has been bound to iommufd. It is preparation work for userspace to allocate hwpt for given device. Like the nested translation support[1]. This series introduces an iommu op to report the iommu hardware info, and an ioctl IOMMU_GET_HW_INFO is added to report such hardware info to user. enum iommu_hw_info_type is defined to differentiate the iommu hardware info reported to user hence user can decode them. This series only adds the framework for iommu hw info reporting, the complete reporting path needs vendor specific definition and driver support. The full code is available in [1] as well. [1] https://github.com/yiliu1765/iommufd/tree/wip/iommufd_nesting_08112023-yi (only the hw_info report path is the latest, other parts is wip) Change log: v7: - Use clear_user() (Jason) - Add fail_nth for hw_ifo (Jason) v6: https://lore.kernel.org/linux-iommu/20230808153510.4170-1-yi.l.liu@intel.co… - Add Jingqi's comment on patch 02 - Add Baolu's r-b to patch 03 - Address Jason's comment on patch 03 v5: https://lore.kernel.org/linux-iommu/20230803143144.200945-1-yi.l.liu@intel.… - Return hw_info_type in the .hw_info op, hence drop hw_info_type field in iommu_ops (Kevin) - Add Jason's r-b for patch 01 - Address coding style comments from Jason and Kevin w.r.t. patch 02, 03 and 04 v4: https://lore.kernel.org/linux-iommu/20230724105936.107042-1-yi.l.liu@intel.… - Rename ioctl to IOMMU_GET_HW_INFO and structure to iommu_hw_info - Move the iommufd_get_hw_info handler to main.c - Place iommu_hw_info prior to iommu_hwpt_alloc - Update the function namings accordingly - Update uapi kdocs v3: https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.c… - Add r-b from Baolu - Rename IOMMU_HW_INFO_TYPE_DEFAULT to be IOMMU_HW_INFO_TYPE_NONE to better suit what it means - Let IOMMU_DEVICE_GET_HW_INFO succeed even the underlying iommu driver does not have driver-specific data to report per below remark. https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/ v2: https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.… - Drop patch 05 of v1 as it is already covered by other series - Rename the capability info to be iommu hardware info v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.co… Regards, Yi Liu Lu Baolu (1): iommu: Add new iommu op to get iommu hardware information Nicolin Chen (1): iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl Yi Liu (2): iommu: Move dev_iommu_ops() to private header iommufd: Add IOMMU_GET_HW_INFO drivers/iommu/iommu-priv.h | 11 +++ drivers/iommu/iommufd/iommufd_test.h | 9 ++ drivers/iommu/iommufd/main.c | 85 +++++++++++++++++++ drivers/iommu/iommufd/selftest.c | 16 ++++ include/linux/iommu.h | 20 ++--- include/uapi/linux/iommufd.h | 45 ++++++++++ tools/testing/selftests/iommu/iommufd.c | 28 +++++- .../selftests/iommu/iommufd_fail_nth.c | 4 + tools/testing/selftests/iommu/iommufd_utils.h | 47 ++++++++++ 9 files changed, 253 insertions(+), 12 deletions(-) -- 2.34.1

2 years, 4 months

3
14
0 0

[PATCH 1/2] kunit: add ability to run tests after boot using debugfs

by Rae Moar

Add functionality to run built-in tests after boot by writing to a debugfs file. Add a new debugfs file labeled "run" for each test suite to use for this purpose. As an example, write to the file using the following: echo "any string" > /sys/kernel/debugfs/kunit/<testsuite>/run This will trigger the test suite to run and will print results to the kernel log. Note that what you "write" to the debugfs file will not be saved. To guard against running tests concurrently with this feature, add a mutex lock around running kunit. This supports the current practice of not allowing tests to be run concurrently on the same kernel. This functionality may not work for all tests. This new functionality could be used to design a parameter injection feature in the future. Signed-off-by: Rae Moar <rmoar(a)google.com> --- Interested in what people think of this idea. I will be adding documentation in v2. Note this may need to be changed once the patches on extending logs land. Thanks! -Rae lib/kunit/debugfs.c | 66 +++++++++++++++++++++++++++++++++++++++++++++ lib/kunit/test.c | 13 +++++++++ 2 files changed, 79 insertions(+) diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c index 22c5c496a68f..7f76cb909a97 100644 --- a/lib/kunit/debugfs.c +++ b/lib/kunit/debugfs.c @@ -8,12 +8,14 @@ #include <linux/module.h> #include <kunit/test.h> +#include <kunit/test-bug.h> #include "string-stream.h" #include "debugfs.h" #define KUNIT_DEBUGFS_ROOT "kunit" #define KUNIT_DEBUGFS_RESULTS "results" +#define KUNIT_DEBUGFS_RUN "run" /* * Create a debugfs representation of test suites: @@ -21,6 +23,8 @@ * Path Semantics * /sys/kernel/debug/kunit/<testsuite>/results Show results of last run for * testsuite + * /sys/kernel/debug/kunit/<testsuite>/run Write to this file to trigger + * testsuite to run * */ @@ -93,6 +97,51 @@ static int debugfs_results_open(struct inode *inode, struct file *file) return single_open(file, debugfs_print_results, suite); } +/* + * Print a usage message to the debugfs "run" file + * (/sys/kernel/debug/kunit/<testsuite>/run) if opened. + */ +static int debugfs_print_run(struct seq_file *seq, void *v) +{ + struct kunit_suite *suite = (struct kunit_suite *)seq->private; + + seq_puts(seq, "Write to this file to trigger the test suite to run.\n"); + seq_printf(seq, "usage: echo \"any string\" > /sys/kernel/debugfs/kunit/%s/run\n", + suite->name); + return 0; +} + +/* + * The debugfs "run" file (/sys/kernel/debug/kunit/<testsuite>/run) + * contains no information. Write to the file to trigger the test suite + * to run. + */ +static int debugfs_run_open(struct inode *inode, struct file *file) +{ + struct kunit_suite *suite; + + suite = (struct kunit_suite *)inode->i_private; + + return single_open(file, debugfs_print_run, suite); +} + +/* + * Trigger a test suite to run by writing to the suite's "run" debugfs + * file found at: /sys/kernel/debug/kunit/<testsuite>/run + * + * Note: what is written to this file will not be saved. + */ +static ssize_t debugfs_run(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct inode *f_inode = file->f_inode; + struct kunit_suite *suite = (struct kunit_suite *) f_inode->i_private; + + __kunit_test_suites_init(&suite, 1); + + return count; +} + static const struct file_operations debugfs_results_fops = { .open = debugfs_results_open, .read = seq_read, @@ -100,10 +149,23 @@ static const struct file_operations debugfs_results_fops = { .release = debugfs_release, }; +static const struct file_operations debugfs_run_fops = { + .open = debugfs_run_open, + .read = seq_read, + .write = debugfs_run, + .llseek = seq_lseek, + .release = debugfs_release, +}; + void kunit_debugfs_create_suite(struct kunit_suite *suite) { struct kunit_case *test_case; + if (suite->log) { + /* Clear the suite log that's leftover from a previous run. */ + suite->log[0] = '\0'; + return; + } /* Allocate logs before creating debugfs representation. */ suite->log = kzalloc(KUNIT_LOG_SIZE, GFP_KERNEL); kunit_suite_for_each_test_case(suite, test_case) @@ -114,6 +176,10 @@ void kunit_debugfs_create_suite(struct kunit_suite *suite) debugfs_create_file(KUNIT_DEBUGFS_RESULTS, S_IFREG | 0444, suite->debugfs, suite, &debugfs_results_fops); + + debugfs_create_file(KUNIT_DEBUGFS_RUN, S_IFREG | 0644, + suite->debugfs, + suite, &debugfs_run_fops); } void kunit_debugfs_destroy_suite(struct kunit_suite *suite) diff --git a/lib/kunit/test.c b/lib/kunit/test.c index 49698a168437..5058a72d9e8a 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -13,6 +13,7 @@ #include <linux/kernel.h> #include <linux/module.h> #include <linux/moduleparam.h> +#include <linux/mutex.h> #include <linux/panic.h> #include <linux/sched/debug.h> #include <linux/sched.h> @@ -22,6 +23,8 @@ #include "string-stream.h" #include "try-catch-impl.h" +static struct mutex kunit_run_lock; + /* * Hook to fail the current test and print an error message to the log. */ @@ -702,6 +705,11 @@ int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_ return 0; } + /* Use mutex lock to guard against running tests concurrently. */ + if (mutex_lock_interruptible(&kunit_run_lock)) { + pr_err("kunit: test interrupted\n"); + return -EINTR; + } static_branch_inc(&kunit_running); for (i = 0; i < num_suites; i++) { @@ -710,6 +718,7 @@ int __kunit_test_suites_init(struct kunit_suite * const * const suites, int num_ } static_branch_dec(&kunit_running); + mutex_unlock(&kunit_run_lock); return 0; } EXPORT_SYMBOL_GPL(__kunit_test_suites_init); @@ -869,6 +878,10 @@ static int __init kunit_init(void) kunit_install_hooks(); kunit_debugfs_init(); + + /* Initialize lock to guard against running tests concurrently. */ + mutex_init(&kunit_run_lock); + #ifdef CONFIG_MODULES return register_module_notifier(&kunit_mod_nb); #else base-commit: 582eb3aeed2d06b122fba95518b84506d3d4ceb9 -- 2.41.0.694.ge786442a9b-goog

2 years, 4 months

1
1
0 0

[PATCH mptcp-next v13 0/4] bpf: Force to MPTCP

by Geliang Tang

As is described in the "How to use MPTCP?" section in MPTCP wiki [1]: "Your app should create sockets with IPPROTO_MPTCP as the proto: ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be forced to create and use MPTCP sockets instead of TCP ones via the mptcpize command bundled with the mptcpd daemon." But the mptcpize (LD_PRELOAD technique) command has some limitations [2]: - it doesn't work if the application is not using libc (e.g. GoLang apps) - in some envs, it might not be easy to set env vars / change the way apps are launched, e.g. on Android - mptcpize needs to be launched with all apps that want MPTCP: we could have more control from BPF to enable MPTCP only for some apps or all the ones of a netns or a cgroup, etc. - it is not in BPF, we cannot talk about it at netdev conf. So this patchset attempts to use BPF to implement functions similer to mptcpize. The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP. [1] https://github.com/multipath-tcp/mptcp_net-next/wiki [2] https://github.com/multipath-tcp/mptcp_net-next/issues/79 v13: - drop "Use random netns name for mptcp" patch. v12: - update diag_* log of update_socket_protocol. - add 'ip netns show' after 'ip netns del' to check if there is a test did not clean up its netns. - return libbpf_get_error() instead of -EIO for the error from open_and_load(). - Use getsockopt(SOL_PROTOCOL) to verify mptcp protocol intead of using 'ss -tOni'. v11: - add comments about outputs of 'ss' and 'nstat'. - use "err = verify_mptcpify()" instead of using =+. v10: - drop "#ifdef CONFIG_BPF_JIT". - include vmlinux.h and bpf_tracing_net.h to avoid defining some macros. - drop unneeded checks for mptcp. v9: - update comment for 'update_socket_protocol'. v8: - drop the additional checks on the 'protocol' value after the 'update_socket_protocol()' call. v7: - add __weak and __diag_* for update_socket_protocol. v6: - add update_socket_protocol. v5: - add bpf_mptcpify helper. v4: - use lsm_cgroup/socket_create v3: - patch 8: char cmd[128]; -> char cmd[256]; v2: - Fix build selftests errors reported by CI Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79 Geliang Tang (4): bpf: Add update_socket_protocol hook selftests/bpf: Add two mptcp netns helpers selftests/bpf: Fix error checks of mptcp open_and_load selftests/bpf: Add mptcpify test net/mptcp/bpf.c | 15 ++ net/socket.c | 26 +++- .../testing/selftests/bpf/prog_tests/mptcp.c | 141 +++++++++++++++--- tools/testing/selftests/bpf/progs/mptcpify.c | 20 +++ 4 files changed, 182 insertions(+), 20 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c -- 2.35.3

2 years, 4 months

2
7
0 0

[PATCH v2] rcutorture: Copy out ftrace into its own console file

by Joel Fernandes (Google)

From: Joel Fernandes (Google) <joel(a)joelfernandes.org> Often times during debugging, it is difficult to jump to the ftrace dump in the console log and treat it independent of the result of the log file. Copy the contents of the buffers into its own file to make it easier to refer to the ftrace dump. The original ftrace dump is still available in the console log if it is desired to refer to it there. Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org> --- v1-v2: Change log updates, "From:" updates. .../selftests/rcutorture/bin/functions.sh | 24 +++++++++++++++++++ .../selftests/rcutorture/bin/parse-console.sh | 7 ++++++ 2 files changed, 31 insertions(+) mode change 100644 => 100755 tools/testing/selftests/rcutorture/bin/functions.sh diff --git a/tools/testing/selftests/rcutorture/bin/functions.sh b/tools/testing/selftests/rcutorture/bin/functions.sh old mode 100644 new mode 100755 index b8e2ea23cb3f..2ec4ab87a7f0 --- a/tools/testing/selftests/rcutorture/bin/functions.sh +++ b/tools/testing/selftests/rcutorture/bin/functions.sh @@ -331,3 +331,27 @@ specify_qemu_net () { echo $1 -net none fi } + +# Extract the ftrace output from the console log output +# The ftrace output looks in the logs looks like: +# Dumping ftrace buffer: +# --------------------------------- +# [...] +# --------------------------------- +extract_ftrace_from_console() { + awk ' + /Dumping ftrace buffer:/ { + capture = 1 + next + } + /---------------------------------/ { + if(capture == 1) { + capture = 2 + next + } else if(capture == 2) { + capture = 0 + } + } + capture == 2 + ' "$1"; +} diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh index 9ab0f6bc172c..e3d2f69ec0fb 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-console.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh @@ -182,3 +182,10 @@ if ! test -s $file.diags then rm -f $file.diags fi + +# Call extract_ftrace_from_console function, if the output is empty, +# don't create $file.ftrace. Otherwise output the results to $file.ftrace +extract_ftrace_from_console $file > $file.ftrace +if [ ! -s $file.ftrace ]; then + rm -f $file.ftrace +fi -- 2.41.0.640.ga95def55d0-goog

2 years, 4 months

3
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror August 2023