February 2025 - Linux-kselftest-mirror

[PATCHv4 RESEND net-next 0/2] selftests: wireguards: use nftables for testing

by Hangbin Liu

This patch set convert iptables to nftables for wireguard testing, as iptables is deparated and nftables is the default framework of most releases. v3: drop iptables directly (Jason A. Donenfeld) Also convert to using nft for qemu testing (Jason A. Donenfeld) v2: use one nft table for testing (Phil Sutter) Hangbin Liu (2): selftests: wireguards: convert iptables to nft selftests: wireguard: update to using nft for qemu test tools/testing/selftests/wireguard/netns.sh | 29 +++++++++----- .../testing/selftests/wireguard/qemu/Makefile | 40 ++++++++++++++----- .../selftests/wireguard/qemu/kernel.config | 7 ++-- 3 files changed, 53 insertions(+), 23 deletions(-) -- 2.46.0

5 months

3
13
0 0

[PATCH 0/3] bitmap: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being printf and scanf), the rest having been converted to KUnit. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 bitmap. I've already sent out a conversion series for each of printf[0] and scanf[1]. There was a previous attempt[2] to do this in July 2024. Please bear with me as I try to understand and address the objections from that time. I've spoken with Muhammad Usama Anjum, the author of that series, and received their approval to "take over" this work. Here we go... On 7/26/24 11:45 PM, John Hubbard wrote: > > This changes the situation from "works for Linus' tab completion > case", to "causes a tab completion problem"! :) > > I think a tests/ subdir is how we eventually decided to do this [1], > right? > > So: > > lib/tests/bitmap_kunit.c > > [1] https://lore.kernel.org/20240724201354.make.730-kees@kernel.org This is true and unfortunate, but not trivial to fix because new kallsyms tests were placed in lib/tests in commit 84b4a51fce4c ("selftests: add new kallsyms selftests") *after* the KUnit filename best practices were adopted. I propose that the KUnit maintainers blaze this trail using `string_kunit.c` which currently still lives in lib/ despite the KUnit docs giving it as an example at lib/tests/. On 7/27/24 12:24 AM, Shuah Khan wrote: > > This change will take away the ability to run bitmap tests during > boot on a non-kunit kernel. > > Nack on this change. I wan to see all tests that are being removed > from lib because they have been converted - also it doesn't make > sense to convert some tests like this one that add the ability test > during boot. This point was also discussed in another thread[3] in which: On 7/27/24 12:35 AM, Shuah Khan wrote: > > Please make sure you aren't taking away the ability to run these tests during > boot. > > It doesn't make sense to convert every single test especially when it > is intended to be run during boot without dependencies - not as a kunit test > but a regression test during boot. > > bitmap is one example - pay attention to the config help test - bitmap > one clearly states it runs regression testing during boot. Any test that > says that isn't a candidate for conversion. > > I am going to nack any such conversions. The crux of the argument seems to be that the config help text is taken to describe the author's intent with the fragment "at boot". I think this may be a case of confirmation bias: I see at least the following KUnit tests with "at boot" in their help text: - CPUMASK_KUNIT_TEST - BITFIELD_KUNIT - CHECKSUM_KUNIT - UTIL_MACROS_KUNIT It seems to me that the inference being made is that any test that runs "at boot" is intended to be run by both developers and users, but I find no evidence that bitmap in particular would ever provide additional value when run by users. There's further discussion about KUnit not being "ideal for cases where people would want to check a subsystem on a running kernel", but I find no evidence that bitmap in particular is actually testing the running kernel; it is a unit test of the bitmap functions, which is also stated in the config help text. David Gow made many of the same points in his final reply[4], which was never replied to. Link: https://lore.kernel.org/all/20250207-printf-kunit-convert-v2-0-057b23860823… [0] Link: https://lore.kernel.org/all/20250207-scanf-kunit-convert-v4-0-a23e2afaede8@… [1] Link: https://lore.kernel.org/all/20240726110658.2281070-1-usama.anjum@collabora.… [2] Link: https://lore.kernel.org/all/327831fb-47ab-4555-8f0b-19a8dbcaacd7@collabora.… [3] Link: https://lore.kernel.org/all/CABVgOSmMoPD3JfzVd4VTkzGL2fZCo8LfwzaVSzeFimPrhg… [4] Thanks for your attention. Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Tamir Duberstein (3): bitmap: remove _check_eq_u32_array bitmap: convert self-test to KUnit bitmap: break kunit into test cases MAINTAINERS | 2 +- arch/m68k/configs/amiga_defconfig | 1 - arch/m68k/configs/apollo_defconfig | 1 - arch/m68k/configs/atari_defconfig | 1 - arch/m68k/configs/bvme6000_defconfig | 1 - arch/m68k/configs/hp300_defconfig | 1 - arch/m68k/configs/mac_defconfig | 1 - arch/m68k/configs/multi_defconfig | 1 - arch/m68k/configs/mvme147_defconfig | 1 - arch/m68k/configs/mvme16x_defconfig | 1 - arch/m68k/configs/q40_defconfig | 1 - arch/m68k/configs/sun3_defconfig | 1 - arch/m68k/configs/sun3x_defconfig | 1 - arch/powerpc/configs/ppc64_defconfig | 1 - lib/Kconfig.debug | 24 +- lib/Makefile | 2 +- lib/{test_bitmap.c => bitmap_kunit.c} | 454 +++++++++++++--------------------- tools/testing/selftests/lib/bitmap.sh | 3 - tools/testing/selftests/lib/config | 1 - 19 files changed, 195 insertions(+), 304 deletions(-) --- base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b change-id: 20250207-bitmap-kunit-convert-92d3147b2eee Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

5 months

8
26
0 0

[PATCH 0/2] fs/proc/task_mmu: add guard region bit to pagemap

by Lorenzo Stoakes

Currently there is no means of determining whether a give page in a mapping range is designated a guard region (as installed via madvise() using the MADV_GUARD_INSTALL flag). This is generally not an issue, but in some instances users may wish to determine whether this is the case. This series adds this ability via /proc/$pid/pagemap, updates the documentation and adds a self test to assert that this functions correctly. Lorenzo Stoakes (2): fs/proc/task_mmu: add guard region bit to pagemap tools/selftests: add guard region test for /proc/$pid/pagemap Documentation/admin-guide/mm/pagemap.rst | 3 +- fs/proc/task_mmu.c | 6 ++- tools/testing/selftests/mm/guard-regions.c | 47 ++++++++++++++++++++++ tools/testing/selftests/mm/vm_util.h | 1 + 4 files changed, 55 insertions(+), 2 deletions(-) -- 2.48.1

5 months, 1 week

5
14
0 0

[PATCH bpf-next 0/2] selftests/bpf: Migrate test_xdp_vlan.sh into test_progs

by Bastien Curutchet (eBPF Foundation)

Hi all, This patch series continues the work to migrate the script tests into prog_tests. test_xdp_vlan.sh tests the ability of an XDP program to modify the VLAN ids on the fly. This isn't currently covered by an other test in the test_progs framework so I add a new file prog_tests/xdp_vlan.c that does the exact same tests (same network topology, same BPF programs) and remove the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- Bastien Curutchet (eBPF Foundation) (2): selftests/bpf: test_xdp_vlan: Rename BPF sections selftests/bpf: Migrate test_xdp_vlan.sh into test_progs tools/testing/selftests/bpf/Makefile | 4 +- tools/testing/selftests/bpf/prog_tests/xdp_vlan.c | 175 ++++++++++++++++ tools/testing/selftests/bpf/progs/test_xdp_vlan.c | 20 +- tools/testing/selftests/bpf/test_xdp_vlan.sh | 233 --------------------- .../selftests/bpf/test_xdp_vlan_mode_generic.sh | 9 - .../selftests/bpf/test_xdp_vlan_mode_native.sh | 9 - 6 files changed, 186 insertions(+), 264 deletions(-) --- base-commit: a814b9be27fb3c3f49343aee4b015b76f5875558 change-id: 20250130-xdp_vlan-e825cc4df14a Best regards, -- Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>

5 months, 1 week

4
8
0 0

[PATCH v4 00/12] Direct Map Removal for guest_memfd

by Patrick Roy

Unmapping virtual machine guest memory from the host kernel's direct map is a successful mitigation against Spectre-style transient execution issues: If the kernel page tables do not contain entries pointing to guest memory, then any attempted speculative read through the direct map will necessarily be blocked by the MMU before any observable microarchitectural side-effects happen. This means that Spectre-gadgets and similar cannot be used to target virtual machine memory. Roughly 60% of speculative execution issues fall into this category [1, Table 1]. This patch series extends guest_memfd with the ability to remove its memory from the host kernel's direct map, to be able to attain the above protection for KVM guests running inside guest_memfd. === Changes to RFC v3 === - Settle relationship between direct map removal and shared/private memory in guest_memfd (David H.) - Omit TLB flushes upon direct map removal again - Settle uABI for how KVM accesses guest memory in non-CoCo guest_memfd VMs (upstream guest_memfd calls) - Add selftests that exercise the codepaths of non-CoCo guest_memfd VMs Lastly, this series is rebased on top of Fuad's v4 for shared mapping of guest_memfd [2]. The KVM parts should also apply on top of 0ad2507d5d93 ("Linux 6.14-rc3"), but the selftest patches need Fuad's series as base. === Overview === guest_memfd should be usable for "non-CoCo" VMs - virtual machines where host userspace is trusted (e.g. can have access to all of guest memory), but which should still be hardened against speculative execution attacks (Spectre, etc.) staged through potentially existing gadgets in the host kernel. To attain this hardening, unmap guest memory from the host kernels address space (e.g. zap direct map entries), while allowing KVM to continue accessing guest memory through userspace mappings. This works because KVM already almost always uses userspace mappings whenever KVM needs to access guest memory - the only parts that require direct map entries (because they use GUP) are KVM's MMU, and kvm-clock on x86. Building on top of guest_memfd sidesteps the MMU problem, as for memslots with KVM_MEM_GUEST_MEMFD set, the MMU consumes fd + offset directly without going through any VMAs. kvm-clock on the other hand is not strictly needed (guests boot fine without it), so ignore it for now. === Implementation === Make KVM_CREATE_GUEST_MEMFD accept a flag (KVM_GMEM_NO_DIRECT_MAP) that instructs it to remove newly allocated folios from the host kernels direct map immediately after preparation. Nothing further is needed to make non-CoCo VMs work - particularly, KVM does not need to be taught any special ways of accessing guest memory if it is in guest_memfd. Userspace can simply mmap guest_memfd (via KVM_GMEM_SHARED_MEM added in Fuad's series), and set the memslot's userspace_addr to this userspace mapping of guest_memfd. === Open Questions === In this patch series, stale TLB entries do not get flushed after direct map entries are marked as not present. This is fine from a functional point of view (as the mapping is still valid, it's just temporarily not supposed to be used), but pokes a theoretical hole into the speculation protection: Something could try to keep alive stale TLB entries for specific pages until the guest starts using them for sensitive information, and then stage a Spectre attack on that memory, despite it being unmapped. In practice, this would require knowing in advance, at gmem fault-time, which pages will eventually contain information of interest, and then preventing these specific TLB entries from getting naturally evicted (where the number of pages that can be targeted like this is limited by the size of the TLB). These seem to be fairly difficult requisites to fulfill, but we were wondering what the community thinks. === Summary === Patch 1 adds a struct address_space flag that indices that folios in a mapping are direct map removed, and threads it through mm code to ensure direct map removed folios don't end up in places where they can cause mayhem (particularly, we reject them in get_user_pages). Since these checks end up being duplicates of already existing checks for secretmem folios, patch 2 unifies the two by using the new address_space flag for secretmem mappings. Patches 3 through 5 are about support for direct map removal in guest_memfd, while patches 6 through 12 are about testing the non-CoCo setup in KVM selftests, with patches 6 through 9 being preparatory, and patches 10 through 12 adding the actual test cases. [1]: https://download.vusec.net/papers/quarantine_raid23.pdf [2]: https://lore.kernel.org/kvm/20250218172500.807733-1-tabba@google.com/ [RFC v1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/ [RFC v2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/ [RFC v3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/ Patrick Roy (12): mm: introduce AS_NO_DIRECT_MAP mm/secretmem: set AS_NO_DIRECT_MAP instead of special-casing KVM: guest_memfd: Add flag to remove from direct map KVM: Add capability to discover KVM_GMEM_NO_DIRECT_MAP support KVM: Documentation: document KVM_GMEM_NO_DIRECT_MAP flag KVM: selftests: load elf via bounce buffer KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 KVM: selftests: Add guest_memfd based vm_mem_backing_src_types KVM: selftests: stuff vm_mem_backing_src_type into vm_shape KVM: selftests: adjust test_create_guest_memfd_invalid KVM: selftests: set KVM_GMEM_NO_DIRECT_MAP in mem conversion tests KVM: selftests: Test guest execution from direct map removed gmem Documentation/virt/kvm/api.rst | 13 ++++ include/linux/pagemap.h | 16 +++++ include/linux/secretmem.h | 18 ------ include/uapi/linux/kvm.h | 3 + lib/buildid.c | 4 +- mm/gup.c | 14 +--- mm/mlock.c | 2 +- mm/secretmem.c | 6 +- .../testing/selftests/kvm/guest_memfd_test.c | 2 +- .../testing/selftests/kvm/include/kvm_util.h | 29 ++++++--- .../testing/selftests/kvm/include/test_util.h | 8 +++ tools/testing/selftests/kvm/lib/elf.c | 8 +-- tools/testing/selftests/kvm/lib/io.c | 23 +++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 64 +++++++++++-------- tools/testing/selftests/kvm/lib/test_util.c | 8 +++ tools/testing/selftests/kvm/lib/x86/sev.c | 1 + .../selftests/kvm/pre_fault_memory_test.c | 1 + .../selftests/kvm/set_memory_region_test.c | 40 ++++++++++++ .../kvm/x86/private_mem_conversions_test.c | 7 +- virt/kvm/guest_memfd.c | 24 ++++++- virt/kvm/kvm_main.c | 5 ++ 21 files changed, 214 insertions(+), 82 deletions(-) base-commit: da40655874b54a2b563f8ceb3ed839c6cd38e0b4 -- 2.48.1

5 months, 1 week

3
25
0 0

[PATCH v3 00/10] selftests/mm: Some cleanups from trying to run them

by Brendan Jackman

I never had much luck running mm selftests so I spent a few hours digging into why. Looks like most of the reason is missing SKIP checks, so this series is just adding a bunch of those that I found. I did not do anything like all of them, just the ones I spotted in gup_longterm, gup_test, mmap, userfaultfd and memfd_secret. It's a bit unfortunate to have to skip those tests when ftruncate() fails, but I don't have time to dig deep enough into it to actually make them pass. I have observed the issue on 9pfs and heard rumours that NFS has a similar problem. I'm now able to run these test groups successfully: - mmap - gup_test - compaction - migration - page_frag - userfaultfd Signed-off-by: Brendan Jackman <jackmanb(a)google.com> --- Changes in v3: - Added fix for userfaultfd tests. - Dropped attempts to use sudo. - Fixed garbage printf in uffd-stress. (Added EXTRA_CFLAGS=-Werror FORCE_TARGETS=1 to my scripts to prevent such errors happening again). - Fixed missing newlines in ksft_test_result_skip() calls. - Link to v2: https://lore.kernel.org/r/20250221-mm-selftests-v2-0-28c4d66383c5@google.com Changes in v2 (Thanks to Dev for the reviews): - Improve and cleanup some error messages - Add some extra SKIPs - Fix misnaming of nr_cpus variable in uffd tests - Link to v1: https://lore.kernel.org/r/20250220-mm-selftests-v1-0-9bbf57d64463@google.com --- Brendan Jackman (10): selftests/mm: Report errno when things fail in gup_longterm selftests/mm: Skip uffd-stress if userfaultfd not available selftests/mm: Skip uffd-wp-mremap if userfaultfd not available selftests/mm/uffd: Rename nr_cpus -> nr_threads selftests/mm: Print some details when uffd-stress gets bad params selftests/mm: Don't fail uffd-stress if too many CPUs selftests/mm: Skip map_populate on weird filesystems selftests/mm: Skip gup_longerm tests on weird filesystems selftests/mm: Drop unnecessary sudo usage selftests/mm: Ensure uffd-wp-mremap gets pages of each size tools/testing/selftests/mm/gup_longterm.c | 45 ++++++++++++++++++---------- tools/testing/selftests/mm/map_populate.c | 7 +++++ tools/testing/selftests/mm/run_vmtests.sh | 25 ++++++++++++++-- tools/testing/selftests/mm/uffd-common.c | 8 ++--- tools/testing/selftests/mm/uffd-common.h | 2 +- tools/testing/selftests/mm/uffd-stress.c | 42 ++++++++++++++++---------- tools/testing/selftests/mm/uffd-unit-tests.c | 2 +- tools/testing/selftests/mm/uffd-wp-mremap.c | 5 +++- 8 files changed, 95 insertions(+), 41 deletions(-) --- base-commit: 76544811c850a1f4c055aa182b513b7a843868ea change-id: 20250220-mm-selftests-2d7d0542face Best regards, -- Brendan Jackman <jackmanb(a)google.com>

5 months, 1 week

4
36
0 0

[PATCH v4 0/6] Fix some issues related to an interrupt type in pci_endpoint_test

by Kunihiko Hayashi

This series solves some issues about global "irq_type" that is used for indicating the current type for users. In addition, avoid an unexpected warning that occur due to interrupts remaining after displaying an error caused by devm_request_irq(). Patch 1 includes adding GET_IRQTYPE test (check for failure). Patch 2-4 include fixes for stable kernels that have global "irq_type". Patch 5-6 include improvements for the latest. Changes since v3: - Add GET_IRQTYPE check to pci_endpoint test in selftests - Add the reason why global variables aren't necessary (patch 5/6) - Add Reviewed-by: lines (patch {2, 4, 6}/6) Changes since v2: - Rebase to v6.14-rc1 - Update message to clarify, and add result of call trace (patch 1/5) - Add Reviewed-by: lines (patch 2/5) - Add new patch to remove global "irq_type" variable (patch 4/5) - Add new patch to replace "devm" version of IRQ functions (patch 5/5) Changes since v1: - Divide original patch into two - Add an error message example - Add "pcitest" display example - Add a patch to fix an interrupt remaining issue Kunihiko Hayashi (6): selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error misc: pci_endpoint_test: Fix displaying irq_type after request_irq error misc: pci_endpoint_test: Fix irq_type to convey the correct type misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi' misc: pci_endpoint_test: Do not use managed irq functions drivers/misc/pci_endpoint_test.c | 31 +++++++------------ .../pci_endpoint/pci_endpoint_test.c | 11 ++++++- 2 files changed, 21 insertions(+), 21 deletions(-) -- 2.25.1

5 months, 2 weeks

4
11
0 0

[PATCH v7 00/10] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

by Nicolin Chen

Following the previous vIOMMU series, this adds another vDEVICE structure, representing the association from an iommufd_device to an iommufd_viommu. This gives the whole architecture a new "v" layer: _______________________________________________________________________ | iommufd (with vIOMMU/vDEVICE) | | _____________ _____________ | | | | | | | | |----------------| vIOMMU |<---| vDEVICE |<------| | | | | | |_____________| | | | | ______ | | _____________ ___|____ | | | | | | | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | |______|________|______________|__________________|_______________|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| |____________| storage|____________| |____________| |______| This vDEVICE object is used to collect and store all vIOMMU-related device information/attributes in a VM. As an initial series for vDEVICE, add only the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM: e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vRID of Intel VT-d to a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID of the device against the physical IOMMU instance. This is essential for a vIOMMU-based invalidation, where the request contains a device's vID for a device cache flush, e.g. ATC invalidation. Therefore, with this vDEVICE object, support a vIOMMU-based invalidation, by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache with a given driver data. As for the implementation of the series, add driver support in ARM SMMUv3 for a real world use case. This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v7 (QEMU branch for testing will be provided in Jason's nesting series) Changelog v7 * Added "Reviewed-by" from Jason * Corrected a line of comments in iommufd_vdevice_destroy() v6 https://lore.kernel.org/all/cover.1730313494.git.nicolinc@nvidia.com/ * Fixed kdoc in the uAPI header * Fixed indentations in iommufd.rst * Replaced vdev->idev with vdev->dev * Added "Reviewed-by" from Kevin and Jason * Updated kdoc of struct iommu_vdevice_alloc * Fixed lockdep function call in iommufd_viommu_find_dev * Added missing iommu_dev validation between viommu and idev * Skipped SMMUv3 driver changes (to post in a separate series) * Replaced !cache_invalidate_user in WARN_ON of the allocation path with cache_invalidate_user validation in iommufd_hwpt_invalidate v5 https://lore.kernel.org/all/cover.1729897278.git.nicolinc@nvidia.com/ * Dropped driver-allocated vDEVICE support * Changed vdev_to_dev helper to iommufd_viommu_find_dev v4 https://lore.kernel.org/all/cover.1729555967.git.nicolinc@nvidia.com/ * Added missing brackets in switch-case * Fixed the unreleased idev refcount issue * Reworked the iommufd_vdevice_alloc allocator * Dropped support for IOMMU_VIOMMU_TYPE_DEFAULT * Added missing TEST_LENGTH and fail_nth coverages * Added a verification to the driver-allocated vDEVICE object * Added an iommufd_vdevice_abort for a missing mutex protection * Added a u64 structure arm_vsmmu_invalidation_cmd for user command conversion v3 https://lore.kernel.org/all/cover.1728491532.git.nicolinc@nvidia.com/ * Added Jason's Reviewed-by * Split this invalidation part out of the part-1 series * Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl * Reduced viommu_api functions by allowing drivers to access viommu and vdevice structure directly * Dropped vdevs_rwsem by using xa_lock instead * Dropped arm_smmu_cache_invalidate_user v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/ * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/ Thanks! Nicolin Jason Gunthorpe (1): iommu: Add iommu_copy_struct_from_full_user_array helper Nicolin Chen (9): iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommu/viommu: Add cache_invalidate to iommufd_viommu_ops iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommufd/viommu: Add iommufd_viommu_find_dev helper iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl Documentation: userspace-api: iommufd: Update vDEVICE drivers/iommu/iommufd/iommufd_private.h | 18 ++ drivers/iommu/iommufd/iommufd_test.h | 30 +++ include/linux/iommu.h | 48 ++++- include/linux/iommufd.h | 22 ++ include/uapi/linux/iommufd.h | 31 ++- tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++ drivers/iommu/iommufd/driver.c | 13 ++ drivers/iommu/iommufd/hw_pagetable.c | 40 +++- drivers/iommu/iommufd/main.c | 6 + drivers/iommu/iommufd/selftest.c | 98 ++++++++- drivers/iommu/iommufd/viommu.c | 76 +++++++ tools/testing/selftests/iommu/iommufd.c | 204 +++++++++++++++++- .../selftests/iommu/iommufd_fail_nth.c | 4 + Documentation/userspace-api/iommufd.rst | 41 +++- 14 files changed, 688 insertions(+), 26 deletions(-) base-commit: 0780dd4af09a5360392f5c376c35ffc2599a9c0e -- 2.43.0

5 months, 2 weeks

4
14
0 0

[PATCH v15 00/15] PCI: EP: Add RC-to-EP doorbell with platform MSI controller

by Frank Li

┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐ │ │ │ │ │ │ │ │ │ PCI Endpoint │ │ PCI Host │ │ │ │ │ │ │ │ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │ │ │ │ │ │ │ │ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │ │ Controller │ │ update doorbell register address│ │ │ │ │ │ for BAR │ │ │ │ │ │ │ │ 3. Write BAR<n>│ │ │◄──┼───────────────────────────────────┼───┤ │ │ │ │ │ │ │ │ ├──►│ 4.Irq Handle │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └────────────┘ └───────────────────────────────────┘ └────────────────┘ This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/ Original patch only target to vntb driver. But actually it is common method. This patches add new API to pci-epf-core, so any EP driver can use it. Previous v2 discussion here. https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/ Changes in v15: - rebase to v6.14-rc1 - fix build issue find by kernel test robot - Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com Changes in v14: Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As a result, the approach has been reverted to the v9 method. However, there are several improvements: MSI now supports msi-map in addition to msi-parent. - The struct device: id is used as the endpoint function (EPF) device identity to map to the stream ID (sideband information). - The EPC device tree source (DTS) utilizes msi-map to provide such information. - The EPF device's of_node is set to the EPC controller’s node. This approach is commonly used for multi-function device (MFD) platform child devices, allowing them to inherit properties from the MFD device’s DTS, such as reset-cells and gpio-cells. This method is well-suited for the current case, as the EPF is inherently created/binded to the EPC and should inherit the EPC’s DTS node properties. Additionally: Since the basic IMX95 LUT support has already been merged into the mainline, a DTS and driver increment patch is added to complete the solution. The patch is rebased onto the latest linux-next tree and aligned with the new pcitest framework. - Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com Changes in v13: - Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI - Change request id as func | vfunc << 3 - Remove IRQ_DOMAIN_MSI_IMMUTABLE Thomas Gleixner: I hope capture all your points in review comments. If missed, let me know. - Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com Changes in v12: - Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function irq_domain_msi_is_immuatble(). - split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches - Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com Changes in v11: - Change to use MSI_FLAG_MSG_IMMUTABLE - Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com Changes in v10: Thomas Gleixner: There are big change in pci-ep-msi.c. I am sure if go on the corrent path. The key improvement is remove only 1 function devices's limitation. I use new patch for imutable check, which relative additional feature compared to base enablement patch. - Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info - Remove only support 1 endpoint function limiation. - Create one MSI domain for each endpoint function devices. - Use "msi-map" in pci ep controler node, instead of of msi-parent. first argument is (func_no << 8 | vfunc_no) - Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com Changes in v9 - Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering - Remove API pci_epf_align_inbound_addr_lo_hi - Move doorbell_alloc in to doorbell_enable function. - Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com Changes in v8: - update helper function name to pci_epf_align_inbound_addr() - Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com Changes in v7: - Add helper function pci_epf_align_addr(); - Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com Changes in v6: - change doorbell_addr to doorbell_offset - use round_down() - add Niklas's test by tag - rebase to pci/endpoint - Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com Changes in v5: - Move request_irq to epf test function driver for more flexiable user case - Add fixed size bar handler - Some minor improvememtn to see each patches's changelog. - Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com Changes in v4: - Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs() - Use new method to avoid compatible problem. Add new command DOORBELL_ENABLE and DOORBELL_DISABLE. pcitest -B send DOORBELL_ENABLE first, EP test function driver try to remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old driver don't support new command, so failure return, not side effect. After test, DOORBELL_DISABLE command send out to recover original map, so pcitest bar test can pass as normal. - Other detail change see each patches's change log - Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com Change from v2 to v3 - Fixed manivannan's comments - Move common part to pci-ep-msi.c and pci-ep-msi.h - rebase to 6.12-rc1 - use RevID to distingiush old version mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid ^^^^^^ to enable platform msi support. ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep - use new device ID, which identify support doorbell to avoid broken compatility. Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices keep the same behavior as before. EP side RC with old driver RC with new driver PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled Other device ID doorbell disabled* doorbell disabled* * Behavior remains unchanged. Change from v1 to v2 - Add missed patch for endpont/pci-epf-test.c - Move alloc and free to epc driver from epf. - Provide general help function for EPC driver to alloc platform msi irq. - Fixed manivannan's comments. Signed-off-by: Frank Li <Frank.Li(a)nxp.com> --- Frank Li (15): platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable() irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask PCI: endpoint: Set ID and of_node for function driver PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment PCI: endpoint: pci-epf-test: Add doorbell test support misc: pci_endpoint_test: Add doorbell test case selftests: pci_endpoint: Add doorbell test case pci: imx6: Add helper function imx_pcie_add_lut_by_rid() pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode arm64: dts: imx95: Add msi-map for pci-ep device arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file arch/arm64/boot/dts/freescale/Makefile | 3 + .../dts/freescale/imx95-19x19-evk-pcie1-ep.dtso | 21 ++++ arch/arm64/boot/dts/freescale/imx95.dtsi | 1 + drivers/base/platform-msi.c | 1 + drivers/irqchip/irq-gic-v3-its-msi-parent.c | 8 ++ drivers/irqchip/irq-gic-v3-its.c | 2 +- drivers/misc/pci_endpoint_test.c | 81 +++++++++++++ drivers/pci/controller/dwc/pci-imx6.c | 25 ++-- drivers/pci/endpoint/Makefile | 1 + drivers/pci/endpoint/functions/pci-epf-test.c | 132 +++++++++++++++++++++ drivers/pci/endpoint/pci-ep-msi.c | 90 ++++++++++++++ drivers/pci/endpoint/pci-epf-core.c | 48 ++++++++ include/linux/irqdomain.h | 7 ++ include/linux/pci-ep-msi.h | 28 +++++ include/linux/pci-epf.h | 21 ++++ include/uapi/linux/pcitest.h | 1 + .../selftests/pci_endpoint/pci_endpoint_test.c | 25 ++++ 17 files changed, 486 insertions(+), 9 deletions(-) --- base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b change-id: 20241010-ep-msi-8b4cab33b1be Best regards, --- Frank Li <Frank.Li(a)nxp.com>

5 months, 2 weeks

4
28
0 0

[PATCH v9 0/8] Buddy allocator like (or non-uniform) folio split

by Zi Yan

Hi all, This patchset adds a new buddy allocator like (or non-uniform) large folio split from a order-n folio to order-m with m < n. It reduces 1. the total number of after-split folios from 2^(n-m) to n-m+1; 2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to n/6-m/6, assuming XA_CHUNK_SHIFT=6; 3. keep more large folios after a split from all order-m folios to order-(n-1) to order-m folios. For example, to split an order-9 to order-0, folio split generates 10 (or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios. Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split(). xfstests quick group passed for both tmpfs and xfs. It is on top of mm-everything-2025-02-26-03-56 with V8 reverted. It is ready to be merged. Changelog === From V8[11]: 1. Removed gfp parameter from xas_try_split() and GFP_NOWAIT is used all the time. (per Baolin Wang) 2. Used __xas_init_node_for_split() instead of __xas_alloc_node_for_split() and moved node allocation out. It fixed a bug when xa_node is pre-allocated by xas_nomem() before xas_try_split() is called without being initialized for split. From V7[9]: 1. Fixed a wrong function name in lib/test_xarray.c. 2. Made __split_folio_to_order() never fail, since the old order check is already done in __folio_split(). (per David Hildenbrand) 3. Fixed an issue reported by syzbot[10] by not dropping the original folio during truncate. 4. Fixed a WARNING when READ_ONLY_THP_FOR_FS is enabled. (Thank David Hildenbrand for reporting the issue) 5. Used two separate struct page* parameters, split_at and lock_at, to specify at which subpage the non-uniform split happens and which subpage to keep locked after the split, respectively. It improves code readability. From V6[8]: 1. Added an xarray function xas_try_split() to support iterative folio split, removing the need of using xas_split_alloc() and xas_split(). The function guarantees that at most one xa_node is allocated for each call. 2. Added concrete numbers of after-split folios and xa_node savings to cover letter, commit log. (per Andrew) From V5[7]: 1. Split shmem to any lower order patches are in mm tree, so dropped from this series. 2. Rename split_folio_at() to try_folio_split() to clarify that non-uniform split will not be used if it is not supported. From V4[6]: 1. Enabled shmem support in both uniform and buddy allocator like split and added selftests for it. 2. Added functions to check if uniform split and buddy allocator like split are supported for the given folio and order. 3. Made truncate fall back to uniform split if buddy allocator split is not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio). 4. Added the missing folio_clear_has_hwpoisoned() to __split_unmapped_folio(). From V3[5]: 1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra operations inside xas_split_alloc() are needed for correctness. 2. Enabled folio_split() for shmem and no issue was found with xfstests quick test group. 3. Split both ends of a truncate range in truncate_inode_partial_folio() to avoid wasting memory in shmem truncate (per David Hildenbrand). 4. Removed page_in_folio_offset() since page_folio() does the same thing. 5. Finished truncate related tests from xfstests quick test group on XFS and tmpfs without issues. 6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio. This check was missed in the prior versions. From V2[3]: 1. Incorporated all the feedback from Kirill[4]. 2. Used GFP_NOWAIT for xas_nomem(). 3. Tested the code path when xas_nomem() fails. 4. Added selftests for folio_split(). 5. Fixed no THP config build error. From V1[2]: 1. Split the original patch 1 into multiple ones for easy review (per Kirill). 2. Added xas_destroy() to avoid memory leak. 3. Fixed nr_dropped not used error (per kernel test robot). 4. Added proper error handling when xas_nomem() fails to allocate memory for xas_split() during buddy allocator like split. From RFC[1]: 1. Merged backend code of split_huge_page_to_list_to_order() and folio_split(). The same code is used for both uniform split and buddy allocator like split. 2. Use xas_nomem() instead of xas_split_alloc() for folio_split(). 3. folio_split() now leaves the first after-split folio unlocked, instead of the one containing the given page, since the caller of truncate_inode_partial_folio() locks and unlocks the first folio. 4. Extended split_huge_page debugfs to use folio_split(). 5. Added truncate_inode_partial_folio() as first user of folio_split(). Design === folio_split() splits a large folio in the same way as buddy allocator splits a large free page for allocation. The purpose is to minimize the number of folios after the split. For example, if user wants to free the 3rd subpage in a order-9 folio, folio_split() will split the order-9 folio as: O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache Since anon folio does not support order-1 yet. The split process is similar to existing approach: 1. Unmap all page mappings (split PMD mappings if exist); 2. Split meta data like memcg, page owner, page alloc tag; 3. Copy meta data in struct folio to sub pages, but instead of spliting the whole folio into multiple smaller ones with the same order in a shot, this approach splits the folio iteratively. Taking the example above, this approach first splits the original order-9 into two order-8, then splits left part of order-8 to two order-7 and so on; 4. Post-process split folios, like write mapping->i_pages for pagecache, adjust folio refcounts, add split folios to corresponding list; 5. Remap split folios 6. Unlock split folios. __split_unmapped_folio() and __split_folio_to_order() replace __split_huge_page() and __split_huge_page_tail() respectively. __split_unmapped_folio() uses different approaches to perform uniform split and buddy allocator like split: 1. uniform split: one single call to __split_folio_to_order() is used to uniformly split the given folio. All resulting folios are put back to the list after split. The folio containing the given page is left to caller to unlock and others are unlocked. 2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls to __split_folio_to_order() are used to split the given folio at order N to order N-1. After each call, the target folio is changed to the one containing the page, which is given as a folio_split() parameter. After each call, folios not containing the page are put back to the list. The folio containing the page is put back to the list when its order is new_order. All folios are unlocked except the first folio, which is left to caller to unlock. Patch Overview === 1. Patch 1 added a new xarray function xas_try_split() to perform iterative xarray split. 2. Patch 2 added __split_unmapped_folio() and __split_folio_to_order() to prepare for moving to new backend split code. 3. Patch 3 moved common code in split_huge_page_to_list_to_order() to __folio_split(). 4. Patch 4 added new folio_split() and made split_huge_page_to_list_to_order() share the new __split_unmapped_folio() with folio_split(). 5. Patch 5 removed no longer used __split_huge_page() and __split_huge_page_tail(). 6. Patch 6 added a new in_folio_offset to split_huge_page debugfs for folio_split() test. 7. Patch 7 used try_folio_split() for truncate operation. 8. Patch 8 added folio_split() tests. Any comments and/or suggestions are welcome. Thanks. [1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/ [3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/ [4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc… [5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/ [6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/ [7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/ [8] https://lore.kernel.org/linux-mm/20250205031417.1771278-1-ziy@nvidia.com/ [9] https://lore.kernel.org/linux-mm/20250211155034.268962-1-ziy@nvidia.com/ [10] https://lore.kernel.org/all/67af65cb.050a0220.21dd3.004a.GAE@google.com/ [11] https://lore.kernel.org/linux-mm/20250218235012.1542225-1-ziy@nvidia.com/ Zi Yan (8): xarray: add xas_try_split() to split a multi-index entry mm/huge_memory: add two new (not yet used) functions for folio_split() mm/huge_memory: move folio split common code to __folio_split() mm/huge_memory: add buddy allocator like (non-uniform) folio_split() mm/huge_memory: remove the old, unused __split_huge_page() mm/huge_memory: add folio_split() to debugfs testing interface mm/truncate: use buddy allocator like folio split for truncate operation selftests/mm: add tests for folio_split(), buddy allocator like split Documentation/core-api/xarray.rst | 14 +- include/linux/huge_mm.h | 36 + include/linux/xarray.h | 6 + lib/test_xarray.c | 52 ++ lib/xarray.c | 131 ++- mm/huge_memory.c | 755 ++++++++++++------ mm/truncate.c | 31 +- tools/testing/radix-tree/Makefile | 1 + .../selftests/mm/split_huge_page_test.c | 34 +- 9 files changed, 783 insertions(+), 277 deletions(-) -- 2.47.2

5 months, 2 weeks

5
30
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror February 2025