February 2025 - Linux-kselftest-mirror

[PATCH v1 1/3] selftests: pidfd: add missing sys/mount.h include in pidfd_fdinfo_test.c

by Peter Seiderer

Fix compile on openSUSE Tumbleweed (gcc-14.2.1, glibc-2.40): - add missing sys/mount.h include Fixes: pidfd_fdinfo_test.c: In function ‘child_fdinfo_nspid_test’: pidfd_fdinfo_test.c:230:13: error: implicit declaration of function ‘mount’ [-Wimplicit-function-declaration] 230 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0); | ^~~~~ Signed-off-by: Peter Seiderer <ps.report(a)gmx.net> --- tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c index f062a986e382..f718aac75068 100644 --- a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c +++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c @@ -13,6 +13,7 @@ #include <syscall.h> #include <sys/wait.h> #include <sys/mman.h> +#include <sys/mount.h> #include "pidfd.h" #include "../kselftest.h" -- 2.47.1

6 months

2
5
0 0

[PATCH v6 0/5] userfaultfd move option

by Suren Baghdasaryan

This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has long been implemented and maintained by Andrea in his local tree [1], but was not upstreamed due to lack of use cases where this approach would be better than allocating a new page and copying the contents. Previous upstraming attempts could be found at [6] and [7]. UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application needs pages to be allocated [2]. However, with UFFDIO_MOVE, if pages are available (in userspace) for recycling, as is usually the case in heap compaction algorithms, then we can avoid the page allocation and memcpy (done by UFFDIO_COPY). Also, since the pages are recycled in the userspace, we avoid the need to release (via madvise) the pages back to the kernel [3]. We see over 40% reduction (on a Google pixel 6 device) in the compacting thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was measured using a benchmark that emulates a heap compaction implementation using userfaultfd (to allow concurrent accesses by application threads). More details of the usecase are explained in [3]. Furthermore, UFFDIO_MOVE enables moving swapped-out pages without touching them within the same vma. Today, it can only be done by mremap, however it forces splitting the vma. TODOs for follow-up improvements: - cross-mm support. Known differences from single-mm and missing pieces: - memcg recharging (might need to isolate pages in the process) - mm counters - cross-mm deposit table moves - cross-mm test - document the address space where src and dest reside in struct uffdio_move - TLB flush batching. Will require extensive changes to PTL locking in move_pages_pte(). OTOH that might let us reuse parts of mremap code. Changes since v5 [10]: - added logic to split large folios in move_pages_pte(), per David Hildenbrand - added check for PAE before split_huge_pmd() to avoid the split if the move operation can't be done - replaced calls to default_huge_page_size() with read_pmd_pagesize() in uffd_move_pmd test, per David Hildenbrand - fixed the condition in uffd_move_test_common() checking if area alignment is needed Changes since v4 [9]: - added Acked-by in patch 1, per Peter Xu - added description for ctx, mm and mode parameters of move_pages(), per kernel test robot - added Reviewed-by's, per Peter Xu and Axel Rasmussen - removed unused operations in uffd_test_case_ops - refactored uffd-unit-test changes to avoid using global variables and handle pmd moves without page size overrides, per Peter Xu Changes since v3 [8]: - changed retry path in folio_lock_anon_vma_read() to unlock and then relock RCU, per Peter Xu - removed cross-mm support from initial patchset, per David Hildenbrand - replaced BUG_ONs with VM_WARN_ON or WARN_ON_ONCE, per David Hildenbrand - added missing cache flushing, per Lokesh Gidra and Peter Xu - updated manpage text in the patch description, per Peter Xu - renamed internal functions from "remap" to "move", per Peter Xu - added mmap_changing check after taking mmap_lock, per Peter Xu - changed uffd context check to ensure dst_mm is registered onto uffd we are operating on, Peter Xu and David Hildenbrand - changed to non-maybe variants of maybe*_mkwrite(), per David Hildenbrand - fixed warning for CONFIG_TRANSPARENT_HUGEPAGE=n, per kernel test robot - comments cleanup, per David Hildenbrand and Peter Xu - checks for VM_IO,VM_PFNMAP,VM_HUGETLB,..., per David Hildenbrand - prevent moving pinned pages, per Peter Xu - changed uffd tests to call move uffd_test_ctx_clear() at the end of the test run instead of in the beginning of the next run - added support for testcase-specific ops - added test for moving PMD-aligned blocks Changes since v2 [5]: - renamed UFFDIO_REMAP to UFFDIO_MOVE, per David Hildenbrand - rebase over mm-unstable to use folio_move_anon_rmap(), per David Hildenbrand - added text for manpage explaining DONTFORK and KSM requirements for this feature, per David Hildenbrand - check for anon_vma changes in the fast path of folio_lock_anon_vma_read, per Peter Xu - updated the title and description of the first patch, per David Hildenbrand - updating comments in folio_lock_anon_vma_read() explaining the need for anon_vma checks, per David Hildenbrand - changed all mapcount checks to PageAnonExclusive, per Jann Horn and David Hildenbrand - changed counters in remap_swap_pte() from MM_ANONPAGES to MM_SWAPENTS, per Jann Horn - added a check for PTE change after folio is locked in remap_pages_pte(), per Jann Horn - added handling of PMD migration entries and bailout when pmd_devmap(), per Jann Horn - added checks to ensure both src and dst VMAs are writable, per Peter Xu - added UFFD_FEATURE_MOVE, per Peter Xu - removed obsolete comments, per Peter Xu - renamed remap_anon_pte to remap_present_pte, per Peter Xu - added a comment for folio_get_anon_vma() explaining the need for anon_vma checks, per Peter Xu - changed error handling in remap_pages() to make it more clear, per Peter Xu - changed EFAULT to EAGAIN to retry when a hugepage appears or disappears from under us, per Peter Xu - added links to previous upstreaming attempts, per David Hildenbrand Changes since v1 [4]: - add mmget_not_zero in userfaultfd_remap, per Jann Horn - removed extern from function definitions, per Matthew Wilcox - converted to folios in remap_pages_huge_pmd, per Matthew Wilcox - use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand - handle pgtable transfers between MMs, per Jann Horn - ignore concurrent A/D pte bit changes, per Jann Horn - split functions into smaller units, per David Hildenbrand - test for folio_test_large in remap_anon_pte, per Matthew Wilcox - use pte_swp_exclusive for swapcount check, per David Hildenbrand - eliminated use of mmu_notifier_invalidate_range_start_nonblock, per Jann Horn - simplified THP alignment checks, per Jann Horn - refactored the loop inside remap_pages, per Jann Horn - additional clarifying comments, per Jann Horn Main changes since Andrea's last version [1]: - Trivial translations from page to folio, mmap_sem to mmap_lock - Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its possible failure - Move pte mapping into remap_pages_pte to allow for retries when source page or anon_vma is contended. Since pte_offset_map_nolock() start RCU read section, we can't block anymore after mapping a pte, so have to unmap the ptesm do the locking and retry. - Add and use anon_vma_trylock_write() to avoid blocking while in RCU read section. - Accommodate changes in mmu_notifier_range_init() API, switch to mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in RCU read section. - Open-code now removed __swp_swapcount() - Replace pmd_read_atomic() with pmdp_get_lockless() - Add new selftest for UFFDIO_MOVE [1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc… [2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha… [3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj… [4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/ [5] https://lore.kernel.org/all/20230923013148.1390521-1-surenb@google.com/ [6] https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarcange@redh… [7] https://lore.kernel.org/all/cover.1547251023.git.blake.caldwell@colorado.ed… [8] https://lore.kernel.org/all/20231009064230.2952396-1-surenb@google.com/ [9] https://lore.kernel.org/all/20231028003819.652322-1-surenb@google.com/ [10] https://lore.kernel.org/all/20231121171643.3719880-1-surenb@google.com/ Andrea Arcangeli (2): mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() userfaultfd: UFFDIO_MOVE uABI Suren Baghdasaryan (3): selftests/mm: call uffd_test_ctx_clear at the end of the test selftests/mm: add uffd_test_case_ops to allow test case-specific operations selftests/mm: add UFFDIO_MOVE ioctl test Documentation/admin-guide/mm/userfaultfd.rst | 3 + fs/userfaultfd.c | 72 +++ include/linux/rmap.h | 5 + include/linux/userfaultfd_k.h | 11 + include/uapi/linux/userfaultfd.h | 29 +- mm/huge_memory.c | 122 ++++ mm/khugepaged.c | 3 + mm/rmap.c | 30 + mm/userfaultfd.c | 614 +++++++++++++++++++ tools/testing/selftests/mm/uffd-common.c | 39 +- tools/testing/selftests/mm/uffd-common.h | 9 + tools/testing/selftests/mm/uffd-stress.c | 5 +- tools/testing/selftests/mm/uffd-unit-tests.c | 192 ++++++ 13 files changed, 1130 insertions(+), 4 deletions(-) -- 2.43.0.rc2.451.g8631bc7472-goog

6 months, 1 week

7
43
0 0

[PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings

by Lorenzo Stoakes

The guard regions feature was initially implemented to support anonymous mappings only, excluding shmem. This was done such as to introduce the feature carefully and incrementally and to be conservative when considering the various caveats and corner cases that are applicable to file-backed mappings but not to anonymous ones. Now this feature has landed in 6.13, it is time to revisit this and to extend this functionality to file-backed and shmem mappings. In order to make this maximally useful, and since one may map file-backed mappings read-only (for instance ELF images), we also remove the restriction on read-only mappings and permit the establishment of guard regions in any non-hugetlb, non-mlock()'d mapping. It is permissible to permit the establishment of guard regions in read-only mappings because the guard regions only reduce access to the mapping, and when removed simply reinstate the existing attributes of the underlying VMA, meaning no access violations can occur. While the change in kernel code introduced in this series is small, the majority of the effort here is spent in extending the testing to assert that the feature works correctly across numerous file-backed mapping scenarios. Every single guard region self-test performed against anonymous memory (which is relevant and not anon-only) has now been updated to also be performed against shmem and a mapping of a file in the working directory. This confirms that all cases also function correctly for file-backed guard regions. In addition a number of other tests are added for specific file-backed mapping scenarios. There are a number of other concerns that one might have with regard to guard regions, addressed below: Readahead ~~~~~~~~~ Readahead is a process through which the page cache is populated on the assumption that sequential reads will occur, thus amortising I/O and, through a clever use of the PG_readahead folio flag establishing during major fault and checked upon minor fault, provides for asynchronous I/O to occur as dat is processed, reducing I/O stalls as data is faulted in. Guard regions do not alter this mechanism which operations at the folio and fault level, but do of course prevent the faulting of folios that would otherwise be mapped. In the instance of a major fault prior to a guard region, synchronous readahead will occur including populating folios in the page cache which the guard regions will, in the case of the mapping in question, prevent access to. In addition, if PG_readahead is placed in a folio that is now inaccessible, this will prevent asynchronous readahead from occurring as it would otherwise do. However, there are mechanisms for heuristically resetting this within readahead regardless, which will 'recover' correct readahead behaviour. Readahead presumes sequential data access, the presence of a guard region clearly indicates that, at least in the guard region, no such sequential access will occur, as it cannot occur there. So this should have very little impact on any real workload. The far more important point is as to whether readahead causes incorrect or inappropriate mapping of ranges disallowed by the presence of guard regions - this is not the case, as readahead does not 'pre-fault' memory in this fashion. At any rate, any mechanism which would attempt to do so would hit the usual page fault paths, which correctly handle PTE markers as with anonymous mappings. Fault-Around ~~~~~~~~~~~~ The fault-around logic, in a similar vein to readahead, attempts to improve efficiency with regard to file-backed memory mappings, however it differs in that it does not try to fetch folios into the page cache that are about to be accessed, but rather pre-maps a range of folios around the faulting address. Guard regions making use of PTE markers makes this relatively trivial, as this case is already handled - see filemap_map_folio_range() and filemap_map_order0_folio() - in both instances, the solution is to simply keep the established page table mappings and let the fault handler take care of PTE markers, as per the comment: /* * NOTE: If there're PTE markers, we'll leave them to be * handled in the specific fault path, and it'll prohibit * the fault-around logic. */ This works, as establishing guard regions results in page table mappings with PTE markers, and clearing them removes them. Truncation ~~~~~~~~~~ File truncation will not eliminate existing guard regions, as the truncation operation will ultimately zap the range via unmap_mapping_range(), which specifically excludes PTE markers. Zapping ~~~~~~~ Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(), which specifically deals with guard entries, leaving them intact except in instances such as process teardown or munmap() where they need to be removed. Reclaim ~~~~~~~ When reclaim is performed on file-backed folios, it ultimately invokes try_to_unmap_one() via the rmap. If the folio is non-large, then map_pte() will ultimately abort the operation for the guard region mapping. If large, then check_pte() will determine that this is a non-device private entry/device-exclusive entry 'swap' PTE and thus abort the operation in that instance. Therefore, no odd things happen in the instance of reclaim being attempted upon a file-backed guard region. Hole Punching ~~~~~~~~~~~~~ This updates the page cache and ultimately invokes unmap_mapping_range(), which explicitly leaves PTE markers in place. Because the establishment of guard regions zapped any existing mappings to file-backed folios, once the guard regions are removed then the hole-punched region will be faulted in as usual and everything will behave as expected. Lorenzo Stoakes (4): mm: allow guard regions in file-backed and read-only mappings selftests/mm: rename guard-pages to guard-regions tools/selftests: expand all guard region tests to file-backed tools/selftests: add file/shmem-backed mapping guard region tests mm/madvise.c | 8 +- tools/testing/selftests/mm/.gitignore | 2 +- tools/testing/selftests/mm/Makefile | 2 +- .../mm/{guard-pages.c => guard-regions.c} | 921 ++++++++++++++++-- 4 files changed, 821 insertions(+), 112 deletions(-) rename tools/testing/selftests/mm/{guard-pages.c => guard-regions.c} (58%) -- 2.48.1

6 months, 1 week

7
63
0 0

[PATCH bpf-next v2 0/6] selftests/bpf: Various sockmap-related fixes

by Michal Luczaj

Series takes care of few bugs and missing features with the aim to improve the test coverage of sockmap/sockhash. Last patch is a create_pair() rewrite making use of __attribute__((cleanup)) to handle socket fd lifetime. Signed-off-by: Michal Luczaj <mhal(a)rbox.co> --- Changes in v2: - Rebase on bpf-next (Jakub) - Use cleanup helpers from kernel's cleanup.h (Jakub) - Fix subject of patch 3, rephrase patch 4, use correct prefix - Link to v1: https://lore.kernel.org/r/20240724-sockmap-selftest-fixes-v1-0-46165d224712… Changes in v1: - No declarations in function body (Jakub) - Don't touch output arguments until function succeeds (Jakub) - Link to v0: https://lore.kernel.org/netdev/027fdb41-ee11-4be0-a493-22f28a1abd7c@rbox.co/ --- Michal Luczaj (6): selftests/bpf: Support more socket types in create_pair() selftests/bpf: Socket pair creation, cleanups selftests/bpf: Simplify inet_socketpair() and vsock_socketpair_connectible() selftests/bpf: Honour the sotype of af_unix redir tests selftests/bpf: Exercise SOCK_STREAM unix_inet_redir_to_connected() selftests/bpf: Introduce __attribute__((cleanup)) in create_pair() .../selftests/bpf/prog_tests/sockmap_basic.c | 28 ++-- .../selftests/bpf/prog_tests/sockmap_helpers.h | 149 ++++++++++++++------- .../selftests/bpf/prog_tests/sockmap_listen.c | 117 ++-------------- 3 files changed, 124 insertions(+), 170 deletions(-) --- base-commit: 92cc2456e9775dc4333fb4aa430763ae4ac2f2d9 change-id: 20240729-selftest-sockmap-fixes-bcca996e143b Best regards, -- Michal Luczaj <mhal(a)rbox.co>

6 months, 3 weeks

3
26
0 0

[PATCH bpf-next v2 0/2] bpf: fix ktls panic with sockmap and add tests

by Jiayuan Chen

We can reproduce the issue using the existing test program: './test_sockmap --ktls' Or use the selftest I provided, which will cause a panic: ------------[ cut here ]------------ kernel BUG at lib/iov_iter.c:629! PKRU: 55555554 Call Trace: <TASK> ? die+0x36/0x90 ? do_trap+0xdd/0x100 ? iov_iter_revert+0x178/0x180 ? iov_iter_revert+0x178/0x180 ? do_error_trap+0x7d/0x110 ? iov_iter_revert+0x178/0x180 ? exc_invalid_op+0x50/0x70 ? iov_iter_revert+0x178/0x180 ? asm_exc_invalid_op+0x1a/0x20 ? iov_iter_revert+0x178/0x180 ? iov_iter_revert+0x5c/0x180 tls_sw_sendmsg_locked.isra.0+0x794/0x840 tls_sw_sendmsg+0x52/0x80 ? inet_sendmsg+0x1f/0x70 __sys_sendto+0x1cd/0x200 ? find_held_lock+0x2b/0x80 ? syscall_trace_enter+0x140/0x270 ? __lock_release.isra.0+0x5e/0x170 ? find_held_lock+0x2b/0x80 ? syscall_trace_enter+0x140/0x270 ? lockdep_hardirqs_on_prepare+0xda/0x190 ? ktime_get_coarse_real_ts64+0xc2/0xd0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x90/0x170 1. It looks like the issue started occurring after bpf being introduced to ktls and later the addition of assertions to iov_iter has caused a panic. If my fix tag is incorrect, please assist me in correcting the fix tag. 2. I make minimal changes for now, it's enough to make ktls work correctly. --- v1->v2: Added more content to the commit message https://lore.kernel.org/all/20250123171552.57345-1-mrpre@163.com/#r --- Jiayuan Chen (2): bpf: fix ktls panic with sockmap selftests/bpf: add ktls selftest net/tls/tls_sw.c | 8 +- .../selftests/bpf/prog_tests/sockmap_ktls.c | 174 +++++++++++++++++- .../selftests/bpf/progs/test_sockmap_ktls.c | 26 +++ 3 files changed, 205 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_ktls.c -- 2.47.1

6 months, 3 weeks

3
4
0 0

[PATCH v8 00/14] iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ)

by Nicolin Chen

As the vIOMMU infrastructure series part-3, this introduces a new vEVENTQ object. The existing FAULT object provides a nice notification pathway to the user space with a queue already, so let vEVENTQ reuse that. Mimicing the HWPT structure, add a common EVENTQ structure to support its derivatives: IOMMUFD_OBJ_FAULT (existing) and IOMMUFD_OBJ_VEVENTQ (new). An IOMMUFD_CMD_VEVENTQ_ALLOC is introduced to allocate vEVENTQ object for vIOMMUs. One vIOMMU can have multiple vEVENTQs in different types but can not support multiple vEVENTQs in the same type. The forwarding part is fairly simple but might need to replace a physical device ID with a virtual device ID in a driver-level event data structure. So, this also adds some helpers for drivers to use. As usual, this series comes with the selftest coverage for this new ioctl and with a real world use case in the ARM SMMUv3 driver. This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v8 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v8 Changelog v8 * Add Reviewed-by from Jason and Pranjal * Fix errno returned in arm_smmu_handle_event() * Validate domain->type outside of arm_smmu_attach_prepare_vmaster() * Drop unnecessary vmaster comparison in arm_smmu_attach_commit_vmaster() v7 https://lore.kernel.org/all/cover.1740238876.git.nicolinc@nvidia.com/ * Rebase on Jason's for-next tree for latest fault.c * Add Reviewed-by * Update commit logs * Add __reserved field sanity * Skip kfree() on the static header * Replace "bool on_list" with list_is_last() * Use u32 for flags in iommufd_vevent_header * Drop casting in iommufd_viommu_get_vdev_id() * Update the bounding logic to veventq->sequence * Add missing cpu_to_le64() around STRTAB_STE_1_MEV * Reuse veventq->common.lock to fence sequence and num_events * Rename overflow to lost_events and log it in upon kmalloc failure * Correct the error handling part in iommufd_veventq_deliver_fetch() * Add an arm_smmu_clear_vmaster() to simplify identity/blocked domain attach ops * Add additional four event records to forward to user space VM, and update the uAPI doc * Reuse the existing smmu->streams_mutex lock to fence master->vmaster pointer, instead of adding a new rwsem v6 https://lore.kernel.org/all/cover.1737754129.git.nicolinc@nvidia.com/ * Drop supports_veventq viommu op * Split bug/cosmetics fixes out of the series * Drop the blocking mutex around copy_to_user() * Add veventq_depth in uAPI to limit vEVENTQ size * Revise the documentation for a clear description * Fix sparse warnings in arm_vmaster_report_event() * Rework iommufd_viommu_get_vdev_id() to return -ENOENT v.s. 0 * Allow Abort/Bypass STEs to allocate vEVENTQ and set STE.MEV for DoS mitigations v5 https://lore.kernel.org/all/cover.1736237481.git.nicolinc@nvidia.com/ * Add Reviewed-by from Baolu * Reorder the OBJ list as well * Fix alphabetical order after renaming in v4 * Add supports_veventq viommu op for vEVENTQ type validation v4 https://lore.kernel.org/all/cover.1735933254.git.nicolinc@nvidia.com/ * Rename "vIRQ" to "vEVENTQ" * Use flexible array in struct iommufd_vevent * Add the new ioctl command to union ucmd_buffer * Fix the alphabetical order in union ucmd_buffer too * Rename _TYPE_NONE to _TYPE_DEFAULT aligning with vIOMMU naming v3 https://lore.kernel.org/all/cover.1734477608.git.nicolinc@nvidia.com/ * Rebase on Will's for-joerg/arm-smmu/updates for arm_smmu_event series * Add "Reviewed-by" lines from Kevin * Fix typos in comments, kdocs, and jump tags * Add a patch to sort struct iommufd_ioctl_op * Update iommufd's userpsace-api documentation * Update uAPI kdoc to quote SMMUv3 offical spec * Drop the unused workqueue in struct iommufd_virq * Drop might_sleep() in iommufd_viommu_report_irq() helper * Add missing "break" in iommufd_viommu_get_vdev_id() helper * Shrink the scope of the vmaster's read lock in SMMUv3 driver * Pass in two arguments to iommufd_eventq_virq_handler() helper * Move "!ops || !ops->read" validation into iommufd_eventq_init() * Move "fault->ictx = ictx" closer to iommufd_ctx_get(fault->ictx) * Update commit message for arm_smmu_attach_prepare/commit_vmaster() * Keep "iommufd_fault" as-is and rename "iommufd_eventq_virq" to just "iommufd_virq" v2 https://lore.kernel.org/all/cover.1733263737.git.nicolinc@nvidia.com/ * Rebase on v6.13-rc1 * Add IOPF and vIRQ in iommufd.rst (userspace-api) * Add a proper locking in iommufd_event_virq_destroy * Add iommufd_event_virq_abort with a lockdep_assert_held * Rename "EVENT_*" to "EVENTQ_*" to describe the objects better * Reorganize flows in iommufd_eventq_virq_alloc for abort() to work * Adde struct arm_smmu_vmaster to store vSID upon attaching to a nested domain, calling a newly added iommufd_viommu_get_vdev_id helper * Adde an arm_vmaster_report_event helper in arm-smmu-v3-iommufd file to simplify the routine in arm_smmu_handle_evt() of the main driver v1 https://lore.kernel.org/all/cover.1724777091.git.nicolinc@nvidia.com/ Thanks! Nicolin Nicolin Chen (14): iommufd/fault: Move two fault functions out of the header iommufd/fault: Add an iommufd_fault_init() helper iommufd: Abstract an iommufd_eventq from iommufd_fault iommufd: Rename fault.c to eventq.c iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC iommufd/viommu: Add iommufd_viommu_get_vdev_id helper iommufd/viommu: Add iommufd_viommu_report_event helper iommufd/selftest: Require vdev_id when attaching to a nested domain iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ coverage iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster iommu/arm-smmu-v3: Report events that belong to devices attached to vIOMMU iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations drivers/iommu/iommufd/Makefile | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 36 ++ drivers/iommu/iommufd/iommufd_private.h | 135 +++- drivers/iommu/iommufd/iommufd_test.h | 10 + include/linux/iommufd.h | 23 + include/uapi/linux/iommufd.h | 105 +++ tools/testing/selftests/iommu/iommufd_utils.h | 115 ++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 64 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 82 ++- drivers/iommu/iommufd/driver.c | 72 +++ drivers/iommu/iommufd/eventq.c | 597 ++++++++++++++++++ drivers/iommu/iommufd/fault.c | 342 ---------- drivers/iommu/iommufd/hw_pagetable.c | 6 +- drivers/iommu/iommufd/main.c | 7 + drivers/iommu/iommufd/selftest.c | 54 ++ drivers/iommu/iommufd/viommu.c | 2 + tools/testing/selftests/iommu/iommufd.c | 36 ++ .../selftests/iommu/iommufd_fail_nth.c | 7 + Documentation/userspace-api/iommufd.rst | 17 + 19 files changed, 1304 insertions(+), 408 deletions(-) create mode 100644 drivers/iommu/iommufd/eventq.c delete mode 100644 drivers/iommu/iommufd/fault.c base-commit: 598749522d4254afb33b8a6c1bea614a95896868 -- 2.43.0

6 months, 3 weeks

6
32
0 0

[PATCH] selftest: rtc: skip some tests if the alarm only supports minutes

by Wolfram Sang

There are alarms which have only minute-granularity. The RTC core already has a flag to describe them. Use this flag to skip tests which require the alarm to support seconds. Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com> --- Tested with a Renesas RZ-N1D board. This RTC obviously has only minute resolution for the alarms. Output now looks like this: # RUN rtc.alarm_alm_set ... # SKIP Skipping test since alarms has only minute granularity. # OK rtc.alarm_alm_set ok 5 rtc.alarm_alm_set # SKIP Skipping test since alarms has only minute granularity. Before it was like this: # RUN rtc.alarm_alm_set ... # rtctest.c:255:alarm_alm_set:Alarm time now set to 09:40:00. # rtctest.c:275:alarm_alm_set:data: 1a0 # rtctest.c:281:alarm_alm_set:Expected new (1489743644) == secs (1489743647) tools/testing/selftests/rtc/rtctest.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c index 3e4f0d5c5329..e0a148261e6f 100644 --- a/tools/testing/selftests/rtc/rtctest.c +++ b/tools/testing/selftests/rtc/rtctest.c @@ -29,6 +29,7 @@ enum rtc_alarm_state { RTC_ALARM_UNKNOWN, RTC_ALARM_ENABLED, RTC_ALARM_DISABLED, + RTC_ALARM_RES_MINUTE, }; FIXTURE(rtc) { @@ -88,7 +89,7 @@ static void nanosleep_with_retries(long ns) } } -static enum rtc_alarm_state get_rtc_alarm_state(int fd) +static enum rtc_alarm_state get_rtc_alarm_state(int fd, int need_seconds) { struct rtc_param param = { 0 }; int rc; @@ -103,6 +104,10 @@ static enum rtc_alarm_state get_rtc_alarm_state(int fd) if ((param.uvalue & _BITUL(RTC_FEATURE_ALARM)) == 0) return RTC_ALARM_DISABLED; + /* Check if alarm has desired granularity */ + if (need_seconds && (param.uvalue & _BITUL(RTC_FEATURE_ALARM_RES_MINUTE))) + return RTC_ALARM_RES_MINUTE; + return RTC_ALARM_ENABLED; } @@ -227,9 +232,11 @@ TEST_F(rtc, alarm_alm_set) { SKIP(return, "Skipping test since %s does not exist", rtc_file); ASSERT_NE(-1, self->fd); - alarm_state = get_rtc_alarm_state(self->fd); + alarm_state = get_rtc_alarm_state(self->fd, 1); if (alarm_state == RTC_ALARM_DISABLED) SKIP(return, "Skipping test since alarms are not supported."); + if (alarm_state == RTC_ALARM_RES_MINUTE) + SKIP(return, "Skipping test since alarms has only minute granularity."); rc = ioctl(self->fd, RTC_RD_TIME, &tm); ASSERT_NE(-1, rc); @@ -295,9 +302,11 @@ TEST_F(rtc, alarm_wkalm_set) { SKIP(return, "Skipping test since %s does not exist", rtc_file); ASSERT_NE(-1, self->fd); - alarm_state = get_rtc_alarm_state(self->fd); + alarm_state = get_rtc_alarm_state(self->fd, 1); if (alarm_state == RTC_ALARM_DISABLED) SKIP(return, "Skipping test since alarms are not supported."); + if (alarm_state == RTC_ALARM_RES_MINUTE) + SKIP(return, "Skipping test since alarms has only minute granularity."); rc = ioctl(self->fd, RTC_RD_TIME, &alarm.time); ASSERT_NE(-1, rc); @@ -357,7 +366,7 @@ TEST_F_TIMEOUT(rtc, alarm_alm_set_minute, 65) { SKIP(return, "Skipping test since %s does not exist", rtc_file); ASSERT_NE(-1, self->fd); - alarm_state = get_rtc_alarm_state(self->fd); + alarm_state = get_rtc_alarm_state(self->fd, 0); if (alarm_state == RTC_ALARM_DISABLED) SKIP(return, "Skipping test since alarms are not supported."); @@ -425,7 +434,7 @@ TEST_F_TIMEOUT(rtc, alarm_wkalm_set_minute, 65) { SKIP(return, "Skipping test since %s does not exist", rtc_file); ASSERT_NE(-1, self->fd); - alarm_state = get_rtc_alarm_state(self->fd); + alarm_state = get_rtc_alarm_state(self->fd, 0); if (alarm_state == RTC_ALARM_DISABLED) SKIP(return, "Skipping test since alarms are not supported."); -- 2.39.2

7 months

2
1
0 0

[PATCH v2 0/4] tools/nolibc: MIPS: entrypoint cleanups and N32/N64 ABIs

by Thomas Weißschuh

Introduce support for the N32 and N64 ABIs. As preparation, the entrypoint is first simplified significantly. Thanks to Maciej for all the valuable information. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Changes in v2: - Clean up entrypoint first - Annotate #endifs - Link to v1: https://lore.kernel.org/r/20250212-nolibc-mips-n32-v1-1-6892e58d1321@weisss… --- Thomas Weißschuh (4): tools/nolibc: MIPS: drop $gp setup tools/nolibc: MIPS: drop manual stack pointer alignment tools/nolibc: MIPS: drop noreorder option tools/nolibc: MIPS: add support for N64 and N32 ABIs tools/include/nolibc/arch-mips.h | 117 +++++++++++++++++++++------- tools/testing/selftests/nolibc/Makefile | 28 ++++++- tools/testing/selftests/nolibc/run-tests.sh | 2 +- 3 files changed, 118 insertions(+), 29 deletions(-) --- base-commit: 9c812b01f13d37410ea103e00bc47e5e0f6d2bad change-id: 20231105-nolibc-mips-n32-234901bd910d Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

7 months

4
13
0 0

[RFC PATCH v3 0/8] PMU partitioning driver support

by Colton Lewis

This series introduces support in the KVM and ARM PMUv3 driver for partitioning PMU counters into two separate ranges by taking advantage of the MDCR_EL2.HPMN register field. The advantage of a partitioned PMU would be to allow KVM guests direct access to a subset of PMU functionality, greatly reducing the overhead of performance monitoring in guests. While this feature could be accepted on its own merits, practically there is a lot more to be done before it will be fully useful, so I'm sending as an RFC for now. v3: * Include cpucap definition for FEAT_HPMN0 to allow for setting HPMN to 0 * Include PMU header cleanup provided by Marc [1] with some minor changes so compilation works * Pull functions out of pmu-emul.c that aren't specific to the emulated PMU. This and the previous item aren't strictly needed but they provide a nicer starting point. * As suggested by Oliver, start a file for partitioned PMU functions and move the reserved_host_counters parameter and MDCR handling into KVM so the driver does not have to know about it and we need fewer hacks to keep the driver working on 32-bit ARM. This was not a complete separation because the driver still needs to start and stop the host counters all at once and needs to toggle MDCR_EL2.HPME to do that. Introduce kvm_pmu_host_counters_{enable,disable}() functions to handle this and define them as no ops on 32-bit ARM. * As suggested by Oliver, don't limit PMCR.N on emulated PMU. This value will be read correctly when the right traps are disabled to use the partitioned PMU v2: https://lore.kernel.org/kvm/20250208020111.2068239-1-coltonlewis@google.com/ v1: https://lore.kernel.org/kvm/20250127222031.3078945-1-coltonlewis@google.com/ [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?… Colton Lewis (7): arm64: cpufeature: Add cap for HPMN0 arm64: Generate sign macro for sysreg Enums KVM: arm64: Reorganize PMU functions KVM: arm64: Introduce module param to partition the PMU perf: arm_pmuv3: Generalize counter bitmasks perf: arm_pmuv3: Keep out of guest counter partition KVM: arm64: selftests: Reword selftests error Marc Zyngier (1): KVM: arm64: Cleanup PMU includes arch/arm/include/asm/arm_pmuv3.h | 2 + arch/arm64/include/asm/arm_pmuv3.h | 2 +- arch/arm64/include/asm/kvm_host.h | 199 +++++++- arch/arm64/include/asm/kvm_pmu.h | 47 ++ arch/arm64/kernel/cpufeature.c | 8 + arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/arm.c | 1 - arch/arm64/kvm/debug.c | 10 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 1 + arch/arm64/kvm/pmu-emul.c | 464 +----------------- arch/arm64/kvm/pmu-part.c | 63 +++ arch/arm64/kvm/pmu.c | 454 +++++++++++++++++ arch/arm64/kvm/sys_regs.c | 2 + arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/gen-sysreg.awk | 1 + arch/arm64/tools/sysreg | 6 +- drivers/perf/arm_pmuv3.c | 73 ++- include/kvm/arm_pmu.h | 204 -------- include/linux/perf/arm_pmu.h | 16 +- include/linux/perf/arm_pmuv3.h | 27 +- .../selftests/kvm/arm64/vpmu_counter_access.c | 2 +- virt/kvm/kvm_main.c | 1 + 22 files changed, 882 insertions(+), 704 deletions(-) create mode 100644 arch/arm64/include/asm/kvm_pmu.h create mode 100644 arch/arm64/kvm/pmu-part.c delete mode 100644 include/kvm/arm_pmu.h base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b -- 2.48.1.601.g30ceb7b040-goog

7 months, 1 week

3
16
0 0

[PATCH v7 0/3] Enable Zicbom in usermode

by Yunhui Cui

v1/v2: There is only the first patch: RISC-V: Enable cbo.clean/flush in usermode, which mainly removes the enabling of cbo.inval in user mode. v3: Add the functionality of Expose Zicbom and selftests for Zicbom. v4: Modify the order of macros, The test_no_cbo_inval function is added separately. v5: 1. Modify the order of RISCV_HWPROBE_KEY_ZICBOM_BLOCK_SIZE in hwprobe.rst 2. "TEST_NO_ZICBOINVAL" -> "TEST_NO_CBO_INVAL" v6: Change hwprobe_ext0_has's second param to u64. v7: Rebase to the latest code of linux-next. Yunhui Cui (3): RISC-V: Enable cbo.clean/flush in usermode RISC-V: hwprobe: Expose Zicbom extension and its block size RISC-V: selftests: Add TEST_ZICBOM into CBO tests Documentation/arch/riscv/hwprobe.rst | 6 ++ arch/riscv/include/asm/hwprobe.h | 2 +- arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/kernel/cpufeature.c | 8 +++ arch/riscv/kernel/sys_hwprobe.c | 8 ++- tools/testing/selftests/riscv/hwprobe/cbo.c | 66 +++++++++++++++++---- 6 files changed, 79 insertions(+), 13 deletions(-) -- 2.39.2

7 months, 1 week

2
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror February 2025