March 2025 - Linux-kselftest-mirror

[RFC PATCH 00/39] 1G page support for guest_memfd

by Ackerley Tng

Hello, This patchset is our exploration of how to support 1G pages in guest_memfd, and how the pages will be used in Confidential VMs. The patchset covers: + How to get 1G pages + Allowing mmap() of guest_memfd to userspace so that both private and shared memory can use the same physical pages + Splitting and reconstructing pages to support conversions and mmap() + How the VM, userspace and guest_memfd interact to support conversions + Selftests to test all the above + Selftests also demonstrate the conversion flow between VM, userspace and guest_memfd. Why 1G pages in guest memfd? Bring guest_memfd to performance and memory savings parity with VMs that are backed by HugeTLBfs. + Performance is improved with 1G pages by more TLB hits and faster page walks on TLB misses. + Memory savings from 1G pages comes from HugeTLB Vmemmap Optimization (HVO). Options for 1G page support: 1. HugeTLB 2. Contiguous Memory Allocator (CMA) 3. Other suggestions are welcome! Comparison between options: 1. HugeTLB + Refactor HugeTLB to separate allocator from the rest of HugeTLB + Pro: Graceful transition for VMs backed with HugeTLB to guest_memfd + Near term: Allows co-tenancy of HugeTLB and guest_memfd backed VMs + Pro: Can provide iterative steps toward new future allocator + Unexplored: Managing userspace-visible changes + e.g. HugeTLB's free_hugepages will decrease if HugeTLB is used, but not when future allocator is used 2. CMA + Port some HugeTLB features to be applied on CMA + Pro: Clean slate What would refactoring HugeTLB involve? (Some refactoring was done in this RFC, more can be done.) 1. Broadly involves separating the HugeTLB allocator from the rest of HugeTLB + Brings more modularity to HugeTLB + No functionality change intended + Likely step towards HugeTLB's integration into core-mm 2. guest_memfd will use just the allocator component of HugeTLB, not including the complex parts of HugeTLB like + Userspace reservations (resv_map) + Shared PMD mappings + Special page walkers What features would need to be ported to CMA? + Improved allocation guarantees + Per NUMA node pool of huge pages + Subpools per guest_memfd + Memory savings + Something like HugeTLB Vmemmap Optimization + Configuration/reporting features + Configuration of number of pages available (and per NUMA node) at and after host boot + Reporting of memory usage/availability statistics at runtime HugeTLB was picked as the source of 1G pages for this RFC because it allows a graceful transition, and retains memory savings from HVO. To illustrate this, if a host machine uses HugeTLBfs to back VMs, and a confidential VM were to be scheduled on that host, some HugeTLBfs pages would have to be given up and returned to CMA for guest_memfd pages to be rebuilt from that memory. This requires memory to be reserved for HVO to be removed and reapplied on the new guest_memfd memory. This not only slows down memory allocation but also trims the benefits of HVO. Memory would have to be reserved on the host to facilitate these transitions. Improving how guest_memfd uses the allocator in a future revision of this RFC: To provide an easier transition away from HugeTLB, guest_memfd's use of HugeTLB should be limited to these allocator functions: + reserve(node, page_size, num_pages) => opaque handle + Used when a guest_memfd inode is created to reserve memory from backend allocator + allocate(handle, mempolicy, page_size) => folio + To allocate a folio from guest_memfd's reservation + split(handle, folio, target_page_size) => void + To take a huge folio, and split it to smaller folios, restore to filemap + reconstruct(handle, first_folio, nr_pages) => void + To take a folio, and reconstruct a huge folio out of nr_pages from the first_folio + free(handle, folio) => void + To return folio to guest_memfd's reservation + error(handle, folio) => void + To handle memory errors + unreserve(handle) => void + To return guest_memfd's reservation to allocator backend Userspace should only provide a page size when creating a guest_memfd and should not have to specify HugeTLB. Overview of patches: + Patches 01-12 + Many small changes to HugeTLB, mostly to separate HugeTLBfs concepts from HugeTLB, and to expose HugeTLB functions. + Patches 13-16 + Letting guest_memfd use HugeTLB + Creation of each guest_memfd reserves pages from HugeTLB's global hstate and puts it into the guest_memfd inode's subpool + Each folio allocation takes a page from the guest_memfd inode's subpool + Patches 17-21 + Selftests for new HugeTLB features in guest_memfd + Patches 22-24 + More small changes on the HugeTLB side to expose functions needed by guest_memfd + Patch 25: + Uses the newly available functions from patches 22-24 to split HugeTLB pages. In this patch, HugeTLB folios are always split to 4K before any usage, private or shared. + Patches 26-28 + Allow mmap() in guest_memfd and faulting in shared pages + Patch 29 + Enables conversion between private/shared pages + Patch 30 + Required to zero folios after conversions to avoid leaking initialized kernel memory + Patch 31-38 + Add selftests to test mapping pages to userspace, guest/host memory sharing and update conversions tests + Patch 33 illustrates the conversion flow between VM/userspace/guest_memfd + Patch 39 + Dynamically split and reconstruct HugeTLB pages instead of always splitting before use. All earlier selftests are expected to still pass. TODOs: + Add logic to wait for safe_refcount [1] + Look into lazy splitting/reconstruction of pages + Currently, when the KVM_SET_MEMORY_ATTRIBUTES is invoked, not only is the mem_attr_array and faultability updated, the pages in the requested range are also split/reconstructed as necessary. We want to look into delaying splitting/reconstruction to fault time. + Solve race between folios being faulted in and being truncated + When running private_mem_conversions_test with more than 1 vCPU, a folio getting truncated may get faulted in by another process, causing elevated mapcounts when the folio is freed (VM_BUG_ON_FOLIO). + Add intermediate splits (1G should first split to 2M and not split directly to 4K) + Use guest's lock instead of hugetlb_lock + Use multi-index xarray/replace xarray with some other data struct for faultability flag + Refactor HugeTLB better, present generic allocator interface Please let us know your thoughts on: + HugeTLB as the choice of transitional allocator backend + Refactoring HugeTLB to provide generic allocator interface + Shared/private conversion flow + Requiring user to request kernel to unmap pages from userspace using madvise(MADV_DONTNEED) + Failing conversion on elevated mapcounts/pincounts/refcounts + Process of splitting/reconstructing page + Anything else! [1] https://lore.kernel.org/all/20240829-guest-memfd-lib-v2-0-b9afc1ff3656@quic… Ackerley Tng (37): mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() mm: hugetlb: Remove unnecessary check for avoid_reserve mm: mempolicy: Refactor out policy_node_nodemask() mm: hugetlb: Refactor alloc_buddy_hugetlb_folio_with_mpol() to interpret mempolicy instead of vma mm: hugetlb: Refactor dequeue_hugetlb_folio_vma() to use mpol mm: hugetlb: Refactor out hugetlb_alloc_folio mm: truncate: Expose preparation steps for truncate_inode_pages_final mm: hugetlb: Expose hugetlb_subpool_{get,put}_pages() mm: hugetlb: Add option to create new subpool without using surplus mm: hugetlb: Expose hugetlb_acct_memory() mm: hugetlb: Move and expose hugetlb_zero_partial_page() KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes KVM: guest_memfd: hugetlb: initialization and cleanup KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb KVM: guest_memfd: Add page alignment check for hugetlb guest_memfd KVM: selftests: Add basic selftests for hugetlb-backed guest_memfd KVM: selftests: Support various types of backing sources for private memory KVM: selftests: Update test for various private memory backing source types KVM: selftests: Add private_mem_conversions_test.sh KVM: selftests: Test that guest_memfd usage is reported via hugetlb mm: hugetlb: Expose vmemmap optimization functions mm: hugetlb: Expose HugeTLB functions for promoting/demoting pages mm: hugetlb: Add functions to add/move/remove from hugetlb lists KVM: guest_memfd: Track faultability within a struct kvm_gmem_private KVM: guest_memfd: Allow mmapping guest_memfd files KVM: guest_memfd: Use vm_type to determine default faultability KVM: Handle conversions in the SET_MEMORY_ATTRIBUTES ioctl KVM: guest_memfd: Handle folio preparation for guest_memfd mmap KVM: selftests: Allow vm_set_memory_attributes to be used without asserting return value of 0 KVM: selftests: Test using guest_memfd memory from userspace KVM: selftests: Test guest_memfd memory sharing between guest and host KVM: selftests: Add notes in private_mem_kvm_exits_test for mmap-able guest_memfd KVM: selftests: Test that pinned pages block KVM from setting memory attributes to PRIVATE KVM: selftests: Refactor vm_mem_add to be more flexible KVM: selftests: Add helper to perform madvise by memslots KVM: selftests: Update private_mem_conversions_test for mmap()able guest_memfd Vishal Annapurve (2): KVM: guest_memfd: Split HugeTLB pages for guest_memfd use KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page fs/hugetlbfs/inode.c | 35 +- include/linux/hugetlb.h | 54 +- include/linux/kvm_host.h | 1 + include/linux/mempolicy.h | 2 + include/linux/mm.h | 1 + include/uapi/linux/kvm.h | 26 + include/uapi/linux/magic.h | 1 + mm/hugetlb.c | 346 ++-- mm/hugetlb_vmemmap.h | 11 - mm/mempolicy.c | 36 +- mm/truncate.c | 26 +- tools/include/linux/kernel.h | 4 +- tools/testing/selftests/kvm/Makefile | 3 + .../kvm/guest_memfd_hugetlb_reporting_test.c | 222 +++ .../selftests/kvm/guest_memfd_pin_test.c | 104 ++ .../selftests/kvm/guest_memfd_sharing_test.c | 160 ++ .../testing/selftests/kvm/guest_memfd_test.c | 238 ++- .../testing/selftests/kvm/include/kvm_util.h | 45 +- .../testing/selftests/kvm/include/test_util.h | 18 + tools/testing/selftests/kvm/lib/kvm_util.c | 443 +++-- tools/testing/selftests/kvm/lib/test_util.c | 99 ++ .../kvm/x86_64/private_mem_conversions_test.c | 158 +- .../x86_64/private_mem_conversions_test.sh | 91 + .../kvm/x86_64/private_mem_kvm_exits_test.c | 11 +- virt/kvm/guest_memfd.c | 1563 ++++++++++++++++- virt/kvm/kvm_main.c | 17 + virt/kvm/kvm_mm.h | 16 + 27 files changed, 3288 insertions(+), 443 deletions(-) create mode 100644 tools/testing/selftests/kvm/guest_memfd_hugetlb_reporting_test.c create mode 100644 tools/testing/selftests/kvm/guest_memfd_pin_test.c create mode 100644 tools/testing/selftests/kvm/guest_memfd_sharing_test.c create mode 100755 tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.sh -- 2.46.0.598.g6f2099f65c-goog

6 months, 3 weeks

17
129
0 0

[PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

by Nicolin Chen

This series introduces a new VIOMMU infrastructure and related ioctls. IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone. The new VIOMMU object is an additional layer, between the nested HWPT and its parent HWPT, to give to both the IOMMUFD core and an IOMMU driver an additional structure to support HW-accelerated feature: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | HW-accel feats | ---------------------------- On a multi-IOMMU system, the VIOMMU object can be instanced to the number of vIOMMUs in a guest VM, while holding the same parent HWPT to share the stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own VMID to attach the shared stage-2 IO pagetable to the physical IOMMU: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------ ---------------- | | VMID0 | ---------------------------- ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested1 |--->| viommu1 ------------------ ---------------- | | VMID1 | ---------------------------- As an initial part-1, add ioctls to support a VIOMMU-based invalidation: IOMMUFD_CMD_VIOMMU_ALLOC to allocate a VIOMMU object IOMMUFD_CMD_VIOMMU_SET/UNSET_VDEV_ID to set/clear device's virtual ID (Resue IOMMUFD_CMD_HWPT_INVALIDATE for a VIOMMU object to flush cache by a given driver data) Worth noting that the VDEV_ID is for a per-VIOMMU device list for drivers to look up the device's physical instance from its virtual ID in a VM. It is essential for a VIOMMU-based invalidation where the request contains a device's virtual ID for its device cache flush, e.g. ATC invalidation. As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT type for a core-allocated-core-managed VIOMMU object, allowing drivers to simply hook a default viommu ops for viommu-based invalidation alone. And provide some viommu helpers to drivers for VDEV_ID translation and parent domain lookup. Add VIOMMU invalidation support to ARM SMMUv3 driver for a real world use case. This adds supports of arm-smmuv-v3's CMDQ_OP_ATC_INV and CMDQ_OP_CFGI_CD/ALL commands, supplementing HWPT-based invalidations. In the future, drivers will also be able to choose a driver-managed type to hold its own structure by adding a new type to enum iommu_viommu_type. More VIOMMU-based structures and ioctls will be introduced in part-2/3 to support a driver-managed VIOMMU, e.g. VQUEUE object for a HW accelerated queue, VIRQ (or VEVENT) object for IRQ injections. Although we repurposed the VIOMMU object from an earlier RFC discussion, for a referece: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v2 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p1-v2 Changelog v2 * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings * Added arm_smmu_cache_invalidate_user patch from Jason's nesting series v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/ Thanks! Nicolin Jason Gunthorpe (3): iommu: Add iommu_copy_struct_from_full_user_array helper iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Update comments about ATS and bypass Nicolin Chen (16): iommufd: Reorder struct forward declarations iommufd/viommu: Add IOMMUFD_OBJ_VIOMMU and IOMMU_VIOMMU_ALLOC ioctl iommu: Pass in a viommu pointer to domain_alloc_user op iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage iommufd/viommu: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID ioctl iommufd/selftest: Add IOMMU_VIOMMU_SET/UNSET_VDEV_ID test coverage iommufd/viommu: Add cache_invalidate for IOMMU_VIOMMU_TYPE_DEFAULT iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommufd/viommu: Add vdev_id helpers for IOMMU drivers iommufd/selftest: Add mock_viommu_invalidate_user op iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add VIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl iommufd/viommu: Add iommufd_viommu_to_parent_domain helper iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user iommu/arm-smmu-v3: Add arm_smmu_viommu_cache_invalidate drivers/iommu/amd/iommu.c | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 218 ++++++++++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 + drivers/iommu/intel/iommu.c | 1 + drivers/iommu/iommufd/Makefile | 5 +- drivers/iommu/iommufd/device.c | 12 + drivers/iommu/iommufd/hw_pagetable.c | 59 +++- drivers/iommu/iommufd/iommufd_private.h | 37 +++ drivers/iommu/iommufd/iommufd_test.h | 30 ++ drivers/iommu/iommufd/main.c | 12 + drivers/iommu/iommufd/selftest.c | 101 ++++++- drivers/iommu/iommufd/viommu.c | 196 +++++++++++++ drivers/iommu/iommufd/viommu_api.c | 53 ++++ include/linux/iommu.h | 56 +++- include/linux/iommufd.h | 51 +++- include/uapi/linux/iommufd.h | 117 +++++++- tools/testing/selftests/iommu/iommufd.c | 259 +++++++++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 126 +++++++++ 18 files changed, 1299 insertions(+), 38 deletions(-) create mode 100644 drivers/iommu/iommufd/viommu.c create mode 100644 drivers/iommu/iommufd/viommu_api.c -- 2.43.0

6 months, 3 weeks

7
148
0 0

[PATCH v3 0/3] introduce PIDFD_SELF* sentinels

by Lorenzo Stoakes

If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like: int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD); ... close(pidfd); Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process thread. This series introduces sentinels for this purposes which can be passed as the pidfd in this instance rather than having to establish a dummy fd for this purpose. It is useful to refer to both the current thread from the userland's perspective for which we use PIDFD_SELF, and the current process from the userland's perspective, for which we use PIDFD_SELF_PROCESS. There is unfortunately some confusion between the kernel and userland as to what constitutes a process - a thread from the userland perspective is a process in userland, and a userland process is a thread group (more specifically the thread group leader from the kernel perspective). We therefore alias things thusly: * PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID. * PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID. In all of the kernel code we refer to PIDFD_SELF_THREAD and PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and PIDFD_SELF_PROCESS. This matters for cases where, for instance, a user unshare()'s FDs or does thread-specific signal handling and where the user would be hugely confused if the FDs referenced or signal processed referred to the thread group leader rather than the individual thread. We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and assert as much in selftests. All other interfaces except setns() will work implicitly with this new interface, however it doesn't make sense to test waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation. In the case of setns() we explicitly disallow use of PIDFD_SELF* as it doesn't make sense to obtain the namespaces of our own process, and it would require work to implement this functionality there that would be of no use. We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd operations such as open() or poll(), as this would require extensive work and be of no real use. v3: * Do not fput() an invalid fd as reported by kernel test bot. * Fix unintended churn from moving variable declaration. v2: * Fix tests as reported by Shuah. * Correct RFC version lore link. https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl… Non-RFC v1: * Removed RFC tag - there seems to be general consensus that this change is a good idea, but perhaps some debate to be had on implementation. It seems sensible then to move forward with the RFC flag removed. * Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases PIDFD_SELF and PIDFD_SELF_PROCESS respectively. * Updated testing accordingly. https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl… RFC version: https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl… Lorenzo Stoakes (3): pidfd: extend pidfd_get_pid() and de-duplicate pid lookup pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process selftests: pidfd: add tests for PIDFD_SELF_* include/linux/pid.h | 43 +++++- include/uapi/linux/pidfd.h | 15 ++ kernel/exit.c | 3 +- kernel/nsproxy.c | 1 + kernel/pid.c | 73 ++++++--- kernel/signal.c | 26 +--- tools/testing/selftests/pidfd/pidfd.h | 8 + .../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++ .../selftests/pidfd/pidfd_setns_test.c | 11 ++ tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++-- 10 files changed, 342 insertions(+), 55 deletions(-) -- 2.46.2

6 months, 4 weeks

6
31
0 0

[PATCH] clk: test: Forward-declare struct of_phandle_args in kunit/clk.h

by Richard Fitzgerald

Add a forward-declare of struct of_phandle_args to prevent the compiler warning: ../include/kunit/clk.h:29:63: warning: ‘struct of_phandle_args’ declared inside parameter list will not be visible outside of this definition or declaration struct clk_hw *(*get)(struct of_phandle_args *clkspec, void *data), Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> --- include/kunit/clk.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/kunit/clk.h b/include/kunit/clk.h index 0afae7688157..f226044cc78d 100644 --- a/include/kunit/clk.h +++ b/include/kunit/clk.h @@ -6,6 +6,7 @@ struct clk; struct clk_hw; struct device; struct device_node; +struct of_phandle_args; struct kunit; struct clk * -- 2.43.0

7 months

2
1
0 0

[PATCH v8 00/10] Basic SEV-SNP Selftests

by Pratik R. Sampat

This patch series extends the sev_init2 and the sev_smoke test to exercise the SEV-SNP VM launch workflow. Primarily, it introduces the architectural defines, its support in the SEV library and extends the tests to interact with the SEV-SNP ioctl() wrappers. Patch 1 - Do not advertise SNP on initialization failure Patch 2 - SNP test for KVM_SEV_INIT2 Patch 3 - Add vmgexit helper Patch 4 - Add SMT control interface helper Patch 5 - Replace assert() with TEST_ASSERT_EQ() Patch 6 - Introduce SEV+ VM type check Patch 7 - SNP iotcl() plumbing for the SEV library Patch 8 - Force set GUEST_MEMFD for SNP Patch 9 - Cleanups of smoke test - Decouple policy from type Patch 10 - SNP smoke test The series is based on git.kernel.org/pub/scm/virt/kvm/kvm.git next v7..v8: * Dropped exporting the SNP initialized API from ccp to KVM. Instead call SNP_PLATFORM_STATUS within KVM to query the initialization. (Tom) While it may be cheaper to query sev->snp_initialized from ccp, making the SNP platform call within KVM does away with any dependencies. v6..v7: https://lore.kernel.org/kvm/20250221210200.244405-7-prsampat@amd.com/ Based on comments from Sean - * Replaced FW check with sev->snp_initialized * Dropped the patch which removes SEV+ KVM advertisement if INIT fails. This should be now be resolved by the combination of the patches [1,2] from Ashish. * Change vmgexit to an inline function * Export SMT control parsing interface to kvm_util Note: hyperv_cpuid KST only compile tested * Replace assert() with TEST_ASSERT_EQ() within SEV library * Define KVM_SEV_PAGE_TYPE_INVALID for SEV call of encrypt_region() * Parameterize encrypt_region() to include privatize_region() * Deduplication of sev test calls between SEV,SEV-ES and SNP * Removed FW version tests for SNP * Included testing of SNP_POLICY_DBG * Dropped most tags from patches that have been changed or indirectly affected [1] https://lore.kernel.org/all/d6d08c6b-9602-4f3d-92c2-8db6d50a1b92@amd.com [2] https://lore.kernel.org/all/f78ddb64087df27e7bcb1ae0ab53f55aa0804fab.173922… v5..v6: https://lore.kernel.org/kvm/ab433246-e97c-495b-ab67-b0cb1721fb99@amd.com/ * Rename is_sev_platform_init to sev_fw_initialized (Nikunj) * Rename KVM CPU feature X86_FEATURE_SNP to X86_FEATURE_SEV_SNP (Nikunj) * Collected Tags from Nikunj, Pankaj, Srikanth. v4..v5: https://lore.kernel.org/kvm/8e7d8172-879e-4a28-8438-343b1c386ec9@amd.com/ * Introduced a check to disable advertising support for SEV, SEV-ES and SNP when platform initialization fails (Nikunj) * Remove the redundant SNP check within is_sev_vm() (Nikunj) * Cleanup of the encrypt_region flow for better readability (Nikunj) * Refactor paths to use the canonical $(ARCH) to rebase for kvm/next v3..v4: https://lore.kernel.org/kvm/20241114234104.128532-1-pratikrajesh.sampat@amd… * Remove SNP FW API version check in the test and ensure the KVM capability advertises the presence of the feature. Retain the minimum version definitions to exercise these API versions in the smoke test * Retained only the SNP smoke test and SNP_INIT2 test * The SNP architectural defined merged with SNP_INIT2 test patch * SNP shutdown merged with SNP smoke test patch * Add SEV VM type check to abstract comparisons and reduce clutter * Define a SNP default policy which sets bits based on the presence of SMT * Decouple privatization and encryption for it to be SNP agnostic * Assert for only positive tests using vm_ioctl() * Dropped tested-by tags In summary - based on comments from Sean, I have primarily reduced the scope of this patch series to focus on breaking down the SNP smoke test patch (v3 - patch2) to first introduce SEV-SNP support and use this interface to extend the sev_init2 and the sev_smoke test. The rest of the v3 patchset that introduces ioctl, pre fault, fallocate and negative tests, will be re-worked and re-introduced subsequently in future patch series post addressing the issues discussed. v2..v3: https://lore.kernel.org/kvm/20240905124107.6954-1-pratikrajesh.sampat@amd.c… * Remove the assignments for the prefault and fallocate test type enums. * Fix error message for sev launch measure and finish. * Collect tested-by tags [Peter, Srikanth] Pratik R. Sampat (10): KVM: SEV: Disable SEV-SNP support on initialization failure KVM: selftests: SEV-SNP test for KVM_SEV_INIT2 KVM: selftests: Add vmgexit helper KVM: selftests: Add SMT control state helper KVM: selftests: Replace assert() with TEST_ASSERT_EQ() KVM: selftests: Introduce SEV VM type check KVM: selftests: Add library support for interacting with SNP KVM: selftests: Force GUEST_MEMFD flag for SNP VM type KVM: selftests: Abstractions for SEV to decouple policy from type KVM: selftests: Add a basic SEV-SNP smoke test arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/svm/sev.c | 30 +++++- tools/arch/x86/include/uapi/asm/kvm.h | 1 + .../testing/selftests/kvm/include/kvm_util.h | 35 +++++++ .../selftests/kvm/include/x86/processor.h | 1 + tools/testing/selftests/kvm/include/x86/sev.h | 42 ++++++++- tools/testing/selftests/kvm/lib/kvm_util.c | 7 +- .../testing/selftests/kvm/lib/x86/processor.c | 4 +- tools/testing/selftests/kvm/lib/x86/sev.c | 93 +++++++++++++++++-- .../testing/selftests/kvm/x86/hyperv_cpuid.c | 19 ---- .../selftests/kvm/x86/sev_init2_tests.c | 13 +++ .../selftests/kvm/x86/sev_smoke_test.c | 75 +++++++++------ 12 files changed, 261 insertions(+), 60 deletions(-) -- 2.43.0

7 months

4
20
0 0

[RFC PATCH 00/11] New KVM ioctl to link a gmem inode to a new gmem file

by Ackerley Tng

Hello, This patchset builds upon the code at https://lore.kernel.org/lkml/20230718234512.1690985-1-seanjc@google.com/T/. This code is available at https://github.com/googleprodkernel/linux-cc/tree/kvm-gmem-link-migrate-rfc…. In guest_mem v11, a split file/inode model was proposed, where memslot bindings belong to the file and pages belong to the inode. This model lends itself well to having different VMs use separate files pointing to the same inode. This RFC proposes an ioctl, KVM_LINK_GUEST_MEMFD, that takes a VM and a gmem fd, and returns another gmem fd referencing a different file and associated with VM. This RFC also includes an update to KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to migrate memory context (slot->arch.lpage_info and kvm->mem_attr_array) from source to destination vm, intra-host. Intended usage of the two ioctls: 1. Source VM’s fd is passed to destination VM via unix sockets 2. Destination VM uses new ioctl KVM_LINK_GUEST_MEMFD to link source VM’s fd to a new fd. 3. Destination VM will pass new fds to KVM_SET_USER_MEMORY_REGION, which will bind the new file, pointing to the same inode that the source VM’s file points to, to memslots 4. Use KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to move kvm->mem_attr_array and slot->arch.lpage_info to the destination VM. 5. Run the destination VM as per normal Some other approaches considered were: + Using the linkat() syscall, but that requires a mount/directory for a source fd to be linked to + Using the dup() syscall, but that only duplicates the fd, and both fds point to the same file --- Ackerley Tng (11): KVM: guest_mem: Refactor out kvm_gmem_alloc_file() KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD KVM: selftests: Add tests for KVM_LINK_GUEST_MEMFD ioctl KVM: selftests: Test transferring private memory to another VM KVM: x86: Refactor sev's flag migration_in_progress to kvm struct KVM: x86: Refactor common code out of sev.c KVM: x86: Refactor common migration preparation code out of sev_vm_move_enc_context_from KVM: x86: Let moving encryption context be configurable KVM: x86: Handle moving of memory context for intra-host migration KVM: selftests: Generalize migration functions from sev_migrate_tests.c KVM: selftests: Add tests for migration of private mem arch/x86/include/asm/kvm_host.h | 4 +- arch/x86/kvm/svm/sev.c | 85 ++----- arch/x86/kvm/svm/svm.h | 3 +- arch/x86/kvm/x86.c | 221 +++++++++++++++++- arch/x86/kvm/x86.h | 6 + include/linux/kvm_host.h | 18 ++ include/uapi/linux/kvm.h | 8 + tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 42 ++++ .../selftests/kvm/include/kvm_util_base.h | 31 +++ .../kvm/x86_64/private_mem_migrate_tests.c | 93 ++++++++ .../selftests/kvm/x86_64/sev_migrate_tests.c | 48 ++-- virt/kvm/guest_mem.c | 151 ++++++++++-- virt/kvm/kvm_main.c | 10 + virt/kvm/kvm_mm.h | 7 + 15 files changed, 596 insertions(+), 132 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_migrate_tests.c -- 2.41.0.640.ga95def55d0-goog

7 months

3
15
0 0

[PATCH] selftests/run_kselftest.sh: Use readlink if realpath is not available

by Yosry Ahmed

'realpath' is not always available, fallback to 'readlink -f' if is not available. They seem to work equally well in this context. Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> --- tools/testing/selftests/run_kselftest.sh | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index 50e03eefe7ac7..0443beacf3621 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -3,7 +3,14 @@ # # Run installed kselftest tests. # -BASE_DIR=$(realpath $(dirname $0)) + +# Fallback to readlink if realpath is not available +if which realpath > /dev/null; then + BASE_DIR=$(realpath $(dirname $0)) +else + BASE_DIR=$(readlink -f $(dirname $0)) +fi + cd $BASE_DIR TESTS="$BASE_DIR"/kselftest-list.txt if [ ! -r "$TESTS" ] ; then -- 2.49.0.rc1.451.g8f38331e32-goog

7 months

2
3
0 0

[PATCH v1 1/3] selftests: pidfd: add missing sys/mount.h include in pidfd_fdinfo_test.c

by Peter Seiderer

Fix compile on openSUSE Tumbleweed (gcc-14.2.1, glibc-2.40): - add missing sys/mount.h include Fixes: pidfd_fdinfo_test.c: In function ‘child_fdinfo_nspid_test’: pidfd_fdinfo_test.c:230:13: error: implicit declaration of function ‘mount’ [-Wimplicit-function-declaration] 230 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0); | ^~~~~ Signed-off-by: Peter Seiderer <ps.report(a)gmx.net> --- tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c index f062a986e382..f718aac75068 100644 --- a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c +++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c @@ -13,6 +13,7 @@ #include <syscall.h> #include <sys/wait.h> #include <sys/mman.h> +#include <sys/mount.h> #include "pidfd.h" #include "../kselftest.h" -- 2.47.1

7 months, 1 week

2
5
0 0

[PATCH v4 0/5] Add support for the Bus Lock Threshold

by Manali Shukla

Misbehaving guests can cause bus locks to degrade the performance of a system. Non-WB (write-back) and misaligned locked RMW (read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee the atomicity. The bus locks can impose notable performance penalties for all processors within the system. Support for the Bus Lock Threshold is indicated by CPUID Fn8000_000A_EDX[29] BusLockThreshold=1, the VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count. VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations Bus lock threshold count VMCB Offset Bits Function 120h 15:0 Bus lock counter During VMRUN, the bus lock threshold count is fetched and stored in an internal count register. Prior to executing a bus lock within the guest, the processor verifies the count in the bus lock register. If the count is greater than zero, the processor executes the bus lock, reducing the count. However, if the count is zero, the bus lock operation is not performed, and instead, a Bus Lock Threshold #VMEXIT is triggered to transfer control to the Virtual Machine Monitor (VMM). A Bus Lock Threshold #VMEXIT is reported to the VMM with VMEXIT code 0xA5h, VMEXIT_BUSLOCK. EXITINFO1 and EXITINFO2 are set to 0 on a VMEXIT_BUSLOCK. On a #VMEXIT, the processor writes the current value of the Bus Lock Threshold Counter to the VMCB. Note: Currently, virtualizing the Bus Lock Threshold feature for L1 guest is not supported. More details about the Bus Lock Threshold feature can be found in AMD APM [1]. v3 -> v4 - Incorporated Sean's review comments - Added a preparatory patch to move linear_rip out of kvm_pio_request, so that it can be used by the bus lock threshold patches. - Added complete_userspace_buslock() function to reload bus_lock_counter to '1' only if the usespace has not changed the RIP. - Added changes to continue running bus_lock_counter accross the nested transitions. v2 -> v3 - Drop parch to add virt tag in /proc/cpuinfo. - Incorporated Tom's review comments. v1 -> v2 - Incorporated misc review comments from Sean. - Removed bus_lock_counter module parameter. - Set the value of bus_lock_counter to zero by default and reload the value by 1 in bus lock exit handler. - Add documentation for the behavioral difference for KVM_EXIT_BUS_LOCK. - Improved selftest for buslock to work on SVM and VMX. - Rewrite the commit messages. Patches are prepared on kvm-next/next (c9ea48bb6ee6). Testing done: - Tested the Bus Lock Threshold functionality on normal, SEV, SEV-ES and SEV-SNP guests. - Tested the Bus Lock Threshold functionality on nested guests. v1: https://lore.kernel.org/kvm/20240709175145.9986-4-manali.shukla@amd.com/T/ v2: https://lore.kernel.org/kvm/20241001063413.687787-4-manali.shukla@amd.com/T/ v3: https://lore.kernel.org/kvm/20241004053341.5726-1-manali.shukla@amd.com/T/ [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250 Manali Shukla (3): KVM: x86: Preparatory patch to move linear_rip out of kvm_pio_request x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs Nikunj A Dadhania (2): KVM: SVM: Enable Bus lock threshold exit KVM: selftests: Add bus lock exit test Documentation/virt/kvm/api.rst | 19 +++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/asm/svm.h | 5 +- arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/svm/nested.c | 42 ++++++ arch/x86/kvm/svm/svm.c | 38 +++++ arch/x86/kvm/svm/svm.h | 2 + arch/x86/kvm/x86.c | 8 +- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/x86/kvm_buslock_test.c | 135 ++++++++++++++++++ 11 files changed, 249 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86/kvm_buslock_test.c base-commit: c9ea48bb6ee6b28bbc956c1e8af98044618fed5e -- 2.34.1

7 months, 1 week

4
21
0 0

[PATCH v2 0/6] Add support for FEAT_{LS64, LS64_V} and related tests

by Yicong Yang

From: Yicong Yang <yangyicong(a)hisilicon.com> Armv8.7 introduces single-copy atomic 64-byte loads and stores instructions and its variants named under FEAT_{LS64, LS64_V}. Add support for Armv8.7 FEAT_{LS64, LS64_V}: - Add identifying and enabling in the cpufeature list - Expose the support of these features to userspace through HWCAP3 and cpuinfo - Add related hwcap test - Handle the trap of unsupported memory (normal/uncacheable) access in a VM A real scenario for this feature is that the userspace driver can make use of this to implement direct WQE (workqueue entry) - a mechanism to fill WQE directly into the hardware. This patchset also complement with Marc's patchset v2[1] for handling LS64* trapped if not advertised for a VM. [1] https://lore.kernel.org/linux-arm-kernel/20250310122505.2857610-1-maz@kerne… Tested with updated hwcap test: On host: root@localhost:/tmp# dmesg | grep "All CPU(s) started" [ 0.504846] CPU: All CPU(s) started at EL2 root@localhost:/tmp# ./hwcap [...] # LS64 present ok 217 cpuinfo_match_LS64 ok 218 sigill_LS64 ok 219 # SKIP sigbus_LS64 # LS64_V present ok 220 cpuinfo_match_LS64_V ok 221 sigill_LS64_V ok 222 # SKIP sigbus_LS64_V # 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0 On guest: root@localhost:/# dmesg | grep "All CPU(s) started" [ 0.205580] CPU: All CPU(s) started at EL1 root@localhost:/mnt# ./hwcap [...] # LS64 present ok 217 cpuinfo_match_LS64 ok 218 sigill_LS64 ok 219 # SKIP sigbus_LS64 # LS64_V present ok 220 cpuinfo_match_LS64_V ok 221 sigill_LS64_V ok 222 # SKIP sigbus_LS64_V # 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0 Change since v1: - Drop the suppport for LS64_ACCDATA - handle the DABT of unsupported memory type after checking the memory attributes Link: https://lore.kernel.org/linux-arm-kernel/20241202135504.14252-1-yangyicong@… Yicong Yang (6): arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1 arm64: Add support for FEAT_{LS64, LS64_V} KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest kselftest/arm64: Add HWCAP test for FEAT_{LS64, LS64_V} arm64: Add ESR.DFSC definition of unsupported exclusive or atomic access KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory Documentation/arch/arm64/booting.rst | 12 +++ Documentation/arch/arm64/elf_hwcaps.rst | 6 ++ arch/arm64/include/asm/el2_setup.h | 12 ++- arch/arm64/include/asm/esr.h | 8 ++ arch/arm64/include/asm/hwcap.h | 2 + arch/arm64/include/asm/kvm_emulate.h | 7 ++ arch/arm64/include/uapi/asm/hwcap.h | 2 + arch/arm64/kernel/cpufeature.c | 51 +++++++++++++ arch/arm64/kernel/cpuinfo.c | 2 + arch/arm64/kvm/inject_fault.c | 35 +++++++++ arch/arm64/kvm/mmu.c | 37 +++++++++- arch/arm64/tools/cpucaps | 2 + tools/testing/selftests/arm64/abi/hwcap.c | 90 +++++++++++++++++++++++ 13 files changed, 264 insertions(+), 2 deletions(-) -- 2.24.0

7 months, 1 week

4
16
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror March 2025