Linux-kselftest-mirror

linux-kselftest-mirror@lists.linaro.org

136 participants
14280 discussions

by Yang Zhong

Highly appreciate for your review. This version mostly addressed the comments from Sean. Most comments are adopted except three which are not closed and need more discussions: - Move the entire xfd write emulation code to x86.c. Doing so requires introducing a new kvm_x86_ops callback to disable msr write bitmap. According to Paolo's earlier comment he prefers to handle it in vmx.c. - Directly check msr_bitmap in update_exception_bitmap() (for trapping #NM) and vcpu_enter_guest() (for syncing guest xfd after vm-exit) instead of introducing an extra flag in the last patch. However, doing so requires another new kvm_x86_ops callback for checking msr_bitmap since vcpu_enter_guest() is x86 common code. Having an extra flag sounds simpler here (at least for the initial AMX support). It does penalize nested guest with one xfd sync per exit, but it's not worse than a normal guest which initializes xfd but doesn't run AMX applications at all. Those could be improved afterwards. - Disable #NM trap for nested guest. This version still chooses to always trap #NM (regardless in L1 or L2) as long as xfd write interception is disabled. In reality #NM is rare if nested guest doesn't intend to run AMX applications and always-trap is safer than dynamic trap for the basic support in case of any oversight here. (Jing is temporarily leave for family reason, Yang helped work out this version) ---- v3->v4: - Verify kvm selftest for AMX (Paolo) - Move fpstate buffer expansion from kvm_vcpu_after_set_cpuid () to kvm_check_cpuid() and improve patch description (Sean) - Drop 'preemption' word in #NM interception patch (Sean) - Remove 'trap_nm' flag. Replace it by: (Sean) * Trapping #NM according to guest_fpu::xfd when write to xfd is intercepted. * Always trapping #NM when xfd write interception is disabled - Use better name for #NM related functions (Sean) - Drop '#ifdef CONFIG_X86_64' in __kvm_set_xcr (Sean) - Update description for KVM_CAP_XSAVE2 and prevent the guest from using the wrong ioctl (Sean) - Replace 'xfd_out_of_sync' with a better name (Sean) v2->v3: - Trap #NM until write IA32_XFD with a non-zero value (Thomas) - Revise return value in __xstate_request_perm() (Thomas) - Revise doc for KVM_GET_SUPPORTED_CPUID (Paolo) - Add Thomas's reviewed-by on one patch - Reorder disabling read interception of XFD_ERR patch (Paolo) - Move disabling r/w interception of XFD from x86.c to vmx.c (Paolo) - Provide the API doc together with the new KVM_GET_XSAVE2 ioctl (Paolo) - Make KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) return minimum size of struct kvm_xsave (4K) (Paolo) - Request permission at the start of vm_create_with_vcpus() in selftest - Request permission conditionally when XFD is supported (Paolo) v1->v2: - Live migration supported and verified with a selftest - Rebase to Thomas's new series for guest fpstate reallocation [1] - Expand fpstate at KVM_SET_CPUID2 instead of when emulating XCR0 and IA32_XFD (Thomas/Paolo) - Accordingly remove all exit-to-userspace stuff - Intercept #NM to save guest XFD_ERR and restore host/guest value at preemption on/off boundary (Thomas) - Accordingly remove all xfd_err logic in preemption callback and fpu_swap_kvm_fpstate() - Reuse KVM_SET_XSAVE to handle both legacy and expanded buffer (Paolo) - Don't return dynamic bits w/o prctl() in KVM_GET_SUPPORTED_CPUID (Paolo) - Check guest permissions for dynamic features in CPUID[0xD] instead of only for AMX at KVM_SET_CPUID (Paolo) - Remove dynamic bit check for 32-bit guest in __kvm_set_xcr() (Paolo) - Fix CPUID emulation for 0x1d and 0x1e (Paolo) - Move "disable interception" to the end of the series (Paolo) This series brings AMX (Advanced Matrix eXtensions) virtualization support to KVM. The preparatory series from Thomas [1] is also included. A large portion of the changes in this series is to deal with eXtended Feature Disable (XFD) which allows resizing of the fpstate buffer to support dynamically-enabled XSTATE features with large state component (e.g. 8K for AMX). There are a lot of simplications when comparing v2/v3 to the original proposal [2] and the first version [3]. Thanks to Thomas and Paolo for many good suggestions. The support is based on following key changes: - Guest permissions for dynamically-enabled XSAVE features Native tasks have to request permission via prctl() before touching a dynamic-resized XSTATE compoenent. Introduce guest permissions for the similar purpose. Userspace VMM is expected to request guest permission only once when the first vCPU is created. KVM checks guest permission in KVM_SET_CPUID2. Setting XFD in guest cpuid w/o proper permissions fails this operation. In the meantime, unpermitted features are also excluded in KVM_GET_SUPPORTED_CPUID. - Extend fpstate reallocation mechanism to cover guest fpu Unlike native tasks which have reallocation triggered from #NM handler, guest fpstate reallocation is requested by KVM when it identifies the intention on using dynamically-enabled XSAVE features inside guest. Extend fpu core to allow KVM request fpstate buffer expansion for a guest fpu containter. - Trigger fpstate reallocation in KVM This could be done either statically (before guest runs) or dynamically (in the emulation path). According to discussion [1] we decide to statically enable all xfeatures allowed by guest perm in KVM_SET_CPUID2, with fpstate buffer sized accordingly. This spares a lot of code and also avoid imposing an ordered restore sequence (XCR0, XFD and XSTATE) to userspace VMM. - RDMSR/WRMSR emulation for IA32_XFD Because fpstate expansion is completed in KVM_SET_CPUID2, emulating r/w access to IA32_XFD simply involves the xfd field in the guest fpu container. If write and guest fpu is currently active, the software state (guest_fpstate::xfd and per-cpu xfd cache) is also updated. - RDMSR/WRMSR emulation for XFD_ERR When XFD causes an instruction to generate #NM, XFD_ERR contains information about which disabled state components are being accessed. It'd be problematic if the XFD_ERR value generated in guest is consumed/clobbered by the host before the guest itself doing so. Intercept #NM exception to save the guest XFD_ERR value when write IA32_XFD with a non-zero value for 1st time. There is at most one interception per guest task given a dynamic feature. RDMSR/WRMSR emulation uses the saved value. The host value (always ZERO outside of the host #NM handler) is restored before enabling preemption. The saved guest value is restored right before entering the guest (with preemption disabled). - Get/set dynamic xfeature state for migration Introduce new capability (KVM_CAP_XSAVE2) to deal with >4KB fpstate buffer. Reading this capability returns the size of the current guest fpstate (e.g. after expansion). Userspace VMM uses a new ioctl (KVM_GET_XSAVE2) to read guest fpstate from the kernel and reuses the existing ioctl (KVM_SET_XSAVE) to update guest fpsate to the kernel. KVM_SET_XSAVE is extended to do properly_sized memdup_user() based on the guest fpstate. - Expose related cpuid bits to guest The last step is to allow exposing XFD, AMX_TILE, AMX_INT8 and AMX_BF16 in guest cpuid. Adding those bits into kvm_cpu_caps finally activates all previous logics in this series - Optimization: disable interception for IA32_XFD IA32_XFD can be frequently updated by the guest, as it is part of the task state and swapped in context switch when prev and next have different XFD setting. Always intercepting WRMSR can easily cause non-negligible overhead. Disable r/w emulation for IA32_XFD after intercepting the first WRMSR(IA32_XFD) with a non-zero value. However MSR passthrough implies the software state (guest_fpstate::xfd and per-cpu xfd cache) might be out of sync with MSR. This suggests KVM needs to re-sync them at VM-exit before preemption is enabled. Thanks Jun Nakajima and Kevin Tian for the design suggestions when this version is being internally worked on. [1] https://lore.kernel.org/all/20211214022825.563892248@linutronix.de/ [2] https://www.spinics.net/lists/kvm/msg259015.html [3] https://lore.kernel.org/lkml/20211208000359.2853257-1-yang.zhong@intel.com/ Thanks, Yang --- Guang Zeng (1): kvm: x86: Add support for getting/setting expanded xstate buffer Jing Liu (11): kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID x86/fpu: Make XFD initialization in __fpstate_reset() a function argument kvm: x86: Check and enable permitted dynamic xfeatures at KVM_SET_CPUID2 kvm: x86: Add emulation for IA32_XFD x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Add CPUID support for Intel AMX Kevin Tian (3): x86/fpu: Provide fpu_update_guest_perm_features() for guest x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Disable interception for IA32_XFD on demand Thomas Gleixner (5): x86/fpu: Extend fpu_xstate_prctl() with guest permissions x86/fpu: Prepare guest FPU for dynamically enabled FPU features x86/fpu: Add guest support to xfd_enable_feature() x86/fpu: Add uabi_size to guest_fpu x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() Wei Wang (1): kvm: selftests: Add support for KVM_CAP_XSAVE2 Documentation/virt/kvm/api.rst | 46 +++++- arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/fpu/api.h | 11 ++ arch/x86/include/asm/fpu/types.h | 32 ++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 16 +- arch/x86/include/uapi/asm/prctl.h | 26 ++-- arch/x86/kernel/fpu/core.c | 104 ++++++++++++- arch/x86/kernel/fpu/xstate.c | 147 +++++++++++------- arch/x86/kernel/fpu/xstate.h | 15 +- arch/x86/kernel/process.c | 2 + arch/x86/kvm/cpuid.c | 99 +++++++++--- arch/x86/kvm/vmx/vmcs.h | 5 + arch/x86/kvm/vmx/vmx.c | 45 +++++- arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/kvm/x86.c | 105 ++++++++++++- include/uapi/linux/kvm.h | 4 + tools/arch/x86/include/uapi/asm/kvm.h | 16 +- tools/include/uapi/linux/kvm.h | 3 + .../testing/selftests/kvm/include/kvm_util.h | 2 + .../selftests/kvm/include/x86_64/processor.h | 10 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 32 ++++ .../selftests/kvm/lib/x86_64/processor.c | 67 +++++++- .../testing/selftests/kvm/x86_64/evmcs_test.c | 2 +- tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +- .../testing/selftests/kvm/x86_64/state_test.c | 2 +- .../kvm/x86_64/vmx_preemption_timer_test.c | 2 +- 27 files changed, 691 insertions(+), 109 deletions(-)

4 years

[PATCH v2 00/13] KVM: selftests: Add tests for SEV and SEV-ES guests

by Michael Roth

These patches and are also available at: https://github.com/mdroth/linux/commits/sev-selftests-v2 They are based on top of the recent RFC: "KVM: selftests: Add support for test-selectable ucall implementations" https://lore.kernel.org/all/20211210164620.11636-1-michael.roth@amd.com/T/ https://github.com/mdroth/linux/commits/sev-selftests-ucall-rfc1 which provides a new ucall implementation that this series relies on. Those patches were in turn based on kvm/next as of 2021-12-10. == OVERVIEW == This series introduces a set of memory encryption-related parameter/hooks in the core kselftest library, then uses the hooks to implement a small library for creating/managing SEV, SEV-ES, and (eventually) SEV-SNP guests. This library is then used to implement a basic boot/memory test that's run for variants of SEV/SEV-ES guests. - Patches 1-8 implement SEV boot tests and should run against existing kernels - Patch 9 is a KVM changes that's required to allow SEV-ES/SEV-SNP guests to boot with an externally generated page table, and is a host kernel prequisite for the remaining patches in the series. - Patches 10-13 extend the boot tests to cover SEV-ES Any review/comments are greatly appreciated! v2: - rebased on ucall_ops patchset (which is based on kvm/next 2021-12-10) - remove SEV-SNP support for now - provide encryption bitmap as const* to original rather than as a copy (Mingwei, Paolo) - drop SEV-specific synchronization helpers in favor of ucall_ops_halt (Paolo) - don't pass around addresses with c-bit included, add them as-needed via addr_gpa2raw() (e.g. when adding PTEs, or initializing initial cr3/vm->pgd) (Paolo) - rename lib/sev.c functions for better consistency (Krish) - move more test setup code out of main test function and into setup_test_common() (Krish) - suppress compiler warnings due to -Waddress-of-packed-member like kernel does - don't require SNP support in minimum firmware version detection (Marc) - allow SEV device path to be configured via make SEV_PATH= (Marc) ---------------------------------------------------------------- Michael Roth (13): KVM: selftests: move vm_phy_pages_alloc() earlier in file KVM: selftests: sparsebit: add const where appropriate KVM: selftests: add hooks for managing encrypted guest memory KVM: selftests: handle encryption bits in page tables KVM: selftests: add support for encrypted vm_vaddr_* allocations KVM: selftests: ensure ucall_shared_alloc() allocates shared memory KVM: selftests: add library for creating/interacting with SEV guests KVM: selftests: add SEV boot tests KVM: SVM: include CR3 in initial VMSA state for SEV-ES guests KVM: selftests: account for error code in #VC exception frame KVM: selftests: add support for creating SEV-ES guests KVM: selftests: add library for handling SEV-ES-related exits KVM: selftests: add SEV-ES boot tests arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/svm.c | 19 ++ arch/x86/kvm/vmx/vmx.c | 6 + arch/x86/kvm/x86.c | 1 + tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 10 +- .../testing/selftests/kvm/include/kvm_util_base.h | 10 + tools/testing/selftests/kvm/include/sparsebit.h | 36 +-- tools/testing/selftests/kvm/include/x86_64/sev.h | 44 +++ .../selftests/kvm/include/x86_64/sev_exitlib.h | 14 + tools/testing/selftests/kvm/include/x86_64/svm.h | 35 +++ .../selftests/kvm/include/x86_64/svm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 270 ++++++++++++------ .../testing/selftests/kvm/lib/kvm_util_internal.h | 10 + tools/testing/selftests/kvm/lib/sparsebit.c | 48 ++-- tools/testing/selftests/kvm/lib/ucall_common.c | 4 +- tools/testing/selftests/kvm/lib/x86_64/handlers.S | 4 +- tools/testing/selftests/kvm/lib/x86_64/processor.c | 16 +- tools/testing/selftests/kvm/lib/x86_64/sev.c | 252 ++++++++++++++++ .../testing/selftests/kvm/lib/x86_64/sev_exitlib.c | 249 ++++++++++++++++ .../selftests/kvm/x86_64/sev_all_boot_test.c | 316 +++++++++++++++++++++ 22 files changed, 1215 insertions(+), 133 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/x86_64/sev.h create mode 100644 tools/testing/selftests/kvm/include/x86_64/sev_exitlib.h create mode 100644 tools/testing/selftests/kvm/lib/x86_64/sev.c create mode 100644 tools/testing/selftests/kvm/lib/x86_64/sev_exitlib.c create mode 100644 tools/testing/selftests/kvm/x86_64/sev_all_boot_test.c

4 years

[PATCH v3 00/22] AMX Support in KVM

by Jing Liu

Highly appreciate for your review. We will continue working on remaining selftest and send out later. TODO: - kvm selftest for AMX is still in progress; ---- v2->v3: - Trap #NM until write IA32_XFD with a non-zero value (Thomas) - Revise return value in __xstate_request_perm() (Thomas) - Revise doc for KVM_GET_SUPPORTED_CPUID (Paolo) - Add Thomas's reviewed-by on one patch - Reorder disabling read interception of XFD_ERR patch (Paolo) - Move disabling r/w interception of XFD from x86.c to vmx.c (Paolo) - Provide the API doc together with the new KVM_GET_XSAVE2 ioctl (Paolo) - Make KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) return minimum size of struct kvm_xsave (4K) (Paolo) - Request permission at the start of vm_create_with_vcpus() in selftest - Request permission conditionally when XFD is supported (Paolo) v1->v2: - Live migration supported and verified with a selftest - Rebase to Thomas's new series for guest fpstate reallocation [1] - Expand fpstate at KVM_SET_CPUID2 instead of when emulating XCR0 and IA32_XFD (Thomas/Paolo) - Accordingly remove all exit-to-userspace stuff - Intercept #NM to save guest XFD_ERR and restore host/guest value at preemption on/off boundary (Thomas) - Accordingly remove all xfd_err logic in preemption callback and fpu_swap_kvm_fpstate() - Reuse KVM_SET_XSAVE to handle both legacy and expanded buffer (Paolo) - Don't return dynamic bits w/o prctl() in KVM_GET_SUPPORTED_CPUID (Paolo) - Check guest permissions for dynamic features in CPUID[0xD] instead of only for AMX at KVM_SET_CPUID (Paolo) - Remove dynamic bit check for 32-bit guest in __kvm_set_xcr() (Paolo) - Fix CPUID emulation for 0x1d and 0x1e (Paolo) - Move "disable interception" to the end of the series (Paolo) This series brings AMX (Advanced Matrix eXtensions) virtualization support to KVM. The preparatory series from Thomas [1] is also included. A large portion of the changes in this series is to deal with eXtended Feature Disable (XFD) which allows resizing of the fpstate buffer to support dynamically-enabled XSTATE features with large state component (e.g. 8K for AMX). There are a lot of simplications when comparing v2/v3 to the original proposal [2] and the first version [3]. Thanks to Thomas and Paolo for many good suggestions. The support is based on following key changes: - Guest permissions for dynamically-enabled XSAVE features Native tasks have to request permission via prctl() before touching a dynamic-resized XSTATE compoenent. Introduce guest permissions for the similar purpose. Userspace VMM is expected to request guest permission only once when the first vCPU is created. KVM checks guest permission in KVM_SET_CPUID2. Setting XFD in guest cpuid w/o proper permissions fails this operation. In the meantime, unpermitted features are also excluded in KVM_GET_SUPPORTED_CPUID. - Extend fpstate reallocation mechanism to cover guest fpu Unlike native tasks which have reallocation triggered from #NM handler, guest fpstate reallocation is requested by KVM when it identifies the intention on using dynamically-enabled XSAVE features inside guest. Extend fpu core to allow KVM request fpstate buffer expansion for a guest fpu containter. - Trigger fpstate reallocation in KVM This could be done either statically (before guest runs) or dynamically (in the emulation path). According to discussion [1] we decide to statically enable all xfeatures allowed by guest perm in KVM_SET_CPUID2, with fpstate buffer sized accordingly. This spares a lot of code and also avoid imposing an ordered restore sequence (XCR0, XFD and XSTATE) to userspace VMM. - RDMSR/WRMSR emulation for IA32_XFD Because fpstate expansion is completed in KVM_SET_CPUID2, emulating r/w access to IA32_XFD simply involves the xfd field in the guest fpu container. If write and guest fpu is currently active, the software state (guest_fpstate::xfd and per-cpu xfd cache) is also updated. - RDMSR/WRMSR emulation for XFD_ERR When XFD causes an instruction to generate #NM, XFD_ERR contains information about which disabled state components are being accessed. It'd be problematic if the XFD_ERR value generated in guest is consumed/clobbered by the host before the guest itself doing so. Intercept #NM exception to save the guest XFD_ERR value when write IA32_XFD with a non-zero value for 1st time. There is at most one interception per guest task given a dynamic feature. RDMSR/WRMSR emulation uses the saved value. The host value (always ZERO outside of the host #NM handler) is restored before enabling preemption. The saved guest value is restored right before entering the guest (with preemption disabled). - Get/set dynamic xfeature state for migration Introduce new capability (KVM_CAP_XSAVE2) to deal with >4KB fpstate buffer. Reading this capability returns the size of the current guest fpstate (e.g. after expansion). Userspace VMM uses a new ioctl (KVM_GET_XSAVE2) to read guest fpstate from the kernel and reuses the existing ioctl (KVM_SET_XSAVE) to update guest fpsate to the kernel. KVM_SET_XSAVE is extended to do properly_sized memdup_user() based on the guest fpstate. - Expose related cpuid bits to guest The last step is to allow exposing XFD, AMX_TILE, AMX_INT8 and AMX_BF16 in guest cpuid. Adding those bits into kvm_cpu_caps finally activates all previous logics in this series - Optimization: disable interception for IA32_XFD IA32_XFD can be frequently updated by the guest, as it is part of the task state and swapped in context switch when prev and next have different XFD setting. Always intercepting WRMSR can easily cause non-negligible overhead. Disable r/w emulation for IA32_XFD after intercepting the first WRMSR(IA32_XFD) with a non-zero value. However MSR passthrough implies the software state (guest_fpstate::xfd and per-cpu xfd cache) might be out of sync with MSR. This suggests KVM needs to re-sync them at VM-exit before preemption is enabled. To verify AMX virtualization overhead on non-AMX usages, we run the Phoronix kernel build test in the guest w/ and w/o AMX in cpuid. The result shows no observable difference between two configurations. Thanks Jun Nakajima and Kevin Tian for the design suggestions when this version is being internally worked on. [1] https://lore.kernel.org/all/20211214022825.563892248@linutronix.de/ [2] https://www.spinics.net/lists/kvm/msg259015.html [3] https://lore.kernel.org/lkml/20211208000359.2853257-1-yang.zhong@intel.com/ Thanks, Jing --- Guang Zeng (1): kvm: x86: Get/set expanded xstate buffer Jing Liu (13): kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID kvm: x86: Check permitted dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Make XFD initialization in __fpstate_reset() a function argument kvm: x86: Enable dynamic XSAVE features at KVM_SET_CPUID2 kvm: x86: Add emulation for IA32_XFD x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Add CPUID support for Intel AMX kvm: x86: Disable interception for IA32_XFD on demand Kevin Tian (2): x86/fpu: Provide fpu_update_guest_perm_features() for guest x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation Thomas Gleixner (5): x86/fpu: Extend fpu_xstate_prctl() with guest permissions x86/fpu: Prepare guest FPU for dynamically enabled FPU features x86/fpu: Add guest support to xfd_enable_feature() x86/fpu: Add uabi_size to guest_fpu x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() Wei Wang (1): kvm: selftests: Add support for KVM_CAP_XSAVE2 Documentation/virt/kvm/api.rst | 46 +++++- arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/fpu/api.h | 11 ++ arch/x86/include/asm/fpu/types.h | 32 ++++ arch/x86/include/asm/kvm_host.h | 2 + arch/x86/include/uapi/asm/kvm.h | 16 +- arch/x86/include/uapi/asm/prctl.h | 26 ++-- arch/x86/kernel/fpu/core.c | 104 ++++++++++++- arch/x86/kernel/fpu/xstate.c | 147 +++++++++++------- arch/x86/kernel/fpu/xstate.h | 15 +- arch/x86/kernel/process.c | 2 + arch/x86/kvm/cpuid.c | 93 ++++++++--- arch/x86/kvm/vmx/vmcs.h | 5 + arch/x86/kvm/vmx/vmx.c | 32 +++- arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/kvm/x86.c | 102 +++++++++++- include/uapi/linux/kvm.h | 4 + tools/arch/x86/include/uapi/asm/kvm.h | 16 +- tools/include/uapi/linux/kvm.h | 3 + .../testing/selftests/kvm/include/kvm_util.h | 2 + .../selftests/kvm/include/x86_64/processor.h | 10 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 32 ++++ .../selftests/kvm/lib/x86_64/processor.c | 67 +++++++- .../testing/selftests/kvm/x86_64/evmcs_test.c | 2 +- tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +- .../testing/selftests/kvm/x86_64/state_test.c | 2 +- .../kvm/x86_64/vmx_preemption_timer_test.c | 2 +- 27 files changed, 668 insertions(+), 111 deletions(-) -- 2.27.0

4 years

[PATCH v1 00/11] mm: COW fixes part 1: fix the COW security issue for THP and hugetlb

by David Hildenbrand

Hi everybody, as discussed in the linux-mm alignment session on Wednesday, this is part 1 of the COW fixes: fix the COW security issue using GUP-triggered unsharing of shared anonymous pages (ordinary, THP, hugetlb). In the meeting slides, this approach was referred to as "Copy On Read". If anybody wants to have access to the slides, please feel free to reach out. The patches are based on v5.16-rc5 and available at: https://github.com/davidhildenbrand/linux/pull/new/unshare_v1 It is currently again possible for a child process to observe modifications of anonymous pages performed by the parent process after fork() in some cases, which is not only a violation of the POSIX semantics of MAP_PRIVATE, but more importantly a real security issue. This issue, including other related COW issues, has been summarized at [1]: " 1. Observing Memory Modifications of Private Pages From A Child Process Long story short: process-private memory might not be as private as you think once you fork(): successive modifications of private memory regions in the parent process can still be observed by the child process, for example, by smart use of vmsplice()+munmap(). The core problem is that pinning pages readable in a child process, such as done via the vmsplice system call, can result in a child process observing memory modifications done in the parent process the child is not supposed to observe. [1] contains an excellent summary and [2] contains further details. This issue was assigned CVE-2020-29374 [9]. For this to trigger, it's required to use a fork() without subsequent exec(), for example, as used under Android zygote. Without further details about an application that forks less-privileged child processes, one cannot really say what's actually affected and what's not -- see the details section the end of this mail for a short sshd/openssh analysis. While commit 17839856fd58 ("gup: document and work around "COW can break either way" issue") fixed this issue and resulted in other problems (e.g., ptrace on pmem), commit 09854ba94c6a ("mm: do_wp_page() simplification") re-introduced part of the problem unfortunately. The original reproducer can be modified quite easily to use THP [3] and make the issue appear again on upstream kernels. I modified it to use hugetlb [4] and it triggers as well. The problem is certainly less severe with hugetlb than with THP; it merely highlights that we still have plenty of open holes we should be closing/fixing. Regarding vmsplice(), the only known workaround is to disallow the vmsplice() system call ... or disable THP and hugetlb. But who knows what else is affected (RDMA? O_DIRECT?) to achieve the same goal -- in the end, it's a more generic issue. " This security issue was first reported by Jann Horn on 27 May 2020 and it currently affects anonymous THP and hugetlb again. The "security issue" part for hugetlb might be less important than for THP. However, with this approach it's just easy to get the MAP_PRIVATE semantics of any anonymous pages in that regard and avoid any such information leaks without much added complexity. Ordinary anonymous pages are currently not affected, because the COW logic was changed in commit 09854ba94c6a ("mm: do_wp_page() simplification") for them to COW on "page_count() != 1" instead of "mapcount > 1", which unfortunately results in other COW issues, some of them documented in [1] as well. To fix this COW issue once and for all, introduce GUP-triggered unsharing that can be conditionally triggered via FAULT_FLAG_UNSHARE. In contrast to traditional COW, unsharing will leave the copied page mapped write-protected in the page table, not having the semantics of a write fault. Logically, unsharing is triggered "early", as soon as GUP performs the action that could result in a COW getting missed later and the security issue triggering: however, unsharing is not triggered as before via a write fault with undesired side effects. Long story short, GUP triggers unsharing if all of the following conditions are met: * The page is mapped R/O * We have an anonymous page, excluding KSM * We want to read (!FOLL_WRITE) * Unsharing is not disabled (!FOLL_NOUNSHARE) * We want to take a reference (FOLL_GET or FOLL_PIN) * The page is a shared anonymous page: mapcount > 1 To reliably detect shared anonymous THP without heavy locking, introduce a mapcount_seqcount seqlock that protects the mapcount of a THP and can be used to read an atomic mapcount value. The mapcount_seqlock is stored inside the memmap of the compound page -- to keep it simple, factor out a raw_seqlock_t from the seqlock_t. As this patch series introduces the same unsharing logic for any anonymous pages, it also paves the way to fix other COW issues, e.g., documented in [1], without reintroducing the security issue or reintroducing other issues we observed in the past (e.g., broken ptrace on pmem). All reproducers for this COW issue have been consolidated in the selftest included in this series. Hopefully we'll get this fixed for good. Future work: * get_user_pages_fast_only() can currently spin on the mapcount_seqcount when reading the mapcount, which might be a rare event. While this is fine even when done from get_user_pages_fast_only() in IRQ context, we might want to just fail fast in get_user_pages_fast_only(). We already have patches prepared that add page_anon_maybe_shared() and page_trans_huge_anon_maybe_shared() that will return "true" in case spinning would be required and make get_user_pages_fast_only() fail fast. I'm excluding them for simplicity. ... even better would be finding a way to just not need the mapcount_seqcount, but THP splitting and PageDoubleMap() gives us a hard time -- but maybe we'll eventually find a way someday :) * Part 2 will tackle the other user-space visible breakages / COW issues raised in [1]. This series is the basis for adjusting the COW logic once again without re-introducing the COW issue fixed in this series and without reintroducing the issues we saw with the original CVE fix (e.g., breaking ptrace on pmem). There might be further parts to improve the GUP long-term <-> MM synchronicity and to optimize some things around that. The idea is by Andrea and some patches are rewritten versions of prototype patches by Andrea. I cross-compiled and tested as good as possible. I'll CC locking+selftest folks only on the relevant patch and the cover letter to minimze the noise. I'll put everyone on CC who was either involved with the COW issues in the past or attended the linux-mm alignment session on Wednesday. Appologies if I forget anyone :) [1] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com David Hildenbrand (11): seqlock: provide lockdep-free raw_seqcount_t variant mm: thp: consolidate mapcount logic on THP split mm: simplify hugetlb and file-THP handling in __page_mapcount() mm: thp: simlify total_mapcount() mm: thp: allow for reading the THP mapcount atomically via a raw_seqlock_t mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (!hugetlb) mm: hugetlb: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (hugetlb) mm: thp: introduce and use page_trans_huge_anon_shared() selftests/vm: add tests for the known COW security issues Documentation/locking/seqlock.rst | 50 ++++ include/linux/huge_mm.h | 72 +++++ include/linux/mm.h | 14 + include/linux/mm_types.h | 9 + include/linux/seqlock.h | 145 +++++++--- mm/gup.c | 89 +++++- mm/huge_memory.c | 120 +++++++-- mm/hugetlb.c | 129 +++++++-- mm/memory.c | 136 ++++++++-- mm/rmap.c | 40 +-- mm/swapfile.c | 35 ++- mm/util.c | 24 +- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/gup_cow.c | 312 ++++++++++++++++++++++ tools/testing/selftests/vm/run_vmtests.sh | 16 ++ 15 files changed, 1044 insertions(+), 148 deletions(-) create mode 100644 tools/testing/selftests/vm/gup_cow.c -- 2.31.1

4 years

[PATCH AUTOSEL 5.15 16/16] userfaultfd/selftests: fix hugetlb area allocations

by Sasha Levin

From: Mike Kravetz <mike.kravetz(a)oracle.com> [ Upstream commit f5c73297181c6b3ad76537bad98eaad6d29b9333 ] Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh or any environment where there are 'just enough' hugetlb pages will always fail with: testing events (fork, remap, remove): ERROR: UFFDIO_COPY error: -12 (errno=12, line=616) The ENOMEM error code implies there are not enough hugetlb pages. However, there are free hugetlb pages but they are all reserved. There is a basic problem with the way the test allocates hugetlb pages which has existed since the test was originally written. Due to the way 'cleanup' was done between different phases of the test, this issue was masked until recently. The issue was uncovered by commit 8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each test"). For the hugetlb test, src and dst areas are allocated as PRIVATE mappings of a hugetlb file. This means that at mmap time, pages are reserved for the src and dst areas. At the start of event testing (and other tests) the src area is populated which results in allocation of huge pages to fill the area and consumption of reserves associated with the area. Then, a child is forked to fault in the dst area. Note that the dst area was allocated in the parent and hence the parent owns the reserves associated with the mapping. The child has normal access to the dst area, but can not use the reserves created/owned by the parent. Thus, if there are no other huge pages available allocation of a page for the dst by the child will fail. Fix by not creating reserves for the dst area. In this way the child can use free (non-reserved) pages. Also, MAP_PRIVATE of a file only makes sense if you are interested in the contents of the file before making a COW copy. The test does not do this. So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous hugetlb mapping. There is no need to create a hugetlb file in the non-shared case. Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Mina Almasry <almasrymina(a)google.com> Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/vm/userfaultfd.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 60aa1a4fc69b6..81690f1737c80 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -86,7 +86,7 @@ static bool test_uffdio_minor = false; static bool map_shared; static int shm_fd; -static int huge_fd; +static int huge_fd = -1; /* only used for hugetlb_shared test */ static char *huge_fd_off0; static unsigned long long *count_verify; static int uffd = -1; @@ -222,6 +222,9 @@ static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset) static void hugetlb_release_pages(char *rel_area) { + if (huge_fd == -1) + return; + if (fallocate(huge_fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, rel_area == huge_fd_off0 ? 0 : nr_pages * page_size, nr_pages * page_size)) @@ -234,16 +237,17 @@ static void hugetlb_allocate_area(void **alloc_area) char **alloc_area_alias; *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, - (map_shared ? MAP_SHARED : MAP_PRIVATE) | - MAP_HUGETLB, - huge_fd, *alloc_area == area_src ? 0 : - nr_pages * page_size); + map_shared ? MAP_SHARED : + MAP_PRIVATE | MAP_HUGETLB | + (*alloc_area == area_src ? 0 : MAP_NORESERVE), + huge_fd, + *alloc_area == area_src ? 0 : nr_pages * page_size); if (*alloc_area == MAP_FAILED) err("mmap of hugetlbfs file failed"); if (map_shared) { area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_HUGETLB, + MAP_SHARED, huge_fd, *alloc_area == area_src ? 0 : nr_pages * page_size); if (area_alias == MAP_FAILED) -- 2.34.1

4 years

[Resource Leak] Missing closing files in testing/selftests/vm/mlock2

by Ryan Cai

Dear Kernel maintainers, 1. In testing/selftests/vm/mlock2, the file opened at Line 39 may not closed when going to Line 65. Location: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d477… Can I send I a patch? Best, Ryan

4 years

[Resource Leak] Missing closing files in tools/bpf/bpf_asm.c

by Ryan Cai

Dear Kernel maintainers, 1. In ksm_read_sysfs, the file opened at Line 74 may not closed when going to Line 83. Location: https://github.com/torvalds/linux/blob/512b7931ad0561ffe14265f9ff554a3c081b… 2. In ksm_write_sysfs, the file opened at Line 56 may not closed when going to Line 64. Location: https://github.com/torvalds/linux/blob/512b7931ad0561ffe14265f9ff554a3c081b… Should they be bugs? I can send a patch for these. Best, Ryan

4 years

[Resource Leak] Missing closing files in testing/selftests/timens/procfs.c

by Ryan Cai

Dear Kernel maintainers, 1. In read_proc_uptime, the file opened at Line 75 may not closed when going to Line 84 and 87. Location: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d477… Can I send I a patch? Best, Ryan

4 years

[PATCH v4 0/4] KVM RISC-V 64-bit selftests support

by Anup Patel

This series adds initial support for testing KVM RISC-V 64-bit using kernel selftests framework. The PATCH1 & PATCH2 of this series does some ground work in KVM RISC-V to implement RISC-V support in the KVM selftests whereas remaining patches does required changes in the KVM selftests. These patches can be found in riscv_kvm_selftests_v3 branch at: https://github.com/avpatel/linux.git Changes since v2: - Rebased series on Linux-5.16-rc6 - Renamed kvm_riscv_stage2_gpa_size() to kvm_riscv_stage2_gpa_bits() in PATCH2 Changes since v1: - Renamed kvm_sbi_ext_expevend_handler() to kvm_sbi_ext_forward_handler() in PATCH1 - Renamed KVM_CAP_RISCV_VM_GPA_SIZE to KVM_CAP_VM_GPA_BITS in PATCH2 and PATCH4 Anup Patel (4): RISC-V: KVM: Forward SBI experimental and vendor extensions RISC-V: KVM: Add VM capability to allow userspace get GPA bits KVM: selftests: Add EXTRA_CFLAGS in top-level Makefile KVM: selftests: Add initial support for RISC-V 64-bit arch/riscv/include/asm/kvm_host.h | 1 + arch/riscv/kvm/mmu.c | 5 + arch/riscv/kvm/vcpu_sbi.c | 4 + arch/riscv/kvm/vcpu_sbi_base.c | 27 ++ arch/riscv/kvm/vm.c | 3 + include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/Makefile | 14 +- .../testing/selftests/kvm/include/kvm_util.h | 10 + .../selftests/kvm/include/riscv/processor.h | 135 +++++++ tools/testing/selftests/kvm/lib/guest_modes.c | 10 + .../selftests/kvm/lib/riscv/processor.c | 362 ++++++++++++++++++ tools/testing/selftests/kvm/lib/riscv/ucall.c | 87 +++++ 12 files changed, 658 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/include/riscv/processor.h create mode 100644 tools/testing/selftests/kvm/lib/riscv/processor.c create mode 100644 tools/testing/selftests/kvm/lib/riscv/ucall.c -- 2.25.1

4 years

Wycena paneli fotowoltaicznych

by "Paweł Jasiński"

Dzień dobry, dostrzegam możliwość współpracy z Państwa firmą. Świadczymy kompleksową obsługę inwestycji w fotowoltaikę, która obniża koszty energii elektrycznej nawet o 90%. Czy są Państwo zainteresowani weryfikacją wstępnych propozycji? Pozdrawiam Paweł Jasiński

4 years

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror