From: Jeff Xu <jeffxu(a)chromium.org>
This series increase the test coverage of mseal_test by:
Add check for vma_size, prot, and error code for existing tests.
Add more testcases for madvise, munmap, mmap and mremap to cover
sealing in different scenarios.
The increase test coverage hopefully help to prevent future regression.
It doesn't change any existing mm api's semantics, i.e. it will pass on
linux main and 6.10 branch.
Note: in order to pass this test in mm-unstable, mm-unstable must have
Liam's fix on mmap [1]
[1] https://lore.kernel.org/linux-kselftest/vyllxuh5xbqmaoyl2mselebij5ox7cseekj…
History:
V3:
- no-functional change, incooperate feedback from Pedro Falcato
V2:
- https://lore.kernel.org/linux-kselftest/20240829214352.963001-1-jeffxu@chro…
- remove the mmap fix (Liam R. Howlett will fix it separately)
- Add cover letter (Lorenzo Stoakes)
- split the testcase for ease of review (Mark Brown)
V1:
- https://lore.kernel.org/linux-kselftest/20240828225522.684774-1-jeffxu@chro…
Jeff Xu (5):
selftests/mseal_test: Check vma_size, prot, error code.
selftests/mseal: add sealed madvise type
selftests/mseal: munmap across multiple vma ranges.
selftests/mseal: add more tests for mmap
selftests/mseal: add more tests for mremap
tools/testing/selftests/mm/mseal_test.c | 830 ++++++++++++++++++++++--
1 file changed, 763 insertions(+), 67 deletions(-)
--
2.46.0.469.g59c65b2a67-goog
The PSCI v1.3 spec (https://developer.arm.com/documentation/den0022)
adds support for a SYSTEM_OFF2 function enabling a HIBERNATE_OFF state
which is analogous to ACPI S4. This will allow hosting environments to
determine that a guest is hibernated rather than just powered off, and
ensure that they preserve the virtual environment appropriately to
allow the guest to resume safely (or bump the hardware_signature in the
FACS to trigger a clean reboot instead).
This updates KVM to support advertising PSCI v1.3, and unconditionally
enables the SYSTEM_OFF2 support when PSCI v1.3 is enabled.
For the guest side, add a new SYS_OFF_MODE_POWER_OFF handler with higher
priority than the EFI one, but which *only* triggers when there's a
hibernation in progress. There are other ways to do this (see the commit
message for more details) but this seemed like the simplest.
Version 2 of the patch series splits out the psci.h definitions into a
separate commit (a dependency for both the guest and KVM side), and adds
definitions for the other new functions added in v1.3. It also moves the
pKVM psci-relay support to a separate commit; although in arch/arm64/kvm
that's actually about the *guest* side of SYSTEM_OFF2 (i.e. using it
from the host kernel, relayed through nVHE).
Version 3 dropped the KVM_CAP which allowed userspace to explicitly opt
in to the new feature like with SYSTEM_SUSPEND, and makes it depend only
on PSCI v1.3 being exposed to the guest.
Version 4 is no longer RFC, as the PSCI v1.3 spec is finally published.
Minor fixes from the last round of review, and an added KVM self test.
Version 5 drops some of the changes which didn't make it to the final
v1.3 spec, and cleans up a couple of places which still referred to it
as 'alpha' or 'beta'. It also temporarily drops the guest-side patch to
invoke SYSTEM_OFF2 for hibernation, pending confirmation that the final
PSCI v1.3 spec just has a typo where it changed to saying that 0x1
should be passed to mean HIBERNATE_OFF, even though it's advertised as
bit 0. That can be sent under separate cover, and perhaps should have
been anyway. The change in question doesn't matter for any of the KVM
patches, because we just treat SYSTEM_OFF2 like the existing
SYSTEM_RESET2, setting a flag to indicate that it was a SYSTEM_OFF2
call, but not actually caring about the argument; that's for userspace
to worry about.
David Woodhouse (5):
firmware/psci: Add definitions for PSCI v1.3 specification
KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
KVM: arm64: Add support for PSCI v1.2 and v1.3
KVM: selftests: Add test for PSCI SYSTEM_OFF2
KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
Documentation/virt/kvm/api.rst | 11 +++++
arch/arm64/include/uapi/asm/kvm.h | 6 +++
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 +
arch/arm64/kvm/hypercalls.c | 2 +
arch/arm64/kvm/psci.c | 43 ++++++++++++++++-
include/kvm/arm_psci.h | 4 +-
include/uapi/linux/psci.h | 5 ++
tools/testing/selftests/kvm/aarch64/psci_test.c | 61 +++++++++++++++++++++++++
8 files changed, 132 insertions(+), 2 deletions(-)
1. In order to make rtctest more explicit and robust, we propose to use
RTC_PARAM_GET ioctl interface to check rtc alarm feature state before
running alarm related tests.
2. The rtctest requires the read permission on /dev/rtc0. The rtctest will
be skipped if the /dev/rtc0 is not readable.
Joseph Jang (2):
selftest: rtc: Add to check rtc alarm status for alarm related test
selftest: rtc: Check if could access /dev/rtc0 before testing
tools/testing/selftests/rtc/Makefile | 2 +-
tools/testing/selftests/rtc/rtctest.c | 71 ++++++++++++++++++++++++++-
2 files changed, 71 insertions(+), 2 deletions(-)
--
2.34.1
This fixes several smaller issues I faced when compiling the arm64
kselftests on my machine.
Patch 1 avoids a warning about the double definition of GNU_SOURCE,
for the arm64/signal tests. Patch 2 fixes a typo, where the f8dp2 hwcap
feature test was looking at the f8dp*4* cpuinfo name. Patch 3 adjusts
the output of the MTE tests when MTE is not available, so that tools
parsing the TAP output don't get confused and report errors.
The remaining patches are about wrong printf format specifiers. I grouped
them by type of error, in patch 4-8.
Please have a look!
Cheers,
Andre
Andre Przywara (8):
kselftest/arm64: signal: drop now redundant GNU_SOURCE definition
kselftest/arm64: hwcap: fix f8dp2 cpuinfo name
kselftest/arm64: mte: use proper SKIP syntax
kselftest/arm64: mte: use string literal for printf-style functions
kselftest/arm64: mte: fix printf type warning about mask
kselftest/arm64: mte: fix printf type warnings about __u64
kselftest/arm64: mte: fix printf type warnings about pointers
kselftest/arm64: mte: fix printf type warnings about longs
tools/testing/selftests/arm64/abi/hwcap.c | 2 +-
.../selftests/arm64/mte/check_buffer_fill.c | 4 ++--
tools/testing/selftests/arm64/mte/check_prctl.c | 4 ++--
.../selftests/arm64/mte/check_tags_inclusion.c | 4 ++--
.../testing/selftests/arm64/mte/mte_common_util.c | 15 +++++++--------
.../testing/selftests/arm64/mte/mte_common_util.h | 6 +++---
tools/testing/selftests/arm64/signal/Makefile | 2 +-
7 files changed, 18 insertions(+), 19 deletions(-)
--
2.25.1
Currently, the situation when guest accesses MMIO during event delivery
is handled differently in VMX and SVM: on VMX KVM returns internal error
with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV, when SVM simply goes
into infinite loop trying to deliver an event again and again.
This patch series eliminates this difference by returning a KVM internal
error with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV when guest is
performing MMIO during event delivery, for both VMX and SVM.
Also, it introduces a selftest test case which covers the MMIO during
event delivery error handling.
Ivan Orlov (3):
KVM: x86, vmx: Add function for event delivery error generation
KVM: vmx, svm, mmu: Process MMIO during event delivery
selftests: KVM: Add test case for MMIO during event delivery
arch/x86/include/asm/kvm_host.h | 8 ++++
arch/x86/kvm/mmu/mmu.c | 15 +++++-
arch/x86/kvm/svm/svm.c | 4 ++
arch/x86/kvm/vmx/vmx.c | 32 ++++---------
arch/x86/kvm/x86.c | 22 +++++++++
.../selftests/kvm/set_memory_region_test.c | 46 +++++++++++++++++++
6 files changed, 104 insertions(+), 23 deletions(-)
--
2.43.0
Hi all,
This series implements the Permission Overlay Extension introduced in 2022
VMSA enhancements [1]. It is based on v6.11-rc4.
Changes since v4[2]:
- Added Acks and R-bs, thanks!
- KVM:
- Move POR_EL{0,1} handling inside TCR_EL2 blocks
- Add visibility functions for registers [4]
- Make ID_AA64MMFR3_EL1 writable
- use system_supports_poe() more consistently
- use BIT instead of hex constants
- fix off-by-one in arch_max_pkey() macro
- add PKEY_DISABLE_EXECUTE and PKEY_DISABLE_READ
- Update some comments and commit messages.
- No change to when we save/restore POR_EL0 for signals!
Conflicts with GCS:
- Uses the same (last) bit in HWCAP2
- Uses the same VM_HIGH_ARCH_5
Conflicts with arm64 KVM:
- Maz has taken patch 8 into one of his own series
- I have taken and modified a patch from Maz (patch 9)
The Permission Overlay Extension allows to constrain permissions on memory
regions. This can be used from userspace (EL0) without a system call or TLB
invalidation.
POE is used to implement the Memory Protection Keys [3] Linux syscall.
The first few patches add the basic framework, then the PKEYS interface is
implemented, and then the selftests are made to work on arm64.
I have tested the modified protection_keys test on x86_64, but not PPC.
I haven't build tested the x86/ppc arch changes.
Thanks,
Joey
[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processor…
[2] https://lore.kernel.org/linux-arm-kernel/20240503130147.1154804-1-joey.goul…
[3] Documentation/core-api/protection-keys.rst
[4] https://lore.kernel.org/linux-arm-kernel/20240806-kvm-arm64-get-reg-list-v2…
Joey Gouly (30):
powerpc/mm: add ARCH_PKEY_BITS to Kconfig
x86/mm: add ARCH_PKEY_BITS to Kconfig
mm: use ARCH_PKEY_BITS to define VM_PKEY_BITN
arm64: disable trapping of POR_EL0 to EL2
arm64: cpufeature: add Permission Overlay Extension cpucap
arm64: context switch POR_EL0 register
KVM: arm64: Save/restore POE registers
KVM: arm64: make kvm_at() take an OP_AT_*
KVM: arm64: use `at s1e1a` for POE
KVM: arm64: Sanitise ID_AA64MMFR3_EL1
arm64: enable the Permission Overlay Extension for EL0
arm64: re-order MTE VM_ flags
arm64: add POIndex defines
arm64: convert protection key into vm_flags and pgprot values
arm64: mask out POIndex when modifying a PTE
arm64: handle PKEY/POE faults
arm64: add pte_access_permitted_no_overlay()
arm64: implement PKEYS support
arm64: add POE signal support
arm64/ptrace: add support for FEAT_POE
arm64: enable POE and PIE to coexist
arm64: enable PKEY support for CPUs with S1POE
arm64: add Permission Overlay Extension Kconfig
kselftest/arm64: move get_header()
selftests: mm: move fpregs printing
selftests: mm: make protection_keys test work on arm64
kselftest/arm64: add HWCAP test for FEAT_S1POE
kselftest/arm64: parse POE_MAGIC in a signal frame
kselftest/arm64: Add test case for POR_EL0 signal frame records
KVM: selftests: get-reg-list: add Permission Overlay registers
Documentation/arch/arm64/elf_hwcaps.rst | 2 +
arch/arm64/Kconfig | 23 +++
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 10 +-
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_asm.h | 3 +-
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/mman.h | 10 +-
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/include/asm/mmu_context.h | 46 +++++-
arch/arm64/include/asm/pgtable-hwdef.h | 10 ++
arch/arm64/include/asm/pgtable-prot.h | 8 +-
arch/arm64/include/asm/pgtable.h | 34 ++++-
arch/arm64/include/asm/pkeys.h | 108 ++++++++++++++
arch/arm64/include/asm/por.h | 33 +++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/include/asm/sysreg.h | 3 +
arch/arm64/include/asm/traps.h | 1 +
arch/arm64/include/asm/vncr_mapping.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/mman.h | 9 ++
arch/arm64/include/uapi/asm/sigcontext.h | 7 +
arch/arm64/kernel/cpufeature.c | 23 +++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/process.c | 28 ++++
arch/arm64/kernel/ptrace.c | 46 ++++++
arch/arm64/kernel/signal.c | 62 ++++++++
arch/arm64/kernel/traps.c | 6 +
arch/arm64/kvm/hyp/include/hyp/fault.h | 5 +-
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 27 ++++
arch/arm64/kvm/sys_regs.c | 25 +++-
arch/arm64/mm/fault.c | 55 ++++++-
arch/arm64/mm/mmap.c | 11 ++
arch/arm64/mm/mmu.c | 45 ++++++
arch/arm64/tools/cpucaps | 1 +
arch/powerpc/Kconfig | 4 +
arch/x86/Kconfig | 4 +
fs/proc/task_mmu.c | 2 +
include/linux/mm.h | 20 ++-
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 14 ++
.../testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/poe_siginfo.c | 86 +++++++++++
.../arm64/signal/testcases/testcases.c | 27 +---
.../arm64/signal/testcases/testcases.h | 28 +++-
.../selftests/kvm/aarch64/get-reg-list.c | 14 ++
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/pkey-arm64.h | 139 ++++++++++++++++++
tools/testing/selftests/mm/pkey-helpers.h | 8 +
tools/testing/selftests/mm/pkey-powerpc.h | 3 +
tools/testing/selftests/mm/pkey-x86.h | 4 +
tools/testing/selftests/mm/protection_keys.c | 109 ++++++++++++--
52 files changed, 1060 insertions(+), 63 deletions(-)
create mode 100644 arch/arm64/include/asm/pkeys.h
create mode 100644 arch/arm64/include/asm/por.h
create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c
create mode 100644 tools/testing/selftests/mm/pkey-arm64.h
--
2.25.1
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
When TCP over IPv4 via INET6 API, sk->sk_family is AF_INET6, but it is a v4 pkt.
inet_csk(sk)->icsk_af_ops is ipv6_mapped and use ip_queue_xmit. Some sockopt did
not take effect, such as tos.
0001: Use sk_is_inet helper to fix it.
0002: Setget_sockopt add a test for tcp over ipv4 via ipv6.
Changelog:
v2->v3: Addressed comments from Eric Dumazet
- Use sk_is_inet() helper
Details in here:
https://lore.kernel.org/bpf/CANn89i+9GmBLCdgsfH=WWe-tyFYpiO27wONyxaxiU6aOBC…
v1->v2: Addressed comments from kernel test robot
- Fix compilation error
Details in here:
https://lore.kernel.org/bpf/202408152058.YXAnhLgZ-lkp@intel.com/T/
Feng Zhou (2):
bpf: Fix bpf_get/setsockopt to tos not take effect when TCP over IPv4
via INET6 API
selftests/bpf: Setget_sockopt add a test for tcp over ipv4 via ipv6
net/core/filter.c | 7 +++-
.../selftests/bpf/prog_tests/setget_sockopt.c | 33 +++++++++++++++++++
.../selftests/bpf/progs/setget_sockopt.c | 13 ++++++--
3 files changed, 49 insertions(+), 4 deletions(-)
--
2.30.2
This splits the preparation works of the iommu and the Intel iommu driver
out from the iommufd pasid attach/replace series. [1]
To support domain replacement, the definition of the set_dev_pasid op
needs to be enhanced. Meanwhile, the existing set_dev_pasid callbacks
should be extended as well to suit the new definition.
This series first prepares the Intel iommu set_dev_pasid op for the new
definition, adds the missing set_dev_pasid support for nested domain, makes
ARM SMMUv3 set_dev_pasid op to suit the new definition, and in the end
enhances the definition of set_dev_pasid op. The AMD set_dev_pasid callback
is extended to fail if the caller tries to do domain replacement to meet the
new definition of set_dev_pasid op. AMD iommu driver would support it later
per Vasant [2].
[1] https://lore.kernel.org/linux-iommu/20240412081516.31168-1-yi.l.liu@intel.c…
[2] https://lore.kernel.org/linux-iommu/fa9c4fc3-9365-465e-8926-b4d2d6361b9c@am…
v2:
- Make ARM SMMUv3 set_dev_pasid op support domain replacement (Jason)
- Drop patch 03 of v1 (Kevin)
- Multiple tweaks in VT-d driver (Kevin)
v1: https://lore.kernel.org/linux-iommu/20240628085538.47049-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Jason Gunthorpe (1):
iommu/arm-smmu-v3: Make smmuv3 set_dev_pasid() op support replace
Lu Baolu (1):
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (4):
iommu: Pass old domain to set_dev_pasid op
iommu/vt-d: Move intel_drain_pasid_prq() into
intel_pasid_tear_down_entry()
iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain
replacement
iommu: Make set_dev_pasid op support domain replacement
drivers/iommu/amd/amd_iommu.h | 3 +-
drivers/iommu/amd/pasid.c | 6 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +-
drivers/iommu/intel/iommu.c | 122 ++++++++++++------
drivers/iommu/intel/iommu.h | 3 +
drivers/iommu/intel/nested.c | 1 +
drivers/iommu/intel/pasid.c | 13 +-
drivers/iommu/intel/pasid.h | 8 +-
drivers/iommu/intel/svm.c | 6 +-
drivers/iommu/iommu.c | 3 +-
include/linux/iommu.h | 5 +-
13 files changed, 129 insertions(+), 56 deletions(-)
--
2.34.1
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suze.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Lorenzo Stoakes (4):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 158 +++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 174 ++--
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1168 ++++++++++++++++++++++
18 files changed, 1525 insertions(+), 69 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.46.2