Hello everyone,
This series implements the Permission Overlay Extension introduced in 2022 VMSA enhancements [1]. It is based on v6.7-rc2.
Changes since v2[2]: # Added ptrace support and selftest # Add missing POR_EL0 initialisation in fork/clone # Rebase onto v6.7-rc2 # Add r-bs
The Permission Overlay Extension allows to constrain permissions on memory regions. This can be used from userspace (EL0) without a system call or TLB invalidation.
POE is used to implement the Memory Protection Keys [3] Linux syscall.
The first few patches add the basic framework, then the PKEYS interface is implemented, and then the selftests are made to work on arm64.
There was discussion about what the 'default' protection key value should be, I used disallow-all (apart from pkey 0), which matches what x86 does.
I have tested the modified protection_keys test on x86_64, but not PPC. I haven't build tested the x86/ppc arch changes.
Thanks, Joey
[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors... [2] https://lore.kernel.org/linux-arm-kernel/20231027180850.1068089-1-joey.gouly... [3] Documentation/core-api/protection-keys.rst
Joey Gouly (25): arm64/sysreg: add system register POR_EL{0,1} arm64/sysreg: update CPACR_EL1 register arm64: cpufeature: add Permission Overlay Extension cpucap arm64: disable trapping of POR_EL0 to EL2 arm64: context switch POR_EL0 register KVM: arm64: Save/restore POE registers arm64: enable the Permission Overlay Extension for EL0 arm64: add POIndex defines arm64: define VM_PKEY_BIT* for arm64 arm64: mask out POIndex when modifying a PTE arm64: enable ARCH_HAS_PKEYS on arm64 arm64: handle PKEY/POE faults arm64: stop using generic mm_hooks.h arm64: implement PKEYS support arm64: add POE signal support arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist arm64/ptrace: add support for FEAT_POE kselftest/arm64: move get_header() selftests: mm: move fpregs printing selftests: mm: make protection_keys test work on arm64 kselftest/arm64: add HWCAP test for FEAT_S1POE kselftest/arm64: parse POE_MAGIC in a signal frame kselftest/arm64: Add test case for POR_EL0 signal frame records KVM: selftests: get-reg-list: add Permission Overlay registers
Documentation/arch/arm64/elf_hwcaps.rst | 3 + arch/arm64/Kconfig | 18 +++ arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 10 +- arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_arm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 4 + arch/arm64/include/asm/mman.h | 8 +- arch/arm64/include/asm/mmu.h | 2 + arch/arm64/include/asm/mmu_context.h | 51 ++++++- arch/arm64/include/asm/page.h | 10 ++ arch/arm64/include/asm/pgtable-hwdef.h | 10 ++ arch/arm64/include/asm/pgtable-prot.h | 8 +- arch/arm64/include/asm/pgtable.h | 26 +++- arch/arm64/include/asm/pkeys.h | 110 ++++++++++++++ arch/arm64/include/asm/por.h | 33 +++++ arch/arm64/include/asm/processor.h | 1 + arch/arm64/include/asm/sysreg.h | 16 ++ arch/arm64/include/asm/traps.h | 1 + arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/sigcontext.h | 7 + arch/arm64/kernel/cpufeature.c | 23 +++ arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/process.c | 22 +++ arch/arm64/kernel/ptrace.c | 46 ++++++ arch/arm64/kernel/signal.c | 51 +++++++ arch/arm64/kernel/traps.c | 12 +- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 ++ arch/arm64/kvm/sys_regs.c | 2 + arch/arm64/mm/fault.c | 44 +++++- arch/arm64/mm/mmap.c | 9 ++ arch/arm64/mm/mmu.c | 40 +++++ arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 15 +- arch/powerpc/include/asm/page.h | 11 ++ arch/x86/include/asm/page.h | 10 ++ fs/proc/task_mmu.c | 2 + include/linux/mm.h | 13 -- include/uapi/linux/elf.h | 1 + tools/testing/selftests/arm64/abi/hwcap.c | 14 ++ .../testing/selftests/arm64/signal/.gitignore | 1 + .../arm64/signal/testcases/poe_siginfo.c | 86 +++++++++++ .../arm64/signal/testcases/testcases.c | 27 +--- .../arm64/signal/testcases/testcases.h | 28 +++- .../selftests/kvm/aarch64/get-reg-list.c | 14 ++ tools/testing/selftests/mm/Makefile | 2 +- tools/testing/selftests/mm/pkey-arm64.h | 139 ++++++++++++++++++ tools/testing/selftests/mm/pkey-helpers.h | 8 + tools/testing/selftests/mm/pkey-powerpc.h | 3 + tools/testing/selftests/mm/pkey-x86.h | 4 + tools/testing/selftests/mm/protection_keys.c | 109 ++++++++++++-- 51 files changed, 1011 insertions(+), 67 deletions(-) create mode 100644 arch/arm64/include/asm/pkeys.h create mode 100644 arch/arm64/include/asm/por.h create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c create mode 100644 tools/testing/selftests/mm/pkey-arm64.h
Add POR_EL{0,1} according to DDI0601 2023-03.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Reviewed-by: Mark Brown broonie@kernel.org --- arch/arm64/include/asm/sysreg.h | 13 +++++++++++++ arch/arm64/tools/sysreg | 12 ++++++++++++ 2 files changed, 25 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 5e65f51c10d2..9c2caf0efdc7 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -1039,6 +1039,19 @@
#define PIRx_ELx_PERM(idx, perm) ((perm) << ((idx) * 4))
+/* + * Permission Overlay Extension (POE) permission encodings. + */ +#define POE_NONE UL(0x0) +#define POE_R UL(0x1) +#define POE_X UL(0x2) +#define POE_RX UL(0x3) +#define POE_W UL(0x4) +#define POE_RW UL(0x5) +#define POE_XW UL(0x6) +#define POE_RXW UL(0x7) +#define POE_MASK UL(0xf) + #define ARM64_FEATURE_FIELD_BITS 4
/* Defined for compatibility only, do not add new users. */ diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 96cbeeab4eec..940040e82399 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2510,6 +2510,18 @@ Sysreg PIR_EL2 3 4 10 2 3 Fields PIRx_ELx EndSysreg
+Sysreg POR_EL0 3 3 10 2 4 +Fields PIRx_ELx +EndSysreg + +Sysreg POR_EL1 3 0 10 2 4 +Fields PIRx_ELx +EndSysreg + +Sysreg POR_EL12 3 5 10 2 4 +Fields PIRx_ELx +EndSysreg + Sysreg LORSA_EL1 3 0 10 4 0 Res0 63:52 Field 51:16 SA
On Fri, Nov 24, 2023 at 04:34:46PM +0000, Joey Gouly wrote:
Add POR_EL{0,1} according to DDI0601 2023-03.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Reviewed-by: Mark Brown broonie@kernel.org
Acked-by: Catalin Marinas catalin.marinas@arm.com
Add E0POE bit that traps accesses to POR_EL0 from EL0. Updated according to DDI0601 2023-03.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Reviewed-by: Mark Brown broonie@kernel.org --- arch/arm64/tools/sysreg | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 940040e82399..5209693239e9 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -1747,7 +1747,8 @@ Field 0 M EndSysreg
SysregFields CPACR_ELx -Res0 63:29 +Res0 63:30 +Field 29 E0POE Field 28 TTA Res0 27:26 Field 25:24 SMEN
On Fri, Nov 24, 2023 at 04:34:47PM +0000, Joey Gouly wrote:
Add E0POE bit that traps accesses to POR_EL0 from EL0.
Nitpick: s/traps/enables/
Acked-by: Catalin Marinas catalin.marinas@arm.com
This indicates if the system supports POE. This is a CPUCAP_BOOT_CPU_FEATURE as the boot CPU will enable POE if it has it, so secondary CPUs must also have this feature.
Add a new config option: ARM64_POE
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/Kconfig | 16 ++++++++++++++++ arch/arm64/kernel/cpufeature.c | 9 +++++++++ arch/arm64/tools/cpucaps | 1 + 3 files changed, 26 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..d7df6c603190 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2078,6 +2078,22 @@ config ARM64_EPAN if the cpu does not implement the feature. endmenu # "ARMv8.7 architectural features"
+menu "ARMv8.9 architectural features" +config ARM64_POE + prompt "Permission Overlay Extension" + def_bool y + help + The Permission Overlay Extension is used to implement Memory + Protection Keys. Memory Protection Keys provides a mechanism for + enforcing page-based protections, but without requiring modification + of the page tables when an application changes protection domains. + + For details, see Documentation/core-api/protection-keys.rst + + If unsure, say y. + +endmenu # "ARMv8.9 architectural features" + config ARM64_SVE bool "ARM Scalable Vector Extension support" default y diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 646591c67e7a..00b6d516ed3f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2731,6 +2731,15 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP) }, +#ifdef CONFIG_ARM64_POE + { + .desc = "Stage-1 Permission Overlay Extension (S1POE)", + .capability = ARM64_HAS_S1POE, + .type = ARM64_CPUCAP_BOOT_CPU_FEATURE, + .matches = has_cpuid_feature, + ARM64_CPUID_FIELDS(ID_AA64MMFR3_EL1, S1POE, IMP) + }, +#endif {}, };
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index b98c38288a9d..bbd2fac9345a 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -43,6 +43,7 @@ HAS_NESTED_VIRT HAS_NO_HW_PREFETCH HAS_PAN HAS_S1PIE +HAS_S1POE HAS_RAS_EXTN HAS_RNG HAS_SB
On Fri, Nov 24, 2023 at 04:34:48PM +0000, Joey Gouly wrote:
This indicates if the system supports POE. This is a CPUCAP_BOOT_CPU_FEATURE as the boot CPU will enable POE if it has it, so secondary CPUs must also have this feature.
Add a new config option: ARM64_POE
For bisection purposes if nothing else I'd expect to see the Kconfig option added after everything else is in place so that we don't build a kernel image which has partial POE support available and confuse things, with no way to enable the config option earlier patches which just build the !POE case until everything is ready.
On Fri, Nov 24, 2023 at 04:34:48PM +0000, Joey Gouly wrote:
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2078,6 +2078,22 @@ config ARM64_EPAN if the cpu does not implement the feature. endmenu # "ARMv8.7 architectural features" +menu "ARMv8.9 architectural features" +config ARM64_POE
- prompt "Permission Overlay Extension"
- def_bool y
- help
The Permission Overlay Extension is used to implement Memory
Protection Keys. Memory Protection Keys provides a mechanism for
enforcing page-based protections, but without requiring modification
of the page tables when an application changes protection domains.
For details, see Documentation/core-api/protection-keys.rst
If unsure, say y.
+endmenu # "ARMv8.9 architectural features"
I agree with Mark, we typically leave the Kconfig option towards the end of the series.
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 646591c67e7a..00b6d516ed3f 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2731,6 +2731,15 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP) }, +#ifdef CONFIG_ARM64_POE
- {
.desc = "Stage-1 Permission Overlay Extension (S1POE)",
.capability = ARM64_HAS_S1POE,
.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
.matches = has_cpuid_feature,
ARM64_CPUID_FIELDS(ID_AA64MMFR3_EL1, S1POE, IMP)
- },
+#endif
Keeping the #ifdef here is ok, it won't be defined at this point.
Allow EL0 or EL1 to access POR_EL0 without being trapped to EL2.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/el2_setup.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index b7afaa026842..df5614be4b70 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -184,12 +184,20 @@ .Lset_pie_fgt_@: mrs_s x1, SYS_ID_AA64MMFR3_EL1 ubfx x1, x1, #ID_AA64MMFR3_EL1_S1PIE_SHIFT, #4 - cbz x1, .Lset_fgt_@ + cbz x1, .Lset_poe_fgt_@
/* Disable trapping of PIR_EL1 / PIRE0_EL1 */ orr x0, x0, #HFGxTR_EL2_nPIR_EL1 orr x0, x0, #HFGxTR_EL2_nPIRE0_EL1
+.Lset_poe_fgt_@: + mrs_s x1, SYS_ID_AA64MMFR3_EL1 + ubfx x1, x1, #ID_AA64MMFR3_EL1_S1POE_SHIFT, #4 + cbz x1, .Lset_fgt_@ + + /* Disable trapping of POR_EL0 */ + orr x0, x0, #HFGxTR_EL2_nPOR_EL0 + .Lset_fgt_@: msr_s SYS_HFGRTR_EL2, x0 msr_s SYS_HFGWTR_EL2, x0
On Fri, Nov 24, 2023 at 04:34:49PM +0000, Joey Gouly wrote:
Allow EL0 or EL1 to access POR_EL0 without being trapped to EL2.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
Acked-by: Catalin Marinas catalin.marinas@arm.com
POR_EL0 is a register that can be modified by userspace directly, so it must be context switched.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/cpufeature.h | 6 ++++++ arch/arm64/include/asm/processor.h | 1 + arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/process.c | 22 ++++++++++++++++++++++ 4 files changed, 32 insertions(+)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index f6d416fe49b0..6870e4d46334 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -819,6 +819,12 @@ static inline bool system_supports_tlb_range(void) return alternative_has_cap_unlikely(ARM64_HAS_TLB_RANGE); }
+static inline bool system_supports_poe(void) +{ + return IS_ENABLED(CONFIG_ARM64_POE) && + alternative_has_cap_unlikely(ARM64_HAS_S1POE); +} + int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index e5bc54522e71..b3ad719c2d0c 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -179,6 +179,7 @@ struct thread_struct { u64 sctlr_user; u64 svcr; u64 tpidr2_el0; + u64 por_el0; };
static inline unsigned int thread_get_vl(struct thread_struct *thread, diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 9c2caf0efdc7..77a4797d0d54 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -1052,6 +1052,9 @@ #define POE_RXW UL(0x7) #define POE_MASK UL(0xf)
+/* Initial value for Permission Overlay Extension for EL0 */ +#define POR_EL0_INIT POE_RXW + #define ARM64_FEATURE_FIELD_BITS 4
/* Defined for compatibility only, do not add new users. */ diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 7387b68c745b..fc899c12d759 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -271,12 +271,19 @@ static void flush_tagged_addr_state(void) clear_thread_flag(TIF_TAGGED_ADDR); }
+static void flush_poe(void) +{ + if (system_supports_poe()) + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); +} + void flush_thread(void) { fpsimd_flush_thread(); tls_thread_flush(); flush_ptrace_hw_breakpoint(current); flush_tagged_addr_state(); + flush_poe(); }
void arch_release_task_struct(struct task_struct *tsk) @@ -374,6 +381,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (system_supports_tpidr2()) p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
+ if (system_supports_poe()) + p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0); + if (stack_start) { if (is_compat_thread(task_thread_info(p))) childregs->compat_sp = stack_start; @@ -498,6 +508,17 @@ static void erratum_1418040_new_exec(void) preempt_enable(); }
+static void permission_overlay_switch(struct task_struct *next) +{ + if (system_supports_poe()) { + current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0); + if (current->thread.por_el0 != next->thread.por_el0) { + write_sysreg_s(next->thread.por_el0, SYS_POR_EL0); + isb(); + } + } +} + /* * __switch_to() checks current->thread.sctlr_user as an optimisation. Therefore * this function must be called with preemption disabled and the update to @@ -533,6 +554,7 @@ struct task_struct *__switch_to(struct task_struct *prev, ssbs_thread_switch(next); erratum_1418040_thread_switch(next); ptrauth_thread_switch_user(next); + permission_overlay_switch(next);
/* * Complete any pending TLB or cache maintenance on this CPU in case
On Fri, Nov 24, 2023 at 04:34:50PM +0000, Joey Gouly wrote:
+static void flush_poe(void) +{
- if (system_supports_poe())
write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
+}
Here we have no isb()...
+static void permission_overlay_switch(struct task_struct *next) +{
- if (system_supports_poe()) {
current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
if (current->thread.por_el0 != next->thread.por_el0) {
write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
isb();
}
- }
+}
...but here we do, I'd expect them to be consistent.
If the barrier is needed it'd probably be helpful to have a comment explaining why we need it before the ERET.
On Sat, Nov 25, 2023 at 12:02:49PM +0000, Mark Brown wrote:
On Fri, Nov 24, 2023 at 04:34:50PM +0000, Joey Gouly wrote:
+static void flush_poe(void) +{
- if (system_supports_poe())
write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
+}
Here we have no isb()...
My immediate thought was that we'd not care about the ISB here since we'll have an ERET before getting to EL0. However, we may have some LDTR/STTR populating the new process args page on exec which may, in theory, pick up a stale POR_EL0.
On Thu, Dec 07, 2023 at 01:55:31PM +0000, Catalin Marinas wrote:
On Sat, Nov 25, 2023 at 12:02:49PM +0000, Mark Brown wrote:
On Fri, Nov 24, 2023 at 04:34:50PM +0000, Joey Gouly wrote:
+static void flush_poe(void) +{
- if (system_supports_poe())
write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
+}
Here we have no isb()...
My immediate thought was that we'd not care about the ISB here since we'll have an ERET before getting to EL0. However, we may have some LDTR/STTR populating the new process args page on exec which may, in theory, pick up a stale POR_EL0.
Yeah, it was a combination of the inconsistency and the lack of clarity over there being a path which could potentially use POR_EL0 before ERET. We at least probably need some comments with regard to the requirements here.
On Fri, Nov 24, 2023 at 04:34:50PM +0000, Joey Gouly wrote:
@@ -498,6 +508,17 @@ static void erratum_1418040_new_exec(void) preempt_enable(); } +static void permission_overlay_switch(struct task_struct *next) +{
- if (system_supports_poe()) {
current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
if (current->thread.por_el0 != next->thread.por_el0) {
write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
isb();
}
- }
Nitpick: use "if (!system_supports_poe()) return;" to avoid too much indentation.
W.r.t. the isb(), I think we accumulated quite a lot on this path. It might be worth going through them and having one at the end, where possible.
Define the new system registers that POE introduces and context switch them.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Marc Zyngier maz@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/kvm_arm.h | 4 ++-- arch/arm64/include/asm/kvm_host.h | 4 ++++ arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 ++++++++++ arch/arm64/kvm/sys_regs.c | 2 ++ 4 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index b85f46a73e21..597470e0b87b 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -346,14 +346,14 @@ */ #define __HFGRTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51)) #define __HFGRTR_EL2_MASK GENMASK(49, 0) -#define __HFGRTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGRTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50))
#define __HFGWTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51) | \ BIT(46) | BIT(42) | BIT(40) | BIT(28) | \ GENMASK(26, 25) | BIT(21) | BIT(18) | \ GENMASK(15, 14) | GENMASK(10, 9) | BIT(2)) #define __HFGWTR_EL2_MASK GENMASK(49, 0) -#define __HFGWTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGWTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50))
#define __HFGITR_EL2_RES0 GENMASK(63, 57) #define __HFGITR_EL2_MASK GENMASK(54, 0) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 824f29f04916..fa9ebd8fce40 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -401,6 +401,10 @@ enum vcpu_sysreg { PIR_EL1, /* Permission Indirection Register 1 (EL1) */ PIRE0_EL1, /* Permission Indirection Register 0 (EL1) */
+ /* Permission Overlay Extension registers */ + POR_EL1, /* Permission Overlay Register 1 (EL1) */ + POR_EL0, /* Permission Overlay Register 0 (EL0) */ + /* 32bit specific registers. */ DACR32_EL2, /* Domain Access Control Register */ IFSR32_EL2, /* Instruction Fault Status Register */ diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h index bb6b571ec627..22f07ee43e7e 100644 --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h @@ -19,6 +19,9 @@ static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt) { ctxt_sys_reg(ctxt, MDSCR_EL1) = read_sysreg(mdscr_el1); + + if (system_supports_poe()) + ctxt_sys_reg(ctxt, POR_EL0) = read_sysreg_s(SYS_POR_EL0); }
static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt) @@ -59,6 +62,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt) ctxt_sys_reg(ctxt, PIR_EL1) = read_sysreg_el1(SYS_PIR); ctxt_sys_reg(ctxt, PIRE0_EL1) = read_sysreg_el1(SYS_PIRE0); } + if (system_supports_poe()) + ctxt_sys_reg(ctxt, POR_EL1) = read_sysreg_el1(SYS_POR); ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par(); ctxt_sys_reg(ctxt, TPIDR_EL1) = read_sysreg(tpidr_el1);
@@ -89,6 +94,9 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt) static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) { write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1), mdscr_el1); + + if (system_supports_poe()) + write_sysreg_s(ctxt_sys_reg(ctxt, POR_EL0), SYS_POR_EL0); }
static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt) @@ -135,6 +143,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt) write_sysreg_el1(ctxt_sys_reg(ctxt, PIR_EL1), SYS_PIR); write_sysreg_el1(ctxt_sys_reg(ctxt, PIRE0_EL1), SYS_PIRE0); } + if (system_supports_poe()) + write_sysreg_el1(ctxt_sys_reg(ctxt, POR_EL1), SYS_POR); write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1), par_el1); write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 4735e1b37fb3..a54e5eadbf29 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2269,6 +2269,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 }, { SYS_DESC(SYS_PIRE0_EL1), NULL, reset_unknown, PIRE0_EL1 }, { SYS_DESC(SYS_PIR_EL1), NULL, reset_unknown, PIR_EL1 }, + { SYS_DESC(SYS_POR_EL1), NULL, reset_unknown, POR_EL1 }, { SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
{ SYS_DESC(SYS_LORSA_EL1), trap_loregion }, @@ -2352,6 +2353,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { .access = access_pmovs, .reg = PMOVSSET_EL0, .get_user = get_pmreg, .set_user = set_pmreg },
+ { SYS_DESC(SYS_POR_EL0), NULL, reset_unknown, POR_EL0 }, { SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 }, { SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 }, { SYS_DESC(SYS_TPIDR2_EL0), undef_access },
On Fri, 24 Nov 2023 16:34:51 +0000, Joey Gouly joey.gouly@arm.com wrote:
Define the new system registers that POE introduces and context switch them.
I would really like to see a discussion on the respective lifetimes of these two registers (see below).
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Marc Zyngier maz@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/kvm_arm.h | 4 ++-- arch/arm64/include/asm/kvm_host.h | 4 ++++ arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 ++++++++++ arch/arm64/kvm/sys_regs.c | 2 ++ 4 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index b85f46a73e21..597470e0b87b 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -346,14 +346,14 @@ */ #define __HFGRTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51)) #define __HFGRTR_EL2_MASK GENMASK(49, 0) -#define __HFGRTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGRTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50)) #define __HFGWTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51) | \ BIT(46) | BIT(42) | BIT(40) | BIT(28) | \ GENMASK(26, 25) | BIT(21) | BIT(18) | \ GENMASK(15, 14) | GENMASK(10, 9) | BIT(2)) #define __HFGWTR_EL2_MASK GENMASK(49, 0) -#define __HFGWTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGWTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50)) #define __HFGITR_EL2_RES0 GENMASK(63, 57) #define __HFGITR_EL2_MASK GENMASK(54, 0) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 824f29f04916..fa9ebd8fce40 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -401,6 +401,10 @@ enum vcpu_sysreg { PIR_EL1, /* Permission Indirection Register 1 (EL1) */ PIRE0_EL1, /* Permission Indirection Register 0 (EL1) */
- /* Permission Overlay Extension registers */
- POR_EL1, /* Permission Overlay Register 1 (EL1) */
- POR_EL0, /* Permission Overlay Register 0 (EL0) */
- /* 32bit specific registers. */ DACR32_EL2, /* Domain Access Control Register */ IFSR32_EL2, /* Instruction Fault Status Register */
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h index bb6b571ec627..22f07ee43e7e 100644 --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h @@ -19,6 +19,9 @@ static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt) { ctxt_sys_reg(ctxt, MDSCR_EL1) = read_sysreg(mdscr_el1);
- if (system_supports_poe())
ctxt_sys_reg(ctxt, POR_EL0) = read_sysreg_s(SYS_POR_EL0);
So this is saved as eagerly as it gets. Why? If it only affects EL0, it can be saved/restored in a much lazier way.
} static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt) @@ -59,6 +62,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt) ctxt_sys_reg(ctxt, PIR_EL1) = read_sysreg_el1(SYS_PIR); ctxt_sys_reg(ctxt, PIRE0_EL1) = read_sysreg_el1(SYS_PIRE0);
And the fact that you only touch PIRE0_EL1 here seems to be a good indication that the above can be relaxed.
}
- if (system_supports_poe())
nit: missing new line before the if().
ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par(); ctxt_sys_reg(ctxt, TPIDR_EL1) = read_sysreg(tpidr_el1);ctxt_sys_reg(ctxt, POR_EL1) = read_sysreg_el1(SYS_POR);
@@ -89,6 +94,9 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt) static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) { write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1), mdscr_el1);
- if (system_supports_poe())
write_sysreg_s(ctxt_sys_reg(ctxt, POR_EL0), SYS_POR_EL0);
Same thing here about the eager restore.
} static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt) @@ -135,6 +143,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt) write_sysreg_el1(ctxt_sys_reg(ctxt, PIR_EL1), SYS_PIR); write_sysreg_el1(ctxt_sys_reg(ctxt, PIRE0_EL1), SYS_PIRE0); }
- if (system_supports_poe())
new line.
write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1), par_el1); write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);write_sysreg_el1(ctxt_sys_reg(ctxt, POR_EL1), SYS_POR);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 4735e1b37fb3..a54e5eadbf29 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2269,6 +2269,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 }, { SYS_DESC(SYS_PIRE0_EL1), NULL, reset_unknown, PIRE0_EL1 }, { SYS_DESC(SYS_PIR_EL1), NULL, reset_unknown, PIR_EL1 },
- { SYS_DESC(SYS_POR_EL1), NULL, reset_unknown, POR_EL1 }, { SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
{ SYS_DESC(SYS_LORSA_EL1), trap_loregion }, @@ -2352,6 +2353,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { .access = access_pmovs, .reg = PMOVSSET_EL0, .get_user = get_pmreg, .set_user = set_pmreg },
- { SYS_DESC(SYS_POR_EL0), NULL, reset_unknown, POR_EL0 }, { SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 }, { SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 }, { SYS_DESC(SYS_TPIDR2_EL0), undef_access },
Another thing that is missing is the trap routing for NV in emulated-nested.c. Please fill in the various tables there.
Thanks,
M.
Hi Marc,
Thanks for taking a look.
On Mon, Nov 27, 2023 at 06:01:18PM +0000, Marc Zyngier wrote:
On Fri, 24 Nov 2023 16:34:51 +0000, Joey Gouly joey.gouly@arm.com wrote:
Define the new system registers that POE introduces and context switch them.
I would really like to see a discussion on the respective lifetimes of these two registers (see below).
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Marc Zyngier maz@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/kvm_arm.h | 4 ++-- arch/arm64/include/asm/kvm_host.h | 4 ++++ arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 ++++++++++ arch/arm64/kvm/sys_regs.c | 2 ++ 4 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index b85f46a73e21..597470e0b87b 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -346,14 +346,14 @@ */ #define __HFGRTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51)) #define __HFGRTR_EL2_MASK GENMASK(49, 0) -#define __HFGRTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGRTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50)) #define __HFGWTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51) | \ BIT(46) | BIT(42) | BIT(40) | BIT(28) | \ GENMASK(26, 25) | BIT(21) | BIT(18) | \ GENMASK(15, 14) | GENMASK(10, 9) | BIT(2)) #define __HFGWTR_EL2_MASK GENMASK(49, 0) -#define __HFGWTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGWTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50)) #define __HFGITR_EL2_RES0 GENMASK(63, 57) #define __HFGITR_EL2_MASK GENMASK(54, 0) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 824f29f04916..fa9ebd8fce40 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -401,6 +401,10 @@ enum vcpu_sysreg { PIR_EL1, /* Permission Indirection Register 1 (EL1) */ PIRE0_EL1, /* Permission Indirection Register 0 (EL1) */
- /* Permission Overlay Extension registers */
- POR_EL1, /* Permission Overlay Register 1 (EL1) */
- POR_EL0, /* Permission Overlay Register 0 (EL0) */
- /* 32bit specific registers. */ DACR32_EL2, /* Domain Access Control Register */ IFSR32_EL2, /* Instruction Fault Status Register */
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h index bb6b571ec627..22f07ee43e7e 100644 --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h @@ -19,6 +19,9 @@ static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt) { ctxt_sys_reg(ctxt, MDSCR_EL1) = read_sysreg(mdscr_el1);
- if (system_supports_poe())
ctxt_sys_reg(ctxt, POR_EL0) = read_sysreg_s(SYS_POR_EL0);
So this is saved as eagerly as it gets. Why? If it only affects EL0, it can be saved/restored in a much lazier way.
Just to confirm I understand what you mean, the current code looks like:
vcpu_load() // Lazy save
while (ret > 0) check_vcpu_requests() kvm_arm_vcpu_enter_exit() // Eager save/restore ret = handle_exit()
vcpu_put() // Lazy restore
POR_EL0 does affect EL2, if it does some form of {get,put}_user. This happens in vgic_its_process_commands (as part of handle_exit), also the stolen time code (in check_vcpu_requests) and could possibly happen if perf tries to walk the user stack.
So I think that it does need to happen eagerly, such that the host-userspace's POR_EL0 is used to access the VM's memory, not the guest-userspace's POR_EL0.
Does that make sense? It will need a comment, I agree.
} static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt) @@ -59,6 +62,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt) ctxt_sys_reg(ctxt, PIR_EL1) = read_sysreg_el1(SYS_PIR); ctxt_sys_reg(ctxt, PIRE0_EL1) = read_sysreg_el1(SYS_PIRE0);
And the fact that you only touch PIRE0_EL1 here seems to be a good indication that the above can be relaxed.
PIREO_EL1 is not directly accessible from EL0. I'll have a think about this a bit more, and if there is a potential similar issue here.
}
- if (system_supports_poe())
nit: missing new line before the if().
ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par(); ctxt_sys_reg(ctxt, TPIDR_EL1) = read_sysreg(tpidr_el1);ctxt_sys_reg(ctxt, POR_EL1) = read_sysreg_el1(SYS_POR);
@@ -89,6 +94,9 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt) static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) { write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1), mdscr_el1);
- if (system_supports_poe())
write_sysreg_s(ctxt_sys_reg(ctxt, POR_EL0), SYS_POR_EL0);
Same thing here about the eager restore.
} static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt) @@ -135,6 +143,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt) write_sysreg_el1(ctxt_sys_reg(ctxt, PIR_EL1), SYS_PIR); write_sysreg_el1(ctxt_sys_reg(ctxt, PIRE0_EL1), SYS_PIRE0); }
- if (system_supports_poe())
new line.
write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1), par_el1); write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1);write_sysreg_el1(ctxt_sys_reg(ctxt, POR_EL1), SYS_POR);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 4735e1b37fb3..a54e5eadbf29 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2269,6 +2269,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 }, { SYS_DESC(SYS_PIRE0_EL1), NULL, reset_unknown, PIRE0_EL1 }, { SYS_DESC(SYS_PIR_EL1), NULL, reset_unknown, PIR_EL1 },
- { SYS_DESC(SYS_POR_EL1), NULL, reset_unknown, POR_EL1 }, { SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
{ SYS_DESC(SYS_LORSA_EL1), trap_loregion }, @@ -2352,6 +2353,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { .access = access_pmovs, .reg = PMOVSSET_EL0, .get_user = get_pmreg, .set_user = set_pmreg },
- { SYS_DESC(SYS_POR_EL0), NULL, reset_unknown, POR_EL0 }, { SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 }, { SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 }, { SYS_DESC(SYS_TPIDR2_EL0), undef_access },
Another thing that is missing is the trap routing for NV in emulated-nested.c. Please fill in the various tables there.
Thanks,
M.
-- Without deviation from the norm, progress is not possible.
Thanks, Joey
On Wed, 29 Nov 2023 15:11:23 +0000, Joey Gouly joey.gouly@arm.com wrote:
Hi Marc,
Thanks for taking a look.
On Mon, Nov 27, 2023 at 06:01:18PM +0000, Marc Zyngier wrote:
On Fri, 24 Nov 2023 16:34:51 +0000, Joey Gouly joey.gouly@arm.com wrote:
Define the new system registers that POE introduces and context switch them.
I would really like to see a discussion on the respective lifetimes of these two registers (see below).
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Marc Zyngier maz@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/kvm_arm.h | 4 ++-- arch/arm64/include/asm/kvm_host.h | 4 ++++ arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 ++++++++++ arch/arm64/kvm/sys_regs.c | 2 ++ 4 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index b85f46a73e21..597470e0b87b 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -346,14 +346,14 @@ */ #define __HFGRTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51)) #define __HFGRTR_EL2_MASK GENMASK(49, 0) -#define __HFGRTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGRTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50)) #define __HFGWTR_EL2_RES0 (GENMASK(63, 56) | GENMASK(53, 51) | \ BIT(46) | BIT(42) | BIT(40) | BIT(28) | \ GENMASK(26, 25) | BIT(21) | BIT(18) | \ GENMASK(15, 14) | GENMASK(10, 9) | BIT(2)) #define __HFGWTR_EL2_MASK GENMASK(49, 0) -#define __HFGWTR_EL2_nMASK (GENMASK(58, 57) | GENMASK(55, 54) | BIT(50)) +#define __HFGWTR_EL2_nMASK (GENMASK(60, 57) | GENMASK(55, 54) | BIT(50)) #define __HFGITR_EL2_RES0 GENMASK(63, 57) #define __HFGITR_EL2_MASK GENMASK(54, 0) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 824f29f04916..fa9ebd8fce40 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -401,6 +401,10 @@ enum vcpu_sysreg { PIR_EL1, /* Permission Indirection Register 1 (EL1) */ PIRE0_EL1, /* Permission Indirection Register 0 (EL1) */
- /* Permission Overlay Extension registers */
- POR_EL1, /* Permission Overlay Register 1 (EL1) */
- POR_EL0, /* Permission Overlay Register 0 (EL0) */
- /* 32bit specific registers. */ DACR32_EL2, /* Domain Access Control Register */ IFSR32_EL2, /* Instruction Fault Status Register */
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h index bb6b571ec627..22f07ee43e7e 100644 --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h @@ -19,6 +19,9 @@ static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt) { ctxt_sys_reg(ctxt, MDSCR_EL1) = read_sysreg(mdscr_el1);
- if (system_supports_poe())
ctxt_sys_reg(ctxt, POR_EL0) = read_sysreg_s(SYS_POR_EL0);
So this is saved as eagerly as it gets. Why? If it only affects EL0, it can be saved/restored in a much lazier way.
Just to confirm I understand what you mean, the current code looks like:
vcpu_load() // Lazy save
while (ret > 0) check_vcpu_requests() kvm_arm_vcpu_enter_exit() // Eager save/restore ret = handle_exit()
vcpu_put() // Lazy restore
Yes, with the additional caveat that VHE and nVHE/hVHE have different views of what can be lazy or not.
POR_EL0 does affect EL2, if it does some form of {get,put}_user. This happens in vgic_its_process_commands (as part of handle_exit), also the stolen time code (in check_vcpu_requests) and could possibly happen if perf tries to walk the user stack.
So I think that it does need to happen eagerly, such that the host-userspace's POR_EL0 is used to access the VM's memory, not the guest-userspace's POR_EL0.
OK, I didn't quite see that initially, thanks for the explanation. I find it rather ugly that userspace can directly affect these functionalities, but hey, why not.
Does that make sense? It will need a comment, I agree.
Yes, that'd be good indeed.
} static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt) @@ -59,6 +62,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt) ctxt_sys_reg(ctxt, PIR_EL1) = read_sysreg_el1(SYS_PIR); ctxt_sys_reg(ctxt, PIRE0_EL1) = read_sysreg_el1(SYS_PIRE0);
And the fact that you only touch PIRE0_EL1 here seems to be a good indication that the above can be relaxed.
PIREO_EL1 is not directly accessible from EL0. I'll have a think about this a bit more, and if there is a potential similar issue here.
Ah, re-reading the spec, I see that there is a PIRE0_EL2 that is in use for VHE, meaning that in that case restoring POR_EL0 is enough to get the correct permissions. For nVHE/hVHE, this function is part of the 'eager' lot, so it doesn't matter.
So this code is correct in the end, and all that's missing is some comments and the NV stuff.
Thanks,
M.
On Fri, 24 Nov 2023 16:34:51 +0000, Joey Gouly joey.gouly@arm.com wrote:
Define the new system registers that POE introduces and context switch them.
Thinking about it some more, I don't think this is enough.
One fundamental thing that POE changes is that read permissions can now be removed from S1 by the guest. Which means that if we take a (for example) a permission fault at S2 and perform (as we do today) a "AT S1E1R" to obtain the faulting IPA, we can end-up with a failing translation because POE, under control of the guest, has removed the read permission.
Which is why FEAT_ATS1A exists, and ignores permission overlays so that we can get to the IPA.
I think this means we need to teach __translate_far_to_hpfar() about AT S1E1A
Thanks,
M.
Expose a HWCAP and ID_AA64MMFR3_EL1_S1POE to userspace, so they can be used to check if the CPU supports the feature.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- Documentation/arch/arm64/elf_hwcaps.rst | 3 +++ arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/kernel/cpufeature.c | 14 ++++++++++++++ arch/arm64/kernel/cpuinfo.c | 1 + 5 files changed, 20 insertions(+)
diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst index ced7b335e2e0..fe7350a66cea 100644 --- a/Documentation/arch/arm64/elf_hwcaps.rst +++ b/Documentation/arch/arm64/elf_hwcaps.rst @@ -317,6 +317,9 @@ HWCAP2_LRCPC3 HWCAP2_LSE128 Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0011.
+HWCAP2_POE + Functionality implied by ID_AA64MMFR3_EL1.S1POE == 0b0001. + 4. Unused AT_HWCAP bits -----------------------
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h index cd71e09ea14d..9a1aa1e5e25c 100644 --- a/arch/arm64/include/asm/hwcap.h +++ b/arch/arm64/include/asm/hwcap.h @@ -142,6 +142,7 @@ #define KERNEL_HWCAP_SVE_B16B16 __khwcap2_feature(SVE_B16B16) #define KERNEL_HWCAP_LRCPC3 __khwcap2_feature(LRCPC3) #define KERNEL_HWCAP_LSE128 __khwcap2_feature(LSE128) +#define KERNEL_HWCAP_POE __khwcap2_feature(POE)
/* * This yields a mask that user programs can use to figure out what diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h index 5023599fa278..69f09521b553 100644 --- a/arch/arm64/include/uapi/asm/hwcap.h +++ b/arch/arm64/include/uapi/asm/hwcap.h @@ -107,5 +107,6 @@ #define HWCAP2_SVE_B16B16 (1UL << 45) #define HWCAP2_LRCPC3 (1UL << 46) #define HWCAP2_LSE128 (1UL << 47) +#define HWCAP2_POE (1UL << 48)
#endif /* _UAPI__ASM_HWCAP_H */ diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 00b6d516ed3f..02169cb3b84b 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -402,6 +402,8 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = { };
static const struct arm64_ftr_bits ftr_id_aa64mmfr3[] = { + ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_POE), + FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_S1POE_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_S1PIE_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR3_EL1_TCRX_SHIFT, 4, 0), ARM64_FTR_END, @@ -2242,6 +2244,14 @@ static void cpu_enable_mops(const struct arm64_cpu_capabilities *__unused) sysreg_clear_set(sctlr_el1, 0, SCTLR_EL1_MSCEn); }
+#ifdef CONFIG_ARM64_POE +static void cpu_enable_poe(const struct arm64_cpu_capabilities *__unused) +{ + sysreg_clear_set(REG_TCR2_EL1, 0, TCR2_EL1x_E0POE); + sysreg_clear_set(CPACR_EL1, 0, CPACR_ELx_E0POE); +} +#endif + /* Internal helper functions to match cpu capability type */ static bool cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap) @@ -2737,6 +2747,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .capability = ARM64_HAS_S1POE, .type = ARM64_CPUCAP_BOOT_CPU_FEATURE, .matches = has_cpuid_feature, + .cpu_enable = cpu_enable_poe, ARM64_CPUID_FIELDS(ID_AA64MMFR3_EL1, S1POE, IMP) }, #endif @@ -2889,6 +2900,9 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = { HWCAP_CAP(ID_AA64SMFR0_EL1, BI32I32, IMP, CAP_HWCAP, KERNEL_HWCAP_SME_BI32I32), HWCAP_CAP(ID_AA64SMFR0_EL1, F32F32, IMP, CAP_HWCAP, KERNEL_HWCAP_SME_F32F32), #endif /* CONFIG_ARM64_SME */ +#ifdef CONFIG_ARM64_POE + HWCAP_CAP(ID_AA64MMFR3_EL1, S1POE, IMP, CAP_HWCAP, KERNEL_HWCAP_POE), +#endif {}, };
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index a257da7b56fe..5515c50f5219 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -130,6 +130,7 @@ static const char *const hwcap_str[] = { [KERNEL_HWCAP_SVE_B16B16] = "sveb16b16", [KERNEL_HWCAP_LRCPC3] = "lrcpc3", [KERNEL_HWCAP_LSE128] = "lse128", + [KERNEL_HWCAP_POE] = "poe", };
#ifdef CONFIG_COMPAT
On Fri, Nov 24, 2023 at 04:34:52PM +0000, Joey Gouly wrote:
+#ifdef CONFIG_ARM64_POE +static void cpu_enable_poe(const struct arm64_cpu_capabilities *__unused) +{
- sysreg_clear_set(REG_TCR2_EL1, 0, TCR2_EL1x_E0POE);
- sysreg_clear_set(CPACR_EL1, 0, CPACR_ELx_E0POE);
+} +#endif
Don't we need the TCR2_EL1x.POE bit (for EL1) enabled as well? I'm thinking of the LDXR/STXR instructions accessing user memory (the futex code).
The 3-bit POIndex is stored in the PTE at bits 60..62.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/pgtable-hwdef.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index e4944d517c99..fe270fa39110 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -178,6 +178,16 @@ #define PTE_PI_IDX_2 53 /* PXN */ #define PTE_PI_IDX_3 54 /* UXN */
+/* + * POIndex[2:0] encoding (Permission Overlay Extension) + */ +#define PTE_PO_IDX_0 (_AT(pteval_t, 1) << 60) +#define PTE_PO_IDX_1 (_AT(pteval_t, 1) << 61) +#define PTE_PO_IDX_2 (_AT(pteval_t, 1) << 62) + +#define PTE_PO_IDX_MASK GENMASK_ULL(62, 60) + + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */
Define the VM_PKEY_BIT* values for arm64, and convert them into the arm64 specific pgprot values.
Move the current values for x86 and PPC into arch/*.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/mman.h | 8 +++++++- arch/arm64/include/asm/page.h | 10 ++++++++++ arch/arm64/mm/mmap.c | 9 +++++++++ arch/powerpc/include/asm/page.h | 11 +++++++++++ arch/x86/include/asm/page.h | 10 ++++++++++ fs/proc/task_mmu.c | 2 ++ include/linux/mm.h | 13 ------------- 7 files changed, 49 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index 5966ee4a6154..ecb2d18dc4d7 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -7,7 +7,7 @@ #include <uapi/asm/mman.h>
static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, - unsigned long pkey __always_unused) + unsigned long pkey) { unsigned long ret = 0;
@@ -17,6 +17,12 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, if (system_supports_mte() && (prot & PROT_MTE)) ret |= VM_MTE;
+#if defined(CONFIG_ARCH_HAS_PKEYS) + ret |= pkey & 0x1 ? VM_PKEY_BIT0 : 0; + ret |= pkey & 0x2 ? VM_PKEY_BIT1 : 0; + ret |= pkey & 0x4 ? VM_PKEY_BIT2 : 0; +#endif + return ret; } #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 2312e6ee595f..aabfda2516d2 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -49,6 +49,16 @@ int pfn_is_map_memory(unsigned long pfn);
#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED)
+#if defined(CONFIG_ARCH_HAS_PKEYS) +/* A protection key is a 3-bit value */ +# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_2 +# define VM_PKEY_BIT0 VM_HIGH_ARCH_2 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT2 VM_HIGH_ARCH_4 +# define VM_PKEY_BIT3 0 +# define VM_PKEY_BIT4 0 +#endif + #include <asm-generic/getorder.h>
#endif diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 645fe60d000f..2e2a5a9bcfa1 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -98,6 +98,15 @@ pgprot_t vm_get_page_prot(unsigned long vm_flags) if (vm_flags & VM_MTE) prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
+#ifdef CONFIG_ARCH_HAS_PKEYS + if (vm_flags & VM_PKEY_BIT0) + prot |= PTE_PO_IDX_0; + if (vm_flags & VM_PKEY_BIT1) + prot |= PTE_PO_IDX_1; + if (vm_flags & VM_PKEY_BIT2) + prot |= PTE_PO_IDX_2; +#endif + return __pgprot(prot); } EXPORT_SYMBOL(vm_get_page_prot); diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index e5fcc79b5bfb..a5e75ec333ad 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -330,6 +330,17 @@ static inline unsigned long kaslr_offset(void) }
#include <asm-generic/memory_model.h> + +#if defined(CONFIG_ARCH_HAS_PKEYS) +# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 +/* A protection key is a 5-bit value */ +# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 +# define VM_PKEY_BIT2 VM_HIGH_ARCH_2 +# define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#endif /* CONFIG_ARCH_HAS_PKEYS */ + #endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PAGE_H */ diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index d18e5c332cb9..b770db1a21e7 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -87,5 +87,15 @@ static __always_inline u64 __is_canonical_address(u64 vaddr, u8 vaddr_bits)
#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
+#if defined(CONFIG_ARCH_HAS_PKEYS) +# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 +/* A protection key is a 4-bit value */ +# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 +# define VM_PKEY_BIT2 VM_HIGH_ARCH_2 +# define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT4 0 +#endif /* CONFIG_ARCH_HAS_PKEYS */ + #endif /* __KERNEL__ */ #endif /* _ASM_X86_PAGE_H */ diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ef2eb12906da..8c2790abeffb 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -691,7 +691,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT0)] = "", [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", +#if VM_PKEY_BIT3 [ilog2(VM_PKEY_BIT3)] = "", +#endif #if VM_PKEY_BIT4 [ilog2(VM_PKEY_BIT4)] = "", #endif diff --git a/include/linux/mm.h b/include/linux/mm.h index 418d26608ece..47f42d9893fe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -328,19 +328,6 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
-#ifdef CONFIG_ARCH_HAS_PKEYS -# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 -# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ -# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 /* on x86 and 5-bit value on ppc64 */ -# define VM_PKEY_BIT2 VM_HIGH_ARCH_2 -# define VM_PKEY_BIT3 VM_HIGH_ARCH_3 -#ifdef CONFIG_PPC -# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 -#else -# define VM_PKEY_BIT4 0 -#endif -#endif /* CONFIG_ARCH_HAS_PKEYS */ - #ifdef CONFIG_X86_USER_SHADOW_STACK /* * VM_SHADOW_STACK should not be set with VM_SHARED because of lack of
On Fri, Nov 24, 2023 at 04:34:54PM +0000, Joey Gouly wrote:
arch/arm64/include/asm/mman.h | 8 +++++++- arch/arm64/include/asm/page.h | 10 ++++++++++ arch/arm64/mm/mmap.c | 9 +++++++++ arch/powerpc/include/asm/page.h | 11 +++++++++++ arch/x86/include/asm/page.h | 10 ++++++++++ fs/proc/task_mmu.c | 2 ++ include/linux/mm.h | 13 ------------- 7 files changed, 49 insertions(+), 14 deletions(-)
It might be worth splitting out the powerpc/x86/generic parts into a separate patch.
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index 5966ee4a6154..ecb2d18dc4d7 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -7,7 +7,7 @@ #include <uapi/asm/mman.h> static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
- unsigned long pkey __always_unused)
- unsigned long pkey)
{ unsigned long ret = 0; @@ -17,6 +17,12 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, if (system_supports_mte() && (prot & PROT_MTE)) ret |= VM_MTE; +#if defined(CONFIG_ARCH_HAS_PKEYS)
- ret |= pkey & 0x1 ? VM_PKEY_BIT0 : 0;
- ret |= pkey & 0x2 ? VM_PKEY_BIT1 : 0;
- ret |= pkey & 0x4 ? VM_PKEY_BIT2 : 0;
+#endif
Is there anywhere that rejects pkey & 8 on arm64? Because with 128-bit PTEs, if we ever support them, we can have 4-bit pkeys.
#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 2312e6ee595f..aabfda2516d2 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -49,6 +49,16 @@ int pfn_is_map_memory(unsigned long pfn); #define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED) +#if defined(CONFIG_ARCH_HAS_PKEYS) +/* A protection key is a 3-bit value */ +# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_2 +# define VM_PKEY_BIT0 VM_HIGH_ARCH_2 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT2 VM_HIGH_ARCH_4 +# define VM_PKEY_BIT3 0 +# define VM_PKEY_BIT4 0 +#endif
I think we should start from VM_HIGH_ARCH_BIT_0 and just move the VM_MTE, VM_MTE_ALLOWED to VM_HIGH_ARCH_BIT_{4,5}.
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index e5fcc79b5bfb..a5e75ec333ad 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -330,6 +330,17 @@ static inline unsigned long kaslr_offset(void) } #include <asm-generic/memory_model.h>
+#if defined(CONFIG_ARCH_HAS_PKEYS) +# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 +/* A protection key is a 5-bit value */ +# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 +# define VM_PKEY_BIT2 VM_HIGH_ARCH_2 +# define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#endif /* CONFIG_ARCH_HAS_PKEYS */
#endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PAGE_H */ diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index d18e5c332cb9..b770db1a21e7 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -87,5 +87,15 @@ static __always_inline u64 __is_canonical_address(u64 vaddr, u8 vaddr_bits) #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA +#if defined(CONFIG_ARCH_HAS_PKEYS) +# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 +/* A protection key is a 4-bit value */ +# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 +# define VM_PKEY_BIT2 VM_HIGH_ARCH_2 +# define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT4 0 +#endif /* CONFIG_ARCH_HAS_PKEYS */
Rather than moving these to arch code, we could instead keep the generic definitions with some additional CONFIG_ARCH_HAS_PKEYS_{3,4,5}BIT selected from the arch code.
When a PTE is modified, the POIndex must be masked off so that it can be modified.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/pgtable.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b19a8aee684c..e45105336ca0 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -828,7 +828,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY | PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP | - PTE_ATTRINDX_MASK; + PTE_ATTRINDX_MASK | PTE_PO_IDX_MASK; + /* preserve the hardware dirty information */ if (pte_hw_dirty(pte)) pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
On Fri, Nov 24, 2023 at 04:34:55PM +0000, Joey Gouly wrote:
When a PTE is modified, the POIndex must be masked off so that it can be modified.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
Enable the ARCH_HAS_PKEYS config, but provide dummy functions for the entire interface.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/Kconfig | 2 ++ arch/arm64/include/asm/pkeys.h | 54 ++++++++++++++++++++++++++++++++++ arch/arm64/mm/mmu.c | 7 +++++ 3 files changed, 63 insertions(+) create mode 100644 arch/arm64/include/asm/pkeys.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index d7df6c603190..72a71a9834dd 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2082,6 +2082,8 @@ menu "ARMv8.9 architectural features" config ARM64_POE prompt "Permission Overlay Extension" def_bool y + select ARCH_USES_HIGH_VMA_FLAGS + select ARCH_HAS_PKEYS help The Permission Overlay Extension is used to implement Memory Protection Keys. Memory Protection Keys provides a mechanism for diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h new file mode 100644 index 000000000000..5761fb48fd53 --- /dev/null +++ b/arch/arm64/include/asm/pkeys.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023 Arm Ltd. + * + * Based on arch/x86/include/asm/pkeys.h +*/ + +#ifndef _ASM_ARM64_PKEYS_H +#define _ASM_ARM64_PKEYS_H + +#define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2) + +#define arch_max_pkey() 0 + +int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, + unsigned long init_val); + +static inline bool arch_pkeys_enabled(void) +{ + return false; +} + +static inline int vma_pkey(struct vm_area_struct *vma) +{ + return -1; +} + +static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, + int prot, int pkey) +{ + return -1; +} + +static inline int execute_only_pkey(struct mm_struct *mm) +{ + return -1; +} + +static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) +{ + return false; +} + +static inline int mm_pkey_alloc(struct mm_struct *mm) +{ + return -1; +} + +static inline int mm_pkey_free(struct mm_struct *mm, int pkey) +{ + return -EINVAL; +} + +#endif /* _ASM_ARM64_PKEYS_H */ diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 15f6347d23b6..f7bf41eae904 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1486,3 +1486,10 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte { set_pte_at(vma->vm_mm, addr, ptep, pte); } + +#ifdef CONFIG_ARCH_HAS_PKEYS +int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) +{ + return -ENOSPC; +} +#endif
On Fri, Nov 24, 2023 at 04:34:56PM +0000, Joey Gouly wrote:
diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h new file mode 100644 index 000000000000..5761fb48fd53 --- /dev/null +++ b/arch/arm64/include/asm/pkeys.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- Copyright (C) 2023 Arm Ltd.
- Based on arch/x86/include/asm/pkeys.h
+*/
+#ifndef _ASM_ARM64_PKEYS_H +#define _ASM_ARM64_PKEYS_H
+#define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2)
+#define arch_max_pkey() 0
+int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
+static inline bool arch_pkeys_enabled(void) +{
- return false;
+}
+static inline int vma_pkey(struct vm_area_struct *vma) +{
- return -1;
+}
What's the point of these dummies? I guess they'll be populated later but I haven't reached that point. Could we not just leave them out for now and add the complete version directly? This would work better with an earlier comment to move the Kconfig entry towards the end of the series.
Also, they don't seem to match the generic include/linux/pkeys.h dummies. For example, vma_pkey() returns 0 in the generic version, -1 here. Should they actually match?
Hi,
Thanks to you and Mark for the comments so far!
On Thu, Dec 07, 2023 at 03:25:17PM +0000, Catalin Marinas wrote:
On Fri, Nov 24, 2023 at 04:34:56PM +0000, Joey Gouly wrote:
diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h new file mode 100644 index 000000000000..5761fb48fd53 --- /dev/null +++ b/arch/arm64/include/asm/pkeys.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- Copyright (C) 2023 Arm Ltd.
- Based on arch/x86/include/asm/pkeys.h
+*/
+#ifndef _ASM_ARM64_PKEYS_H +#define _ASM_ARM64_PKEYS_H
+#define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2)
+#define arch_max_pkey() 0
+int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
+static inline bool arch_pkeys_enabled(void) +{
- return false;
+}
+static inline int vma_pkey(struct vm_area_struct *vma) +{
- return -1;
+}
What's the point of these dummies? I guess they'll be populated later but I haven't reached that point. Could we not just leave them out for now and add the complete version directly? This would work better with an earlier comment to move the Kconfig entry towards the end of the series.
I think the suggestion to move the Kconfig to the end is good, and I agree that it will probably remove these dummy implementations.
Also, they don't seem to match the generic include/linux/pkeys.h dummies. For example, vma_pkey() returns 0 in the generic version, -1 here. Should they actually match?
If for some reason I need to keep them after the Kconfig move, I will look into this further.
Thanks, Joey
If a memory fault occurs that is due to an overlay/pkey fault, report that to userspace with a SEGV_PKUERR.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/traps.h | 1 + arch/arm64/kernel/traps.c | 12 ++++++++-- arch/arm64/mm/fault.c | 44 +++++++++++++++++++++++++++++++--- 3 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h index eefe766d6161..f6f6f2cb7f10 100644 --- a/arch/arm64/include/asm/traps.h +++ b/arch/arm64/include/asm/traps.h @@ -25,6 +25,7 @@ try_emulate_armv8_deprecated(struct pt_regs *regs, u32 insn) void force_signal_inject(int signal, int code, unsigned long address, unsigned long err); void arm64_notify_segfault(unsigned long addr); void arm64_force_sig_fault(int signo, int code, unsigned long far, const char *str); +void arm64_force_sig_fault_pkey(int signo, int code, unsigned long far, const char *str, int pkey); void arm64_force_sig_mceerr(int code, unsigned long far, short lsb, const char *str); void arm64_force_sig_ptrace_errno_trap(int errno, unsigned long far, const char *str);
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 215e6d7f2df8..1bac6c84d3f5 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -263,16 +263,24 @@ static void arm64_show_signal(int signo, const char *str) __show_regs(regs); }
-void arm64_force_sig_fault(int signo, int code, unsigned long far, - const char *str) +void arm64_force_sig_fault_pkey(int signo, int code, unsigned long far, + const char *str, int pkey) { arm64_show_signal(signo, str); if (signo == SIGKILL) force_sig(SIGKILL); + else if (code == SEGV_PKUERR) + force_sig_pkuerr((void __user *)far, pkey); else force_sig_fault(signo, code, (void __user *)far); }
+void arm64_force_sig_fault(int signo, int code, unsigned long far, + const char *str) +{ + arm64_force_sig_fault_pkey(signo, code, far, str, 0); +} + void arm64_force_sig_mceerr(int code, unsigned long far, short lsb, const char *str) { diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 460d799e1296..efd263f56da7 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -23,6 +23,7 @@ #include <linux/sched/debug.h> #include <linux/highmem.h> #include <linux/perf_event.h> +#include <linux/pkeys.h> #include <linux/preempt.h> #include <linux/hugetlb.h>
@@ -497,6 +498,23 @@ static void do_bad_area(unsigned long far, unsigned long esr, #define VM_FAULT_BADMAP ((__force vm_fault_t)0x010000) #define VM_FAULT_BADACCESS ((__force vm_fault_t)0x020000)
+static bool fault_from_pkey(unsigned long esr, struct vm_area_struct *vma, + unsigned int mm_flags) +{ + unsigned long iss2 = ESR_ELx_ISS2(esr); + + if (!arch_pkeys_enabled()) + return false; + + if (iss2 & ESR_ELx_Overlay) + return true; + + return !arch_vma_access_permitted(vma, + mm_flags & FAULT_FLAG_WRITE, + mm_flags & FAULT_FLAG_INSTRUCTION, + mm_flags & FAULT_FLAG_REMOTE); +} + static vm_fault_t __do_page_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned int mm_flags, unsigned long vm_flags, @@ -688,9 +706,29 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, * Something tried to access memory that isn't in our memory * map. */ - arm64_force_sig_fault(SIGSEGV, - fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR, - far, inf->name); + int fault_kind; + /* + * The pkey value that we return to userspace can be different + * from the pkey that caused the fault. + * + * 1. T1 : mprotect_key(foo, PAGE_SIZE, pkey=4); + * 2. T1 : set AMR to deny access to pkey=4, touches, page + * 3. T1 : faults... + * 4. T2: mprotect_key(foo, PAGE_SIZE, pkey=5); + * 5. T1 : enters fault handler, takes mmap_lock, etc... + * 6. T1 : reaches here, sees vma_pkey(vma)=5, when we really + * faulted on a pte with its pkey=4. + */ + int pkey = vma_pkey(vma); + + if (fault_from_pkey(esr, vma, mm_flags)) + fault_kind = SEGV_PKUERR; + else + fault_kind = fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR; + + arm64_force_sig_fault_pkey(SIGSEGV, + fault_kind, + far, inf->name, pkey); }
return 0;
On Fri, Nov 24, 2023 at 04:34:57PM +0000, Joey Gouly wrote:
@@ -497,6 +498,23 @@ static void do_bad_area(unsigned long far, unsigned long esr, #define VM_FAULT_BADMAP ((__force vm_fault_t)0x010000) #define VM_FAULT_BADACCESS ((__force vm_fault_t)0x020000) +static bool fault_from_pkey(unsigned long esr, struct vm_area_struct *vma,
unsigned int mm_flags)
+{
- unsigned long iss2 = ESR_ELx_ISS2(esr);
- if (!arch_pkeys_enabled())
return false;
- if (iss2 & ESR_ELx_Overlay)
return true;
- return !arch_vma_access_permitted(vma,
mm_flags & FAULT_FLAG_WRITE,
mm_flags & FAULT_FLAG_INSTRUCTION,
mm_flags & FAULT_FLAG_REMOTE);
+}
Do we actually need this additional arch_vma_access_permitted() check? The ESR should tell us if it was a POE fault. Permission overlay faults have priority over the base permission faults, so we'd not need to fall back to this additional checks. Well, see below, we could do something slightly smarter here.
I can see x86 and powerpc have similar checks (though at a different point under the mmap lock) but I'm not familiar with their exception model, exception priorities.
static vm_fault_t __do_page_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned int mm_flags, unsigned long vm_flags, @@ -688,9 +706,29 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, * Something tried to access memory that isn't in our memory * map. */
arm64_force_sig_fault(SIGSEGV,
fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR,
far, inf->name);
int fault_kind;
/*
* The pkey value that we return to userspace can be different
* from the pkey that caused the fault.
*
* 1. T1 : mprotect_key(foo, PAGE_SIZE, pkey=4);
* 2. T1 : set AMR to deny access to pkey=4, touches, page
* 3. T1 : faults...
* 4. T2: mprotect_key(foo, PAGE_SIZE, pkey=5);
* 5. T1 : enters fault handler, takes mmap_lock, etc...
* 6. T1 : reaches here, sees vma_pkey(vma)=5, when we really
* faulted on a pte with its pkey=4.
*/
int pkey = vma_pkey(vma);
Other than the vma_pkey() race, I'm more worried about the vma completely disappearing, e.g. munmap() in another thread. We end up dereferencing a free pointer here. We need to do this under the mmap lock.
Since we need to do this check under the mmap lock, we could add an additional check to see if the pkey fault we got was a racy and just restart the instruction if it no longer faults according to por_el0_allows_pkey(). However, the code below is too late in the fault handling to be able to do much other than SIGSEGV.
if (fault_from_pkey(esr, vma, mm_flags))
fault_kind = SEGV_PKUERR;
else
fault_kind = fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR;
arm64_force_sig_fault_pkey(SIGSEGV,
fault_kind,
}far, inf->name, pkey);
return 0;
Hi,
On Mon, Dec 11, 2023 at 06:18:17PM +0000, Catalin Marinas wrote:
On Fri, Nov 24, 2023 at 04:34:57PM +0000, Joey Gouly wrote:
@@ -497,6 +498,23 @@ static void do_bad_area(unsigned long far, unsigned long esr, #define VM_FAULT_BADMAP ((__force vm_fault_t)0x010000) #define VM_FAULT_BADACCESS ((__force vm_fault_t)0x020000) +static bool fault_from_pkey(unsigned long esr, struct vm_area_struct *vma,
unsigned int mm_flags)
+{
- unsigned long iss2 = ESR_ELx_ISS2(esr);
- if (!arch_pkeys_enabled())
return false;
- if (iss2 & ESR_ELx_Overlay)
return true;
- return !arch_vma_access_permitted(vma,
mm_flags & FAULT_FLAG_WRITE,
mm_flags & FAULT_FLAG_INSTRUCTION,
mm_flags & FAULT_FLAG_REMOTE);
+}
Do we actually need this additional arch_vma_access_permitted() check? The ESR should tell us if it was a POE fault. Permission overlay faults have priority over the base permission faults, so we'd not need to fall back to this additional checks. Well, see below, we could do something slightly smarter here.
We want this here as it follows other arch's which will fail with a pkey fault even if the page isn't actually mapped. If the paged isn't mapped we'd get a translation fault, but since we know the type of access and have the pkey in the VMA we check it here.
I can see x86 and powerpc have similar checks (though at a different point under the mmap lock) but I'm not familiar with their exception model, exception priorities.
static vm_fault_t __do_page_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned int mm_flags, unsigned long vm_flags, @@ -688,9 +706,29 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr, * Something tried to access memory that isn't in our memory * map. */
arm64_force_sig_fault(SIGSEGV,
fault == VM_FAULT_BADACCESS ? SEGV_ACCERR : SEGV_MAPERR,
far, inf->name);
int fault_kind;
/*
* The pkey value that we return to userspace can be different
* from the pkey that caused the fault.
*
* 1. T1 : mprotect_key(foo, PAGE_SIZE, pkey=4);
* 2. T1 : set AMR to deny access to pkey=4, touches, page
* 3. T1 : faults...
* 4. T2: mprotect_key(foo, PAGE_SIZE, pkey=5);
* 5. T1 : enters fault handler, takes mmap_lock, etc...
* 6. T1 : reaches here, sees vma_pkey(vma)=5, when we really
* faulted on a pte with its pkey=4.
*/
int pkey = vma_pkey(vma);
Other than the vma_pkey() race, I'm more worried about the vma completely disappearing, e.g. munmap() in another thread. We end up dereferencing a free pointer here. We need to do this under the mmap lock.
Since we need to do this check under the mmap lock, we could add an additional check to see if the pkey fault we got was a racy and just restart the instruction if it no longer faults according to por_el0_allows_pkey(). However, the code below is too late in the fault handling to be able to do much other than SIGSEGV.
After discussing in person, I agree with the assesment that this is the wrong place to do the check, and after looking at the x86 arch code, I see how they're doing it earlier.
Thanks, Joey
These functions will be extended to support pkeys.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/mmu_context.h | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index 9ce4200508b1..16b48fe9353f 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h @@ -20,7 +20,6 @@ #include <asm/cpufeature.h> #include <asm/daifflags.h> #include <asm/proc-fns.h> -#include <asm-generic/mm_hooks.h> #include <asm/cputype.h> #include <asm/sysreg.h> #include <asm/tlbflush.h> @@ -215,6 +214,20 @@ init_new_context(struct task_struct *tsk, struct mm_struct *mm) return 0; }
+static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) +{ + return 0; +} + +static inline void arch_exit_mmap(struct mm_struct *mm) +{ +} + +static inline void arch_unmap(struct mm_struct *mm, + unsigned long start, unsigned long end) +{ +} + #ifdef CONFIG_ARM64_SW_TTBR0_PAN static inline void update_saved_ttbr0(struct task_struct *tsk, struct mm_struct *mm) @@ -304,6 +317,12 @@ static inline unsigned long mm_untag_mask(struct mm_struct *mm) return -1UL >> 8; }
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, + bool write, bool execute, bool foreign) +{ + return true; +} + #include <asm-generic/mmu_context.h>
#endif /* !__ASSEMBLY__ */
On Fri, Nov 24, 2023 at 04:34:58PM +0000, Joey Gouly wrote:
These functions will be extended to support pkeys.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
Nit: can you squash it into patch 14 introducing those functions? Less patches to carry.
Implement the PKEYS interface, using the Permission Overlay Extension.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/mmu.h | 2 + arch/arm64/include/asm/mmu_context.h | 32 ++++++++++++- arch/arm64/include/asm/pgtable.h | 23 +++++++++- arch/arm64/include/asm/pkeys.h | 68 +++++++++++++++++++++++++--- arch/arm64/include/asm/por.h | 33 ++++++++++++++ arch/arm64/mm/mmu.c | 35 +++++++++++++- 6 files changed, 184 insertions(+), 9 deletions(-) create mode 100644 arch/arm64/include/asm/por.h
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 2fcf51231d6e..55338b14b453 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -25,6 +25,8 @@ typedef struct { refcount_t pinned; void *vdso; unsigned long flags; + + u8 pkey_allocation_map; } mm_context_t;
/* diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index 16b48fe9353f..3fc739fb831c 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h @@ -15,6 +15,7 @@ #include <linux/sched/hotplug.h> #include <linux/mm_types.h> #include <linux/pgtable.h> +#include <linux/pkeys.h>
#include <asm/cacheflush.h> #include <asm/cpufeature.h> @@ -211,11 +212,24 @@ init_new_context(struct task_struct *tsk, struct mm_struct *mm) { atomic64_set(&mm->context.id, 0); refcount_set(&mm->context.pinned, 0); + + // pkey 0 is the default, so always reserve it. + mm->context.pkey_allocation_map = 0x1; + return 0; }
+static inline void arch_dup_pkeys(struct mm_struct *oldmm, + struct mm_struct *mm) +{ + /* Duplicate the oldmm pkey state in mm: */ + mm->context.pkey_allocation_map = oldmm->context.pkey_allocation_map; +} + static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) { + arch_dup_pkeys(oldmm, mm); + return 0; }
@@ -317,10 +331,26 @@ static inline unsigned long mm_untag_mask(struct mm_struct *mm) return -1UL >> 8; }
+/* + * We only want to enforce protection keys on the current process + * because we effectively have no access to POR_EL0 for other + * processes or any way to tell *which * POR_EL0 in a threaded + * process we could use. + * + * So do not enforce things if the VMA is not from the current + * mm, or if we are in a kernel thread. + */ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write, bool execute, bool foreign) { - return true; + if (!arch_pkeys_enabled()) + return true; + + /* allow access if the VMA is not one from this process */ + if (foreign || vma_is_foreign(vma)) + return true; + + return por_el0_allows_pkey(vma_pkey(vma), write, execute); }
#include <asm-generic/mmu_context.h> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index e45105336ca0..789c88b138f5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -30,6 +30,7 @@
#include <asm/cmpxchg.h> #include <asm/fixmap.h> +#include <asm/por.h> #include <linux/mmdebug.h> #include <linux/mm_types.h> #include <linux/sched.h> @@ -143,6 +144,24 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) #define pte_accessible(mm, pte) \ (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte))
+static inline bool por_el0_allows_pkey(u8 pkey, bool write, bool execute) +{ + u64 por; + + if (!system_supports_poe()) + return true; + + por = read_sysreg_s(SYS_POR_EL0); + + if (write) + return por_elx_allows_write(por, pkey); + + if (execute) + return por_elx_allows_exec(por, pkey); + + return por_elx_allows_read(por, pkey); +} + /* * p??_access_permitted() is true for valid user mappings (PTE_USER * bit set, subject to the write permission check). For execute-only @@ -151,7 +170,9 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) * PTE_VALID bit set. */ #define pte_access_permitted(pte, write) \ - (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && (!(write) || pte_write(pte))) + (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && \ + (!(write) || pte_write(pte)) && \ + por_el0_allows_pkey(FIELD_GET(PTE_PO_IDX_MASK, pte_val(pte)), write, false)) #define pmd_access_permitted(pmd, write) \ (pte_access_permitted(pmd_pte(pmd), (write))) #define pud_access_permitted(pud, write) \ diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h index 5761fb48fd53..a80c654da93d 100644 --- a/arch/arm64/include/asm/pkeys.h +++ b/arch/arm64/include/asm/pkeys.h @@ -10,7 +10,7 @@
#define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2)
-#define arch_max_pkey() 0 +#define arch_max_pkey() 7
int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); @@ -22,33 +22,89 @@ static inline bool arch_pkeys_enabled(void)
static inline int vma_pkey(struct vm_area_struct *vma) { - return -1; + return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT; }
static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey) { - return -1; + if (pkey != -1) + return pkey; + + return vma_pkey(vma); }
static inline int execute_only_pkey(struct mm_struct *mm) { + // Execute-only mappings are handled by EPAN/FEAT_PAN3. + WARN_ON_ONCE(!cpus_have_final_cap(ARM64_HAS_EPAN)); + return -1; }
+#define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map) +#define mm_set_pkey_allocated(mm, pkey) do { \ + mm_pkey_allocation_map(mm) |= (1U << pkey); \ +} while (0) +#define mm_set_pkey_free(mm, pkey) do { \ + mm_pkey_allocation_map(mm) &= ~(1U << pkey); \ +} while (0) + static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { - return false; + /* + * "Allocated" pkeys are those that have been returned + * from pkey_alloc() or pkey 0 which is allocated + * implicitly when the mm is created. + */ + if (pkey < 0) + return false; + if (pkey >= arch_max_pkey()) + return false; + + return mm_pkey_allocation_map(mm) & (1U << pkey); }
+/* + * Returns a positive, 3-bit key on success, or -1 on failure. + */ static inline int mm_pkey_alloc(struct mm_struct *mm) { - return -1; + /* + * Note: this is the one and only place we make sure + * that the pkey is valid as far as the hardware is + * concerned. The rest of the kernel trusts that + * only good, valid pkeys come out of here. + */ + u8 all_pkeys_mask = ((1U << arch_max_pkey()) - 1); + int ret; + + if (!arch_pkeys_enabled()) + return -1; + + /* + * Are we out of pkeys? We must handle this specially + * because ffz() behavior is undefined if there are no + * zeros. + */ + if (mm_pkey_allocation_map(mm) == all_pkeys_mask) + return -1; + + ret = ffz(mm_pkey_allocation_map(mm)); + + mm_set_pkey_allocated(mm, ret); + + return ret; }
static inline int mm_pkey_free(struct mm_struct *mm, int pkey) { - return -EINVAL; + if (!mm_pkey_is_allocated(mm, pkey)) + return -EINVAL; + + mm_set_pkey_free(mm, pkey); + + return 0; }
#endif /* _ASM_ARM64_PKEYS_H */ diff --git a/arch/arm64/include/asm/por.h b/arch/arm64/include/asm/por.h new file mode 100644 index 000000000000..90484dae9920 --- /dev/null +++ b/arch/arm64/include/asm/por.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023 Arm Ltd. +*/ + +#ifndef _ASM_ARM64_POR_H +#define _ASM_ARM64_POR_H + +#define POR_BITS_PER_PKEY 4 +#define POR_ELx_IDX(por_elx, idx) (((por_elx) >> (idx * POR_BITS_PER_PKEY)) & 0xf) + +static inline bool por_elx_allows_read(u64 por, u8 pkey) +{ + u8 perm = POR_ELx_IDX(por, pkey); + + return perm & POE_R; +} + +static inline bool por_elx_allows_write(u64 por, u8 pkey) +{ + u8 perm = POR_ELx_IDX(por, pkey); + + return perm & POE_W; +} + +static inline bool por_elx_allows_exec(u64 por, u8 pkey) +{ + u8 perm = POR_ELx_IDX(por, pkey); + + return perm & POE_X; +} + +#endif /* _ASM_ARM64_POR_H */ diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index f7bf41eae904..c255345e6a6a 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -25,6 +25,7 @@ #include <linux/vmalloc.h> #include <linux/set_memory.h> #include <linux/kfence.h> +#include <linux/pkeys.h>
#include <asm/barrier.h> #include <asm/cputype.h> @@ -1490,6 +1491,38 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte #ifdef CONFIG_ARCH_HAS_PKEYS int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { - return -ENOSPC; + u64 new_por = POE_RXW; + u64 old_por; + u64 pkey_shift; + + if (!arch_pkeys_enabled()) + return -ENOSPC; + + /* + * This code should only be called with valid 'pkey' + * values originating from in-kernel users. Complain + * if a bad value is observed. + */ + if (WARN_ON_ONCE(pkey >= arch_max_pkey())) + return -EINVAL; + + /* Set the bits we need in POR: */ + if (init_val & PKEY_DISABLE_ACCESS) + new_por = POE_X; + else if (init_val & PKEY_DISABLE_WRITE) + new_por = POE_RX; + + /* Shift the bits in to the correct place in POR for pkey: */ + pkey_shift = pkey * POR_BITS_PER_PKEY; + new_por <<= pkey_shift; + + /* Get old POR and mask off any old bits in place: */ + old_por = read_sysreg_s(SYS_POR_EL0); + old_por &= ~(POE_MASK << pkey_shift); + + /* Write old part along with new part: */ + write_sysreg_s(old_por | new_por, SYS_POR_EL0); + + return 0; } #endif
On Fri, Nov 24, 2023 at 04:34:59PM +0000, Joey Gouly wrote:
@@ -211,11 +212,24 @@ init_new_context(struct task_struct *tsk, struct mm_struct *mm) { atomic64_set(&mm->context.id, 0); refcount_set(&mm->context.pinned, 0);
- // pkey 0 is the default, so always reserve it.
- mm->context.pkey_allocation_map = 0x1;
Nit: use /* */ style comments.
@@ -151,7 +170,9 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
- PTE_VALID bit set.
*/ #define pte_access_permitted(pte, write) \
- (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && (!(write) || pte_write(pte)))
- (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && \
(!(write) || pte_write(pte)) && \
por_el0_allows_pkey(FIELD_GET(PTE_PO_IDX_MASK, pte_val(pte)), write, false))
Do not change pte_access_permitted(), just let it handle the base permissions. This check is about the mm tables, not some current POR_EL0 setting of the thread.
As an example, with this change Linux may decide not to clear the MTE tags just because the current POR_EL0 says no-access. The thread subsequently changes POR_EL0 and it can read the stale tags.
I haven't checked what x86 and powerpc do here. There may be some implications on GUP but I'd rather ignore POE for this case.
#define pmd_access_permitted(pmd, write) \ (pte_access_permitted(pmd_pte(pmd), (write))) #define pud_access_permitted(pud, write) \ diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h index 5761fb48fd53..a80c654da93d 100644 --- a/arch/arm64/include/asm/pkeys.h +++ b/arch/arm64/include/asm/pkeys.h
[...]
static inline int execute_only_pkey(struct mm_struct *mm) {
- // Execute-only mappings are handled by EPAN/FEAT_PAN3.
- WARN_ON_ONCE(!cpus_have_final_cap(ARM64_HAS_EPAN));
- return -1;
}
Why the WARN_ON_ONCE() here? It will trigger if the user asks for PROT_EXEC and I can't see any subsequent patch that changes the core code not to call it. I think we need some arch_has_execute_only_pkey() to avoid going on this path. Our arch would support exec-only with any pkey.
@@ -1490,6 +1491,38 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte #ifdef CONFIG_ARCH_HAS_PKEYS int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) {
- return -ENOSPC;
- u64 new_por = POE_RXW;
- u64 old_por;
- u64 pkey_shift;
- if (!arch_pkeys_enabled())
return -ENOSPC;
- /*
* This code should only be called with valid 'pkey'
* values originating from in-kernel users. Complain
* if a bad value is observed.
*/
- if (WARN_ON_ONCE(pkey >= arch_max_pkey()))
return -EINVAL;
- /* Set the bits we need in POR: */
- if (init_val & PKEY_DISABLE_ACCESS)
new_por = POE_X;
Does PKEY_DISABLE_ACCESS mean allow execute? Or does x86 not have a way to disable execution?
- else if (init_val & PKEY_DISABLE_WRITE)
new_por = POE_RX;
- /* Shift the bits in to the correct place in POR for pkey: */
- pkey_shift = pkey * POR_BITS_PER_PKEY;
- new_por <<= pkey_shift;
- /* Get old POR and mask off any old bits in place: */
- old_por = read_sysreg_s(SYS_POR_EL0);
- old_por &= ~(POE_MASK << pkey_shift);
- /* Write old part along with new part: */
- write_sysreg_s(old_por | new_por, SYS_POR_EL0);
- return 0;
} #endif
On Mon, Dec 11, 2023 at 06:49:37PM +0000, Catalin Marinas wrote:
On Fri, Nov 24, 2023 at 04:34:59PM +0000, Joey Gouly wrote:
@@ -211,11 +212,24 @@ init_new_context(struct task_struct *tsk, struct mm_struct *mm) { atomic64_set(&mm->context.id, 0); refcount_set(&mm->context.pinned, 0);
- // pkey 0 is the default, so always reserve it.
- mm->context.pkey_allocation_map = 0x1;
Nit: use /* */ style comments.
@@ -151,7 +170,9 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
- PTE_VALID bit set.
*/ #define pte_access_permitted(pte, write) \
- (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && (!(write) || pte_write(pte)))
- (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && \
(!(write) || pte_write(pte)) && \
por_el0_allows_pkey(FIELD_GET(PTE_PO_IDX_MASK, pte_val(pte)), write, false))
Do not change pte_access_permitted(), just let it handle the base permissions. This check is about the mm tables, not some current POR_EL0 setting of the thread.
As an example, with this change Linux may decide not to clear the MTE tags just because the current POR_EL0 says no-access. The thread subsequently changes POR_EL0 and it can read the stale tags.
I haven't checked what x86 and powerpc do here. There may be some implications on GUP but I'd rather ignore POE for this case.
There are tests in tools/testing/selftests/mm/protection_keys.c test_kernel_gup_of_access_disabled_region etc, that explicitly check GUP is affected by pkeys. So if we didn't modify pte_access_permitted() it would fail the test and work differently than the other architectures. For MTE/__sync_cache_and_tags, I think could add a pte_access_permitted_no_poe().
#define pmd_access_permitted(pmd, write) \ (pte_access_permitted(pmd_pte(pmd), (write))) #define pud_access_permitted(pud, write) \ diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h index 5761fb48fd53..a80c654da93d 100644 --- a/arch/arm64/include/asm/pkeys.h +++ b/arch/arm64/include/asm/pkeys.h
[...]
static inline int execute_only_pkey(struct mm_struct *mm) {
- // Execute-only mappings are handled by EPAN/FEAT_PAN3.
- WARN_ON_ONCE(!cpus_have_final_cap(ARM64_HAS_EPAN));
- return -1;
}
Why the WARN_ON_ONCE() here? It will trigger if the user asks for PROT_EXEC and I can't see any subsequent patch that changes the core code not to call it. I think we need some arch_has_execute_only_pkey() to avoid going on this path. Our arch would support exec-only with any pkey.
This would warn if you somehow had a CPU with FEAT_POE but not FEAT_PAN3. So I don't really need the WARN() here, if the CPU supports FEAT_POE it should also support FEAT_PAN3. Arm v8.7 made FEAT_PAN3 mandatory.
@@ -1490,6 +1491,38 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte #ifdef CONFIG_ARCH_HAS_PKEYS int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) {
- return -ENOSPC;
- u64 new_por = POE_RXW;
- u64 old_por;
- u64 pkey_shift;
- if (!arch_pkeys_enabled())
return -ENOSPC;
- /*
* This code should only be called with valid 'pkey'
* values originating from in-kernel users. Complain
* if a bad value is observed.
*/
- if (WARN_ON_ONCE(pkey >= arch_max_pkey()))
return -EINVAL;
- /* Set the bits we need in POR: */
- if (init_val & PKEY_DISABLE_ACCESS)
new_por = POE_X;
Does PKEY_DISABLE_ACCESS mean allow execute? Or does x86 not have a way to disable execution?
As I understand it X86 can either disable Read and Write or disable Write. No way to disable execution.
The man page says: PKEY_DISABLE_ACCESS Disable all data access to memory covered by the returned protection key.
- else if (init_val & PKEY_DISABLE_WRITE)
new_por = POE_RX;
- /* Shift the bits in to the correct place in POR for pkey: */
- pkey_shift = pkey * POR_BITS_PER_PKEY;
- new_por <<= pkey_shift;
- /* Get old POR and mask off any old bits in place: */
- old_por = read_sysreg_s(SYS_POR_EL0);
- old_por &= ~(POE_MASK << pkey_shift);
- /* Write old part along with new part: */
- write_sysreg_s(old_por | new_por, SYS_POR_EL0);
- return 0;
} #endif
Thanks, Joey
Add PKEY support to signals, by saving and restoring POR_EL0 from the stackframe.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Reviewed-by: Mark Brown broonie@kernel.org --- arch/arm64/include/uapi/asm/sigcontext.h | 7 ++++ arch/arm64/kernel/signal.c | 51 ++++++++++++++++++++++++ 2 files changed, 58 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index f23c1dc3f002..cef85eeaf541 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -98,6 +98,13 @@ struct esr_context { __u64 esr; };
+#define POE_MAGIC 0x504f4530 + +struct poe_context { + struct _aarch64_ctx head; + __u64 por_el0; +}; + /* * extra_context: describes extra space in the signal frame for * additional structures that don't fit in sigcontext.__reserved[]. diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 0e8beb3349ea..379f364005bf 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -62,6 +62,7 @@ struct rt_sigframe_user_layout { unsigned long zt_offset; unsigned long extra_offset; unsigned long end_offset; + unsigned long poe_offset; };
#define BASE_SIGFRAME_SIZE round_up(sizeof(struct rt_sigframe), 16) @@ -182,6 +183,8 @@ struct user_ctxs { u32 za_size; struct zt_context __user *zt; u32 zt_size; + struct poe_context __user *poe; + u32 poe_size; };
static int preserve_fpsimd_context(struct fpsimd_context __user *ctx) @@ -227,6 +230,20 @@ static int restore_fpsimd_context(struct user_ctxs *user) return err ? -EFAULT : 0; }
+static int restore_poe_context(struct user_ctxs *user) +{ + u64 por_el0; + int err = 0; + + if (user->poe_size != sizeof(*user->poe)) + return -EINVAL; + + __get_user_error(por_el0, &(user->poe->por_el0), err); + if (!err) + write_sysreg_s(por_el0, SYS_POR_EL0); + + return err; +}
#ifdef CONFIG_ARM64_SVE
@@ -590,6 +607,7 @@ static int parse_user_sigframe(struct user_ctxs *user, user->tpidr2 = NULL; user->za = NULL; user->zt = NULL; + user->poe = NULL;
if (!IS_ALIGNED((unsigned long)base, 16)) goto invalid; @@ -640,6 +658,17 @@ static int parse_user_sigframe(struct user_ctxs *user, /* ignore */ break;
+ case POE_MAGIC: + if (!system_supports_poe()) + goto invalid; + + if (user->poe) + goto invalid; + + user->poe = (struct poe_context __user *)head; + user->poe_size = size; + break; + case SVE_MAGIC: if (!system_supports_sve() && !system_supports_sme()) goto invalid; @@ -812,6 +841,9 @@ static int restore_sigframe(struct pt_regs *regs, if (err == 0 && system_supports_sme2() && user.zt) err = restore_zt_context(&user);
+ if (err == 0 && system_supports_poe() && user.poe) + err = restore_poe_context(&user); + return err; }
@@ -928,6 +960,13 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, } }
+ if (system_supports_poe()) { + err = sigframe_alloc(user, &user->poe_offset, + sizeof(struct poe_context)); + if (err) + return err; + } + return sigframe_alloc_end(user); }
@@ -968,6 +1007,15 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, __put_user_error(current->thread.fault_code, &esr_ctx->esr, err); }
+ if (system_supports_poe() && err == 0 && user->poe_offset) { + struct poe_context __user *poe_ctx = + apply_user_offset(user, user->poe_offset); + + __put_user_error(POE_MAGIC, &poe_ctx->head.magic, err); + __put_user_error(sizeof(*poe_ctx), &poe_ctx->head.size, err); + __put_user_error(read_sysreg_s(SYS_POR_EL0), &poe_ctx->por_el0, err); + } + /* Scalable Vector Extension state (including streaming), if present */ if ((system_supports_sve() || system_supports_sme()) && err == 0 && user->sve_offset) { @@ -1119,6 +1167,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, sme_smstop(); }
+ if (system_supports_poe()) + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); + if (ka->sa.sa_flags & SA_RESTORER) sigtramp = ka->sa.sa_restorer; else
+ Szabolcs for libc ack (and keeping the full patch quoted below)
You should cc Szabolcs when reposting, we need his ack on the UAPI changes.
On Fri, Nov 24, 2023 at 04:35:00PM +0000, Joey Gouly wrote:
Add PKEY support to signals, by saving and restoring POR_EL0 from the stackframe.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Reviewed-by: Mark Brown broonie@kernel.org
arch/arm64/include/uapi/asm/sigcontext.h | 7 ++++ arch/arm64/kernel/signal.c | 51 ++++++++++++++++++++++++ 2 files changed, 58 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index f23c1dc3f002..cef85eeaf541 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -98,6 +98,13 @@ struct esr_context { __u64 esr; }; +#define POE_MAGIC 0x504f4530
+struct poe_context {
- struct _aarch64_ctx head;
- __u64 por_el0;
+};
/*
- extra_context: describes extra space in the signal frame for
- additional structures that don't fit in sigcontext.__reserved[].
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 0e8beb3349ea..379f364005bf 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -62,6 +62,7 @@ struct rt_sigframe_user_layout { unsigned long zt_offset; unsigned long extra_offset; unsigned long end_offset;
- unsigned long poe_offset;
}; #define BASE_SIGFRAME_SIZE round_up(sizeof(struct rt_sigframe), 16) @@ -182,6 +183,8 @@ struct user_ctxs { u32 za_size; struct zt_context __user *zt; u32 zt_size;
- struct poe_context __user *poe;
- u32 poe_size;
}; static int preserve_fpsimd_context(struct fpsimd_context __user *ctx) @@ -227,6 +230,20 @@ static int restore_fpsimd_context(struct user_ctxs *user) return err ? -EFAULT : 0; } +static int restore_poe_context(struct user_ctxs *user) +{
- u64 por_el0;
- int err = 0;
- if (user->poe_size != sizeof(*user->poe))
return -EINVAL;
- __get_user_error(por_el0, &(user->poe->por_el0), err);
- if (!err)
write_sysreg_s(por_el0, SYS_POR_EL0);
- return err;
+} #ifdef CONFIG_ARM64_SVE @@ -590,6 +607,7 @@ static int parse_user_sigframe(struct user_ctxs *user, user->tpidr2 = NULL; user->za = NULL; user->zt = NULL;
- user->poe = NULL;
if (!IS_ALIGNED((unsigned long)base, 16)) goto invalid; @@ -640,6 +658,17 @@ static int parse_user_sigframe(struct user_ctxs *user, /* ignore */ break;
case POE_MAGIC:
if (!system_supports_poe())
goto invalid;
if (user->poe)
goto invalid;
user->poe = (struct poe_context __user *)head;
user->poe_size = size;
break;
- case SVE_MAGIC: if (!system_supports_sve() && !system_supports_sme()) goto invalid;
@@ -812,6 +841,9 @@ static int restore_sigframe(struct pt_regs *regs, if (err == 0 && system_supports_sme2() && user.zt) err = restore_zt_context(&user);
- if (err == 0 && system_supports_poe() && user.poe)
err = restore_poe_context(&user);
- return err;
} @@ -928,6 +960,13 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, } }
- if (system_supports_poe()) {
err = sigframe_alloc(user, &user->poe_offset,
sizeof(struct poe_context));
if (err)
return err;
- }
- return sigframe_alloc_end(user);
} @@ -968,6 +1007,15 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, __put_user_error(current->thread.fault_code, &esr_ctx->esr, err); }
- if (system_supports_poe() && err == 0 && user->poe_offset) {
struct poe_context __user *poe_ctx =
apply_user_offset(user, user->poe_offset);
__put_user_error(POE_MAGIC, &poe_ctx->head.magic, err);
__put_user_error(sizeof(*poe_ctx), &poe_ctx->head.size, err);
__put_user_error(read_sysreg_s(SYS_POR_EL0), &poe_ctx->por_el0, err);
- }
- /* Scalable Vector Extension state (including streaming), if present */ if ((system_supports_sve() || system_supports_sme()) && err == 0 && user->sve_offset) {
@@ -1119,6 +1167,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, sme_smstop(); }
- if (system_supports_poe())
write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
- if (ka->sa.sa_flags & SA_RESTORER) sigtramp = ka->sa.sa_restorer; else
-- 2.25.1
The 12/11/2023 18:53, Catalin Marinas wrote:
- Szabolcs for libc ack (and keeping the full patch quoted below)
You should cc Szabolcs when reposting, we need his ack on the UAPI changes.
On Fri, Nov 24, 2023 at 04:35:00PM +0000, Joey Gouly wrote:
Add PKEY support to signals, by saving and restoring POR_EL0 from the stackframe.
this looks good.
Acked-by: Szabolcs Nagy szabolcs.nagy@arm.com
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Reviewed-by: Mark Brown broonie@kernel.org
arch/arm64/include/uapi/asm/sigcontext.h | 7 ++++ arch/arm64/kernel/signal.c | 51 ++++++++++++++++++++++++ 2 files changed, 58 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index f23c1dc3f002..cef85eeaf541 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -98,6 +98,13 @@ struct esr_context { __u64 esr; }; +#define POE_MAGIC 0x504f4530
+struct poe_context {
- struct _aarch64_ctx head;
- __u64 por_el0;
+};
/*
- extra_context: describes extra space in the signal frame for
- additional structures that don't fit in sigcontext.__reserved[].
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 0e8beb3349ea..379f364005bf 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -62,6 +62,7 @@ struct rt_sigframe_user_layout { unsigned long zt_offset; unsigned long extra_offset; unsigned long end_offset;
- unsigned long poe_offset;
}; #define BASE_SIGFRAME_SIZE round_up(sizeof(struct rt_sigframe), 16) @@ -182,6 +183,8 @@ struct user_ctxs { u32 za_size; struct zt_context __user *zt; u32 zt_size;
- struct poe_context __user *poe;
- u32 poe_size;
}; static int preserve_fpsimd_context(struct fpsimd_context __user *ctx) @@ -227,6 +230,20 @@ static int restore_fpsimd_context(struct user_ctxs *user) return err ? -EFAULT : 0; } +static int restore_poe_context(struct user_ctxs *user) +{
- u64 por_el0;
- int err = 0;
- if (user->poe_size != sizeof(*user->poe))
return -EINVAL;
- __get_user_error(por_el0, &(user->poe->por_el0), err);
- if (!err)
write_sysreg_s(por_el0, SYS_POR_EL0);
- return err;
+} #ifdef CONFIG_ARM64_SVE @@ -590,6 +607,7 @@ static int parse_user_sigframe(struct user_ctxs *user, user->tpidr2 = NULL; user->za = NULL; user->zt = NULL;
- user->poe = NULL;
if (!IS_ALIGNED((unsigned long)base, 16)) goto invalid; @@ -640,6 +658,17 @@ static int parse_user_sigframe(struct user_ctxs *user, /* ignore */ break;
case POE_MAGIC:
if (!system_supports_poe())
goto invalid;
if (user->poe)
goto invalid;
user->poe = (struct poe_context __user *)head;
user->poe_size = size;
break;
- case SVE_MAGIC: if (!system_supports_sve() && !system_supports_sme()) goto invalid;
@@ -812,6 +841,9 @@ static int restore_sigframe(struct pt_regs *regs, if (err == 0 && system_supports_sme2() && user.zt) err = restore_zt_context(&user);
- if (err == 0 && system_supports_poe() && user.poe)
err = restore_poe_context(&user);
- return err;
} @@ -928,6 +960,13 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, } }
- if (system_supports_poe()) {
err = sigframe_alloc(user, &user->poe_offset,
sizeof(struct poe_context));
if (err)
return err;
- }
- return sigframe_alloc_end(user);
} @@ -968,6 +1007,15 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, __put_user_error(current->thread.fault_code, &esr_ctx->esr, err); }
- if (system_supports_poe() && err == 0 && user->poe_offset) {
struct poe_context __user *poe_ctx =
apply_user_offset(user, user->poe_offset);
__put_user_error(POE_MAGIC, &poe_ctx->head.magic, err);
__put_user_error(sizeof(*poe_ctx), &poe_ctx->head.size, err);
__put_user_error(read_sysreg_s(SYS_POR_EL0), &poe_ctx->por_el0, err);
- }
- /* Scalable Vector Extension state (including streaming), if present */ if ((system_supports_sve() || system_supports_sme()) && err == 0 && user->sve_offset) {
@@ -1119,6 +1167,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, sme_smstop(); }
- if (system_supports_poe())
write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
- if (ka->sa.sa_flags & SA_RESTORER) sigtramp = ka->sa.sa_restorer; else
-- 2.25.1
Now that PKEYs support has been implemented, enable it for CPUs that support S1POE.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/pkeys.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/pkeys.h b/arch/arm64/include/asm/pkeys.h index a80c654da93d..23c473300058 100644 --- a/arch/arm64/include/asm/pkeys.h +++ b/arch/arm64/include/asm/pkeys.h @@ -17,7 +17,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
static inline bool arch_pkeys_enabled(void) { - return false; + return system_supports_poe(); }
static inline int vma_pkey(struct vm_area_struct *vma)
On Fri, Nov 24, 2023 at 04:35:01PM +0000, Joey Gouly wrote:
Now that PKEYs support has been implemented, enable it for CPUs that support S1POE.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
Acked-by: Catalin Marinas catalin.marinas@arm.com
Set the EL0/userspace indirection encodings to be the overlay enabled variants of the permissions.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/include/asm/pgtable-prot.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index e9624f6326dd..3007208e04aa 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -137,10 +137,10 @@ extern bool arm64_use_ng_mappings;
#define PIE_E0 ( \ PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_X_O) | \ - PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX) | \ - PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RWX) | \ - PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY), PIE_R) | \ - PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED), PIE_RW)) + PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX_O) | \ + PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RWX_O) | \ + PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY), PIE_R_O) | \ + PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED), PIE_RW_O))
#define PIE_E1 ( \ PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_NONE_O) | \
On Fri, Nov 24, 2023 at 04:35:02PM +0000, Joey Gouly wrote:
Set the EL0/userspace indirection encodings to be the overlay enabled variants of the permissions.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
arch/arm64/include/asm/pgtable-prot.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index e9624f6326dd..3007208e04aa 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -137,10 +137,10 @@ extern bool arm64_use_ng_mappings; #define PIE_E0 ( \ PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_X_O) | \
- PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX) | \
- PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RWX) | \
- PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY), PIE_R) | \
- PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED), PIE_RW))
- PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_RX_O) | \
- PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC), PIE_RWX_O) | \
- PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY), PIE_R_O) | \
- PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED), PIE_RW_O))
#define PIE_E1 ( \ PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY), PIE_NONE_O) | \
Don't we need to do this for PIE_E1? Or we consider the futex (LDXR/STXR) accesses not checked by POE? That's fine by me if we go this route but we should document it. The alternative is to enable overlay variants in PIE_E1 but we need to reserve a POE key for the kernel to use.
Add a regset for POE containing POR_EL0.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org --- arch/arm64/kernel/ptrace.c | 46 ++++++++++++++++++++++++++++++++++++++ include/uapi/linux/elf.h | 1 + 2 files changed, 47 insertions(+)
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 20d7ef82de90..c3257a5c97f1 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1409,6 +1409,39 @@ static int tagged_addr_ctrl_set(struct task_struct *target, const struct } #endif
+#ifdef CONFIG_ARM64_POE +static int poe_get(struct task_struct *target, + const struct user_regset *regset, + struct membuf to) +{ + if (!system_supports_poe()) + return -EINVAL; + + return membuf_write(&to, &target->thread.por_el0, + sizeof(target->thread.por_el0)); +} + +static int poe_set(struct task_struct *target, const struct + user_regset *regset, unsigned int pos, + unsigned int count, const void *kbuf, const + void __user *ubuf) +{ + int ret; + long ctrl; + + if (!system_supports_poe()) + return -EINVAL; + + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &ctrl, 0, -1); + if (ret) + return ret; + + target->thread.por_el0 = ctrl; + + return 0; +} +#endif + enum aarch64_regset { REGSET_GPR, REGSET_FPR, @@ -1437,6 +1470,9 @@ enum aarch64_regset { #ifdef CONFIG_ARM64_TAGGED_ADDR_ABI REGSET_TAGGED_ADDR_CTRL, #endif +#ifdef CONFIG_ARM64_POE + REGSET_POE +#endif };
static const struct user_regset aarch64_regsets[] = { @@ -1587,6 +1623,16 @@ static const struct user_regset aarch64_regsets[] = { .set = tagged_addr_ctrl_set, }, #endif +#ifdef CONFIG_ARM64_POE + [REGSET_POE] = { + .core_note_type = NT_ARM_POE, + .n = 1, + .size = sizeof(long), + .align = sizeof(long), + .regset_get = poe_get, + .set = poe_set, + }, +#endif };
static const struct user_regset_view user_aarch64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index 9417309b7230..f2713efcd81b 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -440,6 +440,7 @@ typedef struct elf64_shdr { #define NT_ARM_SSVE 0x40b /* ARM Streaming SVE registers */ #define NT_ARM_ZA 0x40c /* ARM SME ZA registers */ #define NT_ARM_ZT 0x40d /* ARM SME ZT registers */ +#define NT_ARM_POE 0x40f /* ARM POE registers */ #define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */ #define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */ #define NT_MIPS_DSP 0x800 /* MIPS DSP ASE registers */
On Fri, Nov 24, 2023 at 04:35:03PM +0000, Joey Gouly wrote:
Add a regset for POE containing POR_EL0.
+++ b/include/uapi/linux/elf.h @@ -440,6 +440,7 @@ typedef struct elf64_shdr { #define NT_ARM_SSVE 0x40b /* ARM Streaming SVE registers */ #define NT_ARM_ZA 0x40c /* ARM SME ZA registers */ #define NT_ARM_ZT 0x40d /* ARM SME ZT registers */ +#define NT_ARM_POE 0x40f /* ARM POE registers */
Not 0x40e?
Otherwise
Reviewed-by: Mark Brown broonie@kernel.org
On Fri, Nov 24, 2023 at 04:35:03PM +0000, Joey Gouly wrote:
Add a regset for POE containing POR_EL0.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
Put this function in the header so that it can be used by other tests, without needing to link to testcases.c.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Shuah Khan shuah@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com --- .../arm64/signal/testcases/testcases.c | 23 ----------------- .../arm64/signal/testcases/testcases.h | 25 +++++++++++++++++-- 2 files changed, 23 insertions(+), 25 deletions(-)
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.c b/tools/testing/selftests/arm64/signal/testcases/testcases.c index 9f580b55b388..fe950b6bca6b 100644 --- a/tools/testing/selftests/arm64/signal/testcases/testcases.c +++ b/tools/testing/selftests/arm64/signal/testcases/testcases.c @@ -6,29 +6,6 @@
#include "testcases.h"
-struct _aarch64_ctx *get_header(struct _aarch64_ctx *head, uint32_t magic, - size_t resv_sz, size_t *offset) -{ - size_t offs = 0; - struct _aarch64_ctx *found = NULL; - - if (!head || resv_sz < HDR_SZ) - return found; - - while (offs <= resv_sz - HDR_SZ && - head->magic != magic && head->magic) { - offs += head->size; - head = GET_RESV_NEXT_HEAD(head); - } - if (head->magic == magic) { - found = head; - if (offset) - *offset = offs; - } - - return found; -} - bool validate_extra_context(struct extra_context *extra, char **err, void **extra_data, size_t *extra_size) { diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.h b/tools/testing/selftests/arm64/signal/testcases/testcases.h index a08ab0d6207a..d33154c9a4bd 100644 --- a/tools/testing/selftests/arm64/signal/testcases/testcases.h +++ b/tools/testing/selftests/arm64/signal/testcases/testcases.h @@ -87,8 +87,29 @@ struct fake_sigframe {
bool validate_reserved(ucontext_t *uc, size_t resv_sz, char **err);
-struct _aarch64_ctx *get_header(struct _aarch64_ctx *head, uint32_t magic, - size_t resv_sz, size_t *offset); +static inline struct _aarch64_ctx *get_header(struct _aarch64_ctx *head, uint32_t magic, + size_t resv_sz, size_t *offset) +{ + size_t offs = 0; + struct _aarch64_ctx *found = NULL; + + if (!head || resv_sz < HDR_SZ) + return found; + + while (offs <= resv_sz - HDR_SZ && + head->magic != magic && head->magic) { + offs += head->size; + head = GET_RESV_NEXT_HEAD(head); + } + if (head->magic == magic) { + found = head; + if (offset) + *offset = offs; + } + + return found; +} +
static inline struct _aarch64_ctx *get_terminator(struct _aarch64_ctx *head, size_t resv_sz,
On Fri, Nov 24, 2023 at 04:35:04PM +0000, Joey Gouly wrote:
Put this function in the header so that it can be used by other tests, without needing to link to testcases.c.
It would've been good to explain a bit more of the context here but
Reviewed-by: Mark Brown broonie@kernel.org
arm64's fpregs are not at a constant offset from sigcontext. Since this is not an important part of the test, don't print the fpregs pointer on arm64.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Shuah Khan shuah@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Acked-by: Dave Hansen dave.hansen@linux.intel.com --- tools/testing/selftests/mm/pkey-powerpc.h | 1 + tools/testing/selftests/mm/pkey-x86.h | 2 ++ tools/testing/selftests/mm/protection_keys.c | 6 ++++++ 3 files changed, 9 insertions(+)
diff --git a/tools/testing/selftests/mm/pkey-powerpc.h b/tools/testing/selftests/mm/pkey-powerpc.h index ae5df26104e5..6275d0f474b3 100644 --- a/tools/testing/selftests/mm/pkey-powerpc.h +++ b/tools/testing/selftests/mm/pkey-powerpc.h @@ -9,6 +9,7 @@ #endif #define REG_IP_IDX PT_NIP #define REG_TRAPNO PT_TRAP +#define MCONTEXT_FPREGS #define gregs gp_regs #define fpregs fp_regs #define si_pkey_offset 0x20 diff --git a/tools/testing/selftests/mm/pkey-x86.h b/tools/testing/selftests/mm/pkey-x86.h index 814758e109c0..b9170a26bfcb 100644 --- a/tools/testing/selftests/mm/pkey-x86.h +++ b/tools/testing/selftests/mm/pkey-x86.h @@ -15,6 +15,8 @@
#endif
+#define MCONTEXT_FPREGS + #ifndef PKEY_DISABLE_ACCESS # define PKEY_DISABLE_ACCESS 0x1 #endif diff --git a/tools/testing/selftests/mm/protection_keys.c b/tools/testing/selftests/mm/protection_keys.c index 48dc151f8fca..b3dbd76ea27c 100644 --- a/tools/testing/selftests/mm/protection_keys.c +++ b/tools/testing/selftests/mm/protection_keys.c @@ -314,7 +314,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) ucontext_t *uctxt = vucontext; int trapno; unsigned long ip; +#ifdef MCONTEXT_FPREGS char *fpregs; +#endif #if defined(__i386__) || defined(__x86_64__) /* arch */ u32 *pkey_reg_ptr; int pkey_reg_offset; @@ -330,7 +332,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO]; ip = uctxt->uc_mcontext.gregs[REG_IP_IDX]; +#ifdef MCONTEXT_FPREGS fpregs = (char *) uctxt->uc_mcontext.fpregs; +#endif
dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n", __func__, trapno, ip, si_code_str(si->si_code), @@ -359,7 +363,9 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) #endif /* arch */
dprintf1("siginfo: %p\n", si); +#ifdef MCONTEXT_FPREGS dprintf1(" fpregs: %p\n", fpregs); +#endif
if ((si->si_code == SEGV_MAPERR) || (si->si_code == SEGV_ACCERR) ||
The encoding of the pkey register differs on arm64, than on x86/ppc. On those platforms, a bit in the register is used to disable permissions, for arm64, a bit enabled in the register indicates that the permission is allowed.
This drops two asserts of the form: assert(read_pkey_reg() <= orig_pkey_reg); Because on arm64 this doesn't hold, due to the encoding.
The pkey must be reset to both access allow and write allow in the signal handler. pkey_access_allow() works currently for PowerPC as the PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE have overlapping bits set.
Access to the uc_mcontext is abstracted, as arm64 has a different structure.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Shuah Khan shuah@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Acked-by: Dave Hansen dave.hansen@linux.intel.com --- .../arm64/signal/testcases/testcases.h | 3 + tools/testing/selftests/mm/Makefile | 2 +- tools/testing/selftests/mm/pkey-arm64.h | 139 ++++++++++++++++++ tools/testing/selftests/mm/pkey-helpers.h | 8 + tools/testing/selftests/mm/pkey-powerpc.h | 2 + tools/testing/selftests/mm/pkey-x86.h | 2 + tools/testing/selftests/mm/protection_keys.c | 103 +++++++++++-- 7 files changed, 247 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/mm/pkey-arm64.h
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.h b/tools/testing/selftests/arm64/signal/testcases/testcases.h index d33154c9a4bd..e445027d5ec2 100644 --- a/tools/testing/selftests/arm64/signal/testcases/testcases.h +++ b/tools/testing/selftests/arm64/signal/testcases/testcases.h @@ -25,6 +25,9 @@ #define HDR_SZ \ sizeof(struct _aarch64_ctx)
+#define GET_UC_RESV_HEAD(uc) \ + (struct _aarch64_ctx *)(&(uc->uc_mcontext.__reserved)) + #define GET_SF_RESV_HEAD(sf) \ (struct _aarch64_ctx *)(&(sf).uc.uc_mcontext.__reserved)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 78dfec8bc676..33922ae4bb6e 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -97,7 +97,7 @@ TEST_GEN_FILES += $(BINARIES_64) endif else
-ifneq (,$(findstring $(ARCH),ppc64)) +ifneq (,$(filter $(ARCH),arm64 ppc64)) TEST_GEN_FILES += protection_keys endif
diff --git a/tools/testing/selftests/mm/pkey-arm64.h b/tools/testing/selftests/mm/pkey-arm64.h new file mode 100644 index 000000000000..2861564f6415 --- /dev/null +++ b/tools/testing/selftests/mm/pkey-arm64.h @@ -0,0 +1,139 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023 Arm Ltd. +*/ + +#ifndef _PKEYS_ARM64_H +#define _PKEYS_ARM64_H + +#include "vm_util.h" +/* for signal frame parsing */ +#include "../arm64/signal/testcases/testcases.h" + +#ifndef SYS_mprotect_key +# define SYS_mprotect_key 288 +#endif +#ifndef SYS_pkey_alloc +# define SYS_pkey_alloc 289 +# define SYS_pkey_free 290 +#endif +#define MCONTEXT_IP(mc) mc.pc +#define MCONTEXT_TRAPNO(mc) -1 + +#define PKEY_MASK 0xf + +#define POE_NONE 0x0 +#define POE_X 0x2 +#define POE_RX 0x3 +#define POE_RWX 0x7 + +#define NR_PKEYS 7 +#define NR_RESERVED_PKEYS 1 /* pkey-0 */ + +#define PKEY_ALLOW_ALL 0x77777777 + +#define PKEY_BITS_PER_PKEY 4 +#define PAGE_SIZE sysconf(_SC_PAGESIZE) +#undef HPAGE_SIZE +#define HPAGE_SIZE default_huge_page_size() + +/* 4-byte instructions * 16384 = 64K page */ +#define __page_o_noops() asm(".rept 16384 ; nop; .endr") + +static inline u64 __read_pkey_reg(void) +{ + u64 pkey_reg = 0; + + // POR_EL0 + asm volatile("mrs %0, S3_3_c10_c2_4" : "=r" (pkey_reg)); + + return pkey_reg; +} + +static inline void __write_pkey_reg(u64 pkey_reg) +{ + u64 por = pkey_reg; + + dprintf4("%s() changing %016llx to %016llx\n", + __func__, __read_pkey_reg(), pkey_reg); + + // POR_EL0 + asm volatile("msr S3_3_c10_c2_4, %0\nisb" :: "r" (por) :); + + dprintf4("%s() pkey register after changing %016llx to %016llx\n", + __func__, __read_pkey_reg(), pkey_reg); +} + +static inline int cpu_has_pkeys(void) +{ + /* No simple way to determine this */ + return 1; +} + +static inline u32 pkey_bit_position(int pkey) +{ + return pkey * PKEY_BITS_PER_PKEY; +} + +static inline int get_arch_reserved_keys(void) +{ + return NR_RESERVED_PKEYS; +} + +void expect_fault_on_read_execonly_key(void *p1, int pkey) +{ +} + +void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey) +{ + return PTR_ERR_ENOTSUP; +} + +#define set_pkey_bits set_pkey_bits +static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags) +{ + u32 shift = pkey_bit_position(pkey); + u64 new_val = POE_RWX; + + /* mask out bits from pkey in old value */ + reg &= ~((u64)PKEY_MASK << shift); + + if (flags & PKEY_DISABLE_ACCESS) + new_val = POE_X; + else if (flags & PKEY_DISABLE_WRITE) + new_val = POE_RX; + + /* OR in new bits for pkey */ + reg |= new_val << shift; + + return reg; +} + +#define get_pkey_bits get_pkey_bits +static inline u64 get_pkey_bits(u64 reg, int pkey) +{ + u32 shift = pkey_bit_position(pkey); + /* + * shift down the relevant bits to the lowest two, then + * mask off all the other higher bits + */ + u32 perm = (reg >> shift) & PKEY_MASK; + + if (perm == POE_X) + return PKEY_DISABLE_ACCESS; + if (perm == POE_RX) + return PKEY_DISABLE_WRITE; + return 0; +} + +static void aarch64_write_signal_pkey(ucontext_t *uctxt, u64 pkey) +{ + struct _aarch64_ctx *ctx = GET_UC_RESV_HEAD(uctxt); + struct poe_context *poe_ctx = + (struct poe_context *) get_header(ctx, POE_MAGIC, + sizeof(uctxt->uc_mcontext), NULL); + if (poe_ctx) + poe_ctx->por_el0 = pkey; +} + +#endif /* _PKEYS_ARM64_H */ diff --git a/tools/testing/selftests/mm/pkey-helpers.h b/tools/testing/selftests/mm/pkey-helpers.h index 1af3156a9db8..15608350fc01 100644 --- a/tools/testing/selftests/mm/pkey-helpers.h +++ b/tools/testing/selftests/mm/pkey-helpers.h @@ -91,12 +91,17 @@ void record_pkey_malloc(void *ptr, long size, int prot); #include "pkey-x86.h" #elif defined(__powerpc64__) /* arch */ #include "pkey-powerpc.h" +#elif defined(__aarch64__) /* arch */ +#include "pkey-arm64.h" #else /* arch */ #error Architecture not supported #endif /* arch */
+#ifndef PKEY_MASK #define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) +#endif
+#ifndef set_pkey_bits static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags) { u32 shift = pkey_bit_position(pkey); @@ -106,7 +111,9 @@ static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags) reg |= (flags & PKEY_MASK) << shift; return reg; } +#endif
+#ifndef get_pkey_bits static inline u64 get_pkey_bits(u64 reg, int pkey) { u32 shift = pkey_bit_position(pkey); @@ -116,6 +123,7 @@ static inline u64 get_pkey_bits(u64 reg, int pkey) */ return ((reg >> shift) & PKEY_MASK); } +#endif
extern u64 shadow_pkey_reg;
diff --git a/tools/testing/selftests/mm/pkey-powerpc.h b/tools/testing/selftests/mm/pkey-powerpc.h index 6275d0f474b3..3d0c0bdae5bc 100644 --- a/tools/testing/selftests/mm/pkey-powerpc.h +++ b/tools/testing/selftests/mm/pkey-powerpc.h @@ -8,6 +8,8 @@ # define SYS_pkey_free 385 #endif #define REG_IP_IDX PT_NIP +#define MCONTEXT_IP(mc) mc.gp_regs[REG_IP_IDX] +#define MCONTEXT_TRAPNO(mc) mc.gp_regs[REG_TRAPNO] #define REG_TRAPNO PT_TRAP #define MCONTEXT_FPREGS #define gregs gp_regs diff --git a/tools/testing/selftests/mm/pkey-x86.h b/tools/testing/selftests/mm/pkey-x86.h index b9170a26bfcb..5f28e26a2511 100644 --- a/tools/testing/selftests/mm/pkey-x86.h +++ b/tools/testing/selftests/mm/pkey-x86.h @@ -15,6 +15,8 @@
#endif
+#define MCONTEXT_IP(mc) mc.gregs[REG_IP_IDX] +#define MCONTEXT_TRAPNO(mc) mc.gregs[REG_TRAPNO] #define MCONTEXT_FPREGS
#ifndef PKEY_DISABLE_ACCESS diff --git a/tools/testing/selftests/mm/protection_keys.c b/tools/testing/selftests/mm/protection_keys.c index b3dbd76ea27c..989fdf489e33 100644 --- a/tools/testing/selftests/mm/protection_keys.c +++ b/tools/testing/selftests/mm/protection_keys.c @@ -147,7 +147,7 @@ void abort_hooks(void) * will then fault, which makes sure that the fault code handles * execute-only memory properly. */ -#ifdef __powerpc64__ +#if defined(__powerpc64__) || defined(__aarch64__) /* This way, both 4K and 64K alignment are maintained */ __attribute__((__aligned__(65536))) #else @@ -212,7 +212,6 @@ void pkey_disable_set(int pkey, int flags) unsigned long syscall_flags = 0; int ret; int pkey_rights; - u64 orig_pkey_reg = read_pkey_reg();
dprintf1("START->%s(%d, 0x%x)\n", __func__, pkey, flags); @@ -242,8 +241,6 @@ void pkey_disable_set(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); - if (flags) - pkey_assert(read_pkey_reg() >= orig_pkey_reg); dprintf1("END<---%s(%d, 0x%x)\n", __func__, pkey, flags); } @@ -253,7 +250,6 @@ void pkey_disable_clear(int pkey, int flags) unsigned long syscall_flags = 0; int ret; int pkey_rights = hw_pkey_get(pkey, syscall_flags); - u64 orig_pkey_reg = read_pkey_reg();
pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
@@ -273,8 +269,6 @@ void pkey_disable_clear(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__, pkey, read_pkey_reg()); - if (flags) - assert(read_pkey_reg() <= orig_pkey_reg); }
void pkey_write_allow(int pkey) @@ -330,8 +324,8 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) __func__, __LINE__, __read_pkey_reg(), shadow_pkey_reg);
- trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO]; - ip = uctxt->uc_mcontext.gregs[REG_IP_IDX]; + trapno = MCONTEXT_TRAPNO(uctxt->uc_mcontext); + ip = MCONTEXT_IP(uctxt->uc_mcontext); #ifdef MCONTEXT_FPREGS fpregs = (char *) uctxt->uc_mcontext.fpregs; #endif @@ -395,6 +389,8 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext) #elif defined(__powerpc64__) /* arch */ /* restore access and let the faulting instruction continue */ pkey_access_allow(siginfo_pkey); +#elif defined(__aarch64__) + aarch64_write_signal_pkey(uctxt, PKEY_ALLOW_ALL); #endif /* arch */ pkey_faults++; dprintf1("<<<<==================================================\n"); @@ -908,7 +904,9 @@ void expected_pkey_fault(int pkey) * test program continue. We now have to restore it. */ if (__read_pkey_reg() != 0) -#else /* arch */ +#elif defined(__aarch64__) + if (__read_pkey_reg() != PKEY_ALLOW_ALL) +#else if (__read_pkey_reg() != shadow_pkey_reg) #endif /* arch */ pkey_assert(0); @@ -1498,6 +1496,11 @@ void test_executing_on_unreadable_memory(int *ptr, u16 pkey) lots_o_noops_around_write(&scratch); do_not_expect_pkey_fault("executing on PROT_EXEC memory"); expect_fault_on_read_execonly_key(p1, pkey); + + // Reset back to PROT_EXEC | PROT_READ for architectures that support + // non-PKEY execute-only permissions. + ret = mprotect_pkey(p1, PAGE_SIZE, PROT_EXEC | PROT_READ, (u64)pkey); + pkey_assert(!ret); }
void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey) @@ -1671,6 +1674,84 @@ void test_ptrace_modifies_pkru(int *ptr, u16 pkey) } #endif
+#if defined(__aarch64__) +void test_ptrace_modifies_pkru(int *ptr, u16 pkey) +{ + pid_t child; + int status, ret; + struct iovec iov; + u64 trace_pkey; + /* Just a random pkey value.. */ + u64 new_pkey = (POE_X << PKEY_BITS_PER_PKEY * 2) | + (POE_NONE << PKEY_BITS_PER_PKEY) | + POE_RWX; + + child = fork(); + pkey_assert(child >= 0); + dprintf3("[%d] fork() ret: %d\n", getpid(), child); + if (!child) { + ptrace(PTRACE_TRACEME, 0, 0, 0); + + /* Stop and allow the tracer to modify PKRU directly */ + raise(SIGSTOP); + + /* + * need __read_pkey_reg() version so we do not do shadow_pkey_reg + * checking + */ + if (__read_pkey_reg() != new_pkey) + exit(1); + + raise(SIGSTOP); + + exit(0); + } + + pkey_assert(child == waitpid(child, &status, 0)); + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status); + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP); + + iov.iov_base = &trace_pkey; + iov.iov_len = 8; + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_ARM_POE, &iov); + pkey_assert(ret == 0); + pkey_assert(trace_pkey == read_pkey_reg()); + + trace_pkey = new_pkey; + + ret = ptrace(PTRACE_SETREGSET, child, (void *)NT_ARM_POE, &iov); + pkey_assert(ret == 0); + + /* Test that the modification is visible in ptrace before any execution */ + memset(&trace_pkey, 0, sizeof(trace_pkey)); + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_ARM_POE, &iov); + pkey_assert(ret == 0); + pkey_assert(trace_pkey == new_pkey); + + /* Execute the tracee */ + ret = ptrace(PTRACE_CONT, child, 0, 0); + pkey_assert(ret == 0); + + /* Test that the tracee saw the PKRU value change */ + pkey_assert(child == waitpid(child, &status, 0)); + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status); + pkey_assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP); + + /* Test that the modification is visible in ptrace after execution */ + memset(&trace_pkey, 0, sizeof(trace_pkey)); + ret = ptrace(PTRACE_GETREGSET, child, (void *)NT_ARM_POE, &iov); + pkey_assert(ret == 0); + pkey_assert(trace_pkey == new_pkey); + + ret = ptrace(PTRACE_CONT, child, 0, 0); + pkey_assert(ret == 0); + pkey_assert(child == waitpid(child, &status, 0)); + dprintf3("[%d] waitpid(%d) status: %x\n", getpid(), child, status); + pkey_assert(WIFEXITED(status)); + pkey_assert(WEXITSTATUS(status) == 0); +} +#endif + void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 pkey) { int size = PAGE_SIZE; @@ -1706,7 +1787,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = { test_pkey_syscalls_bad_args, test_pkey_alloc_exhaust, test_pkey_alloc_free_attach_pkey0, -#if defined(__i386__) || defined(__x86_64__) +#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) test_ptrace_modifies_pkru, #endif };
Check that when POE is enabled, the POR_EL0 register is accessible.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Shuah Khan shuah@kernel.org --- tools/testing/selftests/arm64/abi/hwcap.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/arm64/abi/hwcap.c b/tools/testing/selftests/arm64/abi/hwcap.c index 1189e77c8152..9ee7b04f3fbb 100644 --- a/tools/testing/selftests/arm64/abi/hwcap.c +++ b/tools/testing/selftests/arm64/abi/hwcap.c @@ -115,6 +115,12 @@ static void pmull_sigill(void) asm volatile(".inst 0x0ee0e000" : : : ); }
+static void poe_sigill(void) +{ + /* mrs x0, POR_EL0 */ + asm volatile("mrs x0, S3_3_C10_C2_4" : : : "x0"); +} + static void rng_sigill(void) { asm volatile("mrs x0, S3_3_C2_C4_0" : : : "x0"); @@ -426,6 +432,14 @@ static const struct hwcap_data { .cpuinfo = "pmull", .sigill_fn = pmull_sigill, }, + { + .name = "POE", + .at_hwcap = AT_HWCAP2, + .hwcap_bit = HWCAP2_POE, + .cpuinfo = "poe", + .sigill_fn = poe_sigill, + .sigill_reliable = true, + }, { .name = "RNG", .at_hwcap = AT_HWCAP2,
On Fri, Nov 24, 2023 at 04:35:07PM +0000, Joey Gouly wrote:
Check that when POE is enabled, the POR_EL0 register is accessible.
Reviewed-by: Mark Brown broonie@kernel.org
Teach the signal frame parsing about the new POE frame, avoids warning when it is generated.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Shuah Khan shuah@kernel.org Reviewed-by: Mark Brown broonie@kernel.org --- tools/testing/selftests/arm64/signal/testcases/testcases.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/arm64/signal/testcases/testcases.c b/tools/testing/selftests/arm64/signal/testcases/testcases.c index fe950b6bca6b..5dda753870aa 100644 --- a/tools/testing/selftests/arm64/signal/testcases/testcases.c +++ b/tools/testing/selftests/arm64/signal/testcases/testcases.c @@ -161,6 +161,10 @@ bool validate_reserved(ucontext_t *uc, size_t resv_sz, char **err) if (head->size != sizeof(struct esr_context)) *err = "Bad size for esr_context"; break; + case POE_MAGIC: + if (head->size != sizeof(struct poe_context)) + *err = "Bad size for poe_context"; + break; case TPIDR2_MAGIC: if (head->size != sizeof(struct tpidr2_context)) *err = "Bad size for tpidr2_context";
Ensure that we get signal context for POR_EL0 if and only if POE is present on the system.
Copied from the TPIDR2 test.
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Mark Brown broonie@kernel.org Cc: Shuah Khan shuah@kernel.org --- .../testing/selftests/arm64/signal/.gitignore | 1 + .../arm64/signal/testcases/poe_siginfo.c | 86 +++++++++++++++++++ 2 files changed, 87 insertions(+) create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c
diff --git a/tools/testing/selftests/arm64/signal/.gitignore b/tools/testing/selftests/arm64/signal/.gitignore index 839e3a252629..6bcb27bd506b 100644 --- a/tools/testing/selftests/arm64/signal/.gitignore +++ b/tools/testing/selftests/arm64/signal/.gitignore @@ -5,6 +5,7 @@ sme_* ssve_* sve_* tpidr2_* +poe_siginfo za_* zt_* !*.[ch] diff --git a/tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c b/tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c new file mode 100644 index 000000000000..d890029304c4 --- /dev/null +++ b/tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c @@ -0,0 +1,86 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023 Arm Limited + * + * Verify that the POR_EL0 register context in signal frames is set up as + * expected. + */ + +#include <signal.h> +#include <ucontext.h> +#include <sys/auxv.h> +#include <sys/prctl.h> +#include <unistd.h> +#include <asm/sigcontext.h> + +#include "test_signals_utils.h" +#include "testcases.h" + +static union { + ucontext_t uc; + char buf[1024 * 128]; +} context; + +#define SYS_POR_EL0 "S3_3_C10_C2_4" + +static uint64_t get_por_el0(void) +{ + uint64_t val; + + asm volatile ( + "mrs %0, " SYS_POR_EL0 "\n" + : "=r"(val) + : + : "cc"); + + return val; +} + +int poe_present(struct tdescr *td, siginfo_t *si, ucontext_t *uc) +{ + struct _aarch64_ctx *head = GET_BUF_RESV_HEAD(context); + struct poe_context *poe_ctx; + size_t offset; + bool in_sigframe; + bool have_poe; + __u64 orig_poe; + + have_poe = getauxval(AT_HWCAP2) & HWCAP2_POE; + if (have_poe) + orig_poe = get_por_el0(); + + if (!get_current_context(td, &context.uc, sizeof(context))) + return 1; + + poe_ctx = (struct poe_context *) + get_header(head, POE_MAGIC, td->live_sz, &offset); + + in_sigframe = poe_ctx != NULL; + + fprintf(stderr, "POR_EL0 sigframe %s on system %s POE\n", + in_sigframe ? "present" : "absent", + have_poe ? "with" : "without"); + + td->pass = (in_sigframe == have_poe); + + /* + * Check that the value we read back was the one present at + * the time that the signal was triggered. + */ + if (have_poe && poe_ctx) { + if (poe_ctx->por_el0 != orig_poe) { + fprintf(stderr, "POR_EL0 in frame is %llx, was %llx\n", + poe_ctx->por_el0, orig_poe); + td->pass = false; + } + } + + return 0; +} + +struct tdescr tde = { + .name = "POR_EL0", + .descr = "Validate that POR_EL0 is present as expected", + .timeout = 3, + .run = poe_present, +};
On Fri, Nov 24, 2023 at 04:35:09PM +0000, Joey Gouly wrote:
+++ b/tools/testing/selftests/arm64/signal/.gitignore @@ -5,6 +5,7 @@ sme_* ssve_* sve_* tpidr2_* +poe_siginfo za_* zt_* !*.[ch]
Please keep this sorted, otherwise
Reviewed-by: Mark Brown broonie@kernel.org
Add new system registers: - POR_EL1 - POR_EL0
Signed-off-by: Joey Gouly joey.gouly@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Marc Zyngier maz@kernel.org Cc: Oliver Upton oliver.upton@linux.dev Cc: Shuah Khan shuah@kernel.org --- tools/testing/selftests/kvm/aarch64/get-reg-list.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/kvm/aarch64/get-reg-list.c b/tools/testing/selftests/kvm/aarch64/get-reg-list.c index 709d7d721760..ac661ebf6859 100644 --- a/tools/testing/selftests/kvm/aarch64/get-reg-list.c +++ b/tools/testing/selftests/kvm/aarch64/get-reg-list.c @@ -40,6 +40,18 @@ static struct feature_id_reg feat_id_regs[] = { ARM64_SYS_REG(3, 0, 0, 7, 3), /* ID_AA64MMFR3_EL1 */ 4, 1 + }, + { + ARM64_SYS_REG(3, 0, 10, 2, 4), /* POR_EL1 */ + ARM64_SYS_REG(3, 0, 0, 7, 3), /* ID_AA64MMFR3_EL1 */ + 16, + 1 + }, + { + ARM64_SYS_REG(3, 3, 10, 2, 4), /* POR_EL0 */ + ARM64_SYS_REG(3, 0, 0, 7, 3), /* ID_AA64MMFR3_EL1 */ + 16, + 1 } };
@@ -468,6 +480,7 @@ static __u64 base_regs[] = { ARM64_SYS_REG(3, 0, 10, 2, 0), /* MAIR_EL1 */ ARM64_SYS_REG(3, 0, 10, 2, 2), /* PIRE0_EL1 */ ARM64_SYS_REG(3, 0, 10, 2, 3), /* PIR_EL1 */ + ARM64_SYS_REG(3, 0, 10, 2, 4), /* POR_EL1 */ ARM64_SYS_REG(3, 0, 10, 3, 0), /* AMAIR_EL1 */ ARM64_SYS_REG(3, 0, 12, 0, 0), /* VBAR_EL1 */ ARM64_SYS_REG(3, 0, 12, 1, 1), /* DISR_EL1 */ @@ -475,6 +488,7 @@ static __u64 base_regs[] = { ARM64_SYS_REG(3, 0, 13, 0, 4), /* TPIDR_EL1 */ ARM64_SYS_REG(3, 0, 14, 1, 0), /* CNTKCTL_EL1 */ ARM64_SYS_REG(3, 2, 0, 0, 0), /* CSSELR_EL1 */ + ARM64_SYS_REG(3, 3, 10, 2, 4), /* POR_EL0 */ ARM64_SYS_REG(3, 3, 13, 0, 2), /* TPIDR_EL0 */ ARM64_SYS_REG(3, 3, 13, 0, 3), /* TPIDRRO_EL0 */ ARM64_SYS_REG(3, 3, 14, 0, 1), /* CNTPCT_EL0 */
On Fri, Nov 24, 2023 at 04:35:10PM +0000, Joey Gouly wrote:
Add new system registers:
- POR_EL1
- POR_EL0
Reviewed-by: Mark Brown broonie@kernel.org
Hi Joey,
On Fri, 24 Nov 2023 16:34:45 +0000, Joey Gouly joey.gouly@arm.com wrote:
Hello everyone,
This series implements the Permission Overlay Extension introduced in 2022 VMSA enhancements [1]. It is based on v6.7-rc2.
Changes since v2[2]: # Added ptrace support and selftest # Add missing POR_EL0 initialisation in fork/clone # Rebase onto v6.7-rc2 # Add r-bs
The Permission Overlay Extension allows to constrain permissions on memory regions. This can be used from userspace (EL0) without a system call or TLB invalidation.
I have given this series a few more thoughts, and came to the conclusion that is it still incomplete on the KVM front:
* FEAT_S1POE often comes together with FEAT_S2POE. For obvious reasons, we cannot afford to let the guest play with S2POR_EL1, nor do we want to advertise FEAT_S2POE to the guest.
You will need to add some additional FGT for this, and mask out FEAT_S2POE from the guest's view of the ID registers.
* letting the guest play with POE comes with some interesting strings attached: a guest that has started on a POE-enabled host cannot be migrated to one that doesn't have POE. which means that the POE registers should only be visible to the host userspace if enabled in the guest's ID registers, and thus only context-switched in these conditions. They should otherwise UNDEF.
Thanks,
M.
Hi Marc,
On Mon, Dec 04, 2023 at 11:03:24AM +0000, Marc Zyngier wrote:
Hi Joey,
On Fri, 24 Nov 2023 16:34:45 +0000, Joey Gouly joey.gouly@arm.com wrote:
Hello everyone,
This series implements the Permission Overlay Extension introduced in 2022 VMSA enhancements [1]. It is based on v6.7-rc2.
Changes since v2[2]: # Added ptrace support and selftest # Add missing POR_EL0 initialisation in fork/clone # Rebase onto v6.7-rc2 # Add r-bs
The Permission Overlay Extension allows to constrain permissions on memory regions. This can be used from userspace (EL0) without a system call or TLB invalidation.
I have given this series a few more thoughts, and came to the conclusion that is it still incomplete on the KVM front:
FEAT_S1POE often comes together with FEAT_S2POE. For obvious reasons, we cannot afford to let the guest play with S2POR_EL1, nor do we want to advertise FEAT_S2POE to the guest.
You will need to add some additional FGT for this, and mask out FEAT_S2POE from the guest's view of the ID registers.
I found out last week that I had misunderstood S2POR_EL1, so yes looks like we need to trap that. I will add that in.
- letting the guest play with POE comes with some interesting strings attached: a guest that has started on a POE-enabled host cannot be migrated to one that doesn't have POE. which means that the POE registers should only be visible to the host userspace if enabled in the guest's ID registers, and thus only context-switched in these conditions. They should otherwise UNDEF.
Can you give me some clarification here? - By visible to the host userspace do you mean via the GET_ONE_REG API? - Currently the ID register (ID_AA64MMFR3_EL1) is not ID_WRITABLE, should this series or another make it so? Currently if the host had POE it's enabled in the guest, so I believe migration to a non-POE host will fail? - For the context switch, do you mean something like:
if (system_supports_poe() && ID_REG(MMFR3_EL1) & S1POE) ctxt_sys_reg(ctxt, POR_EL0) = read_sysreg_s(SYS_POR_EL0);
That would need some refactoring, since I don't see how to access id_regs from struct kvm_cpu_context.
Thanks, Joey
linux-kselftest-mirror@lists.linaro.org