On Mon, Oct 21, 2024 at 06:19:38PM +0100, Will Deacon wrote:
On Mon, Oct 21, 2024 at 04:30:04PM +0100, Catalin Marinas wrote:
On Mon, Oct 21, 2024 at 02:31:08PM +0100, Dave P Martin wrote:
On Thu, Oct 17, 2024 at 02:39:04PM +0100, Kevin Brodsky wrote:
This series is a follow-up to Joey's Permission Overlay Extension (POE) series [1] that recently landed on mainline. The goal is to improve the way we handle the register that governs which pkeys/POIndex are accessible (POR_EL0) during signal delivery. As things stand, we may unexpectedly fail to write the signal frame on the stack because POR_EL0 is not reset before the uaccess operations. See patch 3 for more details and the main changes this series brings.
A similar series landed recently for x86/MPK [2]; the present series aims at aligning arm64 with x86. Worth noting: once the signal frame is written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0 only. This means that a program that sets up an alternate signal stack with a non-zero pkey will need some assembly trampoline to set POR_EL0 before invoking the real signal handler, as discussed here [3].
[...]
Memory with a non-zero pkey cannot be used 100% portably, period, and having non-RW(X) permissions on pkey 0 at any time is also not portable, period. So I'm not sure that having libc magically guess what userspace's pkeys policy is supposed to be based on racily digging metadata out of /proc/self/maps or a cache of it etc. would be such a good idea.
I agree that changing RWX overlay permission for pkey 0 to anything else is a really bad idea. We can't prevent it but we shouldn't actively try to work around it in the kernel either. With the current signal ABI, I don't think we should support anything other than pkey 0 for the stack. Since the user shouldn't change the pkey 0 RWX overlay permission anyway, I don't think we should reset POR_EL0 _prior_ to writing the signal frame. The best we can do is document it somewhere.
So on patch 3 I'd only ensure that we have POR_EL0_INIT when invoking the signal handler and not when performing the uaccess. If the uaccess fails, we'd get a fatal SIGSEGV. The user may have got it already if it made the stack read-only.
Hmm, but based on what Kevin's saying, this would mean actively choosing a different ABI than what x86 is trying to get to.
I was more thinking of not relaxing the ABI further at this point in the rc cycle rather than completely diverging (x86 did relax the ABI subsequently to handle non-zero pkey sigaltstack).
Currently the primary use of pkeys is for W^X and signal stacks shouldn't fall into this category. If we ever have a strong case for non-zero pkeys on the signal stack, we'll need to look into some new ABI. I'm not sure about SS_* flags though, I think the signal POR_EL0 should be associated with the sigaction rather than the stack (the latter would just be mapped by the user with the right pkey, the kernel doesn't need to know which, only what POR_EL0 is needed by the handler).
Until such case turns up, I'd not put any effort into ABI improvements.
Kevin -- do you know what prompted x86 to want the pkey to be reset early in signal delivery? Perhaps such a use-case already exists.
Given the email from Pierre with Chrome potentially using a sigaltstack with a non-zero pkey, Kevin's patches (and the x86 changes) make more sense. The question is whether we do this as a fix now or we leave the relaxation for a subsequent kernel release. I guess we could squeeze it now if we agree on the implementation.