On Mon, Jan 6, 2025 at 5:26 PM Isaac Manjarres isaacmanjarres@google.com wrote:
On Mon, Jan 06, 2025 at 09:35:09AM -0800, Jeff Xu wrote:
- Kees because this is related to W^X memfd and security.
On Fri, Jan 3, 2025 at 7:04 AM Jann Horn jannh@google.com wrote:
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
I think if you want to enforce such restrictions in a scenario where the attacker can already make the target process perform semi-arbitrary syscalls, it would probably be more reliable to enforce rules on executable mappings with something like SELinux policy and/or F_SEAL_EXEC.
I would like to second on the suggestion of making this as part of F_SEAL_EXEC.
Thanks for taking a look at this patch Jeff! Can you please elaborate some more on how F_SEAL_EXEC should be extended to restricting executable mappings?
I understand that if a memfd file is non-executable (either because it was made non-executable via fchmod() or by being created with MFD_NOEXEC_SEAL) one could argue that applying F_SEAL_EXEC to that file would also mean preventing any executable mappings. However, it is not clear to me if we should tie a file's executable permissions to whether or not if it can be mapped as executable. For example, shared object files don't have to have executable permissions, but processes should be able to map them as executable.
The case where we apply F_SEAL_EXEC on an executable memfd also seems awkward to me, since memfds can be mapped as executable by default so what would happen in that scenario?
I also shared the same concerns in my response to Jann in [1].
Apology for not being clear. I meant this below: when 1> memfd is created with MFD_NOEXEC_SEAL or 2> memfd is no-exec (NX) and F_SEAL_EXEC is set. We could also block the memfd from being mapped as executable.
MFD_NOEXEC_SEAL/F_SEAL_EXEC is added in 6fd7353829ca, which is about 2 years old, I m not sure any application uses the case of creating a MFD_NOEXEC_SEAL memfd and still wants to mmap it as executable memory, that is a strange user case. It is more logical that applications want to block both execve() and mmap() for a non-executable memfd. Therefore I think we could reuse the F_SEAL_EXEC bit + NX state for this feature, for simplicity.
diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{
return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
unsigned long vm_flags = *vm_flags_ptr;
unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
/* Executability is not a concern for private mappings. */
if (!(mask & VM_SHARED))
return 0;
Why is it not a concern for private mappings?
/*
* New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
* is active.
*/
if (mask & VM_EXEC)
return -EPERM;
/*
* Prevent mprotect() from making an exec-sealed mapping executable in
* the future.
*/
*vm_flags_ptr &= ~VM_MAYEXEC;
return 0;
+}
int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr) { int err = 0; unsigned int *seals_ptr = memfd_file_seals_ptr(file); unsigned int seals = seals_ptr ? *seals_ptr : 0;
if (is_write_sealed(seals))
if (is_write_sealed(seals)) { err = check_write_seal(vm_flags_ptr);
if (err)
return err;
}
if (is_exec_sealed(seals))
err = check_exec_seal(vm_flags_ptr);
memfd_check_seals_mmap is only for mmap() path, right ?
How about the mprotect() path ? i.e. An attacker can first create a RW VMA mapping for the memfd and later mprotect the VMA to be executable.
Similar to the check_write_seal call , we might want to block mprotect for write seal as well.
So when memfd_check_seals_mmap() is called, if the file is exec_sealed, check_exec_seal() will not only just check that VM_EXEC is not set, but it will also clear VM_MAYEXEC, which will prevent the mapping from being changed to executable via mprotect() later.
Thanks for clarification.
The name of check_exec_seal() is misleading , check implies a read operation, but this function actually does update. Maybe renaming to check_and_update_exec_seal or something like that ?
Do you know which code checks for VM_MAYEXEC flag in the mprotect code path ? it isn't obvious to me, i.e. when I grep the VM_MAYEXEC inside mm path, it only shows one place in mprotect and that doesn't do the work.
~/mm/mm$ grep VM_MAYEXEC * mmap.c: mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; mmap.c: vm_flags &= ~VM_MAYEXEC; mprotect.c: if (rier && (vma->vm_flags & VM_MAYEXEC)) nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
Thanks -Jeff
[1] https://lore.kernel.org/all/Z3x_8uFn2e0EpDqM@google.com/
Thanks, Isaac