On Tue, Jan 23, 2024 at 10:58 AM Theo de Raadt deraadt@openbsd.org wrote:
It's the same with MAP_MSEALABLE. I don't get it. So now there are 3 memory types: - cannot be sealed, ever - not yet sealed - sealed
What purpose does the first type serve? Please explain the use case.
Today, processes have control over their entire address space.
What is the purpose of "permissions cannot be locked". Please supply an example. If I am wrong, I'd like to know where I went wrong.
The linux example is in the V3 and V4 cover letter [1] [2] of the open discussion section.
[1] https://lore.kernel.org/linux-mm/20231212231706.2680890-1-jeffxu@chromium.or... [2] https://lore.kernel.org/linux-mm/20240104185138.169307-3-jeffxu@chromium.org...
Copied below for ease of reading. ----------------------------------------------------------------------------------------- During the development of V3, I had new questions and thoughts and wished to discuss.
1> shm/aio From reading the code, it seems to me that aio/shm can mmap/munmap maps on behalf of userspace, e.g. ksys_shmdt() in shm.c. The lifetime of those mapping are not tied to the lifetime of the process. If those memories are sealed from userspace, then unmap will fail. This isn’t a huge problem, since the memory will eventually be freed at exit or exec. However, it feels like the solution is not complete, because of the leaks in VMA address space during the lifetime of the process.
2> Brk (heap/stack) Currently, userspace applications can seal parts of the heap by calling malloc() and mseal(). This raises the question of what the expected behavior is when sealing the heap is attempted.
let's assume following calls from user space:
ptr = malloc(size); mprotect(ptr, size, RO); mseal(ptr, size, SEAL_PROT_PKEY); free(ptr);
Technically, before mseal() is added, the user can change the protection of the heap by calling mprotect(RO). As long as the user changes the protection back to RW before free(), the memory can be reused.
Adding mseal() into picture, however, the heap is then sealed partially, user can still free it, but the memory remains to be RO, and the result of brk-shrink is nondeterministic, depending on if munmap() will try to free the sealed memory.(brk uses munmap to shrink the heap).
3> Above two cases led to the third topic: There one option to address the problem mentioned above. Option 1: A “MAP_SEALABLE” flag in mmap(). If a map is created without this flag, the mseal() operation will fail. Applications that are not concerned with sealing will expect their behavior to be unchanged. For those that are concerned, adding a flag at mmap time to opt in is not difficult. For the short term, this solves problems 1 and 2 above. The memory in shm/aio/brk will not have the MAP_SEALABLE flag at mmap(), and the same is true for the heap.
If we choose not to go with path, all mapping will by default sealable. We could document above mentioned limitations so devs are more careful at the time to choose what memory to seal. I think deny of service through mseal() by attacker is probably not a concern, if attackers have access to mseal() and unsealed memory, then they can also do other harmful thing to the memory, such as munmap, etc.
4> I think it might be possible to seal the stack or other special mappings created at runtime (vdso, vsyscall, vvar). This means we can enforce and seal W^X for certain types of application. For instance, the stack is typically used in read-write mode, but in some cases, it can become executable. To defend against unintented addition of executable bit to stack, we could let the application to seal it.
Sealing the heap (for adding X) requires special handling, since the heap can shrink, and shrink is implemented through munmap().
Indeed, it might be possible that all virtual memory accessible to user space, regardless of its usage pattern, could be sealed. However, this would require additional research and development work.
-----------------------------------------------------------------------------------------------------