* Resending because I accidentally forgot to include Lorenzo in the "to" list.
Android uses the ashmem driver [1] for creating shared memory regions between processes. The ashmem driver exposes an ioctl command for processes to restrict the permissions an ashmem buffer can be mapped with.
Buffers are created with the ability to be mapped as readable, writable, and executable. Processes remove the ability to map some ashmem buffers as executable to ensure that those buffers cannot be used to inject malicious code for another process to run. Other buffers retain their ability to be mapped as executable, as these buffers can be used for just-in-time (JIT) compilation. So there is a need to be able to remove the ability to map a buffer as executable on a per-buffer basis.
Android is currently trying to migrate towards replacing its ashmem driver usage with memfd. Part of the transition involved introducing a library that serves to abstract away how shared memory regions are allocated (i.e. ashmem vs memfd). This allows clients to use a single interface for restricting how a buffer can be mapped without having to worry about how it is handled for ashmem (through the ioctl command mentioned earlier) or memfd (through file seals).
While memfd has support for preventing buffers from being mapped as writable beyond a certain point in time (thanks to F_SEAL_FUTURE_WRITE), it does not have a similar interface to prevent buffers from being mapped as executable beyond a certain point. However, that could be implemented as a file seal (F_SEAL_FUTURE_EXEC) which works similarly to F_SEAL_FUTURE_WRITE.
F_SEAL_FUTURE_WRITE was chosen as a template for how this new seal should behave, instead of F_SEAL_WRITE, for the following reasons:
1. Having the new seal behave like F_SEAL_FUTURE_WRITE matches the behavior that was present with ashmem. This aids in seamlessly transitioning clients away from ashmem to memfd.
2. Making the new seal behave like F_SEAL_WRITE would mean that no mappings that could become executable in the future (i.e. via mprotect()) can exist when the seal is applied. However, there are known cases (e.g. CursorWindow [2]) where restrictions are applied on how a buffer can be mapped after a mapping has already been made. That mapping may have VM_MAYEXEC set, which would not allow the seal to be applied successfully.
Therefore, the F_SEAL_FUTURE_EXEC seal was designed to have the same semantics as F_SEAL_FUTURE_WRITE.
Note: this series depends on Lorenzo's work [3], [4], [5] from Andrew Morton's mm-unstable branch [6], which reworks memfd's file seal checks, allowing for newer file seals to be implemented in a cleaner fashion.
Changes from v1 ==> v2:
- Changed the return code to be -EPERM instead of -EACCES when attempting to map an exec sealed file with PROT_EXEC to align to mmap()'s man page. Thank you Kalesh Singh for spotting this!
- Rebased on top of Lorenzo's work to cleanup memfd file seal checks in mmap() ([3], [4], and [5]). Thank you for this Lorenzo!
- Changed to deny PROT_EXEC mappings only if the mapping is shared, instead of for both shared and private mappings, after discussing this with Lorenzo.
Opens:
- Lorenzo brought up that this patch may negatively impact the usage of MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED [7]. However, it is not clear to me why that is the case. At the moment, my intent is for the executable permissions of the file to be disjoint from the ability to create executable mappings.
Links:
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow [3] https://lore.kernel.org/all/cover.1732804776.git.lorenzo.stoakes@oracle.com/ [4] https://lkml.kernel.org/r/20241206212846.210835-1-lorenzo.stoakes@oracle.com [5] https://lkml.kernel.org/r/7dee6c5d-480b-4c24-b98e-6fa47dbd8a23@lucifer.local [6] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/tree/?h=mm-unsta... [7] https://lore.kernel.org/all/3a53b154-1e46-45fb-a559-65afa7a8a788@lucifer.loc...
Links to previous versions:
v1: https://lore.kernel.org/all/20241206010930.3871336-1-isaacmanjarres@google.c...
Isaac J. Manjarres (2): mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd selftests/memfd: Add tests for F_SEAL_FUTURE_EXEC
include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 ++++++++++- tools/testing/selftests/memfd/memfd_test.c | 79 ++++++++++++++++++++++ 3 files changed, 118 insertions(+), 1 deletion(-)
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
Android is currently trying to replace ashmem with memfd. However, memfd does not have a provision to permanently remove the ability to map a buffer as executable, and leaves itself open to the type of attack described earlier. However, this should be something that can be achieved via a new file seal.
There are known usecases (e.g. CursorWindow [2]) where a process maps a buffer with read/write permissions before restricting the buffer to being mapped as read-only for future mappings.
The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning that mprotect() can change the mapping to be executable. Therefore, implementing the seal similar to F_SEAL_WRITE would not be appropriate, since it would not work with the CursorWindow usecase. This is because the CursorWindow process restricts the mapping permissions to read-only after the writable mapping is created. So, adding a file seal for executable mappings that operates like F_SEAL_WRITE would fail.
Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can continue to create a writable mapping initially, and then restrict the permissions on the buffer to be mappable as read-only by using both F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is applied, any calls to mmap() with PROT_EXEC will fail.
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com --- include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 +++++++++++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6e6907e63bfc..ef066e524777 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -49,6 +49,7 @@ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ #define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */ +#define F_SEAL_FUTURE_EXEC 0x0040 /* prevent future executable mappings */ /* (1U << 31) is reserved for signed error codes */
/* diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \ + F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \ @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{ + return seals & F_SEAL_FUTURE_EXEC; +} + +static int check_exec_seal(unsigned long *vm_flags_ptr) +{ + unsigned long vm_flags = *vm_flags_ptr; + unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC); + + /* Executability is not a concern for private mappings. */ + if (!(mask & VM_SHARED)) + return 0; + + /* + * New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal + * is active. + */ + if (mask & VM_EXEC) + return -EPERM; + + /* + * Prevent mprotect() from making an exec-sealed mapping executable in + * the future. + */ + *vm_flags_ptr &= ~VM_MAYEXEC; + + return 0; +} + int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr) { int err = 0; unsigned int *seals_ptr = memfd_file_seals_ptr(file); unsigned int seals = seals_ptr ? *seals_ptr : 0;
- if (is_write_sealed(seals)) + if (is_write_sealed(seals)) { err = check_write_seal(vm_flags_ptr); + if (err) + return err; + } + + if (is_exec_sealed(seals)) + err = check_exec_seal(vm_flags_ptr);
return err; }
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
I think if you want to enforce such restrictions in a scenario where the attacker can already make the target process perform semi-arbitrary syscalls, it would probably be more reliable to enforce rules on executable mappings with something like SELinux policy and/or F_SEAL_EXEC.
Android is currently trying to replace ashmem with memfd. However, memfd does not have a provision to permanently remove the ability to map a buffer as executable, and leaves itself open to the type of attack described earlier. However, this should be something that can be achieved via a new file seal.
There are known usecases (e.g. CursorWindow [2]) where a process maps a buffer with read/write permissions before restricting the buffer to being mapped as read-only for future mappings.
Here you're talking about write permission, but the patch is about execute permission?
The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning that mprotect() can change the mapping to be executable. Therefore, implementing the seal similar to F_SEAL_WRITE would not be appropriate, since it would not work with the CursorWindow usecase. This is because the CursorWindow process restricts the mapping permissions to read-only after the writable mapping is created. So, adding a file seal for executable mappings that operates like F_SEAL_WRITE would fail.
Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can continue to create a writable mapping initially, and then restrict the permissions on the buffer to be mappable as read-only by using both F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is applied, any calls to mmap() with PROT_EXEC will fail.
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com
include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 +++++++++++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6e6907e63bfc..ef066e524777 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -49,6 +49,7 @@ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ #define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */ +#define F_SEAL_FUTURE_EXEC 0x0040 /* prevent future executable mappings */ /* (1U << 31) is reserved for signed error codes */
/* diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{
return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
unsigned long vm_flags = *vm_flags_ptr;
unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
/* Executability is not a concern for private mappings. */
if (!(mask & VM_SHARED))
return 0;
Why is it not a concern for private mappings?
/*
* New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
* is active.
*/
if (mask & VM_EXEC)
return -EPERM;
/*
* Prevent mprotect() from making an exec-sealed mapping executable in
* the future.
*/
*vm_flags_ptr &= ~VM_MAYEXEC;
return 0;
+}
int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr) { int err = 0; unsigned int *seals_ptr = memfd_file_seals_ptr(file); unsigned int seals = seals_ptr ? *seals_ptr : 0;
if (is_write_sealed(seals))
if (is_write_sealed(seals)) { err = check_write_seal(vm_flags_ptr);
if (err)
return err;
}
if (is_exec_sealed(seals))
err = check_exec_seal(vm_flags_ptr); return err;
}
2.47.1.613.gc27f4b7a9f-goog
+ Kees because this is related to W^X memfd and security.
On Fri, Jan 3, 2025 at 7:04 AM Jann Horn jannh@google.com wrote:
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
I think if you want to enforce such restrictions in a scenario where the attacker can already make the target process perform semi-arbitrary syscalls, it would probably be more reliable to enforce rules on executable mappings with something like SELinux policy and/or F_SEAL_EXEC.
I would like to second on the suggestion of making this as part of F_SEAL_EXEC.
Android is currently trying to replace ashmem with memfd. However, memfd does not have a provision to permanently remove the ability to map a buffer as executable, and leaves itself open to the type of attack described earlier. However, this should be something that can be achieved via a new file seal.
There are known usecases (e.g. CursorWindow [2]) where a process maps a buffer with read/write permissions before restricting the buffer to being mapped as read-only for future mappings.
Here you're talking about write permission, but the patch is about execute permission?
The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning that mprotect() can change the mapping to be executable. Therefore, implementing the seal similar to F_SEAL_WRITE would not be appropriate, since it would not work with the CursorWindow usecase. This is because the CursorWindow process restricts the mapping permissions to read-only after the writable mapping is created. So, adding a file seal for executable mappings that operates like F_SEAL_WRITE would fail.
Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can continue to create a writable mapping initially, and then restrict the permissions on the buffer to be mappable as read-only by using both F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is applied, any calls to mmap() with PROT_EXEC will fail.
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com
include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 +++++++++++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6e6907e63bfc..ef066e524777 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -49,6 +49,7 @@ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ #define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */ +#define F_SEAL_FUTURE_EXEC 0x0040 /* prevent future executable mappings */ /* (1U << 31) is reserved for signed error codes */
/* diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{
return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
unsigned long vm_flags = *vm_flags_ptr;
unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
/* Executability is not a concern for private mappings. */
if (!(mask & VM_SHARED))
return 0;
Why is it not a concern for private mappings?
/*
* New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
* is active.
*/
if (mask & VM_EXEC)
return -EPERM;
/*
* Prevent mprotect() from making an exec-sealed mapping executable in
* the future.
*/
*vm_flags_ptr &= ~VM_MAYEXEC;
return 0;
+}
int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr) { int err = 0; unsigned int *seals_ptr = memfd_file_seals_ptr(file); unsigned int seals = seals_ptr ? *seals_ptr : 0;
if (is_write_sealed(seals))
if (is_write_sealed(seals)) { err = check_write_seal(vm_flags_ptr);
if (err)
return err;
}
if (is_exec_sealed(seals))
err = check_exec_seal(vm_flags_ptr);
memfd_check_seals_mmap is only for mmap() path, right ?
How about the mprotect() path ? i.e. An attacker can first create a RW VMA mapping for the memfd and later mprotect the VMA to be executable.
Similar to the check_write_seal call , we might want to block mprotect for write seal as well.
return err;
}
2.47.1.613.gc27f4b7a9f-goog
On Mon, Jan 06, 2025 at 09:35:09AM -0800, Jeff Xu wrote:
- Kees because this is related to W^X memfd and security.
On Fri, Jan 3, 2025 at 7:04 AM Jann Horn jannh@google.com wrote:
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
I think if you want to enforce such restrictions in a scenario where the attacker can already make the target process perform semi-arbitrary syscalls, it would probably be more reliable to enforce rules on executable mappings with something like SELinux policy and/or F_SEAL_EXEC.
I would like to second on the suggestion of making this as part of F_SEAL_EXEC.
Thanks for taking a look at this patch Jeff! Can you please elaborate some more on how F_SEAL_EXEC should be extended to restricting executable mappings?
I understand that if a memfd file is non-executable (either because it was made non-executable via fchmod() or by being created with MFD_NOEXEC_SEAL) one could argue that applying F_SEAL_EXEC to that file would also mean preventing any executable mappings. However, it is not clear to me if we should tie a file's executable permissions to whether or not if it can be mapped as executable. For example, shared object files don't have to have executable permissions, but processes should be able to map them as executable.
The case where we apply F_SEAL_EXEC on an executable memfd also seems awkward to me, since memfds can be mapped as executable by default so what would happen in that scenario?
I also shared the same concerns in my response to Jann in [1].
diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{
return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
unsigned long vm_flags = *vm_flags_ptr;
unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
/* Executability is not a concern for private mappings. */
if (!(mask & VM_SHARED))
return 0;
Why is it not a concern for private mappings?
/*
* New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
* is active.
*/
if (mask & VM_EXEC)
return -EPERM;
/*
* Prevent mprotect() from making an exec-sealed mapping executable in
* the future.
*/
*vm_flags_ptr &= ~VM_MAYEXEC;
return 0;
+}
int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr) { int err = 0; unsigned int *seals_ptr = memfd_file_seals_ptr(file); unsigned int seals = seals_ptr ? *seals_ptr : 0;
if (is_write_sealed(seals))
if (is_write_sealed(seals)) { err = check_write_seal(vm_flags_ptr);
if (err)
return err;
}
if (is_exec_sealed(seals))
err = check_exec_seal(vm_flags_ptr);
memfd_check_seals_mmap is only for mmap() path, right ?
How about the mprotect() path ? i.e. An attacker can first create a RW VMA mapping for the memfd and later mprotect the VMA to be executable.
Similar to the check_write_seal call , we might want to block mprotect for write seal as well.
So when memfd_check_seals_mmap() is called, if the file is exec_sealed, check_exec_seal() will not only just check that VM_EXEC is not set, but it will also clear VM_MAYEXEC, which will prevent the mapping from being changed to executable via mprotect() later.
[1] https://lore.kernel.org/all/Z3x_8uFn2e0EpDqM@google.com/
Thanks, Isaac
On Mon, Jan 6, 2025 at 5:26 PM Isaac Manjarres isaacmanjarres@google.com wrote:
On Mon, Jan 06, 2025 at 09:35:09AM -0800, Jeff Xu wrote:
- Kees because this is related to W^X memfd and security.
On Fri, Jan 3, 2025 at 7:04 AM Jann Horn jannh@google.com wrote:
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
I think if you want to enforce such restrictions in a scenario where the attacker can already make the target process perform semi-arbitrary syscalls, it would probably be more reliable to enforce rules on executable mappings with something like SELinux policy and/or F_SEAL_EXEC.
I would like to second on the suggestion of making this as part of F_SEAL_EXEC.
Thanks for taking a look at this patch Jeff! Can you please elaborate some more on how F_SEAL_EXEC should be extended to restricting executable mappings?
I understand that if a memfd file is non-executable (either because it was made non-executable via fchmod() or by being created with MFD_NOEXEC_SEAL) one could argue that applying F_SEAL_EXEC to that file would also mean preventing any executable mappings. However, it is not clear to me if we should tie a file's executable permissions to whether or not if it can be mapped as executable. For example, shared object files don't have to have executable permissions, but processes should be able to map them as executable.
The case where we apply F_SEAL_EXEC on an executable memfd also seems awkward to me, since memfds can be mapped as executable by default so what would happen in that scenario?
I also shared the same concerns in my response to Jann in [1].
Apology for not being clear. I meant this below: when 1> memfd is created with MFD_NOEXEC_SEAL or 2> memfd is no-exec (NX) and F_SEAL_EXEC is set. We could also block the memfd from being mapped as executable.
MFD_NOEXEC_SEAL/F_SEAL_EXEC is added in 6fd7353829ca, which is about 2 years old, I m not sure any application uses the case of creating a MFD_NOEXEC_SEAL memfd and still wants to mmap it as executable memory, that is a strange user case. It is more logical that applications want to block both execve() and mmap() for a non-executable memfd. Therefore I think we could reuse the F_SEAL_EXEC bit + NX state for this feature, for simplicity.
diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{
return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
unsigned long vm_flags = *vm_flags_ptr;
unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
/* Executability is not a concern for private mappings. */
if (!(mask & VM_SHARED))
return 0;
Why is it not a concern for private mappings?
/*
* New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
* is active.
*/
if (mask & VM_EXEC)
return -EPERM;
/*
* Prevent mprotect() from making an exec-sealed mapping executable in
* the future.
*/
*vm_flags_ptr &= ~VM_MAYEXEC;
return 0;
+}
int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr) { int err = 0; unsigned int *seals_ptr = memfd_file_seals_ptr(file); unsigned int seals = seals_ptr ? *seals_ptr : 0;
if (is_write_sealed(seals))
if (is_write_sealed(seals)) { err = check_write_seal(vm_flags_ptr);
if (err)
return err;
}
if (is_exec_sealed(seals))
err = check_exec_seal(vm_flags_ptr);
memfd_check_seals_mmap is only for mmap() path, right ?
How about the mprotect() path ? i.e. An attacker can first create a RW VMA mapping for the memfd and later mprotect the VMA to be executable.
Similar to the check_write_seal call , we might want to block mprotect for write seal as well.
So when memfd_check_seals_mmap() is called, if the file is exec_sealed, check_exec_seal() will not only just check that VM_EXEC is not set, but it will also clear VM_MAYEXEC, which will prevent the mapping from being changed to executable via mprotect() later.
Thanks for clarification.
The name of check_exec_seal() is misleading , check implies a read operation, but this function actually does update. Maybe renaming to check_and_update_exec_seal or something like that ?
Do you know which code checks for VM_MAYEXEC flag in the mprotect code path ? it isn't obvious to me, i.e. when I grep the VM_MAYEXEC inside mm path, it only shows one place in mprotect and that doesn't do the work.
~/mm/mm$ grep VM_MAYEXEC * mmap.c: mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; mmap.c: vm_flags &= ~VM_MAYEXEC; mprotect.c: if (rier && (vma->vm_flags & VM_MAYEXEC)) nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
Thanks -Jeff
[1] https://lore.kernel.org/all/Z3x_8uFn2e0EpDqM@google.com/
Thanks, Isaac
On Tue, Jan 7, 2025 at 6:21 AM Jeff Xu jeffxu@chromium.org wrote:
Do you know which code checks for VM_MAYEXEC flag in the mprotect code path ? it isn't obvious to me, i.e. when I grep the VM_MAYEXEC inside mm path, it only shows one place in mprotect and that doesn't do the work.
~/mm/mm$ grep VM_MAYEXEC * mmap.c: mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; mmap.c: vm_flags &= ~VM_MAYEXEC; mprotect.c: if (rier && (vma->vm_flags & VM_MAYEXEC)) nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
The check happens here:
/* newflags >> 4 shift VM_MAY% in place of VM_% */ if ((newflags & ~(newflags >> 4)) & VM_ACCESS_FLAGS) { error = -EACCES; break; }
Alice
On Wed, Jan 8, 2025 at 5:57 AM Alice Ryhl aliceryhl@google.com wrote:
On Tue, Jan 7, 2025 at 6:21 AM Jeff Xu jeffxu@chromium.org wrote:
Do you know which code checks for VM_MAYEXEC flag in the mprotect code path ? it isn't obvious to me, i.e. when I grep the VM_MAYEXEC inside mm path, it only shows one place in mprotect and that doesn't do the work.
~/mm/mm$ grep VM_MAYEXEC * mmap.c: mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; mmap.c: vm_flags &= ~VM_MAYEXEC; mprotect.c: if (rier && (vma->vm_flags & VM_MAYEXEC)) nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
The check happens here:
/* newflags >> 4 shift VM_MAY% in place of VM_% */ if ((newflags & ~(newflags >> 4)) & VM_ACCESS_FLAGS) { error = -EACCES; break; }
Thanks for helping ! -Jeff
Alice
On Fri, Jan 03, 2025 at 04:03:44PM +0100, Jann Horn wrote:
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
Hi Jann,
Thanks for your feedback and for taking the time to review these patches!
Not that I'm aware of. The reason why I made this seal have semantics where it prevents future executable mappings is because there are existing applications that allocate an ashmem buffer (default permissions are RWX), map the buffer as RW, and then restrict the permissions to just R.
When the buffer is mapped as RW, do_mmap() unconditionally sets VM_MAYEXEC on the VMA for the mapping, which means that the mapping could later be mapped as executable via mprotect(). Therefore, having the semantics of the seal be that it prevents any executable mappings would break existing code that has already been released. It would make transitioning clients to memfd difficult, because to amend that, the ashmem users would have to first restrict the permissions of the buffer to be RW, then map it as RW, and then restrict the permissions again to be just R, which also means an additional system call.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
I think if you want to enforce such restrictions in a scenario where the attacker can already make the target process perform semi-arbitrary syscalls, it would probably be more reliable to enforce rules on executable mappings with something like SELinux policy and/or F_SEAL_EXEC.
For SELinux policy, do you mean to not allow execmem permissions? What about scenarios where a process wants to use JIT compilation, but doesn't want memfd data buffers to be executable? My thought was to use this new seal to have a finer granularity to control what buffers can be mapped as executable. If not, could you please clarify?
Also, F_SEAL_EXEC just seals the memfd's current executable permissions, and doesn't affect the mapping permissions at all. Are you saying that F_SEAL_EXEC should be extended to cover mappings as well? If so, it is not clear to me what the semantics of that would be.
For instance, if a memfd is non-executable and F_SEAL_EXEC is applied, we can also prevent any executable mappings at that point. I'm not sure if that's the right thing to do though. For instance, there are shared object files that don't have executable permissions, but their code sections should be mapped as executable. So, drawing from that, I'm not sure if it makes sense to tie the file execution permissions to the mapping permissions.
There's also the case where F_SEAL_EXEC is invoked on an executable memfd. In that case, there doesn't seem to be anything to do from a mapping perspective since memfds can be mapped as executable by default?
Android is currently trying to replace ashmem with memfd. However, memfd does not have a provision to permanently remove the ability to map a buffer as executable, and leaves itself open to the type of attack described earlier. However, this should be something that can be achieved via a new file seal.
There are known usecases (e.g. CursorWindow [2]) where a process maps a buffer with read/write permissions before restricting the buffer to being mapped as read-only for future mappings.
Here you're talking about write permission, but the patch is about execute permission?
Sorry, I used this example about write permission to show why I implemented the seal with support for preventing future mappings, since the writable mappings that get created can become executable in the future, as described later in the commit text.
The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning that mprotect() can change the mapping to be executable. Therefore, implementing the seal similar to F_SEAL_WRITE would not be appropriate, since it would not work with the CursorWindow usecase. This is because the CursorWindow process restricts the mapping permissions to read-only after the writable mapping is created. So, adding a file seal for executable mappings that operates like F_SEAL_WRITE would fail.
Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can continue to create a writable mapping initially, and then restrict the permissions on the buffer to be mappable as read-only by using both F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is applied, any calls to mmap() with PROT_EXEC will fail.
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com
include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 +++++++++++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6e6907e63bfc..ef066e524777 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -49,6 +49,7 @@ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ #define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */ +#define F_SEAL_FUTURE_EXEC 0x0040 /* prevent future executable mappings */ /* (1U << 31) is reserved for signed error codes */
/* diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{
return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
unsigned long vm_flags = *vm_flags_ptr;
unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
/* Executability is not a concern for private mappings. */
if (!(mask & VM_SHARED))
return 0;
Why is it not a concern for private mappings?
I didn't consider private mappings since it wasn't clear as to how they could be a threat to another process. A process can copy the contents of the buffer into another executable region of memory and just run it from there? Or are you saying that because it can do that, is there any value in differentiating between shared and private mappings?
Thanks, Isaac
On Mon, Jan 06, 2025 at 05:14:26PM -0800, Isaac Manjarres wrote:
On Fri, Jan 03, 2025 at 04:03:44PM +0100, Jann Horn wrote:
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
Hi Jann,
Thanks for your feedback and for taking the time to review these patches!
Not that I'm aware of. The reason why I made this seal have semantics where it prevents future executable mappings is because there are existing applications that allocate an ashmem buffer (default permissions are RWX), map the buffer as RW, and then restrict the permissions to just R.
When the buffer is mapped as RW, do_mmap() unconditionally sets VM_MAYEXEC on the VMA for the mapping, which means that the mapping could later be mapped as executable via mprotect(). Therefore, having the semantics of the seal be that it prevents any executable mappings would break existing code that has already been released. It would make transitioning clients to memfd difficult, because to amend that, the ashmem users would have to first restrict the permissions of the buffer to be RW, then map it as RW, and then restrict the permissions again to be just R, which also means an additional system call.
You could do something similar to my adjustments to the F_SEAL_WRITE changes that clears VM_MAYEXEC in cases where do_mmap() maps an F_SEAL_EXEC'd without PROT_EXEC.
Please note that F_SEAL_EXEC implies:
F_SEAL_SHRINK F_SEAL_GROW F_SEAL_WRITE <- important, obviously F_SEAL_FUTURE_WRITE <- also important
if (seals & F_SEAL_EXEC && inode->i_mode & 0111) seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE;
Though interestingly only _after_ the mapping_deny_writable() call is performed which is... odd.
Probably worth exploring F_SEAL_EXEC semantics in detail if you haven't already to see how viable this is.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
I think if you want to enforce such restrictions in a scenario where the attacker can already make the target process perform semi-arbitrary syscalls, it would probably be more reliable to enforce rules on executable mappings with something like SELinux policy and/or F_SEAL_EXEC.
For SELinux policy, do you mean to not allow execmem permissions? What about scenarios where a process wants to use JIT compilation, but doesn't want memfd data buffers to be executable? My thought was to use this new seal to have a finer granularity to control what buffers can be mapped as executable. If not, could you please clarify?
Also, F_SEAL_EXEC just seals the memfd's current executable permissions, and doesn't affect the mapping permissions at all. Are you saying that F_SEAL_EXEC should be extended to cover mappings as well? If so, it is not clear to me what the semantics of that would be.
I need to dig into how this functions, but I'm guessing then it doesn't immediatley enforce anything preventing an existing mapping from executing? Which differs from F_SEAL_WRITE semantics and then makes it seem like it is already acting like an F_SEAL_FUTURE_EXEC in effect?
Hm need to dig into this a bit.
For instance, if a memfd is non-executable and F_SEAL_EXEC is applied, we can also prevent any executable mappings at that point. I'm not sure if that's the right thing to do though. For instance, there are shared object files that don't have executable permissions, but their code sections should be mapped as executable. So, drawing from that, I'm not sure if it makes sense to tie the file execution permissions to the mapping permissions.
There's also the case where F_SEAL_EXEC is invoked on an executable memfd. In that case, there doesn't seem to be anything to do from a mapping perspective since memfds can be mapped as executable by default?
Android is currently trying to replace ashmem with memfd. However, memfd does not have a provision to permanently remove the ability to map a buffer as executable, and leaves itself open to the type of attack described earlier. However, this should be something that can be achieved via a new file seal.
There are known usecases (e.g. CursorWindow [2]) where a process maps a buffer with read/write permissions before restricting the buffer to being mapped as read-only for future mappings.
Here you're talking about write permission, but the patch is about execute permission?
Sorry, I used this example about write permission to show why I implemented the seal with support for preventing future mappings, since the writable mappings that get created can become executable in the future, as described later in the commit text.
The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning that mprotect() can change the mapping to be executable. Therefore, implementing the seal similar to F_SEAL_WRITE would not be appropriate, since it would not work with the CursorWindow usecase. This is because the CursorWindow process restricts the mapping permissions to read-only after the writable mapping is created. So, adding a file seal for executable mappings that operates like F_SEAL_WRITE would fail.
Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can continue to create a writable mapping initially, and then restrict the permissions on the buffer to be mappable as read-only by using both F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is applied, any calls to mmap() with PROT_EXEC will fail.
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com
include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 +++++++++++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6e6907e63bfc..ef066e524777 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -49,6 +49,7 @@ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ #define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */ +#define F_SEAL_FUTURE_EXEC 0x0040 /* prevent future executable mappings */ /* (1U << 31) is reserved for signed error codes */
/* diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals) +{
return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
unsigned long vm_flags = *vm_flags_ptr;
unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
/* Executability is not a concern for private mappings. */
if (!(mask & VM_SHARED))
return 0;
Why is it not a concern for private mappings?
I didn't consider private mappings since it wasn't clear as to how they could be a threat to another process. A process can copy the contents of the buffer into another executable region of memory and just run it from there? Or are you saying that because it can do that, is there any value in differentiating between shared and private mappings?
Yes this is my point of view also but I might be missing something.
Thanks, Isaac
On Wed, Jan 08, 2025 at 08:43:38PM +0000, Lorenzo Stoakes wrote:
On Mon, Jan 06, 2025 at 05:14:26PM -0800, Isaac Manjarres wrote:
On Fri, Jan 03, 2025 at 04:03:44PM +0100, Jann Horn wrote:
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres isaacmanjarres@google.com wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
Is there really code out there that first maps an ashmem buffer with PROT_EXEC, then uses the ioctl to remove execute permission for future mappings? I don't see why anyone would do that.
Hi Jann,
Thanks for your feedback and for taking the time to review these patches!
Not that I'm aware of. The reason why I made this seal have semantics where it prevents future executable mappings is because there are existing applications that allocate an ashmem buffer (default permissions are RWX), map the buffer as RW, and then restrict the permissions to just R.
When the buffer is mapped as RW, do_mmap() unconditionally sets VM_MAYEXEC on the VMA for the mapping, which means that the mapping could later be mapped as executable via mprotect(). Therefore, having the semantics of the seal be that it prevents any executable mappings would break existing code that has already been released. It would make transitioning clients to memfd difficult, because to amend that, the ashmem users would have to first restrict the permissions of the buffer to be RW, then map it as RW, and then restrict the permissions again to be just R, which also means an additional system call.
You could do something similar to my adjustments to the F_SEAL_WRITE changes that clears VM_MAYEXEC in cases where do_mmap() maps an F_SEAL_EXEC'd without PROT_EXEC.
Please note that F_SEAL_EXEC implies:
F_SEAL_SHRINK F_SEAL_GROW F_SEAL_WRITE <- important, obviously F_SEAL_FUTURE_WRITE <- also important
if (seals & F_SEAL_EXEC && inode->i_mode & 0111) seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE;
Though interestingly only _after_ the mapping_deny_writable() call is performed which is... odd.
Probably worth exploring F_SEAL_EXEC semantics in detail if you haven't already to see how viable this is.
OK please disregard this (+ other waffling on F_SEAL_EXEC), I dug in and this flag is weirdly named and simply prevents chmod() changes to the mapping (...!).
For the semantics you need, you do very much appear to need something completely new and what you suggest with F_SEAL_FUTURE_EXEC does appear to fit the bill nicely.
On Thu, Jan 02, 2025 at 03:32:50PM -0800, Isaac J. Manjarres wrote:
Android currently uses the ashmem driver [1] for creating shared memory regions between processes. Ashmem buffers can initially be mapped with PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the permissions that the buffer can be mapped with.
Processes can remove the ability to map ashmem buffers as executable to ensure that those buffers cannot be exploited to run unintended code.
For instance, suppose process A allocates a memfd that is meant to be read and written by itself and another process, call it B.
Process A shares the buffer with process B, but process B injects code into the buffer, and compromises process A, such that it makes A map the buffer with PROT_EXEC. This provides an opportunity for process A to run the code that process B injected into the buffer.
If process A had the ability to seal the buffer against future executable mappings before sharing the buffer with process B, this attack would not be possible.
Android is currently trying to replace ashmem with memfd. However, memfd does not have a provision to permanently remove the ability to map a buffer as executable, and leaves itself open to the type of attack described earlier. However, this should be something that can be achieved via a new file seal.
There are known usecases (e.g. CursorWindow [2]) where a process maps a buffer with read/write permissions before restricting the buffer to being mapped as read-only for future mappings.
The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning that mprotect() can change the mapping to be executable. Therefore, implementing the seal similar to F_SEAL_WRITE would not be appropriate, since it would not work with the CursorWindow usecase. This is because the CursorWindow process restricts the mapping permissions to read-only after the writable mapping is created. So, adding a file seal for executable mappings that operates like F_SEAL_WRITE would fail.
Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can continue to create a writable mapping initially, and then restrict the permissions on the buffer to be mappable as read-only by using both F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is applied, any calls to mmap() with PROT_EXEC will fail.
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com
include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 +++++++++++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6e6907e63bfc..ef066e524777 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -49,6 +49,7 @@ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ #define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */
Hmm ok I just noticed this... F_SEAL_EXEC is weird then.
It doesn't prevent execution in the same way F_SEAL_WRITE does, nor does it seem to check or care about VM_MAYEXEC...
It just 'prevents chmod from modifying exec bits'.
I mean lord above haha.
And of course the code for it is in shmem_setattr()...
I have not enough faces to palm or palms to face.
So yes I suppose for any sane exec semantics you'll need something new...
+#define F_SEAL_FUTURE_EXEC 0x0040 /* prevent future executable mappings */ /* (1U << 31) is reserved for signed error codes */
/* diff --git a/mm/memfd.c b/mm/memfd.c index 5f5a23c9051d..cfd62454df5e 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) }
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_FUTURE_EXEC |\ F_SEAL_EXEC | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \
@@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr) return 0; }
+static inline bool is_exec_sealed(unsigned int seals)
This should say 'future', otherwise this is very confusing vs. F_SEAL_EXEC.
Also no need for inline outside of a header.
+{
- return seals & F_SEAL_FUTURE_EXEC;
+}
+static int check_exec_seal(unsigned long *vm_flags_ptr) +{
- unsigned long vm_flags = *vm_flags_ptr;
- unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
- /* Executability is not a concern for private mappings. */
- if (!(mask & VM_SHARED))
return 0;
- /*
* New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
* is active.
*/
- if (mask & VM_EXEC)
return -EPERM;
- /*
* Prevent mprotect() from making an exec-sealed mapping executable in
* the future.
*/
- *vm_flags_ptr &= ~VM_MAYEXEC;
- return 0;
+}
int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr) { int err = 0; unsigned int *seals_ptr = memfd_file_seals_ptr(file); unsigned int seals = seals_ptr ? *seals_ptr : 0;
- if (is_write_sealed(seals))
if (is_write_sealed(seals)) { err = check_write_seal(vm_flags_ptr);
if (err)
return err;
}
if (is_exec_sealed(seals))
err = check_exec_seal(vm_flags_ptr);
return err;
}
OK this is actually quite neat now we have everything set up in do_mmap().
I think we probably want some comments to very clearly point out that F_SEAL_EXEC is a bit crazy and weird and meaningless and this is actually vaguely sane...
-- 2.47.1.613.gc27f4b7a9f-goog
Add tests to ensure that F_SEAL_FUTURE_EXEC behaves as expected.
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com --- tools/testing/selftests/memfd/memfd_test.c | 79 ++++++++++++++++++++++ 1 file changed, 79 insertions(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index c0c53451a16d..abc213a5ce99 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -31,6 +31,7 @@ #define STACK_SIZE 65536
#define F_SEAL_EXEC 0x0020 +#define F_SEAL_FUTURE_EXEC 0x0040
#define F_WX_SEALS (F_SEAL_SHRINK | \ F_SEAL_GROW | \ @@ -318,6 +319,37 @@ static void *mfd_assert_mmap_private(int fd) return p; }
+static void *mfd_fail_mmap_exec(int fd) +{ + void *p; + + p = mmap(NULL, + mfd_def_size, + PROT_EXEC, + MAP_SHARED, + fd, + 0); + if (p != MAP_FAILED) { + printf("mmap() didn't fail as expected\n"); + abort(); + } + + return p; +} + +static void mfd_fail_mprotect_exec(void *p) +{ + int ret; + + ret = mprotect(p, + mfd_def_size, + PROT_EXEC); + if (!ret) { + printf("mprotect didn't fail as expected\n"); + abort(); + } +} + static int mfd_assert_open(int fd, int flags, mode_t mode) { char buf[512]; @@ -998,6 +1030,52 @@ static void test_seal_future_write(void) close(fd); }
+/* + * Test SEAL_FUTURE_EXEC_MAPPING + * Test whether SEAL_FUTURE_EXEC_MAPPING actually prevents executable mappings. + */ +static void test_seal_future_exec_mapping(void) +{ + int fd; + void *p; + + + printf("%s SEAL-FUTURE-EXEC-MAPPING\n", memfd_str); + + fd = mfd_assert_new("kern_memfd_seal_future_exec_mapping", + mfd_def_size, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + + /* + * PROT_READ | PROT_WRITE mappings create VMAs with VM_MAYEXEC set. + * However, F_SEAL_FUTURE_EXEC applies to subsequent mappings, + * so it should still succeed even if this mapping is active when the + * seal is applied. + */ + p = mfd_assert_mmap_shared(fd); + + mfd_assert_has_seals(fd, 0); + + mfd_assert_add_seals(fd, F_SEAL_FUTURE_EXEC); + mfd_assert_has_seals(fd, F_SEAL_FUTURE_EXEC); + + mfd_fail_mmap_exec(fd); + + munmap(p, mfd_def_size); + + /* Ensure that new mappings without PROT_EXEC work. */ + p = mfd_assert_mmap_shared(fd); + + /* + * Ensure that mappings created after the seal was applied cannot be + * made executable via mprotect(). + */ + mfd_fail_mprotect_exec(p); + + munmap(p, mfd_def_size); + close(fd); +} + static void test_seal_write_map_read_shared(void) { int fd; @@ -1639,6 +1717,7 @@ int main(int argc, char **argv) test_seal_shrink(); test_seal_grow(); test_seal_resize(); + test_seal_future_exec_mapping();
if (pid_ns_supported()) { test_sysctl_simple();
On Thu, Jan 02, 2025 at 03:32:51PM -0800, Isaac J. Manjarres wrote:
Add tests to ensure that F_SEAL_FUTURE_EXEC behaves as expected.
Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com
This looks reasonable,
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
tools/testing/selftests/memfd/memfd_test.c | 79 ++++++++++++++++++++++ 1 file changed, 79 insertions(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index c0c53451a16d..abc213a5ce99 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -31,6 +31,7 @@ #define STACK_SIZE 65536
#define F_SEAL_EXEC 0x0020 +#define F_SEAL_FUTURE_EXEC 0x0040
#define F_WX_SEALS (F_SEAL_SHRINK | \ F_SEAL_GROW | \ @@ -318,6 +319,37 @@ static void *mfd_assert_mmap_private(int fd) return p; }
+static void *mfd_fail_mmap_exec(int fd) +{
- void *p;
- p = mmap(NULL,
mfd_def_size,
PROT_EXEC,
MAP_SHARED,
fd,
0);
- if (p != MAP_FAILED) {
printf("mmap() didn't fail as expected\n");
abort();
- }
- return p;
+}
+static void mfd_fail_mprotect_exec(void *p) +{
- int ret;
- ret = mprotect(p,
mfd_def_size,
PROT_EXEC);
- if (!ret) {
printf("mprotect didn't fail as expected\n");
abort();
- }
+}
static int mfd_assert_open(int fd, int flags, mode_t mode) { char buf[512]; @@ -998,6 +1030,52 @@ static void test_seal_future_write(void) close(fd); }
+/*
- Test SEAL_FUTURE_EXEC_MAPPING
- Test whether SEAL_FUTURE_EXEC_MAPPING actually prevents executable mappings.
- */
+static void test_seal_future_exec_mapping(void) +{
- int fd;
- void *p;
- printf("%s SEAL-FUTURE-EXEC-MAPPING\n", memfd_str);
- fd = mfd_assert_new("kern_memfd_seal_future_exec_mapping",
mfd_def_size,
MFD_CLOEXEC | MFD_ALLOW_SEALING);
- /*
* PROT_READ | PROT_WRITE mappings create VMAs with VM_MAYEXEC set.
* However, F_SEAL_FUTURE_EXEC applies to subsequent mappings,
* so it should still succeed even if this mapping is active when the
* seal is applied.
*/
- p = mfd_assert_mmap_shared(fd);
- mfd_assert_has_seals(fd, 0);
- mfd_assert_add_seals(fd, F_SEAL_FUTURE_EXEC);
- mfd_assert_has_seals(fd, F_SEAL_FUTURE_EXEC);
- mfd_fail_mmap_exec(fd);
- munmap(p, mfd_def_size);
- /* Ensure that new mappings without PROT_EXEC work. */
- p = mfd_assert_mmap_shared(fd);
- /*
* Ensure that mappings created after the seal was applied cannot be
* made executable via mprotect().
*/
- mfd_fail_mprotect_exec(p);
- munmap(p, mfd_def_size);
- close(fd);
+}
static void test_seal_write_map_read_shared(void) { int fd; @@ -1639,6 +1717,7 @@ int main(int argc, char **argv) test_seal_shrink(); test_seal_grow(); test_seal_resize();
test_seal_future_exec_mapping();
if (pid_ns_supported()) { test_sysctl_simple();
-- 2.47.1.613.gc27f4b7a9f-goog
On Thu, Jan 02, 2025 at 03:32:49PM -0800, Isaac J. Manjarres wrote:
- Resending because I accidentally forgot to include Lorenzo in the "to" list.
Android uses the ashmem driver [1] for creating shared memory regions between processes. The ashmem driver exposes an ioctl command for processes to restrict the permissions an ashmem buffer can be mapped with.
Buffers are created with the ability to be mapped as readable, writable, and executable. Processes remove the ability to map some ashmem buffers as executable to ensure that those buffers cannot be used to inject malicious code for another process to run. Other buffers retain their ability to be mapped as executable, as these buffers can be used for just-in-time (JIT) compilation. So there is a need to be able to remove the ability to map a buffer as executable on a per-buffer basis.
Android is currently trying to migrate towards replacing its ashmem driver usage with memfd. Part of the transition involved introducing a library that serves to abstract away how shared memory regions are allocated (i.e. ashmem vs memfd). This allows clients to use a single interface for restricting how a buffer can be mapped without having to worry about how it is handled for ashmem (through the ioctl command mentioned earlier) or memfd (through file seals).
While memfd has support for preventing buffers from being mapped as writable beyond a certain point in time (thanks to F_SEAL_FUTURE_WRITE), it does not have a similar interface to prevent buffers from being mapped as executable beyond a certain point. However, that could be implemented as a file seal (F_SEAL_FUTURE_EXEC) which works similarly to F_SEAL_FUTURE_WRITE.
F_SEAL_FUTURE_WRITE was chosen as a template for how this new seal should behave, instead of F_SEAL_WRITE, for the following reasons:
- Having the new seal behave like F_SEAL_FUTURE_WRITE matches the behavior that was present with ashmem. This aids in seamlessly transitioning clients away from ashmem to memfd.
I do appreciate the background and of course this is the motivation for the change (and I appreciate the transparency), however it is important to note that _any_ company's internal tooling is of absolutely no relevance to upstream.
All justification has to be applied in general terms and be what's right for the core kernel and all users.
Hopefully we can achieve a happy scenario where your aims internally align nicely with what's right for (or at least not harmful to) upstream, but obviously justification must be made in terms of the benefits of this change _generally_.
- Making the new seal behave like F_SEAL_WRITE would mean that no mappings that could become executable in the future (i.e. via mprotect()) can exist when the seal is applied. However, there are known cases (e.g. CursorWindow [2]) where restrictions are applied on how a buffer can be mapped after a mapping has already been made. That mapping may have VM_MAYEXEC set, which would not allow the seal to be applied successfully.
All mappings have VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC set unless explicitly cleared.
I mean this fits in again with the whole problem with what you might call a 'raw' seal and what I addressed with my various patches around write-sealing - that is, unless you explicitly provide some means of clearing the relevant flag on a non-problematic mapping of a sealed memfd, you will simply _not_ be able to mmap() the fd (but rather have to interact with it some other way).
This feels honestly like a design flaw and really as if the future variants should always have been how it functioned but I may be missing some context.
Anyway this is a side-note and not really relevant to this series, per-se (in fact it argues _for_ it :>)
Therefore, the F_SEAL_FUTURE_EXEC seal was designed to have the same semantics as F_SEAL_FUTURE_WRITE.
Note: this series depends on Lorenzo's work [3], [4], [5] from Andrew Morton's mm-unstable branch [6], which reworks memfd's file seal checks, allowing for newer file seals to be implemented in a cleaner fashion.
:>)
Changes from v1 ==> v2:
Changed the return code to be -EPERM instead of -EACCES when attempting to map an exec sealed file with PROT_EXEC to align to mmap()'s man page. Thank you Kalesh Singh for spotting this!
Rebased on top of Lorenzo's work to cleanup memfd file seal checks in mmap() ([3], [4], and [5]). Thank you for this Lorenzo!
:>))
- Changed to deny PROT_EXEC mappings only if the mapping is shared, instead of for both shared and private mappings, after discussing this with Lorenzo.
Yeah I am happy to continue the discussion on this, I very well might be missing something/just wrong here.
Security is not my area of expertise but I do need to understand the details of the attack vector to be convinced here (we very well might need an 'explain like I am 5 years old' deal here).
Opens:
- Lorenzo brought up that this patch may negatively impact the usage of MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED [7]. However, it is not clear to me why that is the case. At the moment, my intent is for the executable permissions of the file to be disjoint from the ability to create executable mappings.
Sorry for not getting back to you on this, holidays intervened...
If the vm.memfd_noexec sysctl is set to 2, you are simply not permitted to allow executable mappings _at all_.
see check_sysctl_memfd_noexec():
if (!(*flags & MFD_NOEXEC_SEAL) && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) { pr_err_ratelimited( "%s[%d]: memfd_create() requires MFD_NOEXEC_SEAL with vm.memfd_noexec=%d\n", current->comm, task_pid_nr(current), sysctl); return -EACCES; }
MFD_NOEXEC_SEAL naturally conflicts with a future noexec flag.
I suppose you could say in this instance, that any system that sets this is not compatible with your flag. But you do need to address this.
Links:
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline... [2] https://developer.android.com/reference/android/database/CursorWindow [3] https://lore.kernel.org/all/cover.1732804776.git.lorenzo.stoakes@oracle.com/ [4] https://lkml.kernel.org/r/20241206212846.210835-1-lorenzo.stoakes@oracle.com [5] https://lkml.kernel.org/r/7dee6c5d-480b-4c24-b98e-6fa47dbd8a23@lucifer.local [6] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/tree/?h=mm-unsta... [7] https://lore.kernel.org/all/3a53b154-1e46-45fb-a559-65afa7a8a788@lucifer.loc...
Links to previous versions:
v1: https://lore.kernel.org/all/20241206010930.3871336-1-isaacmanjarres@google.c...
Isaac J. Manjarres (2): mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd selftests/memfd: Add tests for F_SEAL_FUTURE_EXEC
include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 39 ++++++++++- tools/testing/selftests/memfd/memfd_test.c | 79 ++++++++++++++++++++++
I'd like to see some documentation changes too.
You need to update userspace-api/mfd_noexec.rst for a start to reference the issue raised above.
But it'd be good to actually document this.
It'd be good to send a follow up patch to the man pages too should this series end up landing, but we can come to that later... :)
3 files changed, 118 insertions(+), 1 deletion(-)
-- 2.47.1.613.gc27f4b7a9f-goog
Thanks for chasing up with a v2, appreciate your hard work, and sorry for not getting back sooner, have had something of a backlog... (and it seems, this shall continue, indefinitely ;)
linux-kselftest-mirror@lists.linaro.org