Android uses the ashmem driver [1] for creating shared memory regions
between processes. The ashmem driver exposes an ioctl command for
processes to restrict the permissions an ashmem buffer can be mapped
with.
Buffers are created with the ability to be mapped as readable, writable,
and executable. Processes remove the ability to map some ashmem buffers
as executable to ensure that those buffers cannot be exploited to run
unintended code. Other buffers retain their ability to be mapped as
executable, as these buffers can be used for just-in-time (JIT)
compilation. So there is a need to be able to remove the ability to
map a buffer as executable on a per-buffer basis.
Android is currently trying to migrate towards replacing its ashmem
driver usage with memfd. Part of the transition involved introducing a
library that serves to abstract away how shared memory regions are
allocated (i.e. ashmem vs memfd). This allows clients to use a single
interface for restricting how a buffer can be mapped without having to
worry about how it is handled for ashmem (through the ioctl
command mentioned earlier) or memfd (through file seals).
While memfd has support for preventing buffers from being mapped as
writable beyond a certain point in time (thanks to
F_SEAL_FUTURE_WRITE), it does not have a similar interface to prevent
buffers from being mapped as executable beyond a certain point.
However, that could be implemented as a file seal (F_SEAL_FUTURE_EXEC)
which works similarly to F_SEAL_FUTURE_WRITE.
F_SEAL_FUTURE_WRITE was chosen as a template for how this new seal
should behave, instead of F_SEAL_WRITE, for the following reasons:
1. Having the new seal behave like F_SEAL_FUTURE_WRITE matches the
behavior that was present with ashmem. This aids in seamlessly
transitioning clients away from ashmem to memfd.
2. Making the new seal behave like F_SEAL_WRITE would mean that no
mappings that could become executable in the future (i.e. via
mprotect()) can exist when the seal is applied. However, there are
known cases (e.g. CursorWindow [2]) where restrictions are applied
on how a buffer can be mapped after a mapping has already been made.
That mapping may have VM_MAYEXEC set, which would not allow the seal
to be applied successfully.
Therefore, the F_SEAL_FUTURE_EXEC seal was designed to have the same
semantics as F_SEAL_FUTURE_WRITE.
Note: this series depends on Lorenzo's work [3] which allows for a
memfd's file seals to be read in do_mmap().
[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainlin…
[2] https://developer.android.com/reference/android/database/CursorWindow
[3] https://lore.kernel.org/all/cover.1732804776.git.lorenzo.stoakes@oracle.com/
Isaac J. Manjarres (2):
mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd
selftests/memfd: Add tests for F_SEAL_FUTURE_EXEC
include/linux/mm.h | 5 ++
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 1 +
mm/mmap.c | 11 +++
tools/testing/selftests/memfd/memfd_test.c | 79 ++++++++++++++++++++++
5 files changed, 97 insertions(+)
--
2.47.0.338.g60cca15819-goog
Last week, Jakub reported [1] that the MPTCP Connect selftest was
unstable. It looked like it started after the introduction of some fixes
[2]. After analysis from Paolo, these patches revealed existing bugs,
that should be fixed by the following patches.
- Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In
some corner cases, the other peer was not notified when more data
could be sent. A fix for v5.11, but depending on a feature introduced
in v5.19.
- Patch 2: Fix spurious wake-up under memory pressure. In this
situation, the userspace could be invited to read data not being there
yet. A fix for v6.7.
- Patch 3: Fix a false positive error when running the MPTCP Connect
selftest with the "disconnect" cases. The userspace could disconnect
the socket too soon, which would reset (MP_FASTCLOSE) the connection,
interpreted as an error by the test. A fix for v5.17.
Link: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org [1]
Link: https://lore.kernel.org/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434ceb@ker… [2]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Paolo Abeni (3):
mptcp: be sure to send ack when mptcp-level window re-opens
mptcp: fix spurious wake-up on under memory pressure
selftests: mptcp: avoid spurious errors on disconnect
net/mptcp/options.c | 6 ++--
net/mptcp/protocol.h | 9 +++--
tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +++++++++++++++++------
3 files changed, 43 insertions(+), 15 deletions(-)
---
base-commit: 76201b5979768500bca362871db66d77cb4c225e
change-id: 20250113-net-mptcp-connect-st-flakes-4af6389808de
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The orig_a0 is missing in struct user_regs_struct of riscv, and there is
no way to add it without breaking UAPI. (See Link tag below)
Like NT_ARM_SYSTEM_CALL do, we add a new regset name NT_RISCV_ORIG_A0 to
access original a0 register from userspace via ptrace API.
Link: https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Signed-off-by: Celeste Liu <uwu(a)coelacanthus.name>
---
Changes in v4:
- Fix a copy paste error in selftest. (Forget to commit...)
- Link to v3: https://lore.kernel.org/r/20241226-riscv-new-regset-v3-0-f5b96465826b@coela…
Changes in v3:
- Use return 0 directly for readability.
- Fix test for modify a0.
- Add Fixes: tag
- Remove useless Cc: stable.
- Selftest will check both a0 and orig_a0, but depends on the
correctness of PTRACE_GET_SYSCALL_INFO.
- Link to v2: https://lore.kernel.org/r/20241203-riscv-new-regset-v2-0-d37da8c0cba6@coela…
Changes in v2:
- Fix integer width.
- Add selftest.
- Link to v1: https://lore.kernel.org/r/20241201-riscv-new-regset-v1-1-c83c58abcc7b@coela…
---
Celeste Liu (2):
riscv/ptrace: add new regset to access original a0 register
riscv: selftests: Add a ptrace test to verify syscall parameter modification
arch/riscv/kernel/ptrace.c | 32 ++++++
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/riscv/abi/.gitignore | 1 +
tools/testing/selftests/riscv/abi/Makefile | 5 +-
tools/testing/selftests/riscv/abi/ptrace.c | 151 +++++++++++++++++++++++++++
5 files changed, 189 insertions(+), 1 deletion(-)
---
base-commit: 0e287d31b62bb53ad81d5e59778384a40f8b6f56
change-id: 20241201-riscv-new-regset-d529b952ad0d
Best regards,
--
Celeste Liu <uwu(a)coelacanthus.name>
Fix several issues in the mptcp connect test's main_loop function.
- Fix a bug where the wrong file descriptor was being checked for errors
- Fix the input file descriptor lifecycle in the reconnection loop to
prevent use of invalid fd
- Add proper resource cleanup in error paths
Cong Liu (3):
selftests: mptcp: Fix incorrect file descriptor check in main_loop
selftests: mptcp: Fix input fd lifecycle in reconnection loop
selftests: mptcp: Clean up resources properly in main_loop
.../selftests/net/mptcp/mptcp_connect.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
base-commit: 2b88851f583d3c4e40bcd40cfe1965241ec229dd
--
2.43.0
When working on OpenRISC support for restartable sequences I noticed
and fixed these two issues with the riscv support bits.
1 The 'inc' argument to RSEQ_ASM_OP_R_DEREF_ADDV was being implicitly
passed to the macro. Fix this by adding 'inc' to the list of macro
arguments.
2 The inline asm input constraints for 'inc' and 'off' use "er", The
riscv gcc port does not have an "e" constraint, this looks to be
copied from the x86 port. Fix this by just using an "r" constraint.
I have compile tested this only for riscv. However, the same fixes I
use in the OpenRISC rseq selftests and everything passes with no issues.
Fixes: 171586a6ab66 ("selftests/rseq: riscv: Template memory ordering and percpu access mode")
Signed-off-by: Stafford Horne <shorne(a)gmail.com>
Tested-by: Charlie Jenkins <charlie(a)rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie(a)rivosinc.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Since v1:
- Added Fixes, Tested-by, Reviewed-by etc.
tools/testing/selftests/rseq/rseq-riscv-bits.h | 6 +++---
tools/testing/selftests/rseq/rseq-riscv.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq-riscv-bits.h b/tools/testing/selftests/rseq/rseq-riscv-bits.h
index de31a0143139..f02f411d550d 100644
--- a/tools/testing/selftests/rseq/rseq-riscv-bits.h
+++ b/tools/testing/selftests/rseq/rseq-riscv-bits.h
@@ -243,7 +243,7 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_offset_deref_addv)(intptr_t *ptr, off_t off, i
#ifdef RSEQ_COMPARE_TWICE
RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, "%l[error1]")
#endif
- RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, 3)
+ RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, inc, 3)
RSEQ_INJECT_ASM(4)
RSEQ_ASM_DEFINE_ABORT(4, abort)
: /* gcc asm goto does not allow outputs */
@@ -251,8 +251,8 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_offset_deref_addv)(intptr_t *ptr, off_t off, i
[current_cpu_id] "m" (rseq_get_abi()->RSEQ_TEMPLATE_CPU_ID_FIELD),
[rseq_cs] "m" (rseq_get_abi()->rseq_cs.arch.ptr),
[ptr] "r" (ptr),
- [off] "er" (off),
- [inc] "er" (inc)
+ [off] "r" (off),
+ [inc] "r" (inc)
RSEQ_INJECT_INPUT
: "memory", RSEQ_ASM_TMP_REG_1
RSEQ_INJECT_CLOBBER
diff --git a/tools/testing/selftests/rseq/rseq-riscv.h b/tools/testing/selftests/rseq/rseq-riscv.h
index 37e598d0a365..67d544aaa9a3 100644
--- a/tools/testing/selftests/rseq/rseq-riscv.h
+++ b/tools/testing/selftests/rseq/rseq-riscv.h
@@ -158,7 +158,7 @@ do { \
"bnez " RSEQ_ASM_TMP_REG_1 ", 222b\n" \
"333:\n"
-#define RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, post_commit_label) \
+#define RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, inc, post_commit_label) \
"mv " RSEQ_ASM_TMP_REG_1 ", %[" __rseq_str(ptr) "]\n" \
RSEQ_ASM_OP_R_ADD(off) \
REG_L RSEQ_ASM_TMP_REG_1 ", 0(" RSEQ_ASM_TMP_REG_1 ")\n" \
--
2.47.0
When working on OpenRISC support for restartable sequences I noticed
and fixed these two issues with the riscv support bits.
1 The 'inc' argument to RSEQ_ASM_OP_R_DEREF_ADDV was being implicitly
passed to the macro. Fix this by adding 'inc' to the list of macro
arguments.
2 The inline asm input constraints for 'inc' and 'off' use "er", The
riscv gcc port does not have an "e" constraint, this looks to be
copied from the x86 port. Fix this by just using an "r" constraint.
I have compile tested this only for riscv. However, the same fixes I
use in the OpenRISC rseq selftests and everything passes with no issues.
Signed-off-by: Stafford Horne <shorne(a)gmail.com>
---
tools/testing/selftests/rseq/rseq-riscv-bits.h | 6 +++---
tools/testing/selftests/rseq/rseq-riscv.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq-riscv-bits.h b/tools/testing/selftests/rseq/rseq-riscv-bits.h
index de31a0143139..f02f411d550d 100644
--- a/tools/testing/selftests/rseq/rseq-riscv-bits.h
+++ b/tools/testing/selftests/rseq/rseq-riscv-bits.h
@@ -243,7 +243,7 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_offset_deref_addv)(intptr_t *ptr, off_t off, i
#ifdef RSEQ_COMPARE_TWICE
RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, "%l[error1]")
#endif
- RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, 3)
+ RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, inc, 3)
RSEQ_INJECT_ASM(4)
RSEQ_ASM_DEFINE_ABORT(4, abort)
: /* gcc asm goto does not allow outputs */
@@ -251,8 +251,8 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_offset_deref_addv)(intptr_t *ptr, off_t off, i
[current_cpu_id] "m" (rseq_get_abi()->RSEQ_TEMPLATE_CPU_ID_FIELD),
[rseq_cs] "m" (rseq_get_abi()->rseq_cs.arch.ptr),
[ptr] "r" (ptr),
- [off] "er" (off),
- [inc] "er" (inc)
+ [off] "r" (off),
+ [inc] "r" (inc)
RSEQ_INJECT_INPUT
: "memory", RSEQ_ASM_TMP_REG_1
RSEQ_INJECT_CLOBBER
diff --git a/tools/testing/selftests/rseq/rseq-riscv.h b/tools/testing/selftests/rseq/rseq-riscv.h
index 37e598d0a365..67d544aaa9a3 100644
--- a/tools/testing/selftests/rseq/rseq-riscv.h
+++ b/tools/testing/selftests/rseq/rseq-riscv.h
@@ -158,7 +158,7 @@ do { \
"bnez " RSEQ_ASM_TMP_REG_1 ", 222b\n" \
"333:\n"
-#define RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, post_commit_label) \
+#define RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, inc, post_commit_label) \
"mv " RSEQ_ASM_TMP_REG_1 ", %[" __rseq_str(ptr) "]\n" \
RSEQ_ASM_OP_R_ADD(off) \
REG_L RSEQ_ASM_TMP_REG_1 ", 0(" RSEQ_ASM_TMP_REG_1 ")\n" \
--
2.47.0
The selftest started failing since commit e93d2521b27f
("x86/vdso: Split virtual clock pages into dedicated mapping")
was merged. While debugging I stumbled upon some memory usage
optimizations.
With these test now runs on a VM with only 60MiB of memory.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v4:
- Pick up review tags
- Correct Fixes: of patch 1
- Drop git rebase commit message artifacts
- Replace strtok_r() with strspn() and strcspn()
- Avoid uninitialized read on error in __get_smap_entry()
- Link to v3: https://lore.kernel.org/r/20250113-virtual_address_range-tests-v3-0-f4a8e6b…
Changes in v3:
- Pick up review tags
- Fix naming around PR_SET_VMA_ANON_NAME helper functions
- Skip selftest if PR_SET_VMA_ANON_NAME is not supported
- Check for VM_IO instead of [vvar name prefix
- Link to v2: https://lore.kernel.org/r/20250110-virtual_address_range-tests-v2-0-262a2bf…
Changes in v2:
- Drop /dev/null usage
- Avoid overcommit restrictions by dropping PROT_WRITE
- Avoid high memory usage due to PTEs
- Link to v1: https://lore.kernel.org/r/20250107-virtual_address_range-tests-v1-0-3834a2f…
---
Thomas Weißschuh (4):
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/mm: virtual_address_range: Unmap chunks after validation
selftests/mm: vm_util: Split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: Avoid reading from VM_IO mappings
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/virtual_address_range.c | 41 ++++++++++++--
tools/testing/selftests/mm/vm_util.c | 66 +++++++++++++++++-----
tools/testing/selftests/mm/vm_util.h | 1 +
4 files changed, 92 insertions(+), 17 deletions(-)
---
base-commit: 3043cb9a517b707c12a3f5879f4970c97bfeb3fb
change-id: 20250107-virtual_address_range-tests-95843766fa97
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Currently the rseq constructor, rseq_init(), assumes that glibc always
has the support for rseq symbols (__rseq_size for instance). However,
glibc supports rseq from version 2.35 onwards. As a result, for the
systems that run glibc less than 2.35, the global rseq_size remains
initialized to -1U. When a thread then tries to register for rseq,
get_rseq_min_alloc_size() would end up returning -1U, which is
incorrect. Hence, initialize rseq_size for the cases where glibc doesn't
have the support for rseq symbols.
Cc: stable(a)vger.kernel.org
Fixes: 73a4f5a704a2 ("selftests/rseq: Fix mm_cid test failure")
Signed-off-by: Raghavendra Rao Ananta <rananta(a)google.com>
---
tools/testing/selftests/rseq/rseq.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 5b9772cdf265..9eb5356f25fa 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -142,6 +142,16 @@ unsigned int get_rseq_kernel_feature_size(void)
return ORIG_RSEQ_FEATURE_SIZE;
}
+static void set_default_rseq_size(void)
+{
+ unsigned int rseq_kernel_feature_size = get_rseq_kernel_feature_size();
+
+ if (rseq_kernel_feature_size < ORIG_RSEQ_ALLOC_SIZE)
+ rseq_size = rseq_kernel_feature_size;
+ else
+ rseq_size = ORIG_RSEQ_ALLOC_SIZE;
+}
+
int rseq_register_current_thread(void)
{
int rc;
@@ -219,12 +229,7 @@ void rseq_init(void)
fallthrough;
case ORIG_RSEQ_ALLOC_SIZE:
{
- unsigned int rseq_kernel_feature_size = get_rseq_kernel_feature_size();
-
- if (rseq_kernel_feature_size < ORIG_RSEQ_ALLOC_SIZE)
- rseq_size = rseq_kernel_feature_size;
- else
- rseq_size = ORIG_RSEQ_ALLOC_SIZE;
+ set_default_rseq_size();
break;
}
default:
@@ -239,8 +244,10 @@ void rseq_init(void)
rseq_size = 0;
return;
}
+
rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
rseq_flags = 0;
+ set_default_rseq_size();
}
static __attribute__((destructor))
base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
--
2.47.0.338.g60cca15819-goog
The selftest started failing since commit e93d2521b27f
("x86/vdso: Split virtual clock pages into dedicated mapping")
was merged. While debugging I stumbled upon some memory usage
optimizations.
With these test now runs on a VM with only 60MiB of memory.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Pick up review tags
- Fix naming around PR_SET_VMA_ANON_NAME helper functions
- Skip selftest if PR_SET_VMA_ANON_NAME is not supported
- Check for VM_IO instead of [vvar name prefix
- Link to v2: https://lore.kernel.org/r/20250110-virtual_address_range-tests-v2-0-262a2bf…
Changes in v2:
- Drop /dev/null usage
- Avoid overcommit restrictions by dropping PROT_WRITE
- Avoid high memory usage due to PTEs
- Link to v1: https://lore.kernel.org/r/20250107-virtual_address_range-tests-v1-0-3834a2f…
---
Thomas Weißschuh (4):
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/mm: virtual_address_range: Unmap chunks after validation
selftests/mm: vm_util: Split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: Avoid reading from VM_IO mappings
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/virtual_address_range.c | 41 ++++++++++++--
tools/testing/selftests/mm/vm_util.c | 63 +++++++++++++++++-----
tools/testing/selftests/mm/vm_util.h | 1 +
4 files changed, 89 insertions(+), 17 deletions(-)
---
base-commit: 7793bee8fed2027eb15219014de6fb0dc15d4a03
change-id: 20250107-virtual_address_range-tests-95843766fa97
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>