October 2023 - Linux-kselftest-mirror

[PATCH v12] exec: Fix dead-lock in de_thread with ptrace_attach

by Bernd Edlinger

This introduces signal->exec_bprm, which is used to fix the case when at least one of the sibling threads is traced, and therefore the trace process may dead-lock in ptrace_attach, but de_thread will need to wait for the tracer to continue execution. The solution is to detect this situation and allow ptrace_attach to continue by temporarily releasing the cred_guard_mutex, while de_thread() is still waiting for traced zombies to be eventually released by the tracer. In the case of the thread group leader we only have to wait for the thread to become a zombie, which may also need co-operation from the tracer due to PTRACE_O_TRACEEXIT. When a tracer wants to ptrace_attach a task that already is in execve, we simply retry the ptrace_may_access check while temporarily installing the new credentials and dumpability which are about to be used after execve completes. If the ptrace_attach happens on a thread that is a sibling-thread of the thread doing execve, it is sufficient to check against the old credentials, as this thread will be waited for, before the new credentials are installed. Other threads die quickly since the cred_guard_mutex is released, but a deadly signal is already pending. In case the mutex_lock_killable misses the signal, the non-zero current->signal->exec_bprm makes sure they release the mutex immediately and return with -ERESTARTNOINTR. This means there is no API change, unlike the previous version of this patch which was discussed here: https://lore.kernel.org/lkml/b6537ae6-31b1-5c50-f32b-8b8332ace882@hotmail.d… See tools/testing/selftests/ptrace/vmaccess.c for a test case that gets fixed by this change. Note that since the test case was originally designed to test the ptrace_attach returning an error in this situation, the test expectation needed to be adjusted, to allow the API to succeed at the first attempt. Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de> --- fs/exec.c | 69 ++++++++++++++++------- fs/proc/base.c | 6 ++ include/linux/cred.h | 1 + include/linux/sched/signal.h | 18 ++++++ kernel/cred.c | 28 +++++++-- kernel/ptrace.c | 32 +++++++++++ kernel/seccomp.c | 12 +++- tools/testing/selftests/ptrace/vmaccess.c | 23 +++++--- 8 files changed, 155 insertions(+), 34 deletions(-) v10: Changes to previous version, make the PTRACE_ATTACH retun -EAGAIN, instead of execve return -ERESTARTSYS. Added some lessions learned to the description. v11: Check old and new credentials in PTRACE_ATTACH again without changing the API. Note: I got actually one response from an automatic checker to the v11 patch, https://lore.kernel.org/lkml/202107121344.wu68hEPF-lkp@intel.com/ which is complaining about: >> kernel/ptrace.c:425:26: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct cred const *old_cred @@ got struct cred const [noderef] __rcu *real_cred @@ 417 struct linux_binprm *bprm = task->signal->exec_bprm; 418 const struct cred *old_cred; 419 struct mm_struct *old_mm; 420 421 retval = down_write_killable(&task->signal->exec_update_lock); 422 if (retval) 423 goto unlock_creds; 424 task_lock(task); > 425 old_cred = task->real_cred; v12: Essentially identical to v11. - Fixed a minor merge conflict in linux v5.17, and fixed the above mentioned nit by adding __rcu to the declaration. - re-tested the patch with all linux versions from v5.11 to v6.6 v10 was an alternative approach which did imply an API change. But I would prefer to avoid such an API change. The difficult part is getting the right dumpability flags assigned before de_thread starts, hope you like this version. If not, the v10 is of course also acceptable. Thanks Bernd. diff --git a/fs/exec.c b/fs/exec.c index 2f2b0acec4f0..902d3b230485 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1041,11 +1041,13 @@ static int exec_mmap(struct mm_struct *mm) return 0; } -static int de_thread(struct task_struct *tsk) +static int de_thread(struct task_struct *tsk, struct linux_binprm *bprm) { struct signal_struct *sig = tsk->signal; struct sighand_struct *oldsighand = tsk->sighand; spinlock_t *lock = &oldsighand->siglock; + struct task_struct *t = tsk; + bool unsafe_execve_in_progress = false; if (thread_group_empty(tsk)) goto no_thread_group; @@ -1068,6 +1070,19 @@ static int de_thread(struct task_struct *tsk) if (!thread_group_leader(tsk)) sig->notify_count--; + while_each_thread(tsk, t) { + if (unlikely(t->ptrace) + && (t != tsk->group_leader || !t->exit_state)) + unsafe_execve_in_progress = true; + } + + if (unlikely(unsafe_execve_in_progress)) { + spin_unlock_irq(lock); + sig->exec_bprm = bprm; + mutex_unlock(&sig->cred_guard_mutex); + spin_lock_irq(lock); + } + while (sig->notify_count) { __set_current_state(TASK_KILLABLE); spin_unlock_irq(lock); @@ -1158,6 +1173,11 @@ static int de_thread(struct task_struct *tsk) release_task(leader); } + if (unlikely(unsafe_execve_in_progress)) { + mutex_lock(&sig->cred_guard_mutex); + sig->exec_bprm = NULL; + } + sig->group_exec_task = NULL; sig->notify_count = 0; @@ -1169,6 +1189,11 @@ static int de_thread(struct task_struct *tsk) return 0; killed: + if (unlikely(unsafe_execve_in_progress)) { + mutex_lock(&sig->cred_guard_mutex); + sig->exec_bprm = NULL; + } + /* protects against exit_notify() and __exit_signal() */ read_lock(&tasklist_lock); sig->group_exec_task = NULL; @@ -1253,6 +1278,24 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) return retval; + /* If the binary is not readable then enforce mm->dumpable=0 */ + would_dump(bprm, bprm->file); + if (bprm->have_execfd) + would_dump(bprm, bprm->executable); + + /* + * Figure out dumpability. Note that this checking only of current + * is wrong, but userspace depends on it. This should be testing + * bprm->secureexec instead. + */ + if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || + is_dumpability_changed(current_cred(), bprm->cred) || + !(uid_eq(current_euid(), current_uid()) && + gid_eq(current_egid(), current_gid()))) + set_dumpable(bprm->mm, suid_dumpable); + else + set_dumpable(bprm->mm, SUID_DUMP_USER); + /* * Ensure all future errors are fatal. */ @@ -1261,7 +1304,7 @@ int begin_new_exec(struct linux_binprm * bprm) /* * Make this the only thread in the thread group. */ - retval = de_thread(me); + retval = de_thread(me, bprm); if (retval) goto out; @@ -1284,11 +1327,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; - /* If the binary is not readable then enforce mm->dumpable=0 */ - would_dump(bprm, bprm->file); - if (bprm->have_execfd) - would_dump(bprm, bprm->executable); - /* * Release all of the old mmap stuff */ @@ -1350,18 +1388,6 @@ int begin_new_exec(struct linux_binprm * bprm) me->sas_ss_sp = me->sas_ss_size = 0; - /* - * Figure out dumpability. Note that this checking only of current - * is wrong, but userspace depends on it. This should be testing - * bprm->secureexec instead. - */ - if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || - !(uid_eq(current_euid(), current_uid()) && - gid_eq(current_egid(), current_gid()))) - set_dumpable(current->mm, suid_dumpable); - else - set_dumpable(current->mm, SUID_DUMP_USER); - perf_event_exec(); __set_task_comm(me, kbasename(bprm->filename), true); @@ -1480,6 +1506,11 @@ static int prepare_bprm_creds(struct linux_binprm *bprm) if (mutex_lock_interruptible(&current->signal->cred_guard_mutex)) return -ERESTARTNOINTR; + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + return -ERESTARTNOINTR; + } + bprm->cred = prepare_exec_creds(); if (likely(bprm->cred)) return 0; diff --git a/fs/proc/base.c b/fs/proc/base.c index ffd54617c354..0da9adfadb48 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2788,6 +2788,12 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, if (rv < 0) goto out_free; + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + rv = -ERESTARTNOINTR; + goto out_free; + } + rv = security_setprocattr(PROC_I(inode)->op.lsm, file->f_path.dentry->d_name.name, page, count); diff --git a/include/linux/cred.h b/include/linux/cred.h index f923528d5cc4..b01e309f5686 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -159,6 +159,7 @@ extern const struct cred *get_task_cred(struct task_struct *); extern struct cred *cred_alloc_blank(void); extern struct cred *prepare_creds(void); extern struct cred *prepare_exec_creds(void); +extern bool is_dumpability_changed(const struct cred *, const struct cred *); extern int commit_creds(struct cred *); extern void abort_creds(struct cred *); extern const struct cred *override_creds(const struct cred *); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 0014d3adaf84..14df7073a0a8 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -234,9 +234,27 @@ struct signal_struct { struct mm_struct *oom_mm; /* recorded mm when the thread group got * killed by the oom killer */ + struct linux_binprm *exec_bprm; /* Used to check ptrace_may_access + * against new credentials while + * de_thread is waiting for other + * traced threads to terminate. + * Set while de_thread is executing. + * The cred_guard_mutex is released + * after de_thread() has called + * zap_other_threads(), therefore + * a fatal signal is guaranteed to be + * already pending in the unlikely + * event, that + * current->signal->exec_bprm happens + * to be non-zero after the + * cred_guard_mutex was acquired. + */ + struct mutex cred_guard_mutex; /* guard against foreign influences on * credential calculations * (notably. ptrace) + * Held while execve runs, except when + * a sibling thread is being traced. * Deprecated do not use in new code. * Use exec_update_lock instead. */ diff --git a/kernel/cred.c b/kernel/cred.c index 98cb4eca23fb..586cb6c7cf6b 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -433,6 +433,28 @@ static bool cred_cap_issubset(const struct cred *set, const struct cred *subset) return false; } +/** + * is_dumpability_changed - Will changing creds from old to new + * affect the dumpability in commit_creds? + * + * Return: false - dumpability will not be changed in commit_creds. + * Return: true - dumpability will be changed to non-dumpable. + * + * @old: The old credentials + * @new: The new credentials + */ +bool is_dumpability_changed(const struct cred *old, const struct cred *new) +{ + if (!uid_eq(old->euid, new->euid) || + !gid_eq(old->egid, new->egid) || + !uid_eq(old->fsuid, new->fsuid) || + !gid_eq(old->fsgid, new->fsgid) || + !cred_cap_issubset(old, new)) + return true; + + return false; +} + /** * commit_creds - Install new credentials upon the current task * @new: The credentials to be assigned @@ -467,11 +489,7 @@ int commit_creds(struct cred *new) get_cred(new); /* we will require a ref for the subj creds too */ /* dumpability changes */ - if (!uid_eq(old->euid, new->euid) || - !gid_eq(old->egid, new->egid) || - !uid_eq(old->fsuid, new->fsuid) || - !gid_eq(old->fsgid, new->fsgid) || - !cred_cap_issubset(old, new)) { + if (is_dumpability_changed(old, new)) { if (task->mm) set_dumpable(task->mm, suid_dumpable); task->pdeath_signal = 0; diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 443057bee87c..eb1c450bb7d7 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -20,6 +20,7 @@ #include <linux/pagemap.h> #include <linux/ptrace.h> #include <linux/security.h> +#include <linux/binfmts.h> #include <linux/signal.h> #include <linux/uio.h> #include <linux/audit.h> @@ -435,6 +436,28 @@ static int ptrace_attach(struct task_struct *task, long request, if (retval) goto unlock_creds; + if (unlikely(task->in_execve)) { + struct linux_binprm *bprm = task->signal->exec_bprm; + const struct cred __rcu *old_cred; + struct mm_struct *old_mm; + + retval = down_write_killable(&task->signal->exec_update_lock); + if (retval) + goto unlock_creds; + task_lock(task); + old_cred = task->real_cred; + old_mm = task->mm; + rcu_assign_pointer(task->real_cred, bprm->cred); + task->mm = bprm->mm; + retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); + rcu_assign_pointer(task->real_cred, old_cred); + task->mm = old_mm; + task_unlock(task); + up_write(&task->signal->exec_update_lock); + if (retval) + goto unlock_creds; + } + write_lock_irq(&tasklist_lock); retval = -EPERM; if (unlikely(task->exit_state)) @@ -508,6 +531,14 @@ static int ptrace_traceme(void) { int ret = -EPERM; + if (mutex_lock_interruptible(&current->signal->cred_guard_mutex)) + return -ERESTARTNOINTR; + + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + return -ERESTARTNOINTR; + } + write_lock_irq(&tasklist_lock); /* Are we already being traced? */ if (!current->ptrace) { @@ -523,6 +554,7 @@ static int ptrace_traceme(void) } } write_unlock_irq(&tasklist_lock); + mutex_unlock(&current->signal->cred_guard_mutex); return ret; } diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 255999ba9190..b29bbfa0b044 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -1955,9 +1955,15 @@ static long seccomp_set_mode_filter(unsigned int flags, * Make sure we cannot change seccomp or nnp state via TSYNC * while another thread is in the middle of calling exec. */ - if (flags & SECCOMP_FILTER_FLAG_TSYNC && - mutex_lock_killable(&current->signal->cred_guard_mutex)) - goto out_put_fd; + if (flags & SECCOMP_FILTER_FLAG_TSYNC) { + if (mutex_lock_killable(&current->signal->cred_guard_mutex)) + goto out_put_fd; + + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + goto out_put_fd; + } + } spin_lock_irq(&current->sighand->siglock); diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c index 4db327b44586..3b7d81fb99bb 100644 --- a/tools/testing/selftests/ptrace/vmaccess.c +++ b/tools/testing/selftests/ptrace/vmaccess.c @@ -39,8 +39,15 @@ TEST(vmaccess) f = open(mm, O_RDONLY); ASSERT_GE(f, 0); close(f); - f = kill(pid, SIGCONT); - ASSERT_EQ(f, 0); + f = waitpid(-1, NULL, 0); + ASSERT_NE(f, -1); + ASSERT_NE(f, 0); + ASSERT_NE(f, pid); + f = waitpid(-1, NULL, 0); + ASSERT_EQ(f, pid); + f = waitpid(-1, NULL, 0); + ASSERT_EQ(f, -1); + ASSERT_EQ(errno, ECHILD); } TEST(attach) @@ -57,22 +64,24 @@ TEST(attach) sleep(1); k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); - ASSERT_EQ(errno, EAGAIN); - ASSERT_EQ(k, -1); + ASSERT_EQ(k, 0); k = waitpid(-1, &s, WNOHANG); ASSERT_NE(k, -1); ASSERT_NE(k, 0); ASSERT_NE(k, pid); ASSERT_EQ(WIFEXITED(s), 1); ASSERT_EQ(WEXITSTATUS(s), 0); - sleep(1); - k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); + k = waitpid(-1, &s, 0); + ASSERT_EQ(k, pid); + ASSERT_EQ(WIFSTOPPED(s), 1); + ASSERT_EQ(WSTOPSIG(s), SIGTRAP); + k = ptrace(PTRACE_CONT, pid, 0L, 0L); ASSERT_EQ(k, 0); k = waitpid(-1, &s, 0); ASSERT_EQ(k, pid); ASSERT_EQ(WIFSTOPPED(s), 1); ASSERT_EQ(WSTOPSIG(s), SIGSTOP); - k = ptrace(PTRACE_DETACH, pid, 0L, 0L); + k = ptrace(PTRACE_CONT, pid, 0L, 0L); ASSERT_EQ(k, 0); k = waitpid(-1, &s, 0); ASSERT_EQ(k, pid); -- 2.39.2

5 days, 10 hours

17
71
0 0

[PATCH] selftests/ftrace: Add test dependency

by Thibault Ferrante

test_duplicates miss a running dependency and leads to test failures on kernel with specific configuration. Signed-off-by: Thibault Ferrante <thibault.ferrante(a)canonical.com> --- .../testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc b/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc index d3a79da215c8..0b5e4543e70b 100644 --- a/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc +++ b/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc @@ -1,7 +1,7 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 # description: Generic dynamic event - check if duplicate events are caught -# requires: dynamic_events "e[:[<group>/][<event>]] <attached-group>.<attached-event> [<args>]":README +# requires: dynamic_events events/syscalls/sys_enter_openat "e[:[<group>/][<event>]] <attached-group>.<attached-event> [<args>]":README echo 0 > events/enable -- 2.39.2

5 days, 18 hours

2
1
0 0

[PATCH] selftests/ftrace: Test toplevel-enable for instance

by Zheng Yejian

'available_events' is actually not required by 'test.d/event/toplevel-enable.tc' and its Existence has been tested in 'test.d/00basic/basic4.tc'. So the require of 'available_events' can be dropped and then we can add 'instance' flag to test 'test.d/event/toplevel-enable.tc' for instance. Test result show as below: # ./ftracetest test.d/event/toplevel-enable.tc === Ftrace unit tests === [1] event tracing - enable/disable with top level files [PASS] [2] (instance) event tracing - enable/disable with top level files [PASS] # of passed: 2 # of failed: 0 # of unresolved: 0 # of untested: 0 # of unsupported: 0 # of xfailed: 0 # of undefined(test bug): 0 Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com> --- tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc index 93c10ea42a68..8b8e1aea985b 100644 --- a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc +++ b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc @@ -1,7 +1,8 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 # description: event tracing - enable/disable with top level files -# requires: available_events set_event events/enable +# requires: set_event events/enable +# flags: instance do_reset() { echo > set_event -- 2.25.1

5 days, 19 hours

2
4
0 0

[PATCH v3] selftests/ftrace: traceonoff_triggers: strip off names

by Yipeng Zou

The func_traceonoff_triggers.tc sometimes goes to fail on my board, Kunpeng-920. [root@localhost]# ./ftracetest ./test.d/ftrace/func_traceonoff_triggers.tc -l fail.log === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers [FAIL] [2] (instance) ftrace - test for function traceon/off triggers [UNSUPPORTED] I look up the log, and it shows that the md5sum is different between csum1 and csum2. ++ cnt=611 ++ sleep .1 +++ cnt_trace +++ grep -v '^#' trace +++ wc -l ++ cnt2=611 ++ '[' 611 -ne 611 ']' +++ cat tracing_on ++ on=0 ++ '[' 0 '!=' 0 ']' +++ md5sum trace ++ csum1='76896aa74362fff66a6a5f3cf8a8a500 trace' ++ sleep .1 +++ md5sum trace ++ csum2='ee8625a21c058818fc26e45c1ed3f6de trace' ++ '[' '76896aa74362fff66a6a5f3cf8a8a500 trace' '!=' 'ee8625a21c058818fc26e45c1ed3f6de trace' ']' ++ fail 'Tracing file is still changing' ++ echo Tracing file is still changing Tracing file is still changing ++ exit_fail ++ exit 1 So I directly dump the trace file before md5sum, the diff shows that: [root@localhost]# diff trace_1.log trace_2.log -y --suppress-common-lines dockerd-12285 [036] d.... 18385.510290: sched_stat | <...>-12285 [036] d.... 18385.510290: sched_stat dockerd-12285 [036] d.... 18385.510291: sched_swit | <...>-12285 [036] d.... 18385.510291: sched_swit <...>-740 [044] d.... 18385.602859: sched_stat | kworker/44:1-740 [044] d.... 18385.602859: sched_stat <...>-740 [044] d.... 18385.602860: sched_swit | kworker/44:1-740 [044] d.... 18385.602860: sched_swit And we can see that <...> filed be filled with names. We can strip off the names there to fix that. After strip off the names: kworker/u257:0-12 [019] d..2. 2528.758910: sched_stat | -12 [019] d..2. 2528.758910: sched_stat_runtime: comm=k kworker/u257:0-12 [019] d..2. 2528.758912: sched_swit | -12 [019] d..2. 2528.758912: sched_switch: prev_comm=kw <idle>-0 [000] d.s5. 2528.762318: sched_waki | -0 [000] d.s5. 2528.762318: sched_waking: comm=sshd pi <idle>-0 [037] dNh2. 2528.762326: sched_wake | -0 [037] dNh2. 2528.762326: sched_wakeup: comm=sshd pi <idle>-0 [037] d..2. 2528.762334: sched_swit | -0 [037] d..2. 2528.762334: sched_switch: prev_comm=sw Fixes: d87b29179aa0 ("selftests: ftrace: Use md5sum to take less time of checking logs") Suggested-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- .../ftrace/test.d/ftrace/func_traceonoff_triggers.tc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc index aee22289536b..1b57771dbfdf 100644 --- a/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc @@ -90,9 +90,10 @@ if [ $on != "0" ]; then fail "Tracing is not off" fi -csum1=`md5sum trace` +# Cannot rely on names being around as they are only cached, strip them +csum1=`cat trace | sed -e 's/^ *[^ ]*$-[0-9][0-9]*$/\1/' | md5sum` sleep $SLEEP_TIME -csum2=`md5sum trace` +csum2=`cat trace | sed -e 's/^ *[^ ]*$-[0-9][0-9]*$/\1/' | md5sum` if [ "$csum1" != "$csum2" ]; then fail "Tracing file is still changing" -- 2.34.1

5 days, 19 hours

2
3
0 0

[PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)google.com> Since Linux introduced the memfd feature, memfd have always had their execute bit set, and the memfd_create() syscall doesn't allow setting it differently. However, in a secure by default system, such as ChromeOS, (where all executables should come from the rootfs, which is protected by Verified boot), this executable nature of memfd opens a door for NoExec bypass and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code and root escalation. [2] lists more VRP in this kind. On the other hand, executable memfd has its legit use, runc uses memfd’s seal and executable feature to copy the contents of the binary then execute them, for such system, we need a solution to differentiate runc's use of executable memfds and an attacker's [3]. To address those above, this set of patches add following: 1> Let memfd_create() set X bit at creation time. 2> Let memfd to be sealed for modifying X bit. 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of X bit.For example, if a container has vm.memfd_noexec=2, then memfd_create() without MFD_NOEXEC_SEAL will be rejected. 4> A new security hook in memfd_create(). This make it possible to a new LSM, which rejects or allows executable memfd based on its security policy. Change history: v7: - patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c). - patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of global ns (pid_sysctl.h). Add a tab (pid_namespace.h). - patch 5/6: remove #ifdef (memfd_test.c) - patch 6/6: remove unneeded security_move_mount(security.c). v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/ - Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/ - Pass vm.memfd_noexec from current ns to child ns. - Fix build issue detected by kernel test robot. - Add missing security.c v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/ - Address API design comments in v2. - Let memfd_create() to set X bit at creation time. - A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit. - A new security hook in memfd_create(). v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/ - address comments in V1. - add sysctl (vm.mfd_noexec) to set the default file permissions of memfd_create to be non-executable. v1:https://lwn.net/Articles/890096/ [1] https://crbug.com/1305411 [2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me… [3] https://lwn.net/Articles/781013/ Daniel Verkamp (2): mm/memfd: add F_SEAL_EXEC selftests/memfd: add tests for F_SEAL_EXEC Jeff Xu (4): mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC mm/memfd: security hook for memfd_create include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/pid_namespace.h | 19 ++ include/linux/security.h | 6 + include/uapi/linux/fcntl.h | 1 + include/uapi/linux/memfd.h | 4 + kernel/pid_namespace.c | 5 + kernel/pid_sysctl.h | 59 ++++ mm/memfd.c | 61 +++- mm/shmem.c | 6 + security/security.c | 5 + tools/testing/selftests/memfd/fuse_test.c | 1 + tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++- 13 files changed, 510 insertions(+), 3 deletions(-) create mode 100644 kernel/pid_sysctl.h base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8 -- 2.39.0.rc1.256.g54fd8350bd-goog

3 months

9
25
0 0

[RFC PATCH 00/11] New KVM ioctl to link a gmem inode to a new gmem file

by Ackerley Tng

Hello, This patchset builds upon the code at https://lore.kernel.org/lkml/20230718234512.1690985-1-seanjc@google.com/T/. This code is available at https://github.com/googleprodkernel/linux-cc/tree/kvm-gmem-link-migrate-rfc…. In guest_mem v11, a split file/inode model was proposed, where memslot bindings belong to the file and pages belong to the inode. This model lends itself well to having different VMs use separate files pointing to the same inode. This RFC proposes an ioctl, KVM_LINK_GUEST_MEMFD, that takes a VM and a gmem fd, and returns another gmem fd referencing a different file and associated with VM. This RFC also includes an update to KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to migrate memory context (slot->arch.lpage_info and kvm->mem_attr_array) from source to destination vm, intra-host. Intended usage of the two ioctls: 1. Source VM’s fd is passed to destination VM via unix sockets 2. Destination VM uses new ioctl KVM_LINK_GUEST_MEMFD to link source VM’s fd to a new fd. 3. Destination VM will pass new fds to KVM_SET_USER_MEMORY_REGION, which will bind the new file, pointing to the same inode that the source VM’s file points to, to memslots 4. Use KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to move kvm->mem_attr_array and slot->arch.lpage_info to the destination VM. 5. Run the destination VM as per normal Some other approaches considered were: + Using the linkat() syscall, but that requires a mount/directory for a source fd to be linked to + Using the dup() syscall, but that only duplicates the fd, and both fds point to the same file --- Ackerley Tng (11): KVM: guest_mem: Refactor out kvm_gmem_alloc_file() KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD KVM: selftests: Add tests for KVM_LINK_GUEST_MEMFD ioctl KVM: selftests: Test transferring private memory to another VM KVM: x86: Refactor sev's flag migration_in_progress to kvm struct KVM: x86: Refactor common code out of sev.c KVM: x86: Refactor common migration preparation code out of sev_vm_move_enc_context_from KVM: x86: Let moving encryption context be configurable KVM: x86: Handle moving of memory context for intra-host migration KVM: selftests: Generalize migration functions from sev_migrate_tests.c KVM: selftests: Add tests for migration of private mem arch/x86/include/asm/kvm_host.h | 4 +- arch/x86/kvm/svm/sev.c | 85 ++----- arch/x86/kvm/svm/svm.h | 3 +- arch/x86/kvm/x86.c | 221 +++++++++++++++++- arch/x86/kvm/x86.h | 6 + include/linux/kvm_host.h | 18 ++ include/uapi/linux/kvm.h | 8 + tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 42 ++++ .../selftests/kvm/include/kvm_util_base.h | 31 +++ .../kvm/x86_64/private_mem_migrate_tests.c | 93 ++++++++ .../selftests/kvm/x86_64/sev_migrate_tests.c | 48 ++-- virt/kvm/guest_mem.c | 151 ++++++++++-- virt/kvm/kvm_main.c | 10 + virt/kvm/kvm_mm.h | 7 + 15 files changed, 596 insertions(+), 132 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_migrate_tests.c -- 2.41.0.640.ga95def55d0-goog

7 months, 3 weeks

3
15
0 0

[PATCH] tools: Copy linux/align.h into tools/

by Ackerley Tng

This provides alignment macros for use in selftests. Also clean up tools/include/linux/bitmap.h's inline definition of IS_ALIGNED(). Signed-off-by: Ackerley Tng <ackerleytng(a)google.com> --- tools/include/linux/align.h | 15 +++++++++++++++ tools/include/linux/bitmap.h | 2 +- 2 files changed, 16 insertions(+), 1 deletion(-) create mode 100644 tools/include/linux/align.h diff --git a/tools/include/linux/align.h b/tools/include/linux/align.h new file mode 100644 index 000000000000..2b4acec7b95a --- /dev/null +++ b/tools/include/linux/align.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_ALIGN_H +#define _LINUX_ALIGN_H + +#include <linux/const.h> + +/* @a is a power of 2 value */ +#define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) +#define ALIGN_DOWN(x, a) __ALIGN_KERNEL((x) - ((a) - 1), (a)) +#define __ALIGN_MASK(x, mask) __ALIGN_KERNEL_MASK((x), (mask)) +#define PTR_ALIGN(p, a) ((typeof(p))ALIGN((unsigned long)(p), (a))) +#define PTR_ALIGN_DOWN(p, a) ((typeof(p))ALIGN_DOWN((unsigned long)(p), (a))) +#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) + +#endif /* _LINUX_ALIGN_H */ diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h index f3566ea0f932..8c6852dba04f 100644 --- a/tools/include/linux/bitmap.h +++ b/tools/include/linux/bitmap.h @@ -3,6 +3,7 @@ #define _TOOLS_LINUX_BITMAP_H #include <string.h> +#include <linux/align.h> #include <linux/bitops.h> #include <linux/find.h> #include <stdlib.h> @@ -126,7 +127,6 @@ static inline bool bitmap_and(unsigned long *dst, const unsigned long *src1, #define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long)) #endif #define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1) -#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) static inline bool bitmap_equal(const unsigned long *src1, const unsigned long *src2, unsigned int nbits) -- 2.39.2.722.g9855ee24e9-goog

1 year, 4 months

2
3
0 0

[PATCH v3] lib: Convert test_user_copy to KUnit test

by Vitor Massaru Iha

This adds the conversion of the runtime tests of test_user_copy fuctions, from `lib/test_user_copy.c`to KUnit tests. Signed-off-by: Vitor Massaru Iha <vitor(a)massaru.org> --- v2: * splitted patch in 3: - Allows to install and load modules in root filesystem; - Provides an userspace memory context when tests are compiled as module; - Convert test_user_copy to KUnit test; * removed entry for CONFIG_TEST_USER_COPY; * replaced pr_warn to KUNIT_EXPECT_FALSE_MSG in test macro to decrease the diff; v3: * rebased with last kunit branch * Please apply this commit from kunit-fixes: 3f37d14b8a3152441f36b6bc74000996679f0998 And these from patchwork: https://patchwork.kernel.org/patch/11676331/ https://patchwork.kernel.org/patch/11676335/ --- lib/Kconfig.debug | 28 ++++++++------ lib/Makefile | 2 +- lib/{test_user_copy.c => user_copy_kunit.c} | 42 +++++++++------------ 3 files changed, 35 insertions(+), 37 deletions(-) rename lib/{test_user_copy.c => user_copy_kunit.c} (91%) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 9ad9210d70a1..f699a3624ae7 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2078,18 +2078,6 @@ config TEST_VMALLOC If unsure, say N. -config TEST_USER_COPY - tristate "Test user/kernel boundary protections" - depends on m - help - This builds the "test_user_copy" module that runs sanity checks - on the copy_to/from_user infrastructure, making sure basic - user/kernel boundary testing is working. If it fails to load, - a regression has been detected in the user/kernel memory boundary - protections. - - If unsure, say N. - config TEST_BPF tristate "Test BPF filter functionality" depends on m && NET @@ -2154,6 +2142,22 @@ config SYSCTL_KUNIT_TEST If unsure, say N. +config USER_COPY_KUNIT + tristate "KUnit Test for user/kernel boundary protections" + depends on KUNIT + depends on m + help + This builds the "user_copy_kunit" module that runs sanity checks + on the copy_to/from_user infrastructure, making sure basic + user/kernel boundary testing is working. If it fails to load, + a regression has been detected in the user/kernel memory boundary + protections. + + For more information on KUnit and unit tests in general please refer + to the KUnit documentation in Documentation/dev-tools/kunit/. + + If unsure, say N. + config LIST_KUNIT_TEST tristate "KUnit Test for Kernel Linked-list structures" if !KUNIT_ALL_TESTS depends on KUNIT diff --git a/lib/Makefile b/lib/Makefile index b1c42c10073b..8c145f85accc 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -78,7 +78,6 @@ obj-$(CONFIG_TEST_VMALLOC) += test_vmalloc.o obj-$(CONFIG_TEST_OVERFLOW) += test_overflow.o obj-$(CONFIG_TEST_RHASHTABLE) += test_rhashtable.o obj-$(CONFIG_TEST_SORT) += test_sort.o -obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o obj-$(CONFIG_TEST_PRINTF) += test_printf.o @@ -318,3 +317,4 @@ obj-$(CONFIG_OBJAGG) += objagg.o # KUnit tests obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o +obj-$(CONFIG_USER_COPY_KUNIT) += user_copy_kunit.o diff --git a/lib/test_user_copy.c b/lib/user_copy_kunit.c similarity index 91% rename from lib/test_user_copy.c rename to lib/user_copy_kunit.c index 5ff04d8fe971..a10ddd15b4cd 100644 --- a/lib/test_user_copy.c +++ b/lib/user_copy_kunit.c @@ -16,6 +16,7 @@ #include <linux/slab.h> #include <linux/uaccess.h> #include <linux/vmalloc.h> +#include <kunit/test.h> /* * Several 32-bit architectures support 64-bit {get,put}_user() calls. @@ -35,7 +36,7 @@ ({ \ int cond = (condition); \ if (cond) \ - pr_warn("[%d] " msg "\n", __LINE__, ##__VA_ARGS__); \ + KUNIT_EXPECT_FALSE_MSG(test, cond, msg, ##__VA_ARGS__); \ cond; \ }) @@ -44,7 +45,7 @@ static bool is_zeroed(void *from, size_t size) return memchr_inv(from, 0x0, size) == NULL; } -static int test_check_nonzero_user(char *kmem, char __user *umem, size_t size) +static int test_check_nonzero_user(struct kunit *test, char *kmem, char __user *umem, size_t size) { int ret = 0; size_t start, end, i, zero_start, zero_end; @@ -102,7 +103,7 @@ static int test_check_nonzero_user(char *kmem, char __user *umem, size_t size) return ret; } -static int test_copy_struct_from_user(char *kmem, char __user *umem, +static int test_copy_struct_from_user(struct kunit *test, char *kmem, char __user *umem, size_t size) { int ret = 0; @@ -177,7 +178,7 @@ static int test_copy_struct_from_user(char *kmem, char __user *umem, return ret; } -static int __init test_user_copy_init(void) +static void user_copy_test(struct kunit *test) { int ret = 0; char *kmem; @@ -192,16 +193,14 @@ static int __init test_user_copy_init(void) #endif kmem = kmalloc(PAGE_SIZE * 2, GFP_KERNEL); - if (!kmem) - return -ENOMEM; + KUNIT_EXPECT_FALSE_MSG(test, kmem == NULL, "kmalloc failed"); user_addr = vm_mmap(NULL, 0, PAGE_SIZE * 2, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_PRIVATE, 0); if (user_addr >= (unsigned long)(TASK_SIZE)) { - pr_warn("Failed to allocate user memory\n"); kfree(kmem); - return -ENOMEM; + KUNIT_FAIL(test, "Failed to allocate user memory"); } usermem = (char __user *)user_addr; @@ -245,9 +244,9 @@ static int __init test_user_copy_init(void) #undef test_legit /* Test usage of check_nonzero_user(). */ - ret |= test_check_nonzero_user(kmem, usermem, 2 * PAGE_SIZE); + ret |= test_check_nonzero_user(test, kmem, usermem, 2 * PAGE_SIZE); /* Test usage of copy_struct_from_user(). */ - ret |= test_copy_struct_from_user(kmem, usermem, 2 * PAGE_SIZE); + ret |= test_copy_struct_from_user(test, kmem, usermem, 2 * PAGE_SIZE); /* * Invalid usage: none of these copies should succeed. @@ -309,23 +308,18 @@ static int __init test_user_copy_init(void) vm_munmap(user_addr, PAGE_SIZE * 2); kfree(kmem); - - if (ret == 0) { - pr_info("tests passed.\n"); - return 0; - } - - return -EINVAL; } -module_init(test_user_copy_init); - -static void __exit test_user_copy_exit(void) -{ - pr_info("unloaded.\n"); -} +static struct kunit_case user_copy_test_cases[] = { + KUNIT_CASE(user_copy_test), + {} +}; -module_exit(test_user_copy_exit); +static struct kunit_suite user_copy_test_suite = { + .name = "user_copy", + .test_cases = user_copy_test_cases, +}; +kunit_test_suites(&user_copy_test_suite); MODULE_AUTHOR("Kees Cook <keescook(a)chromium.org>"); MODULE_LICENSE("GPL"); base-commit: d43c7fb05765152d4d4a39a8ef957c4ea14d8847 -- 2.26.2

1 year, 4 months

4
11
0 0

[RFC] ktap_v2: KTAP specification transition method

by Frank Rowand

In the middle of the thread about a patch to add the skip test result, I suggested documenting the process of deprecating the KTAP v1 Specification method of marking a skipped test: https://lore.kernel.org/all/490271eb-1429-2217-6e38-837c6e5e328b@gmail.com/… In a reply to that email I suggested that we ought to have a process to transition the KTAP Specification from v1 to v2, and possibly v3 and future. This email is meant to be the root of that discussion. My initial thinking is that there are at least three different types of project and/or community that may have different needs in this area. Type 1 - project controls both the test output generation and the test output parsing tool. Both generation and parsing code are in the same repository and/or synchronized versions are distributed together. Devicetree unittests are an example of Type 1. I plan to maintain changes of test output to KTAP v2 format in coordination with updating the parser to process KTAP v2 data. Type 2 - project controls both the test output generation and the test output parsing tool. The test output generation and a parser modifications may be controlled by the project BUT there are one or more external testing projects that (1) may have their own parsers, and (2) may have a single framework that tests multiple versions of the tests. I think that kselftest and kunit tests are probably examples of Type 2. I also think that DT unittests will become a Type 2 project as a result of converting to KTAP v2 data. Type 3 - project may create and maintain some tests, but is primarily a consumer of tests created by other projects. Type 3 projects typically have a single framework that is able to execute and process multiple versions of the tests. The Fuego test project is an example of Type 3. Maybe adding all of this complexity of different Types in my initial thinking was silly -- maybe everything in this topic is governed by the more complex Type 3. My thinking was that the three different Types of project would be impacted in different ways by transition plans. Type 3 would be the most impacted, so I wanted to be sure that any transition plan especially considered their needs. There is an important aspect of the KTAP format that might ease the transition from one version to another: All KTAP formatted results begin with a "version line", so as soon as a parser has processed the first line of a test, it can apply the appropriate KTAP Specification version to all subsequent lines of test output. A parser implementation could choose to process all versions, could choose to invoke a version specific parser, or some other approach all together. In the "add skip test results" thread, I suggested deprecating the v1 method of marking a skipped test in v2, with a scheduled removal of the v1 method in v3. But since the KTAP format version is available in the very first line of test output, is it necessary to do a slow deprecation and removal over two versions? One argument to doing a two version deprecation/removal process is that a parser that is one version older the the test output _might_ be able to process the test output without error, but would not be able to take advantage of features added in the newer version of the Specification. My opinion is that a two version deprecation/removal process will slow the Specification update process and lead to more versions of the Specification over a given time interval. A one version deprecation/removal process puts more of a burden on Type 3 projects and external parsers for Type 2 projects to implement parsers that can process the newer Specification more quickly and puts a burden on test maintainers to delay a move to the newer Specification, or possibly pressure to support selection of more than one Specification version format for output data. One additional item... On the KTAP Specification version 2 process wiki page, I suggested that it is "desirable for test result parsers that understand the KTAP Specification version 2 data also be able to parse version 1 data." With the implication "Converting version 1 compliant data to version 2 compliant data should not require a "flag day" switch of test result parsers." If this thread discussion results in a different decision, I will update the wiki. Thoughts? -Frank

1 year, 5 months

3
2
0 0

[PATCH v4 00/10] Add support for synchronous signals on perf events

by Marco Elver

The perf subsystem today unifies various tracing and monitoring features, from both software and hardware. One benefit of the perf subsystem is automatically inheriting events to child tasks, which enables process-wide events monitoring with low overheads. By default perf events are non-intrusive, not affecting behaviour of the tasks being monitored. For certain use-cases, however, it makes sense to leverage the generality of the perf events subsystem and optionally allow the tasks being monitored to receive signals on events they are interested in. This patch series adds the option to synchronously signal user space on events. To better support process-wide synchronous self-monitoring, without events propagating to children that do not share the current process's shared environment, two pre-requisite patches are added to optionally restrict inheritance to CLONE_THREAD, and remove events on exec (without affecting the parent). Examples how to use these features can be found in the tests added at the end of the series. In addition to the tests added, the series has also been subjected to syzkaller fuzzing (focus on 'kernel/events/' coverage). Motivation and Example Uses --------------------------- 1. Our immediate motivation is low-overhead sampling-based race detection for user space [1]. By using perf_event_open() at process initialization, we can create hardware breakpoint/watchpoint events that are propagated automatically to all threads in a process. As far as we are aware, today no existing kernel facility (such as ptrace) allows us to set up process-wide watchpoints with minimal overheads (that are comparable to mprotect() of whole pages). 2. Other low-overhead error detectors that rely on detecting accesses to certain memory locations or code, process-wide and also only in a specific set of subtasks or threads. [1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf Other ideas for use-cases we found interesting, but should only illustrate the range of potential to further motivate the utility (we're sure there are more): 3. Code hot patching without full stop-the-world. Specifically, by setting a code breakpoint to entry to the patched routine, then send signals to threads and check that they are not in the routine, but without stopping them further. If any of the threads will enter the routine, it will receive SIGTRAP and pause. 4. Safepoints without mprotect(). Some Java implementations use "load from a known memory location" as a safepoint. When threads need to be stopped, the page containing the location is mprotect()ed and threads get a signal. This could be replaced with a watchpoint, which does not require a whole page nor DTLB shootdowns. 5. Threads receiving signals on performance events to throttle/unthrottle themselves. 6. Tracking data flow globally. Changelog --------- v4: * Fix for parent and child racing to exit in sync_child_event(). * Fix race between irq_work running and task's sighand being released by release_task(). * Generalize setting si_perf and si_addr independent of event type; introduces perf_event_attr::sig_data, which can be set by user space to be propagated to si_perf. * Warning in perf_sigtrap() if ctx->task and current mismatch; we expect this on architectures that do not properly implement arch_irq_work_raise(). * Require events that want sigtrap to be associated with a task. * Dropped "perf: Add breakpoint information to siginfo on SIGTRAP" in favor of more generic solution (perf_event_attr::sig_data). v3: * Add patch "perf: Rework perf_event_exit_event()" to beginning of series, courtesy of Peter Zijlstra. * Rework "perf: Add support for event removal on exec" based on the added "perf: Rework perf_event_exit_event()". * Fix kselftests to work with more recent libc, due to the way it forces using the kernel's own siginfo_t. * Add basic perf-tool built-in test. v2/RFC: https://lkml.kernel.org/r/20210310104139.679618-1-elver@google.com * Patch "Support only inheriting events if cloned with CLONE_THREAD" added to series. * Patch "Add support for event removal on exec" added to series. * Patch "Add kselftest for process-wide sigtrap handling" added to series. * Patch "Add kselftest for remove_on_exec" added to series. * Implicitly restrict inheriting events if sigtrap, but the child was cloned with CLONE_CLEAR_SIGHAND, because it is not generally safe if the child cleared all signal handlers to continue sending SIGTRAP. * Various minor fixes (see details in patches). v1/RFC: https://lkml.kernel.org/r/20210223143426.2412737-1-elver@google.com Pre-series: The discussion at [2] led to the changes in this series. The approach taken in "Add support for SIGTRAP on perf events" to trigger the signal was suggested by Peter Zijlstra in [3]. [2] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZ… [3] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.n… Marco Elver (9): perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children perf: Support only inheriting events if cloned with CLONE_THREAD perf: Add support for event removal on exec signal: Introduce TRAP_PERF si_code and si_perf to siginfo perf: Add support for SIGTRAP on perf events selftests/perf_events: Add kselftest for process-wide sigtrap handling selftests/perf_events: Add kselftest for remove_on_exec tools headers uapi: Sync tools/include/uapi/linux/perf_event.h perf test: Add basic stress test for sigtrap handling Peter Zijlstra (1): perf: Rework perf_event_exit_event() arch/m68k/kernel/signal.c | 3 + arch/x86/kernel/signal_compat.c | 5 +- fs/signalfd.c | 4 + include/linux/compat.h | 2 + include/linux/perf_event.h | 9 +- include/linux/signal.h | 1 + include/uapi/asm-generic/siginfo.h | 6 +- include/uapi/linux/perf_event.h | 12 +- include/uapi/linux/signalfd.h | 4 +- kernel/events/core.c | 302 +++++++++++++----- kernel/fork.c | 2 +- kernel/signal.c | 11 + tools/include/uapi/linux/perf_event.h | 12 +- tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 5 + tools/perf/tests/sigtrap.c | 150 +++++++++ tools/perf/tests/tests.h | 1 + .../testing/selftests/perf_events/.gitignore | 3 + tools/testing/selftests/perf_events/Makefile | 6 + tools/testing/selftests/perf_events/config | 1 + .../selftests/perf_events/remove_on_exec.c | 260 +++++++++++++++ tools/testing/selftests/perf_events/settings | 1 + .../selftests/perf_events/sigtrap_threads.c | 210 ++++++++++++ 23 files changed, 924 insertions(+), 87 deletions(-) create mode 100644 tools/perf/tests/sigtrap.c create mode 100644 tools/testing/selftests/perf_events/.gitignore create mode 100644 tools/testing/selftests/perf_events/Makefile create mode 100644 tools/testing/selftests/perf_events/config create mode 100644 tools/testing/selftests/perf_events/remove_on_exec.c create mode 100644 tools/testing/selftests/perf_events/settings create mode 100644 tools/testing/selftests/perf_events/sigtrap_threads.c -- 2.31.0.208.g409f899ff0-goog

1 year, 6 months

5
29
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror October 2023