On Sun, 2020-03-08 at 16:38 -0500, Eric W. Biederman wrote:
The cred_guard_mutex is problematic. The cred_guard_mutex is held over the userspace accesses as the arguments from userspace are read. The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other threads are killed. The cred_guard_mutex is held over "put_user(0, tsk->clear_child_tid)" in exit_mm().
Any of those can result in deadlock, as the cred_guard_mutex is held over a possible indefinite userspace waits for userspace.
Add exec_update_mutex that is only held over exec updating process with the new contents of exec, so that code that needs not to be confused by exec changing the mm and the cred in ways that can not happen during ordinary execution of a process.
The plan is to switch the users of cred_guard_mutex to exec_udpate_mutex one by one. This lets us move forward while still being careful and not introducing any regressions.
Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03M... Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
This patch will trigger a warning during boot,
[ 19.707214][ T1] pci 0035:01:00.0: enabling device (0545 -> 0547) [ 19.707287][ T1] EEH: Capable adapter found: recovery enabled. [ 19.732541][ T1] cpuidle-powernv: Default stop: psscr = 0x0000000000000330,mask=0x00000000003003ff [ 19.732567][ T1] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300375,mask=0x00000000003003ff [ 19.732598][ T1] cpuidle-powernv: First stop level that may lose SPRs = 0x4 [ 19.732617][ T1] cpuidle-powernv: First stop level that may lose timebase = 0x10 [ 19.769784][ T1] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 19.769810][ T1] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 19.789344][ T718] [ 19.789367][ T718] ===================================== [ 19.789379][ T718] WARNING: bad unlock balance detected! [ 19.789393][ T718] 5.6.0-rc5-next-20200311+ #4 Not tainted [ 19.789414][ T718] ------------------------------------- [ 19.789426][ T718] kworker/u257:0/718 is trying to release lock (&sig-
exec_update_mutex) at:
[ 19.789459][ T718] [<c0000000004c6770>] free_bprm+0xe0/0xf0 [ 19.789481][ T718] but there are no more locks to release! [ 19.789502][ T718] [ 19.789502][ T718] other info that might help us debug this: [ 19.789537][ T718] 1 lock held by kworker/u257:0/718: [ 19.789558][ T718] #0: c000001fa8842808 (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file.isra.33+0x1b0/0xda0 [ 19.789611][ T718] [ 19.789611][ T718] stack backtrace: [ 19.789645][ T718] CPU: 8 PID: 718 Comm: kworker/u257:0 Not tainted 5.6.0- rc5-next-20200311+ #4 [ 19.789681][ T718] Call Trace: [ 19.789703][ T718] [c000000dad8cfa70] [c000000000979b40] dump_stack+0xf4/0x164 (unreliable) [ 19.789742][ T718] [c000000dad8cfac0] [c0000000001c1d78] print_unlock_imbalance_bug+0x118/0x140 [ 19.789780][ T718] [c000000dad8cfb40] [c0000000001ceaa0] lock_release+0x270/0x520 [ 19.789817][ T718] [c000000dad8cfbf0] [c0000000009a2898] __mutex_unlock_slowpath+0x68/0x400 [ 19.789854][ T718] [c000000dad8cfcc0] [c0000000004c6770] free_bprm+0xe0/0xf0 [ 19.789900][ T718] [c000000dad8cfcf0] [c0000000004c845c] __do_execve_file.isra.33+0x44c/0xda0 __do_execve_file at fs/exec.c:1904 [ 19.789938][ T718] [c000000dad8cfde0] [c0000000001391d8] call_usermodehelper_exec_async+0x218/0x250 [ 19.789977][ T718] [c000000dad8cfe20] [c00000000000b748] ret_from_kernel_thread+0x5c/0x74
fs/exec.c | 9 +++++++++ include/linux/sched/signal.h | 9 ++++++++- init/init_task.c | 1 + kernel/fork.c | 1 + 4 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index d820a7272a76..ffeebb1f167b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm;
- int ret;
/* Notify parent that we're no longer interested in the old VM */ tsk = current; @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) return -EINTR; } }
- ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
- if (ret)
return ret;
- task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm);
@@ -1438,6 +1444,8 @@ static void free_bprm(struct linux_binprm *bprm) { free_arg_pages(bprm); if (bprm->cred) {
if (!bprm->mm)
mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); }mutex_unlock(¤t->signal->exec_update_mutex);
@@ -1487,6 +1495,7 @@ void install_exec_creds(struct linux_binprm *bprm) * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm);
- mutex_unlock(¤t->signal->exec_update_mutex); mutex_unlock(¤t->signal->cred_guard_mutex);
} EXPORT_SYMBOL(install_exec_creds); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 88050259c466..a29df79540ce 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -224,7 +224,14 @@ struct signal_struct { struct mutex cred_guard_mutex; /* guard against foreign influences on * credential calculations
* (notably. ptrace) */
* (notably. ptrace)
* Deprecated do not use in new code.
* Use exec_update_mutex instead.
*/
- struct mutex exec_update_mutex; /* Held while task_struct is being
* updated during exec, and may have
* inconsistent permissions.
*/
} __randomize_layout; /* diff --git a/init/init_task.c b/init/init_task.c index 9e5cbe5eab7b..bd403ed3e418 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -26,6 +26,7 @@ static struct signal_struct init_signals = { .multiprocess = HLIST_HEAD_INIT, .rlim = INIT_RLIMITS, .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex),
- .exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex),
#ifdef CONFIG_POSIX_TIMERS .posix_timers = LIST_HEAD_INIT(init_signals.posix_timers), .cputimer = { diff --git a/kernel/fork.c b/kernel/fork.c index 60a1295f4384..12896a6ecee6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1594,6 +1594,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) sig->oom_score_adj_min = current->signal->oom_score_adj_min; mutex_init(&sig->cred_guard_mutex);
- mutex_init(&sig->exec_update_mutex);
return 0; }