Kirill Tkhai ktkhai@virtuozzo.com writes:
On 09.03.2020 00:38, Eric W. Biederman wrote:
The cred_guard_mutex is problematic. The cred_guard_mutex is held over the userspace accesses as the arguments from userspace are read. The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other threads are killed. The cred_guard_mutex is held over "put_user(0, tsk->clear_child_tid)" in exit_mm().
Any of those can result in deadlock, as the cred_guard_mutex is held over a possible indefinite userspace waits for userspace.
Add exec_update_mutex that is only held over exec updating process with the new contents of exec, so that code that needs not to be confused by exec changing the mm and the cred in ways that can not happen during ordinary execution of a process.
The plan is to switch the users of cred_guard_mutex to exec_udpate_mutex one by one. This lets us move forward while still being careful and not introducing any regressions.
Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03M... Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
fs/exec.c | 9 +++++++++ include/linux/sched/signal.h | 9 ++++++++- init/init_task.c | 1 + kernel/fork.c | 1 + 4 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index d820a7272a76..ffeebb1f167b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm;
- int ret;
/* Notify parent that we're no longer interested in the old VM */ tsk = current; @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) return -EINTR; } }
- ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
- if (ret)
return ret;
You missed old_mm->mmap_sem unlock. See here:
Duh. Thank you.
I actually need to switch the lock ordering here, and I haven't yet because my son was sick yesterday.
Something like this.
diff --git a/fs/exec.c b/fs/exec.c index 96f89401b4d1..03d50c27ec01 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1020,9 +1020,14 @@ static int exec_mmap(struct mm_struct *mm) tsk = current; old_mm = current->mm; exec_mm_release(tsk, old_mm); + if (old_mm) + sync_mm_rss(old_mm); + + ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); + if (ret) + return ret;
if (old_mm) { - sync_mm_rss(old_mm); /* * Make sure that if there is a core dump in progress * for the old mm, we get out and die instead of going @@ -1032,14 +1037,11 @@ static int exec_mmap(struct mm_struct *mm) down_read(&old_mm->mmap_sem); if (unlikely(old_mm->core_state)) { up_read(&old_mm->mmap_sem); + mutex_unlock(&tsk->signal->exec_update_mutex); return -EINTR; } }
- ret = mutex_lock_killable(&tsk->signal->exec_update_mutex); - if (ret) - return ret; - task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm);
diff --git a/fs/exec.c b/fs/exec.c index 47582cd97f86..d557bac3e862 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1063,8 +1063,11 @@ static int exec_mmap(struct mm_struct *mm) } ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
- if (ret)
- if (ret) {
if (old_mm)
return ret;up_read(&old_mm->mmap_sem);
- }
task_lock(tsk); active_mm = tsk->active_mm;
Eric