On 12.03.2020 17:38, Eric W. Biederman wrote:
Kirill Tkhai ktkhai@virtuozzo.com writes:
On 12.03.2020 15:24, Eric W. Biederman wrote:
Kirill Tkhai ktkhai@virtuozzo.com writes:
On 09.03.2020 00:38, Eric W. Biederman wrote:
The cred_guard_mutex is problematic. The cred_guard_mutex is held over the userspace accesses as the arguments from userspace are read. The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other threads are killed. The cred_guard_mutex is held over "put_user(0, tsk->clear_child_tid)" in exit_mm().
Any of those can result in deadlock, as the cred_guard_mutex is held over a possible indefinite userspace waits for userspace.
Add exec_update_mutex that is only held over exec updating process with the new contents of exec, so that code that needs not to be confused by exec changing the mm and the cred in ways that can not happen during ordinary execution of a process.
The plan is to switch the users of cred_guard_mutex to exec_udpate_mutex one by one. This lets us move forward while still being careful and not introducing any regressions.
Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03M... Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
fs/exec.c | 9 +++++++++ include/linux/sched/signal.h | 9 ++++++++- init/init_task.c | 1 + kernel/fork.c | 1 + 4 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index d820a7272a76..ffeebb1f167b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm;
- int ret;
/* Notify parent that we're no longer interested in the old VM */ tsk = current; @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) return -EINTR; } }
- ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
- if (ret)
return ret;
You missed old_mm->mmap_sem unlock. See here:
Duh. Thank you.
I actually need to switch the lock ordering here, and I haven't yet because my son was sick yesterday.
There is some fundamental problem with your patch, since the below fires in 100% cases on current linux-next:
Thank you.
I have just backed this out of linux-next for now because it is clearly flawed.
You make some good points about the recursion. I will go back to the drawing board and see what I can work out.
[ 22.838717] kernel BUG at fs/exec.c:1474!
diff --git a/fs/exec.c b/fs/exec.c index 47582cd97f86..0f77f8c94905 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1470,8 +1470,10 @@ static void free_bprm(struct linux_binprm *bprm) { free_arg_pages(bprm); if (bprm->cred) {
if (!bprm->mm)
if (!bprm->mm) {
BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); mutex_unlock(¤t->signal->exec_update_mutex);
mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); }}
@@ -1521,6 +1523,7 @@ void install_exec_creds(struct linux_binprm *bprm) * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm);
- BUG_ON(!mutex_is_locked(¤t->signal->exec_update_mutex)); mutex_unlock(¤t->signal->exec_update_mutex); mutex_unlock(¤t->signal->cred_guard_mutex);
}
First time the mutex is unlocked in:
exec_binprm()->search_binary_handler()->.load_binary->install_exec_creds()
Then exec_binprm()->search_binary_handler()->.load_binary->flush_old_exec() clears mm:
bprm->mm = NULL;
Second time the mutex is unlocked in free_bprm():
if (bprm->cred) { if (!bprm->mm) mutex_unlock(¤t->signal->exec_update_mutex);
My opinion is we should not relay on side indicators like bprm->mm. Better you may introduce struct linux_binprm::exec_update_mutex_is_locked. So the next person dealing with this after you won't waste much time on diving into this. Also, if someone decides to change the place, where bprm->mm is set into NULL, this person will bump into hell of dependences between unrelated components like your newly introduced mutex.
So, I'm strongly for *struct linux_binprm::exec_update_mutex_is_locked*, since this improves modularity.
Am I wrong or is that also a problem with cred_guard_mutex?
No, there is no a problem.
cred_guard_mutex is locked in a pair with bprm->cred = prepare_exec_creds() assignment.
cred_guard_mutex is unlocked in a pair with bprm->cred = NULL clearing (see install_exec_creds()). Further free_bprm() skip unlock in case of bprm->cred is NULL.