On 11/17, Bernd Edlinger wrote:
On 11/11/25 10:21, Christian Brauner wrote:
On Wed, Nov 05, 2025 at 03:32:10PM +0100, Oleg Nesterov wrote:
But this is minor. Why do we need "bool unsafe_execve_in_progress" ? If this patch is correct, de_thread() can drop/reacquire cred_guard_mutex unconditionally.
I would not like to drop the mutex when no absolutely necessary for performance reasons.
OK, I won't insist... But I don't really understand how this can help to improve the performance. If nothing else, this adds another for_other_threads() loop.
And again, the unsafe_execve_in_progress == T case is unlikely. I'm afraid this case (de_thread() without cred_guard_mutex) won't have enough testing.
In any case, why you dislike the suggestion to add this unsafe_execve_in_progress logic in a separate patch?
- if (unlikely(unsafe_execve_in_progress)) {
spin_unlock_irq(lock);sig->exec_bprm = bprm;mutex_unlock(&sig->cred_guard_mutex);spin_lock_irq(lock);I don't think spin_unlock_irq() + spin_lock_irq() makes any sense...
Since the spin lock was acquired while holding the mutex, both should be unlocked in reverse sequence and the spin lock re-acquired after releasing the mutex.
Why?
I'd expect the scheduler to do a task switch after the cred_guard_mutex is unlocked, at least in the RT-linux variant, while the spin lock is not yet unlocked.
I must have missed something, but I still don't understand why this would be wrong...
@@ -1114,13 +1139,31 @@ int begin_new_exec(struct linux_binprm * bprm) */ trace_sched_prepare_exec(current, bprm);
- /* If the binary is not readable then enforce mm->dumpable=0 */
- would_dump(bprm, bprm->file);
- if (bprm->have_execfd)
would_dump(bprm, bprm->executable);- /*
* Figure out dumpability. Note that this checking only of current* is wrong, but userspace depends on it. This should be testing* bprm->secureexec instead.*/- if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
is_dumpability_changed(current_cred(), bprm->cred) ||!(uid_eq(current_euid(), current_uid()) &&gid_eq(current_egid(), current_gid())))set_dumpable(bprm->mm, suid_dumpable);- else
set_dumpable(bprm->mm, SUID_DUMP_USER);OK, we need to do this before de_thread() drops cred_guard_mutex. But imo this too should be done in a separate patch, the changelog should explain this change.
The dumpability need to be determined before de_thread, because ptrace_may_access needs this information to determine if the tracer is allowed to ptrace. That is part of the core of the patch, it would not work without that.
Yes,
I will add more comments to make that more easy to understand.
But again, why this change can't come in a separate patch? Before the patch which drops cred_guard_mutex in de_thread().
int lock_current_cgm(void) { if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex)) return -ERESTARTNOINTR;
if (!current->signal->group_exec_task) return 0; WARN_ON(!fatal_signal_pending(current)); mutex_unlock(¤t->signal->cred_guard_mutex); return -ERESTARTNOINTR;}
?
Some use mutex_lock_interruptible and some use mutex_lock_killable here, so it wont work for all of them. I would not consider this a new kind of dead-lock free mutex, but just an open-coded state machine, handling the state that the tasks have whild de_thread is running.
OK. and we don't have mutex_lock_state(). I think that all users could use mutex_lock_killable(), but you are right anyway, and this is minor.
Note that it checks ->group_exec_task, not ->exec_bprm. So this change can come in a separate patch too, but I won't insist.
Yes. Although this is minor too ;)
This is the most problematic change which I can't review...
Firstly, it changes task->mm/real_cred for __ptrace_may_access() and this looks dangerous to me.
Yeah, that is not ok. This is effectively override_creds for real_cred and that is not a pattern I want to see us establish at all! Temporary credential overrides for the subjective credentials is already terrible but at least we have the explicit split between real_cred and cred expressely for that. So no, that's not an acceptable solution.
Okay I understand your point. I did this originally just to avoid to have to change the interface to all the security engines, but instead I could add a flag PTRACE_MODE_BPRMCREDS to the ptrace_may_access which must be handled in all security engines, to use child->signal->exec_bprm->creds instead of __task_cred(child).
Can't comment... I don't understand your idea, but this is my fault. I guess this needs more changes, in particular __ptrace_may_access_mm_cred(), but most probably I misunderstood your idea.
Or. check_unsafe_exec() sets LSM_UNSAFE_PTRACE if ptrace. Is it safe to ptrace the execing task after that? I have no idea what the security hooks can do...
That means the tracee is already ptraced before the execve, and SUID-bits do not work as usual, and are more or less ignored. But in this patch the tracee is not yet ptraced.
Well. I meant that if LSM_UNSAFE_PTRACE is not set, then currently (say) security_bprm_committing_creds() has all rights to assume that the execing task is not ptraced. Yes, I don't see any potential problem right now, but still.
And just in case... Lets look at this code
+ rcu_assign_pointer(task->real_cred, bprm->cred); + task->mm = bprm->mm; + retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); + rcu_assign_pointer(task->real_cred, old_cred); + task->mm = old_mm;
again.
This is mostly theoretical, but what if begin_new_exec() fails after de_thread() and before exec_mmap() and/or commit_creds(bprm->cred) ? In this case the execing thread will report SIGSEGV to debugger which can (say) read old_mm.
No?
I am starting to think that ptrace_attach() should simply fail with -EWOULDBLOCK if it detects "unsafe_execve_in_progress" ... And perhaps this is what you already tried to do in the past, I can't recall :/
Oleg.