Hןת
On Sat, Jan 18, 2025 at 12:21 PM Kees Cook kees@kernel.org wrote:
On Thu, Jan 16, 2025 at 04:55:39PM -0800, Eyal Birger wrote:
Since uretprobe is a "kernel implementation detail" system call which is not used by userspace application code directly, it is impractical and there's very little point in forcing all userspace applications to explicitly allow it in order to avoid crashing tracked processes.
How is this any different from sigreturn, rt_sigreturn, or restart_syscall? These are all handled explicitly by userspace filters already, and I don't see why uretprobe should be any different. Docker has had plenty of experience with fixing their seccomp filters for new syscalls. For example, many times already a given libc will suddenly start using a new syscall when it sees its available, etc.
I think the difference is that this syscall is not part of the process's code - it is inserted there by another process tracing it. So this is different than desiring to deploy a new version of a binary that uses a new libc or a new syscall. Here the case is that there are three players - the tracer running out of docker, the tracee running in docker, and docker itself. All three were running fine in a specific kernel version, but upgrading the kernel now crashes the traced process.
Basically, this is a Docker issue, not a kernel issue.
As mentione above, for all three given binaries, nothing changed - only the kernel version.
Seccomp is behaving correctly. I don't want to start making syscalls invisible without an extremely good reason. If _anything_ should be invisible, it is restart_syscall (which actually IS invisible under certain architectures).
I think this syscall is different in that respect for the reasons described. I don't know if seccomp is behaving correctly when it blocks a kernel implementation detail that isn't user created. IMHO the fact that this implementation detail is implemented as a syscall is unfortunate, and I'm trying to mitigate the result.
Eyal.
-Kees
-- Kees Cook