Re: [PATCH] seccomp: passthrough uretprobe systemcall without filtering

21 Jan 2025


      On Tue, Jan 21, 2025 at 8:55 AM Jiri Olsa olsajiri@gmail.com wrote:
...
On Tue, Jan 21, 2025 at 11:16:31AM -0500, Steven Rostedt wrote:
...
[ Watching this with popcorn from the sidelines, but I'll chime in anyway ]
On Tue, 21 Jan 2025 15:38:48 +0100
Jiri Olsa olsajiri@gmail.com wrote:
...
I'm still trying to come up with some other solution but wanted
to exhaust all the options I could think of
I think this may have been mentioned, but is there a way that the kernel
could know that this system call is being monitored by seccomp, and if so,
just stick with the interrupt version? If not, enable the system call?
yes [1], the problem with that solution is that we install uretprobe
trampoline at function's uprobe entry probe, so we won't catch case
where seccomp is enabled in this probed function, like:
foo
    uprobe -> install uretprobe trampoline
    ...
    seccomp(SECCOMP_MODE_STRICT..
    ...
    ret -> execute uretprobe trampoline with sys_uretprobe
I thought we could perhaps switch existing uretprobe trampoline to
int3 when we are in sys_seccomp, but another user thread might be
already executing the existing uretprobe trampoline, so I don't
think we can do that
Jiri,
We should abandon the vector of "let's try to detect whether someone
is blocking sys_uretprobe" as a solution, I don't believe it's
possible. Blocking sys_uretprobe is too dynamic of a thing. There is
an arbitrary periods of time between adding uretprobe trampoline
(i.e., sys_uretprobe) and actually disabling sys_uretprobe through
seccomp (or even BPF: LSM or even kprobes can do that, why not?), and
userspace can flip this decision many times over.
And as Oleg said, sysctl
"please-make-my-uretprobe-2x-faster-assuming-i-know-about-this-option"
makes no sense either, this will basically almost never get enabled.
Kees,
You said yourself that sys_uretprobe is no different from rt_sigreturn
and restart_syscall, so why would we rollback sys_uretprobe if we
wouldn't rollback rt_sigreturn/restart_syscall? Given it's impossible,
generally speaking, to know if userspace is blocking the syscall (and
that can change dynamically and very frequently), any improvement or
optimization that kernel would do with the help of special syscall is
now prohibited, effectively. That doesn't seem wise to restrict the
kernel development so much just because libseccomp blocks any unknown
syscall by default.
I'm OK either asking libseccomp to learn about sys_uretprobe and not
block it (like systemd is doing), or if we want to bend over
backwards, prevent user policy from filtering theses special syscalls
which are meant to be used by kernel only. We can't single out
sys_uretprobe just because it's the newest of this special cohort.
You also asked "what if userspace wants to block uprobes"? If that's
really the goal, that would be done at uprobe attachment time, not
when uprobe is (conceptually) attached, new process is forked, and
kernel installs uretprobe trampoline with uretprobe syscall. Or just
control that through (lack of) capabilities. Using seccomp to block
*second part of uretprobe handling* doesn't make much sense. It's just
the wrong place for that.
P.S. Also using FRED as an excuse for not doing sys_uretprobe is
manipulative. When we get FRED-enabled CPUs widely available and
deployed *and* all (or at least majority of) the currently used CPUs
are decommissioned, only then we can realistically talk about
sys_uretprobe being unnecessary. That's years and years. sys_uretprobe
is necessary and important *right now* and will be for the foreseeable
future.
...
jirka
[1] https://lore.kernel.org/bpf/20250114123257.GD19816@redhat.com/

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] seccomp: passthrough uretprobe systemcall without filtering