On Tue, Jan 21, 2025 at 8:55 AM Jiri Olsa olsajiri@gmail.com wrote:
On Tue, Jan 21, 2025 at 11:16:31AM -0500, Steven Rostedt wrote:
[ Watching this with popcorn from the sidelines, but I'll chime in anyway ]
On Tue, 21 Jan 2025 15:38:48 +0100 Jiri Olsa olsajiri@gmail.com wrote:
I'm still trying to come up with some other solution but wanted to exhaust all the options I could think of
I think this may have been mentioned, but is there a way that the kernel could know that this system call is being monitored by seccomp, and if so, just stick with the interrupt version? If not, enable the system call?
yes [1], the problem with that solution is that we install uretprobe trampoline at function's uprobe entry probe, so we won't catch case where seccomp is enabled in this probed function, like:
foo uprobe -> install uretprobe trampoline ... seccomp(SECCOMP_MODE_STRICT.. ... ret -> execute uretprobe trampoline with sys_uretprobe
I thought we could perhaps switch existing uretprobe trampoline to int3 when we are in sys_seccomp, but another user thread might be already executing the existing uretprobe trampoline, so I don't think we can do that
Jiri,
We should abandon the vector of "let's try to detect whether someone is blocking sys_uretprobe" as a solution, I don't believe it's possible. Blocking sys_uretprobe is too dynamic of a thing. There is an arbitrary periods of time between adding uretprobe trampoline (i.e., sys_uretprobe) and actually disabling sys_uretprobe through seccomp (or even BPF: LSM or even kprobes can do that, why not?), and userspace can flip this decision many times over.
And as Oleg said, sysctl "please-make-my-uretprobe-2x-faster-assuming-i-know-about-this-option" makes no sense either, this will basically almost never get enabled.
Kees,
You said yourself that sys_uretprobe is no different from rt_sigreturn and restart_syscall, so why would we rollback sys_uretprobe if we wouldn't rollback rt_sigreturn/restart_syscall? Given it's impossible, generally speaking, to know if userspace is blocking the syscall (and that can change dynamically and very frequently), any improvement or optimization that kernel would do with the help of special syscall is now prohibited, effectively. That doesn't seem wise to restrict the kernel development so much just because libseccomp blocks any unknown syscall by default.
I'm OK either asking libseccomp to learn about sys_uretprobe and not block it (like systemd is doing), or if we want to bend over backwards, prevent user policy from filtering theses special syscalls which are meant to be used by kernel only. We can't single out sys_uretprobe just because it's the newest of this special cohort.
You also asked "what if userspace wants to block uprobes"? If that's really the goal, that would be done at uprobe attachment time, not when uprobe is (conceptually) attached, new process is forked, and kernel installs uretprobe trampoline with uretprobe syscall. Or just control that through (lack of) capabilities. Using seccomp to block *second part of uretprobe handling* doesn't make much sense. It's just the wrong place for that.
P.S. Also using FRED as an excuse for not doing sys_uretprobe is manipulative. When we get FRED-enabled CPUs widely available and deployed *and* all (or at least majority of) the currently used CPUs are decommissioned, only then we can realistically talk about sys_uretprobe being unnecessary. That's years and years. sys_uretprobe is necessary and important *right now* and will be for the foreseeable future.
jirka
[1] https://lore.kernel.org/bpf/20250114123257.GD19816@redhat.com/