On Mon, Dec 9, 2024 at 2:14 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Mon, Dec 9, 2024 at 10:22 AM Jann Horn jannh@google.com wrote:
On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 3:14 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:43 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote: > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko > andrii.nakryiko@gmail.com wrote: > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote: > > > > > > Currently, the pointer stored in call->prog_array is loaded in > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the > > > loaded pointer can immediately be dangling. Later, > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, > > > but this is too late. It then uses rcu_dereference_check(), but this use of > > > rcu_dereference_check() does not actually dereference anything. > > > > > > It looks like the intention was to pass a pointer to the member > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference > > > the pointer in there. Fix the issue by actually doing that. > > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Jann Horn jannh@google.com > > > --- > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an > > > include of linux/delay.h. > > > > > > Build this userspace program: > > > > > > ``` > > > $ cat dummy.c > > > #include <stdio.h> > > > int main(void) { > > > printf("hello world\n"); > > > } > > > $ gcc -o dummy dummy.c > > > ``` > > > > > > Then build this BPF program and load it (change the path to point to > > > the "dummy" binary you built): > > > > > > ``` > > > $ cat bpf-uprobe-kern.c > > > #include <linux/bpf.h> > > > #include <bpf/bpf_helpers.h> > > > #include <bpf/bpf_tracing.h> > > > char _license[] SEC("license") = "GPL"; > > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") > > > int BPF_UPROBE(main_uprobe) { > > > bpf_printk("main uprobe triggered\n"); > > > return 0; > > > } > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach > > > ``` > > > > > > Then run ./dummy in one terminal, and after launching it, run > > > `sudo umount uprobe-test` in another terminal. Once the 10-second > > > mdelay() is over, a use-after-free should occur, which may or may > > > not crash your kernel at the `prog->sleepable` check in > > > bpf_prog_run_array_uprobe() depending on your luck. > > > --- > > > Changes in v2: > > > - remove diff chunk in patch notes that confuses git > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog... > > > --- > > > include/linux/bpf.h | 4 ++-- > > > kernel/trace/trace_uprobe.c | 2 +- > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be > > used, it seems like it is the caller's responsibility to > > RCU-dereference array and keep RCU critical section before calling > > into bpf_prog_run_array(). So I wonder if it's best to do this instead > > (Gmail will butcher the diff, but it's about the idea): > > Yeah, that's the other option I was considering. That would be more > consistent with the existing bpf_prog_run_array(), but has the > downside of unnecessarily pushing responsibility up to the caller... > I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
Ack, I will make it consistent.
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index eaee2a819f4c..4b8a9edd3727 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array, > > * rcu-protected dynamically sized maps. > > */ > > static __always_inline u32 > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, > > const void *ctx, bpf_prog_run_fn run_prog) > > { > > const struct bpf_prog_array_item *item; > > const struct bpf_prog *prog; > > - const struct bpf_prog_array *array; > > struct bpf_run_ctx *old_run_ctx; > > struct bpf_trace_run_ctx run_ctx; > > u32 ret = 1; > > > > might_fault(); > > + RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held"); > > + > > + if (unlikely(!array)) > > + goto out; > > > > - rcu_read_lock_trace(); > > migrate_disable(); > > > > run_ctx.is_uprobe = true; > > > > - array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); > > - if (unlikely(!array)) > > - goto out; > > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > > item = &array->items[0]; > > while ((prog = READ_ONCE(item->prog))) { > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct > > bpf_prog_array __rcu *array_rcu, > > bpf_reset_run_ctx(old_run_ctx); > > out: > > migrate_enable(); > > - rcu_read_unlock_trace(); > > return ret; > > } > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > > index fed382b7881b..87a2b8fefa90 100644 > > --- a/kernel/trace/trace_uprobe.c > > +++ b/kernel/trace/trace_uprobe.c > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, > > if (bpf_prog_array_valid(call)) { > > u32 ret; > > > > + rcu_read_lock_trace(); > > ret = bpf_prog_run_array_uprobe(call->prog_array, > > regs, bpf_prog_run); > > But then this should be something like this (possibly split across > multiple lines with a helper variable or such): > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, > rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_dereference() is not legal there - that asserts that we are in a normal RCU read-side critical section, which we are not. rcu_dereference_raw() would be, but I think it is nice to document the semantics to make it explicit under which lock we're operating.
sure, I don't mind
I'll send a v3 in a bit after testing it.
Actually, now I'm still hitting a page fault with my WIP v3 fix applied... I'll probably poke at this some more next week.
OK, that's interesting, keep us posted!
If I replace the "uprobe/" in my reproducer with "uprobe.s/", the reproducer stops crashing even on bpf/master without this fix - because it happens that handle_swbp() is already holding a rcu_read_lock_trace() lock way up the stack. So I think this fix should still be applied, but it probably doesn't need to go into stable unless there is another path to the buggy code that doesn't come from handle_swbp(). I guess I probably should resend my patch with an updated commit message pointing out this caveat?
The problem I'm actually hitting seems to be a use-after-free of a "struct bpf_prog" because of mismatching RCU flavors. Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But it is possible to attach a non-sleepable BPF program to a uprobe, and non-sleepable BPF programs are freed via normal RCU (see __bpf_prog_put_noref()). And that is what happens with the reproducer from my initial post (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@go...)
- I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
So I think that while I am delaying execution in bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out the event->tp_event->prog_array pointer and does bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then the BPF program can be freed since the reader doesn't hold an RCU read lock. This seems a bit annoying to fix - there could legitimately be several versions of the bpf_prog_array that are still used by tasks-trace-RCU readers, so I think we can't just NULL out the array entry and use RCU for the bpf_prog_array access on the reader side. I guess we could add another flag on BPF programs that answers "should this program be freed via tasks-trace-RCU" (separately from whether the program is sleepable)?
Yes, we shouldn't NULL anything out.
This is the same issue we've been solving recently for sleepable tracepoints, see [0] and other patches in the same patch set. We solved it for sleepable (aka "faultable") tracepoints, but uprobes have the same problem where the attachment point is sleepable by nature (and thus protected by RCU Tasks Trace), but BPF program itself is non-sleepable (and thus we only wait for synchronize_rcu() before freeing), which causes a disconnect.
We can easily fix this for BPF link-based uprobes, but legacy uprobes can be directly attached to perf event, so that's a bit more cumbersome. Let me think what should be the best way to handle this.
Meanwhile, I agree, please send your original fix (with changes we discussed), it's good to have them, even if they don't fix your original issue. I'll CC you on fixes once I have them.
Ok, weeding through the perf/uprobe plumbing for BPF, I think we avoid this problem with uprobe BPF link because uprobe_unregister_sync() waits for RCU Tasks Trace GP, and so once we finish uprobe unregistration, we have a guarantee that there is no more uprobe that might dereference our BPF program. (I might have thought about this problem when fixing BPF link for sleepable tracepoints, but I missed the legacy perf_event attach/detach case).
With legacy perf event perf_event_detach_bpf_prog() we don't do any of that, we just NULL out pointer and do bpf_prog_put(), not caring whether this is uprobe, kprobe, or tracepoint...
So one way to solve this is to either teach perf_event_detach_bpf_prog() to delay bpf_prog_put() until after RCU Tasks Trace GP (which is what we do with bpf_prog_array, but not the program itself), or add prog->aux->sleepable_hook flag in addition to prog->aux->sleepable, and then inside bpf_prog_put() check (prog->aux->sleepable || prog->aux->sleepable_hook) and do RCU Tasks Trace delay (in addition to normal call_rcu()).
Third alternative would be to have something like bpf_prog_put_sleepable() (just like we have bpf_prog_array_free_sleepable()), where this would do additional call_rcu_tasks_trace() even if BPF program itself isn't sleepable.
Alexei, Daniel, any thoughts or preferences?
[0] https://lore.kernel.org/all/20241101181754.782341-1-andrii@kernel.org/