Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
It looks like the intention was to pass a pointer to the member call->prog_array into bpf_prog_run_array_uprobe() and actually dereference the pointer in there. Fix the issue by actually doing that.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com --- To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly before the might_fault() in bpf_prog_run_array_uprobe() and add an include of linux/delay.h.
Build this userspace program:
``` $ cat dummy.c #include <stdio.h> int main(void) { printf("hello world\n"); } $ gcc -o dummy dummy.c ```
Then build this BPF program and load it (change the path to point to the "dummy" binary you built):
``` $ cat bpf-uprobe-kern.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> char _license[] SEC("license") = "GPL";
SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") int BPF_UPROBE(main_uprobe) { bpf_printk("main uprobe triggered\n"); return 0; } $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach ```
Then run ./dummy in one terminal, and after launching it, run `sudo umount uprobe-test` in another terminal. Once the 10-second mdelay() is over, a use-after-free should occur, which may or may not crash your kernel at the `prog->sleepable` check in bpf_prog_run_array_uprobe() depending on your luck. --- Changes in v2: - remove diff chunk in patch notes that confuses git - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog... --- include/linux/bpf.h | 4 ++-- kernel/trace/trace_uprobe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c150a34a7b1075584711609682e4c..00b3c5b197df75a0386233b9885b480b2ce72f5f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,7 +2193,7 @@ bpf_prog_run_array(const struct bpf_prog_array *array, * rcu-protected dynamically sized maps. */ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(struct bpf_prog_array __rcu **array_rcu, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; @@ -2210,7 +2210,7 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
run_ctx.is_uprobe = true;
- array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); + array = rcu_dereference_check(*array_rcu, rcu_read_lock_trace_held()); if (unlikely(!array)) goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b82ee3c334ea77860cce77581a74d..c4eef1eb5ddb3c65205aa9d64af1c72d62fab87f 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,7 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
- ret = bpf_prog_run_array_uprobe(call->prog_array, regs, bpf_prog_run); + ret = bpf_prog_run_array_uprobe(&call->prog_array, regs, bpf_prog_run); if (!ret) return; }
--- base-commit: 509df676c2d79c985ec2eaa3e3a3bbe557645861 change-id: 20241206-bpf-fix-uprobe-uaf-53d928bab3d0
On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote:
Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
It looks like the intention was to pass a pointer to the member call->prog_array into bpf_prog_run_array_uprobe() and actually dereference the pointer in there. Fix the issue by actually doing that.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly before the might_fault() in bpf_prog_run_array_uprobe() and add an include of linux/delay.h.
Build this userspace program:
$ cat dummy.c #include <stdio.h> int main(void) { printf("hello world\n"); } $ gcc -o dummy dummy.c
Then build this BPF program and load it (change the path to point to the "dummy" binary you built):
$ cat bpf-uprobe-kern.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> char _license[] SEC("license") = "GPL"; SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") int BPF_UPROBE(main_uprobe) { bpf_printk("main uprobe triggered\n"); return 0; } $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
Then run ./dummy in one terminal, and after launching it, run `sudo umount uprobe-test` in another terminal. Once the 10-second mdelay() is over, a use-after-free should occur, which may or may not crash your kernel at the `prog->sleepable` check in bpf_prog_run_array_uprobe() depending on your luck.
Changes in v2:
- remove diff chunk in patch notes that confuses git
- Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog...
include/linux/bpf.h | 4 ++-- kernel/trace/trace_uprobe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
Looking at how similar in spirit bpf_prog_run_array() is meant to be used, it seems like it is the caller's responsibility to RCU-dereference array and keep RCU critical section before calling into bpf_prog_run_array(). So I wonder if it's best to do this instead (Gmail will butcher the diff, but it's about the idea):
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c..4b8a9edd3727 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array, * rcu-protected dynamically sized maps. */ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; - const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1;
might_fault(); + RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held"); + + if (unlikely(!array)) + goto out;
- rcu_read_lock_trace(); migrate_disable();
run_ctx.is_uprobe = true;
- array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); - if (unlikely(!array)) - goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) { @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, bpf_reset_run_ctx(old_run_ctx); out: migrate_enable(); - rcu_read_unlock_trace(); return ret; }
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b..87a2b8fefa90 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
+ rcu_read_lock_trace(); ret = bpf_prog_run_array_uprobe(call->prog_array, regs, bpf_prog_run); + rcu_read_unlock_trace(); if (!ret) return; }
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c150a34a7b1075584711609682e4c..00b3c5b197df75a0386233b9885b480b2ce72f5f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,7 +2193,7 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
- rcu-protected dynamically sized maps.
*/ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(struct bpf_prog_array __rcu **array_rcu, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; @@ -2210,7 +2210,7 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
run_ctx.is_uprobe = true;
array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
array = rcu_dereference_check(*array_rcu, rcu_read_lock_trace_held()); if (unlikely(!array)) goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b82ee3c334ea77860cce77581a74d..c4eef1eb5ddb3c65205aa9d64af1c72d62fab87f 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,7 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
ret = bpf_prog_run_array_uprobe(call->prog_array, regs, bpf_prog_run);
ret = bpf_prog_run_array_uprobe(&call->prog_array, regs, bpf_prog_run); if (!ret) return; }
base-commit: 509df676c2d79c985ec2eaa3e3a3bbe557645861 change-id: 20241206-bpf-fix-uprobe-uaf-53d928bab3d0
-- Jann Horn jannh@google.com
On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote:
Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
It looks like the intention was to pass a pointer to the member call->prog_array into bpf_prog_run_array_uprobe() and actually dereference the pointer in there. Fix the issue by actually doing that.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly before the might_fault() in bpf_prog_run_array_uprobe() and add an include of linux/delay.h.
Build this userspace program:
$ cat dummy.c #include <stdio.h> int main(void) { printf("hello world\n"); } $ gcc -o dummy dummy.c
Then build this BPF program and load it (change the path to point to the "dummy" binary you built):
$ cat bpf-uprobe-kern.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> char _license[] SEC("license") = "GPL"; SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") int BPF_UPROBE(main_uprobe) { bpf_printk("main uprobe triggered\n"); return 0; } $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
Then run ./dummy in one terminal, and after launching it, run `sudo umount uprobe-test` in another terminal. Once the 10-second mdelay() is over, a use-after-free should occur, which may or may not crash your kernel at the `prog->sleepable` check in bpf_prog_run_array_uprobe() depending on your luck.
Changes in v2:
- remove diff chunk in patch notes that confuses git
- Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog...
include/linux/bpf.h | 4 ++-- kernel/trace/trace_uprobe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
Looking at how similar in spirit bpf_prog_run_array() is meant to be used, it seems like it is the caller's responsibility to RCU-dereference array and keep RCU critical section before calling into bpf_prog_run_array(). So I wonder if it's best to do this instead (Gmail will butcher the diff, but it's about the idea):
Yeah, that's the other option I was considering. That would be more consistent with the existing bpf_prog_run_array(), but has the downside of unnecessarily pushing responsibility up to the caller... I'm fine with either.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c..4b8a9edd3727 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
- rcu-protected dynamically sized maps.
*/ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog;
const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1; might_fault();
RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
if (unlikely(!array))
goto out;
rcu_read_lock_trace(); migrate_disable(); run_ctx.is_uprobe = true;
array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
if (unlikely(!array))
goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) {
@@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, bpf_reset_run_ctx(old_run_ctx); out: migrate_enable();
rcu_read_unlock_trace(); return ret;
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b..87a2b8fefa90 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
rcu_read_lock_trace(); ret = bpf_prog_run_array_uprobe(call->prog_array,
regs, bpf_prog_run);
But then this should be something like this (possibly split across multiple lines with a helper variable or such):
ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()), regs, bpf_prog_run);
rcu_read_unlock_trace(); if (!ret) return; }
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote:
Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
It looks like the intention was to pass a pointer to the member call->prog_array into bpf_prog_run_array_uprobe() and actually dereference the pointer in there. Fix the issue by actually doing that.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly before the might_fault() in bpf_prog_run_array_uprobe() and add an include of linux/delay.h.
Build this userspace program:
$ cat dummy.c #include <stdio.h> int main(void) { printf("hello world\n"); } $ gcc -o dummy dummy.c
Then build this BPF program and load it (change the path to point to the "dummy" binary you built):
$ cat bpf-uprobe-kern.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> char _license[] SEC("license") = "GPL"; SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") int BPF_UPROBE(main_uprobe) { bpf_printk("main uprobe triggered\n"); return 0; } $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
Then run ./dummy in one terminal, and after launching it, run `sudo umount uprobe-test` in another terminal. Once the 10-second mdelay() is over, a use-after-free should occur, which may or may not crash your kernel at the `prog->sleepable` check in bpf_prog_run_array_uprobe() depending on your luck.
Changes in v2:
- remove diff chunk in patch notes that confuses git
- Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog...
include/linux/bpf.h | 4 ++-- kernel/trace/trace_uprobe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
Looking at how similar in spirit bpf_prog_run_array() is meant to be used, it seems like it is the caller's responsibility to RCU-dereference array and keep RCU critical section before calling into bpf_prog_run_array(). So I wonder if it's best to do this instead (Gmail will butcher the diff, but it's about the idea):
Yeah, that's the other option I was considering. That would be more consistent with the existing bpf_prog_run_array(), but has the downside of unnecessarily pushing responsibility up to the caller... I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c..4b8a9edd3727 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
- rcu-protected dynamically sized maps.
*/ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog;
const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1; might_fault();
RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
if (unlikely(!array))
goto out;
rcu_read_lock_trace(); migrate_disable(); run_ctx.is_uprobe = true;
array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
if (unlikely(!array))
goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) {
@@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, bpf_reset_run_ctx(old_run_ctx); out: migrate_enable();
rcu_read_unlock_trace(); return ret;
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b..87a2b8fefa90 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
rcu_read_lock_trace(); ret = bpf_prog_run_array_uprobe(call->prog_array,
regs, bpf_prog_run);
But then this should be something like this (possibly split across multiple lines with a helper variable or such):
ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_read_unlock_trace(); if (!ret) return; }
On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote:
Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
It looks like the intention was to pass a pointer to the member call->prog_array into bpf_prog_run_array_uprobe() and actually dereference the pointer in there. Fix the issue by actually doing that.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly before the might_fault() in bpf_prog_run_array_uprobe() and add an include of linux/delay.h.
Build this userspace program:
$ cat dummy.c #include <stdio.h> int main(void) { printf("hello world\n"); } $ gcc -o dummy dummy.c
Then build this BPF program and load it (change the path to point to the "dummy" binary you built):
$ cat bpf-uprobe-kern.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> char _license[] SEC("license") = "GPL"; SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") int BPF_UPROBE(main_uprobe) { bpf_printk("main uprobe triggered\n"); return 0; } $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
Then run ./dummy in one terminal, and after launching it, run `sudo umount uprobe-test` in another terminal. Once the 10-second mdelay() is over, a use-after-free should occur, which may or may not crash your kernel at the `prog->sleepable` check in bpf_prog_run_array_uprobe() depending on your luck.
Changes in v2:
- remove diff chunk in patch notes that confuses git
- Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog...
include/linux/bpf.h | 4 ++-- kernel/trace/trace_uprobe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
Looking at how similar in spirit bpf_prog_run_array() is meant to be used, it seems like it is the caller's responsibility to RCU-dereference array and keep RCU critical section before calling into bpf_prog_run_array(). So I wonder if it's best to do this instead (Gmail will butcher the diff, but it's about the idea):
Yeah, that's the other option I was considering. That would be more consistent with the existing bpf_prog_run_array(), but has the downside of unnecessarily pushing responsibility up to the caller... I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
Ack, I will make it consistent.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c..4b8a9edd3727 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
- rcu-protected dynamically sized maps.
*/ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog;
const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1; might_fault();
RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
if (unlikely(!array))
goto out;
rcu_read_lock_trace(); migrate_disable(); run_ctx.is_uprobe = true;
array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
if (unlikely(!array))
goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) {
@@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, bpf_reset_run_ctx(old_run_ctx); out: migrate_enable();
rcu_read_unlock_trace(); return ret;
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b..87a2b8fefa90 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
rcu_read_lock_trace(); ret = bpf_prog_run_array_uprobe(call->prog_array,
regs, bpf_prog_run);
But then this should be something like this (possibly split across multiple lines with a helper variable or such):
ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_dereference() is not legal there - that asserts that we are in a normal RCU read-side critical section, which we are not. rcu_dereference_raw() would be, but I think it is nice to document the semantics to make it explicit under which lock we're operating.
I'll send a v3 in a bit after testing it.
On Fri, Dec 6, 2024 at 11:43 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote:
Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
It looks like the intention was to pass a pointer to the member call->prog_array into bpf_prog_run_array_uprobe() and actually dereference the pointer in there. Fix the issue by actually doing that.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly before the might_fault() in bpf_prog_run_array_uprobe() and add an include of linux/delay.h.
Build this userspace program:
$ cat dummy.c #include <stdio.h> int main(void) { printf("hello world\n"); } $ gcc -o dummy dummy.c
Then build this BPF program and load it (change the path to point to the "dummy" binary you built):
$ cat bpf-uprobe-kern.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> char _license[] SEC("license") = "GPL"; SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") int BPF_UPROBE(main_uprobe) { bpf_printk("main uprobe triggered\n"); return 0; } $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
Then run ./dummy in one terminal, and after launching it, run `sudo umount uprobe-test` in another terminal. Once the 10-second mdelay() is over, a use-after-free should occur, which may or may not crash your kernel at the `prog->sleepable` check in bpf_prog_run_array_uprobe() depending on your luck.
Changes in v2:
- remove diff chunk in patch notes that confuses git
- Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog...
include/linux/bpf.h | 4 ++-- kernel/trace/trace_uprobe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
Looking at how similar in spirit bpf_prog_run_array() is meant to be used, it seems like it is the caller's responsibility to RCU-dereference array and keep RCU critical section before calling into bpf_prog_run_array(). So I wonder if it's best to do this instead (Gmail will butcher the diff, but it's about the idea):
Yeah, that's the other option I was considering. That would be more consistent with the existing bpf_prog_run_array(), but has the downside of unnecessarily pushing responsibility up to the caller... I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
Ack, I will make it consistent.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c..4b8a9edd3727 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
- rcu-protected dynamically sized maps.
*/ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog;
const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1; might_fault();
RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
if (unlikely(!array))
goto out;
rcu_read_lock_trace(); migrate_disable(); run_ctx.is_uprobe = true;
array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
if (unlikely(!array))
goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) {
@@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, bpf_reset_run_ctx(old_run_ctx); out: migrate_enable();
rcu_read_unlock_trace(); return ret;
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b..87a2b8fefa90 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
rcu_read_lock_trace(); ret = bpf_prog_run_array_uprobe(call->prog_array,
regs, bpf_prog_run);
But then this should be something like this (possibly split across multiple lines with a helper variable or such):
ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_dereference() is not legal there - that asserts that we are in a normal RCU read-side critical section, which we are not. rcu_dereference_raw() would be, but I think it is nice to document the semantics to make it explicit under which lock we're operating.
I'll send a v3 in a bit after testing it.
Actually, now I'm still hitting a page fault with my WIP v3 fix applied... I'll probably poke at this some more next week.
On Fri, Dec 6, 2024 at 3:14 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:43 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote:
Currently, the pointer stored in call->prog_array is loaded in __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the loaded pointer can immediately be dangling. Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, but this is too late. It then uses rcu_dereference_check(), but this use of rcu_dereference_check() does not actually dereference anything.
It looks like the intention was to pass a pointer to the member call->prog_array into bpf_prog_run_array_uprobe() and actually dereference the pointer in there. Fix the issue by actually doing that.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly before the might_fault() in bpf_prog_run_array_uprobe() and add an include of linux/delay.h.
Build this userspace program:
$ cat dummy.c #include <stdio.h> int main(void) { printf("hello world\n"); } $ gcc -o dummy dummy.c
Then build this BPF program and load it (change the path to point to the "dummy" binary you built):
$ cat bpf-uprobe-kern.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> char _license[] SEC("license") = "GPL"; SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") int BPF_UPROBE(main_uprobe) { bpf_printk("main uprobe triggered\n"); return 0; } $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
Then run ./dummy in one terminal, and after launching it, run `sudo umount uprobe-test` in another terminal. Once the 10-second mdelay() is over, a use-after-free should occur, which may or may not crash your kernel at the `prog->sleepable` check in bpf_prog_run_array_uprobe() depending on your luck.
Changes in v2:
- remove diff chunk in patch notes that confuses git
- Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog...
include/linux/bpf.h | 4 ++-- kernel/trace/trace_uprobe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
Looking at how similar in spirit bpf_prog_run_array() is meant to be used, it seems like it is the caller's responsibility to RCU-dereference array and keep RCU critical section before calling into bpf_prog_run_array(). So I wonder if it's best to do this instead (Gmail will butcher the diff, but it's about the idea):
Yeah, that's the other option I was considering. That would be more consistent with the existing bpf_prog_run_array(), but has the downside of unnecessarily pushing responsibility up to the caller... I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
Ack, I will make it consistent.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c..4b8a9edd3727 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
- rcu-protected dynamically sized maps.
*/ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog;
const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1; might_fault();
RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
if (unlikely(!array))
goto out;
rcu_read_lock_trace(); migrate_disable(); run_ctx.is_uprobe = true;
array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
if (unlikely(!array))
goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) {
@@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, bpf_reset_run_ctx(old_run_ctx); out: migrate_enable();
rcu_read_unlock_trace(); return ret;
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b..87a2b8fefa90 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
rcu_read_lock_trace(); ret = bpf_prog_run_array_uprobe(call->prog_array,
regs, bpf_prog_run);
But then this should be something like this (possibly split across multiple lines with a helper variable or such):
ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_dereference() is not legal there - that asserts that we are in a normal RCU read-side critical section, which we are not. rcu_dereference_raw() would be, but I think it is nice to document the semantics to make it explicit under which lock we're operating.
sure, I don't mind
I'll send a v3 in a bit after testing it.
Actually, now I'm still hitting a page fault with my WIP v3 fix applied... I'll probably poke at this some more next week.
OK, that's interesting, keep us posted!
On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 3:14 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:43 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote: > > Currently, the pointer stored in call->prog_array is loaded in > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the > loaded pointer can immediately be dangling. Later, > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, > but this is too late. It then uses rcu_dereference_check(), but this use of > rcu_dereference_check() does not actually dereference anything. > > It looks like the intention was to pass a pointer to the member > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference > the pointer in there. Fix the issue by actually doing that. > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") > Cc: stable@vger.kernel.org > Signed-off-by: Jann Horn jannh@google.com > --- > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly > before the might_fault() in bpf_prog_run_array_uprobe() and add an > include of linux/delay.h. > > Build this userspace program: > > ``` > $ cat dummy.c > #include <stdio.h> > int main(void) { > printf("hello world\n"); > } > $ gcc -o dummy dummy.c > ``` > > Then build this BPF program and load it (change the path to point to > the "dummy" binary you built): > > ``` > $ cat bpf-uprobe-kern.c > #include <linux/bpf.h> > #include <bpf/bpf_helpers.h> > #include <bpf/bpf_tracing.h> > char _license[] SEC("license") = "GPL"; > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") > int BPF_UPROBE(main_uprobe) { > bpf_printk("main uprobe triggered\n"); > return 0; > } > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach > ``` > > Then run ./dummy in one terminal, and after launching it, run > `sudo umount uprobe-test` in another terminal. Once the 10-second > mdelay() is over, a use-after-free should occur, which may or may > not crash your kernel at the `prog->sleepable` check in > bpf_prog_run_array_uprobe() depending on your luck. > --- > Changes in v2: > - remove diff chunk in patch notes that confuses git > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog... > --- > include/linux/bpf.h | 4 ++-- > kernel/trace/trace_uprobe.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) >
Looking at how similar in spirit bpf_prog_run_array() is meant to be used, it seems like it is the caller's responsibility to RCU-dereference array and keep RCU critical section before calling into bpf_prog_run_array(). So I wonder if it's best to do this instead (Gmail will butcher the diff, but it's about the idea):
Yeah, that's the other option I was considering. That would be more consistent with the existing bpf_prog_run_array(), but has the downside of unnecessarily pushing responsibility up to the caller... I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
Ack, I will make it consistent.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index eaee2a819f4c..4b8a9edd3727 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
- rcu-protected dynamically sized maps.
*/ static __always_inline u32 -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog;
const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1; might_fault();
RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
if (unlikely(!array))
goto out;
rcu_read_lock_trace(); migrate_disable(); run_ctx.is_uprobe = true;
array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
if (unlikely(!array))
goto out; old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) {
@@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, bpf_reset_run_ctx(old_run_ctx); out: migrate_enable();
rcu_read_unlock_trace(); return ret;
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index fed382b7881b..87a2b8fefa90 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, if (bpf_prog_array_valid(call)) { u32 ret;
rcu_read_lock_trace(); ret = bpf_prog_run_array_uprobe(call->prog_array,
regs, bpf_prog_run);
But then this should be something like this (possibly split across multiple lines with a helper variable or such):
ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_dereference() is not legal there - that asserts that we are in a normal RCU read-side critical section, which we are not. rcu_dereference_raw() would be, but I think it is nice to document the semantics to make it explicit under which lock we're operating.
sure, I don't mind
I'll send a v3 in a bit after testing it.
Actually, now I'm still hitting a page fault with my WIP v3 fix applied... I'll probably poke at this some more next week.
OK, that's interesting, keep us posted!
If I replace the "uprobe/" in my reproducer with "uprobe.s/", the reproducer stops crashing even on bpf/master without this fix - because it happens that handle_swbp() is already holding a rcu_read_lock_trace() lock way up the stack. So I think this fix should still be applied, but it probably doesn't need to go into stable unless there is another path to the buggy code that doesn't come from handle_swbp(). I guess I probably should resend my patch with an updated commit message pointing out this caveat?
The problem I'm actually hitting seems to be a use-after-free of a "struct bpf_prog" because of mismatching RCU flavors. Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But it is possible to attach a non-sleepable BPF program to a uprobe, and non-sleepable BPF programs are freed via normal RCU (see __bpf_prog_put_noref()). And that is what happens with the reproducer from my initial post (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@go...) - I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
So I think that while I am delaying execution in bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out the event->tp_event->prog_array pointer and does bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then the BPF program can be freed since the reader doesn't hold an RCU read lock. This seems a bit annoying to fix - there could legitimately be several versions of the bpf_prog_array that are still used by tasks-trace-RCU readers, so I think we can't just NULL out the array entry and use RCU for the bpf_prog_array access on the reader side. I guess we could add another flag on BPF programs that answers "should this program be freed via tasks-trace-RCU" (separately from whether the program is sleepable)?
On Mon, Dec 9, 2024 at 10:22 AM Jann Horn jannh@google.com wrote:
On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 3:14 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:43 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote: > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote: > > > > Currently, the pointer stored in call->prog_array is loaded in > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the > > loaded pointer can immediately be dangling. Later, > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, > > but this is too late. It then uses rcu_dereference_check(), but this use of > > rcu_dereference_check() does not actually dereference anything. > > > > It looks like the intention was to pass a pointer to the member > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference > > the pointer in there. Fix the issue by actually doing that. > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") > > Cc: stable@vger.kernel.org > > Signed-off-by: Jann Horn jannh@google.com > > --- > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly > > before the might_fault() in bpf_prog_run_array_uprobe() and add an > > include of linux/delay.h. > > > > Build this userspace program: > > > > ``` > > $ cat dummy.c > > #include <stdio.h> > > int main(void) { > > printf("hello world\n"); > > } > > $ gcc -o dummy dummy.c > > ``` > > > > Then build this BPF program and load it (change the path to point to > > the "dummy" binary you built): > > > > ``` > > $ cat bpf-uprobe-kern.c > > #include <linux/bpf.h> > > #include <bpf/bpf_helpers.h> > > #include <bpf/bpf_tracing.h> > > char _license[] SEC("license") = "GPL"; > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") > > int BPF_UPROBE(main_uprobe) { > > bpf_printk("main uprobe triggered\n"); > > return 0; > > } > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach > > ``` > > > > Then run ./dummy in one terminal, and after launching it, run > > `sudo umount uprobe-test` in another terminal. Once the 10-second > > mdelay() is over, a use-after-free should occur, which may or may > > not crash your kernel at the `prog->sleepable` check in > > bpf_prog_run_array_uprobe() depending on your luck. > > --- > > Changes in v2: > > - remove diff chunk in patch notes that confuses git > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog... > > --- > > include/linux/bpf.h | 4 ++-- > > kernel/trace/trace_uprobe.c | 2 +- > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be > used, it seems like it is the caller's responsibility to > RCU-dereference array and keep RCU critical section before calling > into bpf_prog_run_array(). So I wonder if it's best to do this instead > (Gmail will butcher the diff, but it's about the idea):
Yeah, that's the other option I was considering. That would be more consistent with the existing bpf_prog_run_array(), but has the downside of unnecessarily pushing responsibility up to the caller... I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
Ack, I will make it consistent.
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index eaee2a819f4c..4b8a9edd3727 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array, > * rcu-protected dynamically sized maps. > */ > static __always_inline u32 > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, > const void *ctx, bpf_prog_run_fn run_prog) > { > const struct bpf_prog_array_item *item; > const struct bpf_prog *prog; > - const struct bpf_prog_array *array; > struct bpf_run_ctx *old_run_ctx; > struct bpf_trace_run_ctx run_ctx; > u32 ret = 1; > > might_fault(); > + RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held"); > + > + if (unlikely(!array)) > + goto out; > > - rcu_read_lock_trace(); > migrate_disable(); > > run_ctx.is_uprobe = true; > > - array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); > - if (unlikely(!array)) > - goto out; > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > item = &array->items[0]; > while ((prog = READ_ONCE(item->prog))) { > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct > bpf_prog_array __rcu *array_rcu, > bpf_reset_run_ctx(old_run_ctx); > out: > migrate_enable(); > - rcu_read_unlock_trace(); > return ret; > } > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > index fed382b7881b..87a2b8fefa90 100644 > --- a/kernel/trace/trace_uprobe.c > +++ b/kernel/trace/trace_uprobe.c > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, > if (bpf_prog_array_valid(call)) { > u32 ret; > > + rcu_read_lock_trace(); > ret = bpf_prog_run_array_uprobe(call->prog_array, > regs, bpf_prog_run);
But then this should be something like this (possibly split across multiple lines with a helper variable or such):
ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_dereference() is not legal there - that asserts that we are in a normal RCU read-side critical section, which we are not. rcu_dereference_raw() would be, but I think it is nice to document the semantics to make it explicit under which lock we're operating.
sure, I don't mind
I'll send a v3 in a bit after testing it.
Actually, now I'm still hitting a page fault with my WIP v3 fix applied... I'll probably poke at this some more next week.
OK, that's interesting, keep us posted!
If I replace the "uprobe/" in my reproducer with "uprobe.s/", the reproducer stops crashing even on bpf/master without this fix - because it happens that handle_swbp() is already holding a rcu_read_lock_trace() lock way up the stack. So I think this fix should still be applied, but it probably doesn't need to go into stable unless there is another path to the buggy code that doesn't come from handle_swbp(). I guess I probably should resend my patch with an updated commit message pointing out this caveat?
The problem I'm actually hitting seems to be a use-after-free of a "struct bpf_prog" because of mismatching RCU flavors. Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But it is possible to attach a non-sleepable BPF program to a uprobe, and non-sleepable BPF programs are freed via normal RCU (see __bpf_prog_put_noref()). And that is what happens with the reproducer from my initial post (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@go...)
- I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
So I think that while I am delaying execution in bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out the event->tp_event->prog_array pointer and does bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then the BPF program can be freed since the reader doesn't hold an RCU read lock. This seems a bit annoying to fix - there could legitimately be several versions of the bpf_prog_array that are still used by tasks-trace-RCU readers, so I think we can't just NULL out the array entry and use RCU for the bpf_prog_array access on the reader side. I guess we could add another flag on BPF programs that answers "should this program be freed via tasks-trace-RCU" (separately from whether the program is sleepable)?
Yes, we shouldn't NULL anything out.
This is the same issue we've been solving recently for sleepable tracepoints, see [0] and other patches in the same patch set. We solved it for sleepable (aka "faultable") tracepoints, but uprobes have the same problem where the attachment point is sleepable by nature (and thus protected by RCU Tasks Trace), but BPF program itself is non-sleepable (and thus we only wait for synchronize_rcu() before freeing), which causes a disconnect.
We can easily fix this for BPF link-based uprobes, but legacy uprobes can be directly attached to perf event, so that's a bit more cumbersome. Let me think what should be the best way to handle this.
Meanwhile, I agree, please send your original fix (with changes we discussed), it's good to have them, even if they don't fix your original issue. I'll CC you on fixes once I have them.
[0] https://lore.kernel.org/all/20241101181754.782341-1-andrii@kernel.org/
On Mon, Dec 9, 2024 at 2:14 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Mon, Dec 9, 2024 at 10:22 AM Jann Horn jannh@google.com wrote:
On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 3:14 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:43 PM Jann Horn jannh@google.com wrote:
On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
On Fri, Dec 6, 2024 at 2:25 PM Jann Horn jannh@google.com wrote: > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko > andrii.nakryiko@gmail.com wrote: > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn jannh@google.com wrote: > > > > > > Currently, the pointer stored in call->prog_array is loaded in > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the > > > loaded pointer can immediately be dangling. Later, > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section, > > > but this is too late. It then uses rcu_dereference_check(), but this use of > > > rcu_dereference_check() does not actually dereference anything. > > > > > > It looks like the intention was to pass a pointer to the member > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference > > > the pointer in there. Fix the issue by actually doing that. > > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Jann Horn jannh@google.com > > > --- > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an > > > include of linux/delay.h. > > > > > > Build this userspace program: > > > > > > ``` > > > $ cat dummy.c > > > #include <stdio.h> > > > int main(void) { > > > printf("hello world\n"); > > > } > > > $ gcc -o dummy dummy.c > > > ``` > > > > > > Then build this BPF program and load it (change the path to point to > > > the "dummy" binary you built): > > > > > > ``` > > > $ cat bpf-uprobe-kern.c > > > #include <linux/bpf.h> > > > #include <bpf/bpf_helpers.h> > > > #include <bpf/bpf_tracing.h> > > > char _license[] SEC("license") = "GPL"; > > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main") > > > int BPF_UPROBE(main_uprobe) { > > > bpf_printk("main uprobe triggered\n"); > > > return 0; > > > } > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach > > > ``` > > > > > > Then run ./dummy in one terminal, and after launching it, run > > > `sudo umount uprobe-test` in another terminal. Once the 10-second > > > mdelay() is over, a use-after-free should occur, which may or may > > > not crash your kernel at the `prog->sleepable` check in > > > bpf_prog_run_array_uprobe() depending on your luck. > > > --- > > > Changes in v2: > > > - remove diff chunk in patch notes that confuses git > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@goog... > > > --- > > > include/linux/bpf.h | 4 ++-- > > > kernel/trace/trace_uprobe.c | 2 +- > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be > > used, it seems like it is the caller's responsibility to > > RCU-dereference array and keep RCU critical section before calling > > into bpf_prog_run_array(). So I wonder if it's best to do this instead > > (Gmail will butcher the diff, but it's about the idea): > > Yeah, that's the other option I was considering. That would be more > consistent with the existing bpf_prog_run_array(), but has the > downside of unnecessarily pushing responsibility up to the caller... > I'm fine with either.
there is really just one caller ("legacy" singular uprobe handler), so I think this should be fine. Unless someone objects I'd keep it consistent with other "prog_array_run" helpers
Ack, I will make it consistent.
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index eaee2a819f4c..4b8a9edd3727 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array, > > * rcu-protected dynamically sized maps. > > */ > > static __always_inline u32 > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu, > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, > > const void *ctx, bpf_prog_run_fn run_prog) > > { > > const struct bpf_prog_array_item *item; > > const struct bpf_prog *prog; > > - const struct bpf_prog_array *array; > > struct bpf_run_ctx *old_run_ctx; > > struct bpf_trace_run_ctx run_ctx; > > u32 ret = 1; > > > > might_fault(); > > + RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held"); > > + > > + if (unlikely(!array)) > > + goto out; > > > > - rcu_read_lock_trace(); > > migrate_disable(); > > > > run_ctx.is_uprobe = true; > > > > - array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); > > - if (unlikely(!array)) > > - goto out; > > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > > item = &array->items[0]; > > while ((prog = READ_ONCE(item->prog))) { > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct > > bpf_prog_array __rcu *array_rcu, > > bpf_reset_run_ctx(old_run_ctx); > > out: > > migrate_enable(); > > - rcu_read_unlock_trace(); > > return ret; > > } > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > > index fed382b7881b..87a2b8fefa90 100644 > > --- a/kernel/trace/trace_uprobe.c > > +++ b/kernel/trace/trace_uprobe.c > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu, > > if (bpf_prog_array_valid(call)) { > > u32 ret; > > > > + rcu_read_lock_trace(); > > ret = bpf_prog_run_array_uprobe(call->prog_array, > > regs, bpf_prog_run); > > But then this should be something like this (possibly split across > multiple lines with a helper variable or such): > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array, > rcu_read_lock_trace_held()), regs, bpf_prog_run);
Yeah, absolutely, forgot to move the RCU dereference part, good catch! But I wouldn't do the _check() variant here, literally the previous line does rcu_read_trace_lock(), so this check part seems like just unnecessary verboseness, I'd go with a simple rcu_dereference().
rcu_dereference() is not legal there - that asserts that we are in a normal RCU read-side critical section, which we are not. rcu_dereference_raw() would be, but I think it is nice to document the semantics to make it explicit under which lock we're operating.
sure, I don't mind
I'll send a v3 in a bit after testing it.
Actually, now I'm still hitting a page fault with my WIP v3 fix applied... I'll probably poke at this some more next week.
OK, that's interesting, keep us posted!
If I replace the "uprobe/" in my reproducer with "uprobe.s/", the reproducer stops crashing even on bpf/master without this fix - because it happens that handle_swbp() is already holding a rcu_read_lock_trace() lock way up the stack. So I think this fix should still be applied, but it probably doesn't need to go into stable unless there is another path to the buggy code that doesn't come from handle_swbp(). I guess I probably should resend my patch with an updated commit message pointing out this caveat?
The problem I'm actually hitting seems to be a use-after-free of a "struct bpf_prog" because of mismatching RCU flavors. Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But it is possible to attach a non-sleepable BPF program to a uprobe, and non-sleepable BPF programs are freed via normal RCU (see __bpf_prog_put_noref()). And that is what happens with the reproducer from my initial post (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@go...)
- I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
So I think that while I am delaying execution in bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out the event->tp_event->prog_array pointer and does bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then the BPF program can be freed since the reader doesn't hold an RCU read lock. This seems a bit annoying to fix - there could legitimately be several versions of the bpf_prog_array that are still used by tasks-trace-RCU readers, so I think we can't just NULL out the array entry and use RCU for the bpf_prog_array access on the reader side. I guess we could add another flag on BPF programs that answers "should this program be freed via tasks-trace-RCU" (separately from whether the program is sleepable)?
Yes, we shouldn't NULL anything out.
This is the same issue we've been solving recently for sleepable tracepoints, see [0] and other patches in the same patch set. We solved it for sleepable (aka "faultable") tracepoints, but uprobes have the same problem where the attachment point is sleepable by nature (and thus protected by RCU Tasks Trace), but BPF program itself is non-sleepable (and thus we only wait for synchronize_rcu() before freeing), which causes a disconnect.
We can easily fix this for BPF link-based uprobes, but legacy uprobes can be directly attached to perf event, so that's a bit more cumbersome. Let me think what should be the best way to handle this.
Meanwhile, I agree, please send your original fix (with changes we discussed), it's good to have them, even if they don't fix your original issue. I'll CC you on fixes once I have them.
Ok, weeding through the perf/uprobe plumbing for BPF, I think we avoid this problem with uprobe BPF link because uprobe_unregister_sync() waits for RCU Tasks Trace GP, and so once we finish uprobe unregistration, we have a guarantee that there is no more uprobe that might dereference our BPF program. (I might have thought about this problem when fixing BPF link for sleepable tracepoints, but I missed the legacy perf_event attach/detach case).
With legacy perf event perf_event_detach_bpf_prog() we don't do any of that, we just NULL out pointer and do bpf_prog_put(), not caring whether this is uprobe, kprobe, or tracepoint...
So one way to solve this is to either teach perf_event_detach_bpf_prog() to delay bpf_prog_put() until after RCU Tasks Trace GP (which is what we do with bpf_prog_array, but not the program itself), or add prog->aux->sleepable_hook flag in addition to prog->aux->sleepable, and then inside bpf_prog_put() check (prog->aux->sleepable || prog->aux->sleepable_hook) and do RCU Tasks Trace delay (in addition to normal call_rcu()).
Third alternative would be to have something like bpf_prog_put_sleepable() (just like we have bpf_prog_array_free_sleepable()), where this would do additional call_rcu_tasks_trace() even if BPF program itself isn't sleepable.
Alexei, Daniel, any thoughts or preferences?
[0] https://lore.kernel.org/all/20241101181754.782341-1-andrii@kernel.org/
On Mon, Dec 9, 2024 at 2:45 PM Andrii Nakryiko andrii.nakryiko@gmail.com wrote:
Ok, weeding through the perf/uprobe plumbing for BPF, I think we avoid this problem with uprobe BPF link because uprobe_unregister_sync() waits for RCU Tasks Trace GP, and so once we finish uprobe unregistration, we have a guarantee that there is no more uprobe that might dereference our BPF program. (I might have thought about this problem when fixing BPF link for sleepable tracepoints, but I missed the legacy perf_event attach/detach case).
With legacy perf event perf_event_detach_bpf_prog() we don't do any of that, we just NULL out pointer and do bpf_prog_put(), not caring whether this is uprobe, kprobe, or tracepoint...
So one way to solve this is to either teach perf_event_detach_bpf_prog() to delay bpf_prog_put() until after RCU Tasks Trace GP (which is what we do with bpf_prog_array, but not the program itself),
since this is a legacy prog detach api I would just add sync_rcu_tt there. It's a backportable one line change.
or add prog->aux->sleepable_hook flag in addition to prog->aux->sleepable, and then inside bpf_prog_put() check (prog->aux->sleepable || prog->aux->sleepable_hook) and do RCU Tasks Trace delay (in addition to normal call_rcu()).
That sounds like more work and if we introduce sleepable_hook we would better generalize it and rely on it everywhere. Which makes it even more work and certainly not for stable.
Third alternative would be to have something like bpf_prog_put_sleepable() (just like we have bpf_prog_array_free_sleepable()), where this would do additional call_rcu_tasks_trace() even if BPF program itself isn't sleepable.
Sounds like less work than 2, but conceptually it's the same as 1. Just call_rcu_tt vs sync_rcu_tt. And we can wait just fine in that code path. So I'd go with 1.
linux-stable-mirror@lists.linaro.org