JIT'd bpf programs that have subprograms can have a postive value for num_extentries but a NULL value for extable. This is problematic if one of these bpf programs encounters a fault during its execution. The fault handlers correctly identify that the faulting IP belongs to a bpf program. However, performing a search_extable call on a NULL extable leads to a second fault.
Fix up by refusing to search a NULL extable, and by checking the subprograms' extables if the umbrella program has subprograms configured.
Once I realized what was going on, I was able to use the following bpf program to get an oops from this failure:
#include "vmlinux.h" #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
#define PATH_MAX 4096
struct callback_ctx { u8 match; };
struct filter_value { char prefix[PATH_MAX]; }; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 256); __type(key, int); __type(value, struct filter_value); } test_filter SEC(".maps");
static __u64 test_filter_cb(struct bpf_map *map, __u32 *key, struct filter_value *val, struct callback_ctx *data) { return 1; }
SEC("fentry/__sys_bind") int BPF_PROG(__sys_bind, int fd, struct sockaddr *umyaddr, int addrlen) { pid_t pid;
struct callback_ctx cx = { .match = 0 }; pid = bpf_get_current_pid_tgid() >> 32; bpf_for_each_map_elem(&test_filter, test_filter_cb, &cx, 0); bpf_printk("fentry: pid = %d, family = %llx\n", pid, umyaddr->sa_family); return 0; }
And then the following code to actually trigger a failure:
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/ip.h>
int main(int argc, char *argv[]) { int sfd, rc; struct sockaddr *sockptr = (struct sockaddr *)0x900000000000;
sfd = socket(AF_INET, SOCK_STREAM, 0); if (sfd < 0) { perror("socket"); exit(EXIT_FAILURE); }
while (1) { rc = bind(sfd, (struct sockaddr *) sockptr, sizeof(struct sockaddr_in)); if (rc < 0) { perror("bind"); sleep(5); } else { break; } }
return 0; }
I was able to validate that this problem does not occur when subprograms are not in use, or when the direct pointer accesses are replaced with bpf_probe_read calls. I further validated that this did not break the extable handling in existing bpf programs. The same program caused no failures when subprograms were removed, but the exception was still injected.
Cc: stable@vger.kernel.org Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: Krister Johansen kjlx@templeofstupid.com --- kernel/bpf/core.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7421487422d4..0e12238e4340 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -736,15 +736,33 @@ const struct exception_table_entry *search_bpf_extables(unsigned long addr) { const struct exception_table_entry *e = NULL; struct bpf_prog *prog; + struct bpf_prog_aux *aux; + int i;
rcu_read_lock(); prog = bpf_prog_ksym_find(addr); if (!prog) goto out; - if (!prog->aux->num_exentries) + aux = prog->aux; + if (!aux->num_exentries) goto out;
- e = search_extable(prog->aux->extable, prog->aux->num_exentries, addr); + /* prog->aux->extable can be NULL if subprograms are in use. In that + * case, check each sub-function's aux->extables to see if it has a + * matching entry. + */ + if (aux->extable != NULL) { + e = search_extable(prog->aux->extable, + prog->aux->num_exentries, addr); + } else { + for (i = 0; (i < aux->func_cnt) && (e == NULL); i++) { + if (!aux->func[i]->aux->num_exentries || + aux->func[i]->aux->extable == NULL) + continue; + e = search_extable(aux->func[i]->aux->extable, + aux->func[i]->aux->num_exentries, addr); + } + } out: rcu_read_unlock(); return e;
On Mon, Jun 5, 2023 at 9:50 AM Krister Johansen kjlx@templeofstupid.com wrote:
JIT'd bpf programs that have subprograms can have a postive value for num_extentries but a NULL value for extable. This is problematic if one of these bpf programs encounters a fault during its execution. The fault handlers correctly identify that the faulting IP belongs to a bpf program. However, performing a search_extable call on a NULL extable leads to a second fault.
Fix up by refusing to search a NULL extable, and by checking the subprograms' extables if the umbrella program has subprograms configured.
Once I realized what was going on, I was able to use the following bpf program to get an oops from this failure:
#include "vmlinux.h" #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
#define PATH_MAX 4096
struct callback_ctx { u8 match; };
struct filter_value { char prefix[PATH_MAX]; }; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 256); __type(key, int); __type(value, struct filter_value); } test_filter SEC(".maps");
static __u64 test_filter_cb(struct bpf_map *map, __u32 *key, struct filter_value *val, struct callback_ctx *data) { return 1; }
SEC("fentry/__sys_bind") int BPF_PROG(__sys_bind, int fd, struct sockaddr *umyaddr, int addrlen) { pid_t pid;
struct callback_ctx cx = { .match = 0 }; pid = bpf_get_current_pid_tgid() >> 32; bpf_for_each_map_elem(&test_filter, test_filter_cb, &cx, 0); bpf_printk("fentry: pid = %d, family = %llx\n", pid, umyaddr->sa_family);
Instead of printk please do a volatile read of umyaddr->sa_family.
Please convert this commit log to a test in selftest/bpf/ and resubmit as two patches.
Also see bpf_testmod_return_ptr() and SEC("fexit/bpf_testmod_return_ptr") in progs/test_module_attach.c.
Probably easier to tweak that test for subprogs instead of adding your own SEC("fentry/__sys_bind") test and triggering bind() from user space.
return 0;
}
And then the following code to actually trigger a failure:
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/ip.h>
int main(int argc, char *argv[]) { int sfd, rc; struct sockaddr *sockptr = (struct sockaddr *)0x900000000000;
sfd = socket(AF_INET, SOCK_STREAM, 0); if (sfd < 0) { perror("socket"); exit(EXIT_FAILURE); } while (1) { rc = bind(sfd, (struct sockaddr *) sockptr, sizeof(struct sockaddr_in)); if (rc < 0) { perror("bind"); sleep(5); } else { break; } } return 0;
}
I was able to validate that this problem does not occur when subprograms are not in use, or when the direct pointer accesses are replaced with bpf_probe_read calls. I further validated that this did not break the extable handling in existing bpf programs. The same program caused no failures when subprograms were removed, but the exception was still injected.
Cc: stable@vger.kernel.org Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: Krister Johansen kjlx@templeofstupid.com
kernel/bpf/core.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7421487422d4..0e12238e4340 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -736,15 +736,33 @@ const struct exception_table_entry *search_bpf_extables(unsigned long addr) { const struct exception_table_entry *e = NULL; struct bpf_prog *prog;
struct bpf_prog_aux *aux;
int i; rcu_read_lock(); prog = bpf_prog_ksym_find(addr); if (!prog) goto out;
if (!prog->aux->num_exentries)
aux = prog->aux;
if (!aux->num_exentries) goto out;
e = search_extable(prog->aux->extable, prog->aux->num_exentries, addr);
/* prog->aux->extable can be NULL if subprograms are in use. In that
* case, check each sub-function's aux->extables to see if it has a
* matching entry.
*/
if (aux->extable != NULL) {
e = search_extable(prog->aux->extable,
prog->aux->num_exentries, addr);
} else {
for (i = 0; (i < aux->func_cnt) && (e == NULL); i++) {
() are redundant. !e is preferred over e == NULL
if (!aux->func[i]->aux->num_exentries ||
aux->func[i]->aux->extable == NULL)
continue;
e = search_extable(aux->func[i]->aux->extable,
aux->func[i]->aux->num_exentries, addr);
}
}
something odd here. We do bpf_prog_kallsyms_add(func[i]); for each subprog. So bpf_prog_ksym_find() in search_bpf_extables() should be finding ksym and extable of the subprog and not the main prog. The bug is probably elsewhere.
Once you respin with a selftest we can help debugging.
On Mon, Jun 05, 2023 at 04:30:29PM -0700, Alexei Starovoitov wrote:
On Mon, Jun 5, 2023 at 9:50 AM Krister Johansen kjlx@templeofstupid.com wrote:
if (!aux->func[i]->aux->num_exentries ||
aux->func[i]->aux->extable == NULL)
continue;
e = search_extable(aux->func[i]->aux->extable,
aux->func[i]->aux->num_exentries, addr);
}
}
something odd here. We do bpf_prog_kallsyms_add(func[i]); for each subprog. So bpf_prog_ksym_find() in search_bpf_extables() should be finding ksym and extable of the subprog and not the main prog. The bug is probably elsewhere.
I have a kdump (or more) of this bug so if there's additional state you'd like me to share, let me know. With your comments in mind, I took another look at the ksym fields in the aux structs. I have this in the main program:
ksym = { start = 18446744072638420852, end = 18446744072638423040, name = <...> lnode = { next = 0xffff88d9c1065168, prev = 0xffff88da91609168 }, tnode = { node = {{ __rb_parent_color = 18446613068361611640, rb_right = 0xffff88da91609178, rb_left = 0xffff88d9f0c5a578 }, { __rb_parent_color = 18446613068361611664, rb_right = 0xffff88da91609190, rb_left = 0xffff88d9f0c5a590 }} }, prog = true },
and this in the func[0] subprogram:
ksym = { start = 18446744072638420852, end = 18446744072638423040, name = <...> lnode = { next = 0xffff88da91609168, prev = 0xffffffff981f8990 <bpf_kallsyms> }, tnode = { node = {{ __rb_parent_color = 18446613068361606520, rb_right = 0x0, rb_left = 0x0 }, { __rb_parent_color = 18446613068361606544, rb_right = 0x0, rb_left = 0x0 }} }, prog = true },
That sure looks like func[0] is a leaf in the rbtree and the main program is an intermediate node with leaves. If that's the case, then bpf_prog_ksym_find may have found the main program instead of the subprogram. In that case, do you think it's better to skip the main program's call to bpf_prog_ksym_set_addr() if it has subprograms instead of searching for subprograms if the main program is found?
-K
On Mon, Jun 5, 2023 at 5:46 PM Krister Johansen kjlx@templeofstupid.com wrote:
On Mon, Jun 05, 2023 at 04:30:29PM -0700, Alexei Starovoitov wrote:
On Mon, Jun 5, 2023 at 9:50 AM Krister Johansen kjlx@templeofstupid.com wrote:
if (!aux->func[i]->aux->num_exentries ||
aux->func[i]->aux->extable == NULL)
continue;
e = search_extable(aux->func[i]->aux->extable,
aux->func[i]->aux->num_exentries, addr);
}
}
something odd here. We do bpf_prog_kallsyms_add(func[i]); for each subprog. So bpf_prog_ksym_find() in search_bpf_extables() should be finding ksym and extable of the subprog and not the main prog. The bug is probably elsewhere.
I have a kdump (or more) of this bug so if there's additional state you'd like me to share, let me know.
Please convert the test into selftest. Then everyone will be able to reproduce easily and it will serve us later to make sure we don't regress.
With your comments in mind, I took another look at the ksym fields in the aux structs. I have this in the main program:
ksym = { start = 18446744072638420852, end = 18446744072638423040, name = <...> lnode = { next = 0xffff88d9c1065168, prev = 0xffff88da91609168 }, tnode = { node = {{ __rb_parent_color = 18446613068361611640, rb_right = 0xffff88da91609178, rb_left = 0xffff88d9f0c5a578 }, { __rb_parent_color = 18446613068361611664, rb_right = 0xffff88da91609190, rb_left = 0xffff88d9f0c5a590 }} }, prog = true },
and this in the func[0] subprogram:
ksym = { start = 18446744072638420852, end = 18446744072638423040, name = <...> lnode = { next = 0xffff88da91609168, prev = 0xffffffff981f8990 <bpf_kallsyms> }, tnode = { node = {{ __rb_parent_color = 18446613068361606520, rb_right = 0x0, rb_left = 0x0 }, { __rb_parent_color = 18446613068361606544, rb_right = 0x0, rb_left = 0x0 }} }, prog = true },
That sure looks like func[0] is a leaf in the rbtree and the main program is an intermediate node with leaves. If that's the case, then bpf_prog_ksym_find may have found the main program instead of the subprogram. In that case, do you think it's better to skip the main program's call to bpf_prog_ksym_set_addr() if it has subprograms instead of searching for subprograms if the main program is found?
I see. Looks like we're doing double bpf_prog_kallsyms_add(). First in in jit_subprogs(): for (i = 0; i < env->subprog_cnt; i++) { bpf_prog_lock_ro(func[i]); bpf_prog_kallsyms_add(func[i]); } and then again: bpf_prog_kallsyms_add(prog); in bpf_prog_load().
because func[0] is the main prog.
We are also doing double bpf_prog_lock_ro() for main prog, but that's not causing harm.
The fix is probably just this:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1e38584d497c..89266dac9c12 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -17633,7 +17633,7 @@ static int jit_subprogs(struct bpf_verifier_env *env) /* finally lock prog and jit images for all functions and * populate kallsysm */ - for (i = 0; i < env->subprog_cnt; i++) { + for (i = 1; i < env->subprog_cnt; i++) { bpf_prog_lock_ro(func[i]); bpf_prog_kallsyms_add(func[i]); }
On Mon, Jun 05, 2023 at 06:31:57PM -0700, Alexei Starovoitov wrote:
On Mon, Jun 5, 2023 at 5:46 PM Krister Johansen kjlx@templeofstupid.com wrote:
With your comments in mind, I took another look at the ksym fields in the aux structs. I have this in the main program:
ksym = { start = 18446744072638420852, end = 18446744072638423040, name = <...> lnode = { next = 0xffff88d9c1065168, prev = 0xffff88da91609168 }, tnode = { node = {{ __rb_parent_color = 18446613068361611640, rb_right = 0xffff88da91609178, rb_left = 0xffff88d9f0c5a578 }, { __rb_parent_color = 18446613068361611664, rb_right = 0xffff88da91609190, rb_left = 0xffff88d9f0c5a590 }} }, prog = true },
and this in the func[0] subprogram:
ksym = { start = 18446744072638420852, end = 18446744072638423040, name = <...> lnode = { next = 0xffff88da91609168, prev = 0xffffffff981f8990 <bpf_kallsyms> }, tnode = { node = {{ __rb_parent_color = 18446613068361606520, rb_right = 0x0, rb_left = 0x0 }, { __rb_parent_color = 18446613068361606544, rb_right = 0x0, rb_left = 0x0 }} }, prog = true },
That sure looks like func[0] is a leaf in the rbtree and the main program is an intermediate node with leaves. If that's the case, then bpf_prog_ksym_find may have found the main program instead of the subprogram. In that case, do you think it's better to skip the main program's call to bpf_prog_ksym_set_addr() if it has subprograms instead of searching for subprograms if the main program is found?
I see. Looks like we're doing double bpf_prog_kallsyms_add(). First in in jit_subprogs(): for (i = 0; i < env->subprog_cnt; i++) { bpf_prog_lock_ro(func[i]); bpf_prog_kallsyms_add(func[i]); } and then again: bpf_prog_kallsyms_add(prog); in bpf_prog_load().
because func[0] is the main prog.
We are also doing double bpf_prog_lock_ro() for main prog, but that's not causing harm.
The fix is probably just this:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1e38584d497c..89266dac9c12 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -17633,7 +17633,7 @@ static int jit_subprogs(struct bpf_verifier_env *env) /* finally lock prog and jit images for all functions and * populate kallsysm */
for (i = 0; i < env->subprog_cnt; i++) {
for (i = 1; i < env->subprog_cnt; i++) { bpf_prog_lock_ro(func[i]); bpf_prog_kallsyms_add(func[i]); }
This will cause the oops to always occur, because func[0] has a extable entry when jit_subporgs() completes, but prog->aux doesn't. jit_subprogs also sets prog->bpf_func which prevents the other copy of the main program from getting jit'd, and consequently getting an extable assigned.
There are probably a few options to fix:
1. skip the bpf_prog_kallsyms_add in bpf_prog_load if the program being loaded has subprograms
2. check extables when searching to see if they're NULL and if the subprogram has one instead
3. copy the main program's extable back to prog->aux
I'll send out a v2 here shortly that includes the selftest you requested. It takes approach #3, which is also a 1-line change.
-K
linux-stable-mirror@lists.linaro.org