Hi everyone,
This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*.
Motivation ---------- Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function.
Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration.
This patchset demonstrates the way to override probe functions by BPF program.
Overview -------- This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type. When this type of BPF program attaches, it overrides the target tracepoint probe function.
And it also extends a new struct type "tracepoint_func_snapshot", which extends the tracepoint structure. It is used to record the original probe function registered by kernel after BPF program being attached and restore from it after detachment.
Critical steps --------------
1. Attach: Attach programs via the raw_tracepoint_open syscall. 2. Override: (a) Locate the target probe by `probe_name`. (b) Override target probe with the BPF program. (c) Save the BPF program and target probe function into "tracepoint_func_snapshot". 3. Restore: When the BPF program is detached, automatically restore the original probe function from earlier saved snapshot.
Future work ----------- This patchset is intended as a first step toward supporting BPF programs that can override tracepoint probes. The current implementation may not yet cover all use cases or handle every corner case.
I welcome feedback and suggestions from the community, and will continue to refine and improve the design based on comments and real-world requirements.
Thanks! Fuyu
Fuyu Zhao (3): bpf: Introduce BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE libbpf: Add support for BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE selftests/bpf: Add selftest for "raw_tp.o"
include/linux/bpf_types.h | 2 + include/linux/trace_events.h | 9 + include/linux/tracepoint-defs.h | 6 + include/linux/tracepoint.h | 3 + include/uapi/linux/bpf.h | 2 + kernel/bpf/syscall.c | 35 +++- kernel/trace/bpf_trace.c | 31 +++ kernel/tracepoint.c | 190 +++++++++++++++++- tools/include/uapi/linux/bpf.h | 2 + tools/lib/bpf/bpf.c | 1 + tools/lib/bpf/bpf.h | 3 +- tools/lib/bpf/libbpf.c | 27 ++- tools/lib/bpf/libbpf.h | 3 +- .../bpf/prog_tests/raw_tp_override_test_run.c | 23 +++ .../bpf/progs/test_raw_tp_override_test_run.c | 20 ++ .../selftests/bpf/test_kmods/bpf_testmod.c | 7 + 16 files changed, 352 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c
This patch introduces a new program type -- BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE. Program of this type requires an additional parameter -- probe_name, to locate the target tracepoint probe function registered by register_trace_* in the kernel.
This type reuses existing RAW_TRACEPOINT infrastructure, and differs only when probe_name is specified. In that case, the newly attached RAW_TRACEPOINT_OVERRIDE program and the target probe function are paired and stored in a snapshot.
When the BPF program is detached, snapshots are consulted to determine whether restoration of the original probe function is required.
Signed-off-by: Fuyu Zhao zhaofuyu@vivo.com --- include/linux/bpf_types.h | 2 + include/linux/trace_events.h | 9 ++ include/linux/tracepoint-defs.h | 6 + include/linux/tracepoint.h | 3 + include/uapi/linux/bpf.h | 2 + kernel/bpf/syscall.c | 35 ++++-- kernel/trace/bpf_trace.c | 31 ++++++ kernel/tracepoint.c | 190 +++++++++++++++++++++++++++++++- 8 files changed, 269 insertions(+), 9 deletions(-)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index fa78f49d4a9a..e5cf8a1af6cd 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -48,6 +48,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable, struct bpf_raw_tracepoint_args, u64) BPF_PROG_TYPE(BPF_PROG_TYPE_TRACING, tracing, void *, void *) +BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE, raw_tracepoint_override, + struct bpf_raw_tracepoint_args, u64) #endif #ifdef CONFIG_CGROUP_BPF BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev, diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 04307a19cde3..fcb2d62d0c9f 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -768,6 +768,9 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info); struct bpf_raw_tp_link; int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link); int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link); +int bpf_probe_override(struct bpf_raw_event_map *btp, + struct bpf_raw_tp_link *link, + const char *probe_name);
struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name); void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp); @@ -805,6 +808,12 @@ static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf { return -EOPNOTSUPP; } +static inline int bpf_probe_override(struct bpf_raw_event_map *btp, + struct bpf_raw_tp_link *link, + const char *probe_name) +{ + return -EOPNOTSUPP; +} static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name) { return NULL; diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h index aebf0571c736..9d7b1710c0aa 100644 --- a/include/linux/tracepoint-defs.h +++ b/include/linux/tracepoint-defs.h @@ -29,6 +29,11 @@ struct tracepoint_func { int prio; };
+struct tracepoint_func_snapshot { + struct tracepoint_func orig; + struct tracepoint_func override; +}; + struct tracepoint_ext { int (*regfunc)(void); void (*unregfunc)(void); @@ -45,6 +50,7 @@ struct tracepoint { void *probestub; struct tracepoint_func __rcu *funcs; struct tracepoint_ext *ext; + struct tracepoint_func_snapshot *snapshot; };
#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 826ce3f8e1f8..399001e2afca 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -50,6 +50,9 @@ tracepoint_probe_register_may_exist(struct tracepoint *tp, void *probe, return tracepoint_probe_register_prio_may_exist(tp, probe, data, TRACEPOINT_DEFAULT_PRIO); } +extern int +tracepoint_probe_override(struct tracepoint *tp, void *probe, void *data, + const char *func_replaced); extern void for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv), void *priv); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 233de8677382..cd3d889fe634 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1071,6 +1071,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ BPF_PROG_TYPE_NETFILTER, + BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE, __MAX_BPF_PROG_TYPE };
@@ -1707,6 +1708,7 @@ union bpf_attr { __u32 prog_fd; __u32 :32; __aligned_u64 cookie; + __aligned_u64 probe_name; } raw_tracepoint;
struct { /* anonymous struct for BPF_BTF_LOAD */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 3f178a0f8eb1..e360062db34e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4092,14 +4092,16 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro #endif /* CONFIG_PERF_EVENTS */
static int bpf_raw_tp_link_attach(struct bpf_prog *prog, - const char __user *user_tp_name, u64 cookie, + const char __user *user_tp_name, + const char __user *user_probe_name, + u64 cookie, enum bpf_attach_type attach_type) { struct bpf_link_primer link_primer; struct bpf_raw_tp_link *link; struct bpf_raw_event_map *btp; - const char *tp_name; - char buf[128]; + const char *tp_name, *probe_name; + char buf[128], probe[128]; int err;
switch (prog->type) { @@ -4124,6 +4126,17 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog, buf[sizeof(buf) - 1] = 0; tp_name = buf; break; + case BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE: + if (strncpy_from_user(buf, user_tp_name, sizeof(buf) - 1) < 0) + return -EFAULT; + buf[sizeof(buf) - 1] = 0; + tp_name = buf; + + if (strncpy_from_user(probe, user_probe_name, sizeof(probe) - 1) < 0) + return -EFAULT; + probe[sizeof(probe) - 1] = 0; + probe_name = probe; + break; default: return -EINVAL; } @@ -4149,7 +4162,10 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog, goto out_put_btp; }
- err = bpf_probe_register(link->btp, link); + if (prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE) + err = bpf_probe_override(link->btp, link, probe_name); + else + err = bpf_probe_register(link->btp, link); if (err) { bpf_link_cleanup(&link_primer); goto out_put_btp; @@ -4162,12 +4178,12 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog, return err; }
-#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.cookie +#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.probe_name
static int bpf_raw_tracepoint_open(const union bpf_attr *attr) { struct bpf_prog *prog; - void __user *tp_name; + void __user *tp_name, *probe_name; __u64 cookie; int fd;
@@ -4180,7 +4196,9 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
tp_name = u64_to_user_ptr(attr->raw_tracepoint.name); cookie = attr->raw_tracepoint.cookie; - fd = bpf_raw_tp_link_attach(prog, tp_name, cookie, prog->expected_attach_type); + probe_name = u64_to_user_ptr(attr->raw_tracepoint.probe_name); + fd = bpf_raw_tp_link_attach(prog, tp_name, probe_name, + cookie, prog->expected_attach_type); if (fd < 0) bpf_prog_put(prog); return fd; @@ -5565,7 +5583,8 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) goto out; } if (prog->expected_attach_type == BPF_TRACE_RAW_TP) - ret = bpf_raw_tp_link_attach(prog, NULL, attr->link_create.tracing.cookie, + ret = bpf_raw_tp_link_attach(prog, NULL, NULL, + attr->link_create.tracing.cookie, attr->link_create.attach_type); else if (prog->expected_attach_type == BPF_TRACE_ITER) ret = bpf_iter_link_attach(attr, uattr, prog); diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 606007c387c5..1e965517ba05 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1998,6 +1998,14 @@ const struct bpf_verifier_ops raw_tracepoint_writable_verifier_ops = { const struct bpf_prog_ops raw_tracepoint_writable_prog_ops = { };
+const struct bpf_verifier_ops raw_tracepoint_override_verifier_ops = { + .get_func_proto = raw_tp_prog_func_proto, + .is_valid_access = raw_tp_writable_prog_is_valid_access, +}; + +const struct bpf_prog_ops raw_tracepoint_override_prog_ops = { +}; + static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, struct bpf_insn_access_aux *info) @@ -2307,6 +2315,29 @@ BPF_TRACE_DEFN_x(10); BPF_TRACE_DEFN_x(11); BPF_TRACE_DEFN_x(12);
+int bpf_probe_override(struct bpf_raw_event_map *btp, + struct bpf_raw_tp_link *link, + const char *probe_name) +{ + struct tracepoint *tp = btp->tp; + struct bpf_prog *prog = link->link.prog; + + if (!probe_name) + return -EINVAL; + + /* + * check that program doesn't access arguments beyond what's + * available in this tracepoint + */ + if (prog->aux->max_ctx_offset > btp->num_args * sizeof(u64)) + return -EINVAL; + + if (prog->aux->max_tp_access > btp->writable_size) + return -EINVAL; + + return tracepoint_probe_override(tp, (void *)btp->bpf_func, link, probe_name); +} + int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link) { struct tracepoint *tp = btp->tp; diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 62719d2941c9..3b8317306edc 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -14,6 +14,7 @@ #include <linux/sched/signal.h> #include <linux/sched/task.h> #include <linux/static_key.h> +#include <linux/kallsyms.h>
enum tp_func_state { TP_FUNC_0, @@ -130,6 +131,121 @@ static void debug_print_probes(struct tracepoint_func *funcs) printk(KERN_DEBUG "Probe %d : %pSb\n", i, funcs[i].func); }
+static struct tracepoint_func * +find_func_to_override(struct tracepoint_func *funcs, + unsigned long probe_addr) +{ + int iter; + + if (!funcs) + return NULL; + + for (iter = 0; funcs[iter].func; iter++) { + if ((unsigned long)funcs[iter].func == probe_addr) + return &(funcs[iter]); + } + + return NULL; +} + +static struct tracepoint_func_snapshot * +find_func_snapshot(struct tracepoint_func_snapshot **ss, + struct tracepoint_func *func, + bool *is_override) +{ + int iter; + struct tracepoint_func_snapshot *shots; + + shots = *ss; + if (!shots) + return NULL; + + for (iter = 0; shots[iter].override.func; iter++) { + if (shots[iter].override.func == func->func && + shots[iter].override.data == func->data) { + *is_override = true; + return &(shots[iter]); + } + + if (shots[iter].orig.func == func->func && + shots[iter].orig.data == func->data) { + *is_override = false; + return &(shots[iter]); + } + } + + return NULL; +} + +static void drop_func_snapshot(struct tracepoint_func_snapshot **ss, + struct tracepoint_func_snapshot *drop) +{ + struct tracepoint_func_snapshot *old, *new; + int nr_snapshots; /* Counter for snapshots */ + int iter; /* Iterate over old snapshots */ + int idx = 0; /* Index of snapshot to drop */ + + old = *ss; + if (!old) + return; + + for (nr_snapshots = 0; old[nr_snapshots].override.func; nr_snapshots++) { + if (&(old[nr_snapshots]) == drop) + idx = nr_snapshots; + } + + if (nr_snapshots == 0) { + kfree(old); + *ss = NULL; + return; + } + + new = kmalloc_array(nr_snapshots, sizeof(struct tracepoint_func_snapshot), GFP_KERNEL); + if (!new) { + for (iter = idx; iter < nr_snapshots - 1; iter++) + old[iter] = old[iter + 1]; + memset(&(old[nr_snapshots - 1]), 0, sizeof(struct tracepoint_func_snapshot)); + } else { + int j = 0; + + for (iter = 0; iter < nr_snapshots; iter++) { + if (iter != idx) + new[j++] = old[iter]; + } + kfree(old); + *ss = new; + } +} + +static int save_func_snapshot(struct tracepoint_func_snapshot **ss, + struct tracepoint_func *new_func, + struct tracepoint_func *old_func) +{ + struct tracepoint_func_snapshot *old, *new; + int nr_shots = 0; /* Counter for old snapshots */ + int total; /* Total count of new snapshots */ + + old = *ss; + if (old) + while (old[nr_shots].override.func) + nr_shots++; + + /* + 2 : one for new snapshot, one for NULL snapshot */ + total = nr_shots + 2; + new = kmalloc_array(total, sizeof(struct tracepoint_func_snapshot), GFP_KERNEL); + if (!new) + return -ENOMEM; + + memcpy(new, old, nr_shots * sizeof(struct tracepoint_func_snapshot)); + new[nr_shots].orig = *old_func; + new[nr_shots].override = *new_func; + new[nr_shots + 1].override.func = NULL; + + *ss = new; + kfree(old); + return 0; +} + static struct tracepoint_func * func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func, int prio) @@ -412,6 +528,52 @@ static int tracepoint_remove_func(struct tracepoint *tp, return 0; }
+static int tracepoint_override_func(struct tracepoint *tp, + struct tracepoint_func *func, + struct tracepoint_func *func_override) +{ + int ret = tracepoint_remove_func(tp, func); + + return ret ? : tracepoint_add_func(tp, func_override, + func_override->prio, false); +} + +static int tracepoint_restore_func(struct tracepoint *tp, + struct tracepoint_func *func, + struct tracepoint_func *func_restore) +{ + int ret = tracepoint_remove_func(tp, func); + + return ret ? : tracepoint_add_func(tp, func_restore, + func_restore->prio, false); +} + +int tracepoint_probe_override(struct tracepoint *tp, void *probe, + void *data, const char *probe_name) +{ + struct tracepoint_func tp_func; + struct tracepoint_func *target_func; + unsigned long probe_addr; + int ret; + + probe_addr = kallsyms_lookup_name(probe_name); + mutex_lock(&tracepoints_mutex); + target_func = find_func_to_override(tp->funcs, probe_addr); + if (!target_func) + return -ESRCH; + tp_func.func = probe; + tp_func.data = data; + tp_func.prio = target_func->prio; + ret = save_func_snapshot(&(tp->snapshot), &tp_func, target_func); + if (ret) + goto unlock; + + ret = tracepoint_override_func(tp, target_func, &tp_func); +unlock: + mutex_unlock(&tracepoints_mutex); + return ret; +} + /** * tracepoint_probe_register_prio_may_exist - Connect a probe to a tracepoint with priority * @tp: tracepoint @@ -496,12 +658,38 @@ EXPORT_SYMBOL_GPL(tracepoint_probe_register); int tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data) { struct tracepoint_func tp_func; + struct tracepoint_func_snapshot *shot; int ret; + bool is_override; /* whether probe is an overriding func */
mutex_lock(&tracepoints_mutex); tp_func.func = probe; tp_func.data = data; - ret = tracepoint_remove_func(tp, &tp_func); + + shot = find_func_snapshot(&(tp->snapshot), &tp_func, &is_override); + if (!shot) { + ret = tracepoint_remove_func(tp, &tp_func); + } else { + /* unregister probe rengistered by raw_tracepoint_open, + * restore to original tp_func. + * + * 1. restore orig func from snapshot. + * 2. remove snapshot. + */ + if (is_override) + ret = tracepoint_restore_func(tp, &tp_func, &(shot->orig)); + /* unregister orig probe registered by register_trace_*. + * + * 1. remove curr probe func(registered by raw_tracepoint_open) + * from tp->funcs. + * 2. remove snapshot. + */ + else + ret = tracepoint_remove_func(tp, &(shot->override)); + if (!ret) + drop_func_snapshot(&(tp->snapshot), shot); + } + mutex_unlock(&tracepoints_mutex); return ret; }
Extend libbpf to support the new BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type, making it available to user space applications through the standard libbpf API.
Signed-off-by: Fuyu Zhao zhaofuyu@vivo.com --- tools/include/uapi/linux/bpf.h | 2 ++ tools/lib/bpf/bpf.c | 1 + tools/lib/bpf/bpf.h | 3 ++- tools/lib/bpf/libbpf.c | 27 ++++++++++++++++++++++++++- tools/lib/bpf/libbpf.h | 3 ++- 5 files changed, 33 insertions(+), 3 deletions(-)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 233de8677382..7438836b3e4b 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1071,6 +1071,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ BPF_PROG_TYPE_NETFILTER, + BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE, __MAX_BPF_PROG_TYPE };
@@ -1707,6 +1708,7 @@ union bpf_attr { __u32 prog_fd; __u32 :32; __aligned_u64 cookie; + __aligned_u64 probe_name; } raw_tracepoint;
struct { /* anonymous struct for BPF_BTF_LOAD */ diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index ab40dbf9f020..95b73f94ce72 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -1235,6 +1235,7 @@ int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts) attr.raw_tracepoint.prog_fd = prog_fd; attr.raw_tracepoint.name = ptr_to_u64(OPTS_GET(opts, tp_name, NULL)); attr.raw_tracepoint.cookie = OPTS_GET(opts, cookie, 0); + attr.raw_tracepoint.probe_name = ptr_to_u64(OPTS_GET(opts, probe_name, NULL));
fd = sys_bpf_fd(BPF_RAW_TRACEPOINT_OPEN, &attr, attr_sz); return libbpf_err_errno(fd); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 7252150e7ad3..0ebedbd99fe5 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -630,9 +630,10 @@ struct bpf_raw_tp_opts { size_t sz; /* size of this struct for forward/backward compatibility */ const char *tp_name; __u64 cookie; + const char *probe_name; size_t :0; }; -#define bpf_raw_tp_opts__last_field cookie +#define bpf_raw_tp_opts__last_field probe_name
LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts); LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd); diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index fe4fc5438678..ce67c917ba59 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -9557,6 +9557,8 @@ static const struct bpf_sec_def section_defs[] = { SEC_DEF("raw_tp+", RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp), SEC_DEF("raw_tracepoint.w+", RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp), SEC_DEF("raw_tp.w+", RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp), + SEC_DEF("raw_tracepoint.o+", RAW_TRACEPOINT_OVERRIDE, 0, SEC_NONE, attach_raw_tp), + SEC_DEF("raw_tp.o+", RAW_TRACEPOINT_OVERRIDE, 0, SEC_NONE, attach_raw_tp), SEC_DEF("tp_btf+", TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF, attach_trace), SEC_DEF("fentry+", TRACING, BPF_TRACE_FENTRY, SEC_ATTACH_BTF, attach_trace), SEC_DEF("fmod_ret+", TRACING, BPF_MODIFY_RETURN, SEC_ATTACH_BTF, attach_trace), @@ -12684,6 +12686,7 @@ bpf_program__attach_raw_tracepoint_opts(const struct bpf_program *prog,
raw_opts.tp_name = tp_name; raw_opts.cookie = OPTS_GET(opts, cookie, 0); + raw_opts.probe_name = OPTS_GET(opts, probe_name, NULL); pfd = bpf_raw_tracepoint_open_opts(prog_fd, &raw_opts); if (pfd < 0) { pfd = -errno; @@ -12704,14 +12707,18 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(const struct bpf_program *pr
static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link) { + LIBBPF_OPTS(bpf_raw_tracepoint_opts, raw_opts); static const char *const prefixes[] = { "raw_tp", "raw_tracepoint", "raw_tp.w", "raw_tracepoint.w", + "raw_tp.o", + "raw_tracepoint.o", }; size_t i; const char *tp_name = NULL; + char *dup = NULL, *sep = NULL;
*link = NULL;
@@ -12739,7 +12746,25 @@ static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf return -EINVAL; }
- *link = bpf_program__attach_raw_tracepoint(prog, tp_name); + if (prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE) { + dup = strdup(tp_name); + if (!dup) + return -ENOMEM; + + sep = strchr(dup, ':'); + if (!sep) { + free(dup); + return -EINVAL; + } + *sep = '\0'; + + tp_name = dup; + raw_opts.probe_name = sep + 1, + *link = bpf_program__attach_raw_tracepoint_opts(prog, tp_name, &raw_opts); + free(dup); + } else { + *link = bpf_program__attach_raw_tracepoint(prog, tp_name); + } return libbpf_get_error(*link); }
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 2e91148d9b44..f4e9cb819b75 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -820,9 +820,10 @@ bpf_program__attach_tracepoint_opts(const struct bpf_program *prog, struct bpf_raw_tracepoint_opts { size_t sz; /* size of this struct for forward/backward compatibility */ __u64 cookie; + const char *probe_name; size_t :0; }; -#define bpf_raw_tracepoint_opts__last_field cookie +#define bpf_raw_tracepoint_opts__last_field probe_name
LIBBPF_API struct bpf_link * bpf_program__attach_raw_tracepoint(const struct bpf_program *prog,
Add test for the new BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type. This test verifies whether a BPF program can successfully override the target tracepoint probe function.
Signed-off-by: Fuyu Zhao zhaofuyu@vivo.com --- .../bpf/prog_tests/raw_tp_override_test_run.c | 23 +++++++++++++++++++ .../bpf/progs/test_raw_tp_override_test_run.c | 20 ++++++++++++++++ .../selftests/bpf/test_kmods/bpf_testmod.c | 7 ++++++ 3 files changed, 50 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c
diff --git a/tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c b/tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c new file mode 100644 index 000000000000..02301253cd9b --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <test_progs.h> +#include "bpf/libbpf_internal.h" +#include "test_raw_tp_override_test_run.skel.h" + +void test_raw_tp_override_test_run(void) +{ + struct test_raw_tp_override_test_run *skel; + + skel = test_raw_tp_override_test_run__open_and_load(); + if (!ASSERT_OK_PTR(skel, "test_raw_tp_override_test_run__open_and_load")) + return; + + if (!ASSERT_OK(test_raw_tp_override_test_run__attach(skel), + "test_raw_tp_override_test_run__attach")) + goto cleanup; + ASSERT_OK(trigger_module_test_write(1), "trigger_write"); + ASSERT_EQ(skel->bss->flag, 1, "check_flag"); + +cleanup: + test_raw_tp_override_test_run__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c b/tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c new file mode 100644 index 000000000000..eb6d24e1c737 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +__u32 flag = 0; + +/** + * This program overrides raw_tp_override_probe handler in + * tracepoint bpf_testmode_test_raw_tp_null_tp. + */ +SEC("raw_tp.o/bpf_testmod_test_write_bare_tp:raw_tp_override_probe") +int BPF_PROG(tp_override, struct task_struct *task, char *comm) +{ + flag = 1; + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c index 2beb9b2fcbd8..7a49178d2343 100644 --- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c +++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c @@ -1628,6 +1628,11 @@ static struct bpf_testmod_multi_st_ops multi_st_ops_cfi_stubs = { .test_1 = bpf_testmod_multi_st_ops__test_1, };
+static void raw_tp_override_probe(void *ignored, struct task_struct *task, + struct bpf_testmod_test_write_ctx *ctx) +{ +} + struct bpf_struct_ops testmod_multi_st_ops = { .verifier_ops = &bpf_testmod_verifier_ops, .init = multi_st_ops_init, @@ -1665,6 +1670,7 @@ static int bpf_testmod_init(void) ret = ret ?: register_btf_id_dtor_kfuncs(bpf_testmod_dtors, ARRAY_SIZE(bpf_testmod_dtors), THIS_MODULE); + ret = ret ?: register_trace_bpf_testmod_test_write_bare_tp(raw_tp_override_probe, NULL); if (ret < 0) return ret; if (bpf_fentry_test1(0) < 0) @@ -1701,6 +1707,7 @@ static void bpf_testmod_exit(void) bpf_kfunc_close_sock(); sysfs_remove_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file); unregister_bpf_testmod_uprobe(); + unregister_trace_bpf_testmod_test_write_bare_tp(raw_tp_override_probe, NULL); }
module_init(bpf_testmod_init);
On Wed, 17 Sep 2025 15:22:39 +0800 Fuyu Zhao zhaofuyu@vivo.com wrote:
Hi everyone,
This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*.
Motivation
Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function.
I'm confused by what you mean by "tracepoint probe function"?
You mean the function callback that gets called via the "register_trace_*()"?
Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration.
This sounds way out of scope for tracepoints. Please provide a solid example for this.
This patchset demonstrates the way to override probe functions by BPF program.
Overview
This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type. When this type of BPF program attaches, it overrides the target tracepoint probe function.
And it also extends a new struct type "tracepoint_func_snapshot", which extends the tracepoint structure. It is used to record the original probe function registered by kernel after BPF program being attached and restore from it after detachment.
The tracepoint structure exists for every tracepoint in the kernel. By adding a pointer to it, you just increased the size of the tracepoint. I'm already complaining that each tracepoint causes around 5K of memory overhead, and I'd like to make it smaller.
-- Steve
Sorry, I just realized that I forgot to include the CC list in my first reply. Resending with CCs. Apologies to Steven for the extra noise.
On 9/18/2025 3:30 AM, Steven Rostedt wrote:
On Wed, 17 Sep 2025 15:22:39 +0800 Fuyu Zhao zhaofuyu@vivo.com wrote:
Hi everyone,
This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*.
Motivation
Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function.
I'm confused by what you mean by "tracepoint probe function"?
You mean the function callback that gets called via the "register_trace_*()"?
Yes, that’s correct. My earlier wording was not very precise — thanks for pointing that out.
Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration.
This sounds way out of scope for tracepoints. Please provide a solid example for this.
I appreciate your comment. The example I gave about security patches probably wasn’t a good one here — I just meant to show the idea of changing kernel behavior at runtime. Sorry for the confusion.
At the moment, I don’t have a solid real-world example to provide. This work is still in an exploratory stage.
One possible use case is CPU core selection under certain scenarios. For example, developers may want to experiment with alternative strategies for deciding which CPU a task should run on to improve performance.
If a tracepoint is added as a hook point in this path, then overriding its function callback could make it possible to dynamically adjust the cpu-selection logic without rebuilding and rebooting the kernel.
The same mechanism could also be applied in other kernel paths where developers want to make quick changes from user space.
This patchset demonstrates the way to override probe functions by BPF program.
Overview
This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type. When this type of BPF program attaches, it overrides the target tracepoint probe function.
And it also extends a new struct type "tracepoint_func_snapshot", which extends the tracepoint structure. It is used to record the original probe function registered by kernel after BPF program being attached and restore from it after detachment.
The tracepoint structure exists for every tracepoint in the kernel. By adding a pointer to it, you just increased the size of the tracepoint. I'm already complaining that each tracepoint causes around 5K of memory overhead, and I'd like to make it smaller.
-- Steve
It is true that adding a pointer to the tracepoint structure increases memory overhead. However, memory for "snapshot" pointer will only be allocated after a BPF program is attached, and freed once it is dettached.
I am also considering whether it is possible to reuse existing structures to reduce memory usage.
I'd be very grateful for any suggestions or guidance you might have.
Thanks, Fuyu
On Thu, 18 Sep 2025 20:33:22 +0800 Fuyu Zhao zhaofuyu@vivo.com wrote:
At the moment, I don’t have a solid real-world example to provide. This work is still in an exploratory stage.
We shouldn't be in the business of "if you build it, they will come". Unless there is a concrete use case now, I would not be adding anything.
My entire workflow for what I created in the tracing system was "I have a need, I will implement it". The "need" came first. I then wrote code to satisfy that need. It should not be the other way around.
-- Steve
On 9/18/2025 11:24 PM, Steven Rostedt wrote:
On Thu, 18 Sep 2025 20:33:22 +0800 Fuyu Zhao zhaofuyu@vivo.com wrote:
At the moment, I don’t have a solid real-world example to provide. This work is still in an exploratory stage.
We shouldn't be in the business of "if you build it, they will come". Unless there is a concrete use case now, I would not be adding anything.
My entire workflow for what I created in the tracing system was "I have a need, I will implement it". The "need" came first. I then wrote code to satisfy that need. It should not be the other way around.
-- Steve
Thanks a lot for the feedback and guidance.
I understand your point that new functionality should be driven by real needs rather than exploratory ideas.
I’ll keep looking into this. If I find a concrete use case that demonstrates clear value, I’ll bring it back for discussion.
Thanks again.
On Thu Sep 18, 2025 at 3:29 PM UTC, Steven Rostedt wrote:
My entire workflow for what I created in the tracing system was "I have a need, I will implement it". The "need" came first. I then wrote code to satisfy that need. It should not be the other way around.
Tagging on to this sentiment - the kernel's design is emergent and will always remain so.
Speculative features have a very low probability of reflecting the required design language. On the other hand, if someone needs a thing, the need will drive the use of conformal design language.
..Ch:W..
On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao zhaofuyu@vivo.com wrote:
Hi everyone,
This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*.
Motivation
Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function.
Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration.
IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw tracepoint with btf (tp_btf).
Did I miss something?
Thanks, Song
On 9/18/2025 4:02 AM, Song Liu wrote:
On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao zhaofuyu@vivo.com wrote:
Hi everyone,
This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*.
Motivation
Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function.
Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration.
IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw tracepoint with btf (tp_btf).
Did I miss something?
Thanks, Song
As I understand it, raw tracepoints (raw_tp) and raw tracepoint (raw_tp) are designed mainly for tracing the kernel. The goal of this work is to provide a way to override the tracepoint callback, so that kernel behavior can be adjusted dynamically.
Thanks, Fuyu
On Thu, Sep 18, 2025 at 04:05:51PM +0800, Fuyu Zhao wrote:
On 9/18/2025 4:02 AM, Song Liu wrote:
On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao zhaofuyu@vivo.com wrote:
Hi everyone,
This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*.
Motivation
Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function.
Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration.
IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw tracepoint with btf (tp_btf).
Did I miss something?
Thanks, Song
As I understand it, raw tracepoints (raw_tp) and raw tracepoint (raw_tp) are designed mainly for tracing the kernel. The goal of this work is to provide a way to override the tracepoint callback, so that kernel behavior can be adjusted dynamically.
hi, what's the use case for this? also I'd think you can do that just by unregister the callback you want to override and register new one?
thanks, jirka
On 9/18/2025 4:47 PM, Jiri Olsa wrote:
[You don't often get email from olsajiri@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
On Thu, Sep 18, 2025 at 04:05:51PM +0800, Fuyu Zhao wrote:
On 9/18/2025 4:02 AM, Song Liu wrote:
On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao zhaofuyu@vivo.com wrote:
Hi everyone,
This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*.
Motivation
Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function.
Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration.
IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw tracepoint with btf (tp_btf).
Did I miss something?
Thanks, Song
As I understand it, raw tracepoints (raw_tp) and raw tracepoint (raw_tp) are designed mainly for tracing the kernel. The goal of this work is to provide a way to override the tracepoint callback, so that kernel behavior can be adjusted dynamically.
hi, what's the use case for this? also I'd think you can do that just by unregister the callback you want to override and register new one?
thanks, jirka
At this moment, I don't have a real-world example. However, I mentioned one possible use case in my reply to Steven:
One possible use case is CPU core selection under certain scenarios. For example, developers may want to experiment with alternative strategies for deciding which CPU a task should run on to improve performance. If a tracepoint is added as a hook point in this path, then overriding its function callback could make it possible to dynamically adjust the cpu-selection logic without rebuilding and rebooting the kernel.
As for the reason not to unregister and register a new callback: callbacks registered directly inside the kernel cannot be unregistered from user space. From user space, we can only attach additional callbacks with BPF programs, but can not remove or replace the ones already registered in the kernel. Therefore, an override mechanism is needed.
Thanks, Fuyu
On Thu, 18 Sep 2025 21:15:57 +0800 Fuyu Zhao zhaofuyu@vivo.com wrote:
As for the reason not to unregister and register a new callback: callbacks registered directly inside the kernel cannot be unregistered from user space. From user space, we can only attach additional callbacks with BPF programs, but can not remove or replace the ones already registered in the kernel. Therefore, an override mechanism is needed.
The fact that user space cannot unregister or override the current callbacks, to me is a feature and not a bug.
-- Steve
On 9/18/2025 11:32 PM, Steven Rostedt wrote:
On Thu, 18 Sep 2025 21:15:57 +0800 Fuyu Zhao zhaofuyu@vivo.com wrote:
As for the reason not to unregister and register a new callback: callbacks registered directly inside the kernel cannot be unregistered from user space. From user space, we can only attach additional callbacks with BPF programs, but can not remove or replace the ones already registered in the kernel. Therefore, an override mechanism is needed.
The fact that user space cannot unregister or override the current callbacks, to me is a feature and not a bug.
-- Steve
I see, thank you for sharing your view — I’ll keep it in mind.
Sincerely, Fuyu
linux-kselftest-mirror@lists.linaro.org