On Tue, Sep 10, 2024 at 11:36 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Mon, Sep 9, 2024 at 5:55 PM Daniel Xu dxu@dxuuu.xyz wrote:
Right now there exists prog produce / userspace consume and userspace produce / prog consume support. But it is also useful to have prog produce / prog consume.
For example, we want to track the latency overhead of cpumap in production. Since we need to store enqueue timestamps somewhere and cpumap is MPSC, we need an MPSC data structure to shadow cpumap. BPF ringbuf is such a data structure. Rather than reimplement (possibly poorly) a custom ringbuffer in BPF, extend the existing interface to allow the final quadrant of ringbuf usecases to be filled. Note we ignore userspace to userspace use case - there is no need to involve kernel for that.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
kernel/bpf/verifier.c | 6 +- tools/testing/selftests/bpf/Makefile | 3 +- .../selftests/bpf/prog_tests/ringbuf.c | 50 +++++++++++++++ .../bpf/progs/test_ringbuf_bpf_to_bpf.c | 64 +++++++++++++++++++ 4 files changed, 120 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_bpf_to_bpf.c
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 53d0556fbbf3..56bfe559f228 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9142,7 +9142,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, func_id != BPF_FUNC_ringbuf_query && func_id != BPF_FUNC_ringbuf_reserve_dynptr && func_id != BPF_FUNC_ringbuf_submit_dynptr &&
func_id != BPF_FUNC_ringbuf_discard_dynptr)
func_id != BPF_FUNC_ringbuf_discard_dynptr &&
func_id != BPF_FUNC_user_ringbuf_drain) goto error; break; case BPF_MAP_TYPE_USER_RINGBUF:
@@ -9276,7 +9277,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, goto error; break; case BPF_FUNC_user_ringbuf_drain:
if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF)
if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF &&
map->map_type != BPF_MAP_TYPE_RINGBUF) goto error;
I think it should work.
Andrii,
do you see any issues with such use?
Not from a quick glance. Both ringbufs have the same memory layout, so user_ringbuf_drain() should probably work fine for normal BPF ringbuf (and either way bpf_user_ringbuf_drain() has to protect from malicious user space, so its code is pretty unassuming).
We should make it very explicit, though, that the user is responsible for making sure that bpf_user_ringbuf_drain() will not be called simultaneously in two threads, kernel or user space.
Also, Daniel, can you please make sure that dynptr we return for each sample is read-only? We shouldn't let consumer BPF program ability to corrupt ringbuf record headers (accidentally or otherwise).
And as a thought exercise. I wonder what would it take to have an open-coded iterator returning these read-only dynptrs for each consumed record? Maybe we already have all the pieces together. So consider looking into that as well.
P.S. And yeah "user_" part in helper name is kind of unfortunate given it will work for both ringbufs. Can/should we add some sort of alias for this helper so it can be used with both bpf_user_ringbuf_drain() and bpf_ringbuf_drain() names?
break; case BPF_FUNC_get_stackid:
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 9905e3739dd0..233109843d4d 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -503,7 +503,8 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \ LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c \ trace_printk.c trace_vprintk.c map_ptr_kern.c \ core_kern.c core_kern_overflow.c test_ringbuf.c \
test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c
test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c \
test_ringbuf_bpf_to_bpf.c
[...]