This reply was resent as the previous email had a missing In-Reply-To in the header.
On Mon, 2025-02-10 at 16:06 -0800, Andrii Nakryiko wrote:
Tracking associated maps for a program is not necessary. As long as the last BPF program using the BPF map is unloaded, the kernel will automatically free not-anymore-referenced BPF map. Note that bpf_object itself will keep FDs for BPF maps, so you'd need to make sure to do bpf_object__close() to release those references.
But if you are going to ask to re-create BPF maps next time BPF program is loaded... Well, I'll say you are asking for a bit too > much, tbh. If you want to be *that* sophisticated, it shouldn't be too hard for you to get all this information from BPF program's instructions.
We really are that sophisticated (see below for more details). We could scan program instructions, but we'd then tie our logic to BPF implementation details and duplicate logic already present in libbpf implementation details and duplicate logic already present in libbpf (https://elixir.bootlin.com/linux/v6.13.2/source/tools/lib/bpf/libbpf.c#L=
6087
). Obviously this *can* be done but it's not at all ideal from an application perspective.
I agree it's not ideal, but it's also not some complicated and bound-to-be-changed logic. What you point out in libbpf source code is a bit different thing, reality is much simpler. Only so-called ldimm64 instruction (BPF_LD | BPF_IMM | BPF_DW opcode) can be referencing map FD, so analysing this is borderline trivial. And this is part of BPF ISA, so not going to change.
Our approach is to associate an array of maps as a property with each BPF program, this property is initialised at the relocation stage. So, we do not need to parse BPF program instructions. Instead, we rely on recorded relocations. I think this is a more robust and clean solution with advantage of all code in the same place and being at the higher level of abstraction with a relocation table.
The mainline libbpf keeps array of maps for a bpf_object, we extended this by adding an array of maps associated with each bpf_program.
For example, a code excerpt, from our development branch, which associates a map with bpf_program at relocation phase:
insn[0].src_reg = BPF_PSEUDO_MAP_FD; insn[0].imm = map->fd; err = bpf_program__add_map(prog, map);
bpf_object is the unit of coherence in libbpf, so I don't see us refcounting maps between bpf_objects. Kernel is doing refcounting based on FDs, so see if you can use that.
I can understand that. That said, I think if there's no logic across objects, and bpf_object access is not thread-safe, it puts us into a tough situation:
- Complex refcounting, code scanning, etc to keep consistency when
manipulating maps used by multiple programs.
- Parallel loading not being well-balanced, if we split programs across
objects.
We could alternatively write our own custom loader, but then we’d have to duplicate much of the useful logic that libbpf already implements: skeleton generation, map/program association, embedding programs into ELFs, loading logic and kernel probing, etc. We’d like some way to handle dynamic/parallel loading without having to replicate all the advantages libbpf grants us.
Yeah, I can understand that as well, but bpf_object's single-threaded design and the fact that bpf_object__load is kind of the final step where programs are loaded (or not) is pretty backed in. I don't see bpf_object becoming multi-threaded.
We understood this, but the current bpf_object design allowed us to use it in a multithreaded environment with minor modification for bpf_program load.
We understand that the design choice of libbpf being single threaded is unlikely to be reconsidered.
bpf_object is the unit of coherence in libbpf, so I don't see us refcounting maps between bpf_objects. Kernel is doing refcounting based on FDs, so see if you can use that.
I can understand that. That said, I think if there's no logic across objects, and bpf_object access is not thread-safe, it puts us into a tough situation:
- Complex refcounting, code scanning, etc to keep consistency when
manipulating maps used by multiple programs.
- Parallel loading not being well-balanced, if we split programs across
objects.
We could alternatively write our own custom loader, but then we’d have to duplicate much of the useful logic that libbpf already implements: skeleton generation, map/program association, embedding programs into ELFs, loading logic and kernel probing, etc. We’d like some way to handle dynamic/parallel loading without having to replicate all the advantages libbpf grants us.
Yeah, I can understand that as well, but bpf_object's single-threaded design and the fact that bpf_object__load is kind of the final step where programs are loaded (or not) is pretty backed in. I don't see bpf_object becoming multi-threaded. The dynamic program loading/unloading/loading again is something that I can't yet justify, tbh.
So the best I can propose you is to use libbpf's skeleton and bpf_object concept for, effectively, ELF handling, relocations, all the preparations up to loading BPF programs. And after that you can take over loading and handling program lifetime outside of bpf_object.
Dynamic map creation after bpf_object__load() I think is completely outside of the scope and you'll have to solve this problem for yourself. I would point out, though, that internally libbpf already switched to sort-of pre-creating stable FDs for maps before they are actually created in the kernel. So it's conceivable that we can have more granularity in bpf_object preparation. I.e., first step would be to parse ELF and handle relocations, prepare everything. After that we can have a step to create maps, and then another one to create programs. Usually people would do all that, but you can stop right before maps creation or before program creation, whatever fits your use case better.
The key is that program instructions will be final and won't need adjustments regardless of maps actually being created or not. FDs, as I mentioned, are stable regardless.
We used this in our design, so we did not need to scan BPF program instructions to fix map's fds referenced by instructions from a dynamically loaded bpf_program with dynamically created maps.
The use case here is that our security monitoring agent leverages eBPF as its foundational technology to gather telemetry from the kernel. As part of that, we hook many different kernel subsystems (process, memory, filesystem, network, etc), tying them together and tracking with maps. So we legitimately have a very large number of programs all doing different work. For products of this scale, it increases security and performance to load this set of programs and their maps in an optimized, parallel fashion and subsequently change the loaded set of programs and maps dynamically without disturbing the rest of the application.
Yes, makes sense. You'll need to decide for yourself if it's actually more meaningful to split those 200 programs into independent bpf_objects by features, and be rigorous about sharing state (maps) through bpf_map__reuse_fd(), which would allow to parallelize loading within confines of existing libbpf APIs. Or you can be a bit more low-level with program loading outside of bpf_object API, as I described above.
Yes, this can be one of the ways to share bpf maps across multiple bpf_objects and use existing libbpf for parallel bps programs loading, if we want to keep a full libbpf compatibility, but at a cost of complicating design, as we need to convert a single bpf_object model to multiple bpf_objects with a new layer that manages these bpf_objects.
In our case, as a bpf_program can map to multiple features, which can be modified independently, and to achieve an even load balancing across multiple threads, it would be probably one bpf_program for a bpf_object.