On Mon, Jul 28, 2025 at 3:35 PM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
Please make a full description of what the test does, since it's not trivial to decipher from the code. If I'm reading it correctly, first, the user space makes the LPM completely full and then lookup/update use random key-s within range.
Yep, that's correct. I'll provide a more comprehensive description in v4.
But delete looks different. It seems the kernel delete operation can interleave with user space refilling the LPM, so ...
lookup: throughput 7.423 ± 0.023 M ops/s ( 7.423M ops/prod), latency 134.710 ns/op update: throughput 2.643 ± 0.015 M ops/s ( 2.643M ops/prod), latency 378.310 ns/op delete: throughput 0.712 ± 0.008 M ops/s ( 0.712M ops/prod), latency 1405.152 ns/op
this comparison doesn't look like apples to apples, since delete will include user space refill time.
Yeah, you're right. Though we only measure the delete time from in the BPF prog, delete throughput is essentially blocked while the refill happens and because things are measured with a 1-second timer (bench.c:sigalarm_handler) the refill time gets accounted for anyway. I could try extrapolating the delete time like I've done for the free op, i.e. we calculate how many ops were completed in what fraction of a second.
free: throughput 0.574 ± 0.003 K ops/s ( 0.574K ops/prod), latency 1.743 ms/op
Does this measure the free-ing of full LPM ?
Yes, this measures the total time to free every element in the trie.
+static void __lpm_validate(void)
why double underscore ? Just lpm_validate() ?
The double underscore is used for the common validation parts, but I'll rename this to include "_common()" (along with all other uses of __).
/*
* Ideally we'd have access to the map ID but that's already
* freed before we enter trie_free().
*/
BPF_CORE_READ_STR_INTO(&name, map, name);
if (bpf_strncmp(name, BPF_OBJ_NAME_LEN, "trie_free_map"))
return 0;
val = bpf_ktime_get_ns();
bpf_map_update_elem(&latency_free_start, &map, &val, BPF_ANY);
Looks like there is only one lpm map. What's the point of that extra map ? Just have a global var for start time ?
Sure, I can make this a global variable for the start time instead.
bpf_get_prandom_u32() is not free and modulo operation isn't free either. The benchmark includes their time. It's ok to have it, but add a mode where the bench tests linear lookup/update too with simple key.data++
Good idea.
Since the map suppose to full before we start all keys should be there, right?
Yes.
Let's add a sanity check that update() succeeds.
Will do.
+static int delete (__u32 index, bool *need_refill) +{
struct trie_key key = {
.data = deleted_entries,
.prefixlen = prefixlen,
};
bpf_map_delete_elem(&trie_map, &key);
/* Do we need to refill the map? */
if (++deleted_entries == nr_entries) {
/*
* Atomicity isn't required because DELETE only supports
* one producer running concurrently. What we need is a
* way to track how many entries have been deleted from
* the trie between consecutive invocations of the BPF
* prog because a single bpf_loop() call might not
* delete all entries, e.g. when NR_LOOPS < nr_entries.
*/
deleted_entries = 0;
*need_refill = true;
return 1;
This early break is another reason that makes 'delete' op different from 'lookup/update'. Pls make all 3 consistent, so they can be compared to each other.
Hmm.. I'm not quite sure how to do that. lookup/update don't modify the number of entries in the map whereas delete does (update only updates existing entries, it doesn't create new ones). So when the map is empty it needs to be refilled. You're right that somehow the time to refill needs to be removed from the delete throughput/latency numbers but fundamentally these 3 ops are not the same and I don't see how to treat them as such.