On 28/07/2025 16.34, Alexei Starovoitov wrote:
diff --git a/tools/testing/selftests/bpf/progs/lpm_trie_bench.c b/tools/testing/selftests/bpf/progs/lpm_trie_bench.c new file mode 100644 index 000000000000..522e1cbef490 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/lpm_trie_bench.c
[...]
+static void gen_random_key(struct trie_key *key) +{
key->prefixlen = prefixlen;
key->data = bpf_get_prandom_u32() % nr_entries;
bpf_get_prandom_u32() is not free and modulo operation isn't free either. The benchmark includes their time. It's ok to have it, but add a mode where the bench tests linear lookup/update too with simple key.data++
I've extended this bench with a "noop" and "baseline" benchmark[1].
[1] https://lore.kernel.org/all/175509897596.2755384.18413775753563966331.stgit@...
This allowed us to measure and deduce that the: bpf_get_prandom_u32() % nr_entries
Takes 14.1 nanosec for doing the rand + modulo.
The "noop" test shows harness overhead is 13.402 ns/op and on-top the "baseline" shows randomness takes 27.529 ns/op.
--Jesper