Re: [PATCH RFC net-next v2] page_pool: import Jesper's page_pool benchmark

28 May 2025


      On Wed, May 28, 2025 at 11:28:54AM +0200, Toke Høiland-Jørgensen wrote:
...
Mina Almasry almasrymina@google.com writes:
...
On Mon, May 26, 2025 at 5:51 AM Toke Høiland-Jørgensen toke@redhat.com wrote:
...
Back when you posted the first RFC, Jesper and I chatted about ways to
avoid the ugly "load module and read the output from dmesg" interface to
the test.
...
...
I agree the existing interface is ugly.
...
...
...
One idea we came up with was to make the module include only the "inner"
functions for the benchmark, and expose those to BPF as kfuncs. Then the
test runner can be a BPF program that runs the tests, collects the data
and passes it to userspace via maps or a ringbuffer or something. That's
a nicer and more customisable interface than the printk output. And if
they're small enough, maybe we could even include the functions into the
page_pool code itself, instead of in a separate benchmark module?
...
...
...
WDYT of that idea? :)
...
...
...but this sounds like an enormous amount of effort, for something
that is a bit ugly but isn't THAT bad. Especially for me, I'm not that
much of an expert that I know how to implement what you're referring
to off the top of my head. I normally am open to spending time but
this is not that high on my todolist and I have limited bandwidth to
resolve this :(
...
...
I also feel that this is something that could be improved post merge.
agreed
...
...
I think it's very beneficial to have this merged in some form that can
be improved later. Byungchul is making a lot of changes to these mm
things and it would be nice to have an easy way to run the benchmark
in tree and maybe even get automated results from nipa. If we could
agree on mvp that is appropriate to merge without too much scope creep
that would be ideal from my side at least.
...
Right, fair. I guess we can merge it as-is, and then investigate whether
we can move it to BPF-based (or maybe 'perf bench' - Cc acme) later :)
tldr; I'd advise to merge it as-is, then kfunc'ify parts of it and use
it from a 'perf bench' suite.
Yeah, the model would be what I did for uprobes, but even then there is
a selftests based uprobes benchmark ;-)
The 'perf bench' part, that calls into the skel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
The skel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
While this one is just to generate BPF load to measure the impact on
uprobes, for your case it would involve using a ring buffer to
communicate from the skel (BPF/kernel side) to the userspace part,
similar to what is done in various other BPF based perf tooling
available in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
Like at this line (BPF skel part):
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tre...
The simplest part is in the canonical, standalone runqslower tool, also
hosted in the kernel sources:
BPF skel sending stuff to userspace:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
The userspace part that reads it:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
This is a callback that gets called for every event that the BPF skel
produces, called from this loop:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
That handle_event callback was associated via:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
There is a dissection I did about this process a long time ago, but
still relevant, I think:
http://oldvger.kernel.org/~acme/bpf/devconf.cz-2020-BPF-The-Status-of-BTF-pr...
The part explaining the interaction userspace/kernel starts here:
http://oldvger.kernel.org/~acme/bpf/devconf.cz-2020-BPF-The-Status-of-BTF-pr...
(yeah, its http, but then, its _old_vger ;-)
Doing it in perf is interesting because it gets widely packaged, so
whatever you add to it gets visibility for people using 'perf bench' and
also gets available in most places, it would add to this collection:
root@number:~# perf bench
Usage: 
    perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
       syscall: System call benchmarks
           mem: Memory access benchmarks
          numa: NUMA scheduling and MM benchmarks
         futex: Futex stressing benchmarks
         epoll: Epoll stressing benchmarks
     internals: Perf-internals benchmarks
    breakpoint: Breakpoint benchmarks
        uprobe: uprobe benchmarks
           all: All benchmarks
root@number:~#
the 'perf bench' that uses BPF skel:
root@number:~# perf bench uprobe baseline
# Running 'uprobe/baseline' benchmark:
# Executed 1,000 usleep(1000) calls
     Total time: 1,050,383 usecs
1,050.383 usecs/op
root@number:~# perf trace  --summary perf bench uprobe trace_printk
# Running 'uprobe/trace_printk' benchmark:
# Executed 1,000 usleep(1000) calls
     Total time: 1,053,082 usecs
1,053.082 usecs/op
Summary of events:
uprobe-trace_pr (1247691), 3316 events, 96.9%
syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   clock_nanosleep     1000      0  1101.236     1.007     1.101    50.939      4.53%
   close                 98      0    32.979     0.001     0.337    32.821     99.52%
   perf_event_open        1      0    18.691    18.691    18.691    18.691      0.00%
   mmap                 209      0     0.567     0.001     0.003     0.007      2.59%
   bpf                   38      2     0.380     0.000     0.010     0.092     28.38%
   openat                65      0     0.171     0.001     0.003     0.012      7.14%
   mprotect              56      0     0.141     0.001     0.003     0.008      6.86%
   read                  68      0     0.082     0.001     0.001     0.010     11.60%
   fstat                 65      0     0.056     0.001     0.001     0.003      5.40%
   brk                   10      0     0.050     0.001     0.005     0.012     24.29%
   pread64                8      0     0.042     0.001     0.005     0.021     49.29%
<SNIP other syscalls>
root@number:~#
- Arnaldo

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH RFC net-next v2] page_pool: import Jesper's page_pool benchmark