On 2024/9/10 19:27, Jesper Dangaard Brouer wrote: ...
The main issue I remembered was that it only support x86:(
Yes, because I've added ASM code for reading TSC counter in a very precise manor. Given we run many iterations, then I don't think we need this precise reading. I guess it can simply be replaced with get_cycles() or get_cycles64(). Then it should work on all archs.
Agreed.
The code already supports wall-clock time via ktime_get() (specifically ktime_get_real_ts64()).
My preference here (for the performance part) is to upstream the out-of-tree tests that Jesper (and probably others) are using, rather than adding a new performance test that is not as battle-hardened.
I looked through the out-of-tree tests again, it seems we can take the best of them. For Jesper' ko: It seems we can do prefill as something that pp_fill_ptr_ring() does in bench_page_pool_simple.c to avoid the noise from the page allocator.
For the ko in this patch: It uses NAPI instead of tasklet mimicking the NAPI context, support PP_FLAG_DMA_MAP flag testing, and return '-EAGAIN' in module_init() to use perf stat for collecting and calculating performance data.
My bench don't return minus-number on module load, because I used perf record, and to see symbols decoded with perf report, I needed the module to be loaded.
I started on reading the PMU counters[1] around the bench loop, it works if enabling PMU counters yourself/manually, but I never finished that work.
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/...
Is there other testcase or better practicing that we can learn from Jesper' out of tree ko?
I created a time_bench.c [2] module that other modules [3] can use to easier reuse the benchmarking code in other modules.
[2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time...
[3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/benc...
Will take a look at it, thanks.
--Jesper