While exploring uretprobe syscall and trampoline for ARM64, we observed a slight performance gain for Redis benchmark using uretprobe syscall. This patchset aims to further improve the performance of uretprobe by optimizing the management of struct return_instance data.
In details, uretprobe utilizes dynamically allocated memory for struct return_instance data. These data track the call chain of instrumented functions. This approach is not efficient, especially considering the inherent locality of function invocation.
This patchset proposes a rework of the return_instances management. It replaces dynamic memory allocation with a statically allocated array. This approach leverages the stack-style usage of return_instance and remove the need for kamlloc/kfree operations.
This patch has been tested on Kunpeng916 (Hi1616), 4 NUMA nodes, 64 cores @ 2.4GHz. Redis benchmarks show a throughput gain by 2% for Redis GET and SET commands:
------------------------------------------------------------------ Test case | No uretprobes | uretprobes | uretprobes | | (current) | (optimized) ================================================================== Redis SET (RPS) | 47025 | 40619 (-13.6%) | 41529 (-11.6%) ------------------------------------------------------------------ Redis GET (RPS) | 46715 | 41426 (-11.3%) | 42306 (-9.4%) ------------------------------------------------------------------
Liao Chang (2): uprobes: Optimize the return_instance related routines selftests/bpf: Add uretprobe test for return_instance management
include/linux/uprobes.h | 10 +- kernel/events/uprobes.c | 162 +++++++++++------- .../bpf/prog_tests/uretprobe_depth.c | 150 ++++++++++++++++ .../selftests/bpf/progs/uretprobe_depth.c | 19 ++ 4 files changed, 274 insertions(+), 67 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/uretprobe_depth.c create mode 100644 tools/testing/selftests/bpf/progs/uretprobe_depth.c