When a caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond) trace_foo(args);
trace_foo() internally re-evaluates the static_branch_unlikely() key. Since static branches are patched binary instructions the compiler cannot fold the two evaluations, so every such site pays the cost twice.
This series introduces trace_invoke_##name() as a companion to trace_##name(). It calls __do_trace_##name() directly, bypassing the redundant static-branch re-check, while preserving all other correctness properties of the normal path (RCU-watching assertion, might_fault() for syscall tracepoints). The internal __do_trace_##name() symbol is not leaked to call sites; trace_invoke_##name() is the only new public API.
if (trace_foo_enabled() && cond) trace_invoke_foo(args); /* calls __do_trace_foo() directly */
The first patch adds the three-location change to include/linux/tracepoint.h (__DECLARE_TRACE, __DECLARE_TRACE_SYSCALL, and the !TRACEPOINTS_ENABLED stub). The remaining 14 patches mechanically convert all guarded call sites found in the tree: kernel/, io_uring/, net/, accel/habanalabs, cpufreq/, devfreq/, dma-buf/, fsi/, drm/, HID, i2c/, spi/, scsi/ufs/, and btrfs/.
This series is motivated by Peter Zijlstra's observation in the discussion around Dmitry Ilvokhin's locking tracepoint instrumentation series, where he noted that compilers cannot optimize static branches and that guarded call sites end up evaluating the static branch twice for no reason, and by Steven Rostedt's suggestion to add a proper API instead of exposing internal implementation details like __do_trace_##name() directly to call sites:
https://lore.kernel.org/linux-trace-kernel/8298e098d3418cb446ef396f119edac58...
Suggested-by: Steven Rostedt rostedt@goodmis.org Suggested-by: Peter Zijlstra peterz@infradead.org
Vineeth Pillai (Google) (15): tracepoint: Add trace_invoke_##name() API kernel: Use trace_invoke_##name() at guarded tracepoint call sites io_uring: Use trace_invoke_##name() at guarded tracepoint call sites net: Use trace_invoke_##name() at guarded tracepoint call sites accel/habanalabs: Use trace_invoke_##name() at guarded tracepoint call sites cpufreq: Use trace_invoke_##name() at guarded tracepoint call sites devfreq: Use trace_invoke_##name() at guarded tracepoint call sites dma-buf: Use trace_invoke_##name() at guarded tracepoint call sites fsi: Use trace_invoke_##name() at guarded tracepoint call sites drm: Use trace_invoke_##name() at guarded tracepoint call sites HID: Use trace_invoke_##name() at guarded tracepoint call sites i2c: Use trace_invoke_##name() at guarded tracepoint call sites spi: Use trace_invoke_##name() at guarded tracepoint call sites scsi: ufs: Use trace_invoke_##name() at guarded tracepoint call sites btrfs: Use trace_invoke_##name() at guarded tracepoint call sites
drivers/accel/habanalabs/common/device.c | 12 ++++++------ drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++- drivers/accel/habanalabs/common/pci/pci.c | 4 ++-- drivers/cpufreq/amd-pstate.c | 10 +++++----- drivers/cpufreq/cpufreq.c | 2 +- drivers/cpufreq/intel_pstate.c | 2 +- drivers/devfreq/devfreq.c | 2 +- drivers/dma-buf/dma-fence.c | 4 ++-- drivers/fsi/fsi-master-aspeed.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- drivers/gpu/drm/scheduler/sched_entity.c | 4 ++-- drivers/hid/intel-ish-hid/ipc/pci-ish.c | 2 +- drivers/i2c/i2c-core-slave.c | 2 +- drivers/spi/spi-axi-spi-engine.c | 4 ++-- drivers/ufs/core/ufshcd.c | 12 ++++++------ fs/btrfs/extent_map.c | 4 ++-- fs/btrfs/raid56.c | 4 ++-- include/linux/tracepoint.h | 11 +++++++++++ io_uring/io_uring.h | 2 +- kernel/irq_work.c | 2 +- kernel/sched/ext.c | 2 +- kernel/smp.c | 2 +- net/core/dev.c | 2 +- net/core/xdp.c | 2 +- net/openvswitch/actions.c | 2 +- net/openvswitch/datapath.c | 2 +- net/sctp/outqueue.c | 2 +- net/tipc/node.c | 2 +- 30 files changed, 62 insertions(+), 50 deletions(-)