On 2025/12/9 21:48, Kevin Brodsky wrote:
On 04/12/2025 09:21, Jinjie Ruan wrote:
After switch arm64 to Generic Entry, the compiler no longer inlines
Did it inline it before this series?
Yes, as below
<do_el0_svc>: d503201f nop d503201f nop d503233f paciasp a9be7bfd stp x29, x30, [sp, #-32]! 910003fd mov x29, sp a90153f3 stp x19, x20, [sp, #16] aa0003f3 mov x19, x0 d5384114 mrs x20, sp_el0 f9400001 ldr x1, [x0] f9400282 ldr x2, [x20] f9008801 str x1, [x0, #272] f9402001 ldr x1, [x0, #64] b9011801 str w1, [x0, #280] 373001e2 tbnz w2, #6, ffff80008002c0f0 <do_el0_svc+0x70> f278105f tst x2, #0x1f00 54000261 b.ne ffff80008002c108 <do_el0_svc+0x88> // b.any 52803ac2 mov w2, #0x1d6 // #470 97ffffb1 bl ffff80008002bf88 <invoke_syscall.constprop.0> f9400280 ldr x0, [x20] 92783400 and x0, x0, #0x3fff00 926bdc00 and x0, x0, #0xffffffffffe01fff b4000060 cbz x0, ffff80008002c0e0 <do_el0_svc+0x60> aa1303e0 mov x0, x19 97ffc835 bl ffff80008001e1b0 <syscall_trace_exit> a94153f3 ldp x19, x20, [sp, #16] a8c27bfd ldp x29, x30, [sp], #32 d50323bf autiasp d65f03c0 ret 92804000 mov x0, #0xfffffffffffffdff // #-513 f9000260 str x0, [x19] a94153f3 ldp x19, x20, [sp, #16] a8c27bfd ldp x29, x30, [sp], #32 d50323bf autiasp d65f03c0 ret 3100043f cmn w1, #0x1 54000140 b.eq ffff80008002c134 <do_el0_svc+0xb4> // b.none aa1303e0 mov x0, x19 97ffc7c1 bl ffff80008001e018 <syscall_trace_enter> 2a0003e1 mov w1, w0 3100041f cmn w0, #0x1 54fffdc0 b.eq ffff80008002c0d8 <do_el0_svc+0x58> // b.none aa1303e0 mov x0, x19 52803ac2 mov w2, #0x1d6 // #470 97ffff97 bl ffff80008002bf88 <invoke_syscall.constprop.0> 17ffffea b ffff80008002c0d8 <do_el0_svc+0x58> 928004a0 mov x0, #0xffffffffffffffda // #-38 f9000260 str x0, [x19] 17fffff5 b ffff80008002c110 <do_el0_svc+0x90> d53cd044 mrs x4, tpidr_el2 d53cd040 mrs x0, tpidr_el2 d53cd041 mrs x1, tpidr_el2 00000000 udf #0 d503201f nop d503201f nop
el0_svc_common() into do_el0_svc(). So inline el0_svc_common() and it has 1% performance uplift on perf bench basic syscall on kunpeng920 as below.
| Metric | W/O this patch | With this patch | Change | | ---------- | -------------- | --------------- | --------- | | Total time | 2.195 [sec] | 2.171 [sec] | ↓1.1% | | usecs/op | 0.219575 | 0.217192 | ↓1.1% | | ops/sec | 4,554,260 | 4,604,225 | ↑1.1% |
Signed-off-by: Jinjie Ruan ruanjinjie@huawei.com
I think this is sensible - do_el0_svc() is clearly hot and the small increase in code size is completely justified. It also removes a performance regression when enabling CONFIG_COMPAT (without it el0_svc_common() has only one caller so it should be inlined regardless).
Reviewed-by: Kevin Brodsky kevin.brodsky@arm.com
arch/arm64/kernel/syscall.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 47e193a1cfff..5aa51da9ec25 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -66,8 +66,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno, choose_random_kstack_offset(get_random_u16()); } -static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
const syscall_fn_t syscall_table[])+static __always_inline void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
const syscall_fn_t syscall_table[]){ unsigned long work = READ_ONCE(current_thread_info()->syscall_work); unsigned long flags = read_thread_flags();