Otherwise, we see warnings like this:
[ 0.000000][ T0] ------------[ cut here ]------------ [ 0.000000][ T0] unexpected static_call insn opcode 0xf at kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/static_call.c:88 __static_call_validate+0x68/0x70 [ 0.000000][ T0] Modules linked in: [ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.151-00083-gf200c7260296 #68 fe3cb25cf78cb710722bb5acd1cadddd35172924 [ 0.000000][ T0] RIP: 0010:__static_call_validate+0x68/0x70 [ 0.000000][ T0] Code: 0f b6 4a 04 81 f1 c0 00 00 00 09 c1 74 cc 80 3d be 2c 02 02 00 75 c3 c6 05 b5 2c 02 02 01 48 c7 c7 38 4f c3 82 e8 e8 c8 09 00 <0f> 0b c3 00 00 cc cc 00 53 48 89 fb 48 63 15 31 71 06 02 e8 b0 b8 [ 0.000000][ T0] RSP: 0000:ffffffff82e03e70 EFLAGS: 00010046 ORIG_RAX: 0000000000000000 [ 0.000000][ T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 [ 0.000000][ T0] RDX: 0000000000000000 RSI: ffffffff82e03ce0 RDI: 0000000000000001 [ 0.000000][ T0] RBP: 0000000000000001 R08: 00000000ffffffff R09: ffffffff82eaab70 [ 0.000000][ T0] R10: ffffffff82e2e900 R11: 205d305420202020 R12: ffffffff82e51960 [ 0.000000][ T0] R13: ffffffff81038987 R14: ffffffff81038987 R15: 0000000000000001 [ 0.000000][ T0] FS: 0000000000000000(0000) GS:ffffffff83726000(0000) knlGS:0000000000000000 [ 0.000000][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.000000][ T0] CR2: ffff888000014be8 CR3: 00000000037b2000 CR4: 00000000000000a0 [ 0.000000][ T0] Call Trace: [ 0.000000][ T0] <TASK> [ 0.000000][ T0] ? __warn+0x75/0xe0 [ 0.000000][ T0] ? report_bug+0x81/0xe0 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? early_fixup_exception+0x44/0xa0 [ 0.000000][ T0] ? early_idt_handler_common+0x2f/0x40 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? __static_call_validate+0x68/0x70 [ 0.000000][ T0] ? arch_static_call_transform+0x5c/0x90 [ 0.000000][ T0] ? __static_call_init+0x1ec/0x230 [ 0.000000][ T0] ? static_call_init+0x32/0x70 [ 0.000000][ T0] ? setup_arch+0x36/0x4f0 [ 0.000000][ T0] ? start_kernel+0x67/0x400 [ 0.000000][ T0] ? secondary_startup_64_no_verify+0xb1/0xbb [ 0.000000][ T0] </TASK> [ 0.000000][ T0] ---[ end trace 8c8589c01f370686 ]---
Peter Zijlstra (4): arch: Introduce CONFIG_FUNCTION_ALIGNMENT x86/alternatives: Introduce int3_emulate_jcc() x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions x86/static_call: Add support for Jcc tail-calls
Thomas Gleixner (1): x86/asm: Differentiate between code and function alignment
Makefile | 4 +- arch/Kconfig | 24 ++++++++++++ arch/ia64/Kconfig | 1 + arch/ia64/Makefile | 2 +- arch/x86/Kconfig | 2 + arch/x86/boot/compressed/head_64.S | 8 ++++ arch/x86/include/asm/linkage.h | 12 +++--- arch/x86/include/asm/text-patching.h | 31 +++++++++++++++ arch/x86/kernel/alternative.c | 56 +++++++++++++++++++++++----- arch/x86/kernel/kprobes/core.c | 38 ++++--------------- arch/x86/kernel/static_call.c | 50 +++++++++++++++++++++++-- include/asm-generic/vmlinux.lds.h | 4 +- include/linux/linkage.h | 4 +- lib/Kconfig.debug | 1 + 14 files changed, 183 insertions(+), 54 deletions(-)
From: Peter Zijlstra peterz@infradead.org
commit d49a0626216b95cd4bf696f6acf55f39a16ab0bb upstream.
Generic function-alignment infrastructure.
Architectures can select FUNCTION_ALIGNMENT_xxB symbols; the FUNCTION_ALIGNMENT symbol is then set to the largest such selected size, 0 otherwise.
From this the -falign-functions compiler argument and __ALIGN macro are set.
This incorporates the DEBUG_FORCE_FUNCTION_ALIGN_64B knob and future alignment requirements for x86_64 (later in this series) into a single place.
NOTE: also removes the 0x90 filler byte from the generic __ALIGN primitive, that value makes no sense outside of x86.
NOTE: .balign 0 reverts to a no-op.
Requested-by: Linus Torvalds torvalds@linux-foundation.org Change-Id: I053b3c408d56988381feb8c8bdb5e27ea221755f Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220915111143.719248727@infradead.org [cascardo: adjust context at arch/x86/Kconfig] Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com --- Makefile | 4 ++-- arch/Kconfig | 24 ++++++++++++++++++++++++ arch/ia64/Kconfig | 1 + arch/ia64/Makefile | 2 +- arch/x86/Kconfig | 2 ++ arch/x86/boot/compressed/head_64.S | 8 ++++++++ arch/x86/include/asm/linkage.h | 4 +--- include/asm-generic/vmlinux.lds.h | 4 ++-- include/linux/linkage.h | 4 ++-- lib/Kconfig.debug | 1 + 10 files changed, 44 insertions(+), 10 deletions(-)
diff --git a/Makefile b/Makefile index bb73dba0e505..16f3838838d0 100644 --- a/Makefile +++ b/Makefile @@ -1000,8 +1000,8 @@ KBUILD_CFLAGS += $(CC_FLAGS_CFI) export CC_FLAGS_CFI endif
-ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B -KBUILD_CFLAGS += -falign-functions=64 +ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0) +KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT) endif
# arch Makefile may override CC so keep this after arch Makefile is included diff --git a/arch/Kconfig b/arch/Kconfig index 2e2dc0975ab4..a541ce263865 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1303,4 +1303,28 @@ source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig"
+config FUNCTION_ALIGNMENT_4B + bool + +config FUNCTION_ALIGNMENT_8B + bool + +config FUNCTION_ALIGNMENT_16B + bool + +config FUNCTION_ALIGNMENT_32B + bool + +config FUNCTION_ALIGNMENT_64B + bool + +config FUNCTION_ALIGNMENT + int + default 64 if FUNCTION_ALIGNMENT_64B + default 32 if FUNCTION_ALIGNMENT_32B + default 16 if FUNCTION_ALIGNMENT_16B + default 8 if FUNCTION_ALIGNMENT_8B + default 4 if FUNCTION_ALIGNMENT_4B + default 0 + endmenu diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 89869aff8ca2..8902d1517894 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -63,6 +63,7 @@ config IA64 select PCI_MSI_ARCH_FALLBACKS if PCI_MSI select SET_FS select ZONE_DMA32 + select FUNCTION_ALIGNMENT_32B default y help The Itanium Processor Family is Intel's 64-bit successor to diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile index 7e548c654a29..43cde968da88 100644 --- a/arch/ia64/Makefile +++ b/arch/ia64/Makefile @@ -23,7 +23,7 @@ KBUILD_AFLAGS_KERNEL := -mconstant-gp EXTRA :=
cflags-y := -pipe $(EXTRA) -ffixed-r13 -mfixed-range=f12-f15,f32-f127 \ - -falign-functions=32 -frename-registers -fno-optimize-sibling-calls + -frename-registers -fno-optimize-sibling-calls KBUILD_CFLAGS_KERNEL := -mconstant-gp
GAS_STATUS = $(shell $(srctree)/arch/ia64/scripts/check-gas "$(CC)" "$(OBJDUMP)") diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cfb1edd25437..65b2cff3c93b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -267,6 +267,8 @@ config X86 select HAVE_ARCH_KCSAN if X86_64 select X86_FEATURE_NAMES if PROC_FS select PROC_PID_ARCH_STATUS if PROC_FS + select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16 + select FUNCTION_ALIGNMENT_4B imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
config INSTRUCTION_DECODER diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index c3d427c817c7..e189a16ae40a 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -37,6 +37,14 @@ #include <asm/trapnr.h> #include "pgtable.h"
+/* + * Fix alignment at 16 bytes. Following CONFIG_FUNCTION_ALIGNMENT will result + * in assembly errors due to trying to move .org backward due to the excessive + * alignment. + */ +#undef __ALIGN +#define __ALIGN .balign 16, 0x90 + /* * Locally defined symbols should be marked hidden: */ diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index 5000cf59bdf5..3d7293d52950 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -13,10 +13,8 @@
#ifdef __ASSEMBLY__
-#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16) -#define __ALIGN .p2align 4, 0x90 +#define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT, 0x90; #define __ALIGN_STR __stringify(__ALIGN) -#endif
#if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO) #define RET jmp __x86_return_thunk diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 8471717c5085..dd9ea351bc02 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -81,8 +81,8 @@ #define RO_EXCEPTION_TABLE #endif
-/* Align . to a 8 byte boundary equals to maximum function alignment. */ -#define ALIGN_FUNCTION() . = ALIGN(8) +/* Align . function alignment. */ +#define ALIGN_FUNCTION() . = ALIGN(CONFIG_FUNCTION_ALIGNMENT)
/* * LD_DEAD_CODE_DATA_ELIMINATION option enables -fdata-sections, which diff --git a/include/linux/linkage.h b/include/linux/linkage.h index dbf8506decca..fc81e51330b2 100644 --- a/include/linux/linkage.h +++ b/include/linux/linkage.h @@ -69,8 +69,8 @@ #endif
#ifndef __ALIGN -#define __ALIGN .align 4,0x90 -#define __ALIGN_STR ".align 4,0x90" +#define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT +#define __ALIGN_STR __stringify(__ALIGN) #endif
#ifdef __ASSEMBLY__ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 64d6292cf686..28faea9b5da6 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -424,6 +424,7 @@ config SECTION_MISMATCH_WARN_ONLY config DEBUG_FORCE_FUNCTION_ALIGN_64B bool "Force all function address 64B aligned" depends on EXPERT && (X86_64 || ARM64 || PPC32 || PPC64 || ARC) + select FUNCTION_ALIGNMENT_64B help There are cases that a commit from one domain changes the function address alignment of other domains, and cause magic performance
From: Thomas Gleixner tglx@linutronix.de
commit 8eb5d34e77c63fde8af21c691bcf6e3cd87f7829 upstream.
Create SYM_F_ALIGN to differentiate alignment requirements between SYM_CODE and SYM_FUNC.
This distinction is useful later when adding padding in front of functions; IOW this allows following the compiler's patchable-function-entry option.
[peterz: Changelog]
Change-Id: I4f9bc0507e5c3fdb3e0839806989efc305e0a758 Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220915111143.824822743@infradead.org [cascardo: adjust for missing commit c4691712b546 ("x86/linkage: Add ENDBR to SYM_FUNC_START*()")] Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com --- arch/x86/include/asm/linkage.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index 3d7293d52950..04a333c334ae 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -11,11 +11,15 @@ #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0))) #endif /* CONFIG_X86_32 */
-#ifdef __ASSEMBLY__ - #define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT, 0x90; #define __ALIGN_STR __stringify(__ALIGN)
+#define ASM_FUNC_ALIGN __ALIGN_STR +#define __FUNC_ALIGN __ALIGN +#define SYM_F_ALIGN __FUNC_ALIGN + +#ifdef __ASSEMBLY__ + #if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO) #define RET jmp __x86_return_thunk #else /* CONFIG_RETPOLINE */
From: Peter Zijlstra peterz@infradead.org
commit db7adcfd1cec4e95155e37bc066fddab302c6340 upstream.
Move the kprobe Jcc emulation into int3_emulate_jcc() so it can be used by more code -- specifically static_call() will need this.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/r/20230123210607.057678245@infradead.org Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com --- arch/x86/include/asm/text-patching.h | 31 +++++++++++++++++++++++ arch/x86/kernel/kprobes/core.c | 38 ++++++---------------------- 2 files changed, 39 insertions(+), 30 deletions(-)
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index c6015b407461..7281ce64e99d 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -181,6 +181,37 @@ void int3_emulate_ret(struct pt_regs *regs) unsigned long ip = int3_emulate_pop(regs); int3_emulate_jmp(regs, ip); } + +static __always_inline +void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp) +{ + static const unsigned long jcc_mask[6] = { + [0] = X86_EFLAGS_OF, + [1] = X86_EFLAGS_CF, + [2] = X86_EFLAGS_ZF, + [3] = X86_EFLAGS_CF | X86_EFLAGS_ZF, + [4] = X86_EFLAGS_SF, + [5] = X86_EFLAGS_PF, + }; + + bool invert = cc & 1; + bool match; + + if (cc < 0xc) { + match = regs->flags & jcc_mask[cc >> 1]; + } else { + match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^ + ((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT); + if (cc >= 0xe) + match = match || (regs->flags & X86_EFLAGS_ZF); + } + + if ((match && !invert) || (!match && invert)) + ip += disp; + + int3_emulate_jmp(regs, ip); +} + #endif /* !CONFIG_UML_X86 */
#endif /* _ASM_X86_TEXT_PATCHING_H */ diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 052ea7425c4d..893f040b97b7 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -463,50 +463,26 @@ static void kprobe_emulate_call(struct kprobe *p, struct pt_regs *regs) } NOKPROBE_SYMBOL(kprobe_emulate_call);
-static nokprobe_inline -void __kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs, bool cond) +static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs) { unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
- if (cond) - ip += p->ainsn.rel32; + ip += p->ainsn.rel32; int3_emulate_jmp(regs, ip); } - -static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs) -{ - __kprobe_emulate_jmp(p, regs, true); -} NOKPROBE_SYMBOL(kprobe_emulate_jmp);
-static const unsigned long jcc_mask[6] = { - [0] = X86_EFLAGS_OF, - [1] = X86_EFLAGS_CF, - [2] = X86_EFLAGS_ZF, - [3] = X86_EFLAGS_CF | X86_EFLAGS_ZF, - [4] = X86_EFLAGS_SF, - [5] = X86_EFLAGS_PF, -}; - static void kprobe_emulate_jcc(struct kprobe *p, struct pt_regs *regs) { - bool invert = p->ainsn.jcc.type & 1; - bool match; + unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
- if (p->ainsn.jcc.type < 0xc) { - match = regs->flags & jcc_mask[p->ainsn.jcc.type >> 1]; - } else { - match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^ - ((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT); - if (p->ainsn.jcc.type >= 0xe) - match = match || (regs->flags & X86_EFLAGS_ZF); - } - __kprobe_emulate_jmp(p, regs, (match && !invert) || (!match && invert)); + int3_emulate_jcc(regs, p->ainsn.jcc.type, ip, p->ainsn.rel32); } NOKPROBE_SYMBOL(kprobe_emulate_jcc);
static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs) { + unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size; bool match;
if (p->ainsn.loop.type != 3) { /* LOOP* */ @@ -534,7 +510,9 @@ static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs) else if (p->ainsn.loop.type == 1) /* LOOPE */ match = match && (regs->flags & X86_EFLAGS_ZF);
- __kprobe_emulate_jmp(p, regs, match); + if (match) + ip += p->ainsn.rel32; + int3_emulate_jmp(regs, ip); } NOKPROBE_SYMBOL(kprobe_emulate_loop);
From: Peter Zijlstra peterz@infradead.org
commit ac0ee0a9560c97fa5fe1409e450c2425d4ebd17a upstream.
In order to re-write Jcc.d32 instructions text_poke_bp() needs to be taught about them.
The biggest hurdle is that the whole machinery is currently made for 5 byte instructions and extending this would grow struct text_poke_loc which is currently a nice 16 bytes and used in an array.
However, since text_poke_loc contains a full copy of the (s32) displacement, it is possible to map the Jcc.d32 2 byte opcodes to Jcc.d8 1 byte opcode for the int3 emulation.
This then leaves the replacement bytes; fudge that by only storing the last 5 bytes and adding the rule that 'length == 6' instruction will be prefixed with a 0x0f byte.
Change-Id: Ie3f72c6b92f865d287c8940e5a87e59d41cfaa27 Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/r/20230123210607.115718513@infradead.org [cascardo: there is no emit_call_track_retpoline] Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com --- arch/x86/kernel/alternative.c | 56 +++++++++++++++++++++++++++++------ 1 file changed, 47 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index e5536edbae57..5614e6d219b7 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -351,6 +351,12 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, kasan_enable_current(); }
+static inline bool is_jcc32(struct insn *insn) +{ + /* Jcc.d32 second opcode byte is in the range: 0x80-0x8f */ + return insn->opcode.bytes[0] == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80; +} + #if defined(CONFIG_RETPOLINE) && defined(CONFIG_STACK_VALIDATION)
/* @@ -1201,6 +1207,11 @@ void text_poke_sync(void) on_each_cpu(do_sync_core, NULL, 1); }
+/* + * NOTE: crazy scheme to allow patching Jcc.d32 but not increase the size of + * this thing. When len == 6 everything is prefixed with 0x0f and we map + * opcode to Jcc.d8, using len to distinguish. + */ struct text_poke_loc { /* addr := _stext + rel_addr */ s32 rel_addr; @@ -1322,6 +1333,10 @@ noinstr int poke_int3_handler(struct pt_regs *regs) int3_emulate_jmp(regs, (long)ip + tp->disp); break;
+ case 0x70 ... 0x7f: /* Jcc */ + int3_emulate_jcc(regs, tp->opcode & 0xf, (long)ip, tp->disp); + break; + default: BUG(); } @@ -1395,16 +1410,26 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * Second step: update all but the first byte of the patched range. */ for (do_sync = 0, i = 0; i < nr_entries; i++) { - u8 old[POKE_MAX_OPCODE_SIZE] = { tp[i].old, }; + u8 old[POKE_MAX_OPCODE_SIZE+1] = { tp[i].old, }; + u8 _new[POKE_MAX_OPCODE_SIZE+1]; + const u8 *new = tp[i].text; int len = tp[i].len;
if (len - INT3_INSN_SIZE > 0) { memcpy(old + INT3_INSN_SIZE, text_poke_addr(&tp[i]) + INT3_INSN_SIZE, len - INT3_INSN_SIZE); + + if (len == 6) { + _new[0] = 0x0f; + memcpy(_new + 1, new, 5); + new = _new; + } + text_poke(text_poke_addr(&tp[i]) + INT3_INSN_SIZE, - (const char *)tp[i].text + INT3_INSN_SIZE, + new + INT3_INSN_SIZE, len - INT3_INSN_SIZE); + do_sync++; }
@@ -1432,8 +1457,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * The old instruction is recorded so that the event can be * processed forwards or backwards. */ - perf_event_text_poke(text_poke_addr(&tp[i]), old, len, - tp[i].text, len); + perf_event_text_poke(text_poke_addr(&tp[i]), old, len, new, len); }
if (do_sync) { @@ -1450,10 +1474,15 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * replacing opcode. */ for (do_sync = 0, i = 0; i < nr_entries; i++) { - if (tp[i].text[0] == INT3_INSN_OPCODE) + u8 byte = tp[i].text[0]; + + if (tp[i].len == 6) + byte = 0x0f; + + if (byte == INT3_INSN_OPCODE) continue;
- text_poke(text_poke_addr(&tp[i]), tp[i].text, INT3_INSN_SIZE); + text_poke(text_poke_addr(&tp[i]), &byte, INT3_INSN_SIZE); do_sync++; }
@@ -1471,9 +1500,11 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, const void *opcode, size_t len, const void *emulate) { struct insn insn; - int ret, i; + int ret, i = 0;
- memcpy((void *)tp->text, opcode, len); + if (len == 6) + i = 1; + memcpy((void *)tp->text, opcode+i, len-i); if (!emulate) emulate = opcode;
@@ -1484,6 +1515,13 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, tp->len = len; tp->opcode = insn.opcode.bytes[0];
+ if (is_jcc32(&insn)) { + /* + * Map Jcc.d32 onto Jcc.d8 and use len to distinguish. + */ + tp->opcode = insn.opcode.bytes[1] - 0x10; + } + switch (tp->opcode) { case RET_INSN_OPCODE: case JMP32_INSN_OPCODE: @@ -1500,7 +1538,6 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, BUG_ON(len != insn.length); };
- switch (tp->opcode) { case INT3_INSN_OPCODE: case RET_INSN_OPCODE: @@ -1509,6 +1546,7 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, case CALL_INSN_OPCODE: case JMP32_INSN_OPCODE: case JMP8_INSN_OPCODE: + case 0x70 ... 0x7f: /* Jcc */ tp->disp = insn.immediate.value; break;
From: Peter Zijlstra peterz@infradead.org
commit 923510c88d2b7d947c4217835fd9ca6bd65cc56c upstream.
Clang likes to create conditional tail calls like:
0000000000000350 <amd_pmu_add_event>: 350: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 351: R_X86_64_NONE __fentry__-0x4 355: 48 83 bf 20 01 00 00 00 cmpq $0x0,0x120(%rdi) 35d: 0f 85 00 00 00 00 jne 363 <amd_pmu_add_event+0x13> 35f: R_X86_64_PLT32 __SCT__amd_pmu_branch_add-0x4 363: e9 00 00 00 00 jmp 368 <amd_pmu_add_event+0x18> 364: R_X86_64_PLT32 __x86_return_thunk-0x4
Where 0x35d is a static call site that's turned into a conditional tail-call using the Jcc class of instructions.
Teach the in-line static call text patching about this.
Notably, since there is no conditional-ret, in that case patch the Jcc to point at an empty stub function that does the ret -- or the return thunk when needed.
Reported-by: "Erhard F." erhard_f@mailbox.org Change-Id: I99c8fc3f721e5d1c74f06710b38d4bac5230303a Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/r/Y9Kdg9QjHkr9G5b5@hirez.programming.kicks-ass.net [cascardo: __static_call_validate didn't have the bool tramp argument] Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com --- arch/x86/kernel/static_call.c | 50 ++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c index e25050c7ff1e..0e0221271e72 100644 --- a/arch/x86/kernel/static_call.c +++ b/arch/x86/kernel/static_call.c @@ -9,6 +9,7 @@ enum insn_type { NOP = 1, /* site cond-call */ JMP = 2, /* tramp / site tail-call */ RET = 3, /* tramp / site cond-tail-call */ + JCC = 4, };
/* @@ -25,12 +26,40 @@ static const u8 xor5rax[] = { 0x2e, 0x2e, 0x2e, 0x31, 0xc0 };
static const u8 retinsn[] = { RET_INSN_OPCODE, 0xcc, 0xcc, 0xcc, 0xcc };
+static u8 __is_Jcc(u8 *insn) /* Jcc.d32 */ +{ + u8 ret = 0; + + if (insn[0] == 0x0f) { + u8 tmp = insn[1]; + if ((tmp & 0xf0) == 0x80) + ret = tmp; + } + + return ret; +} + +extern void __static_call_return(void); + +asm (".global __static_call_return\n\t" + ".type __static_call_return, @function\n\t" + ASM_FUNC_ALIGN "\n\t" + "__static_call_return:\n\t" + ANNOTATE_NOENDBR + ANNOTATE_RETPOLINE_SAFE + "ret; int3\n\t" + ".size __static_call_return, . - __static_call_return \n\t"); + static void __ref __static_call_transform(void *insn, enum insn_type type, void *func, bool modinit) { const void *emulate = NULL; int size = CALL_INSN_SIZE; const void *code; + u8 op, buf[6]; + + if ((type == JMP || type == RET) && (op = __is_Jcc(insn))) + type = JCC;
switch (type) { case CALL: @@ -56,6 +85,20 @@ static void __ref __static_call_transform(void *insn, enum insn_type type, else code = &retinsn; break; + + case JCC: + if (!func) { + func = __static_call_return; + if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) + func = x86_return_thunk; + } + + buf[0] = 0x0f; + __text_gen_insn(buf+1, op, insn+1, func, 5); + code = buf; + size = 6; + + break; }
if (memcmp(insn, code, size) == 0) @@ -67,13 +110,14 @@ static void __ref __static_call_transform(void *insn, enum insn_type type, text_poke_bp(insn, code, size, emulate); }
-static void __static_call_validate(void *insn, bool tail) +static void __static_call_validate(u8 *insn, bool tail) { - u8 opcode = *(u8 *)insn; + u8 opcode = insn[0];
if (tail) { if (opcode == JMP32_INSN_OPCODE || - opcode == RET_INSN_OPCODE) + opcode == RET_INSN_OPCODE || + __is_Jcc(insn)) return; } else { if (opcode == CALL_INSN_OPCODE ||
On Wed, Mar 13, 2024 at 07:42:50AM -0300, Thadeu Lima de Souza Cascardo wrote:
Peter Zijlstra (4): arch: Introduce CONFIG_FUNCTION_ALIGNMENT x86/alternatives: Introduce int3_emulate_jcc() x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions x86/static_call: Add support for Jcc tail-calls
Thomas Gleixner (1): x86/asm: Differentiate between code and function alignment
Is this not an issue on 6.1? I don't see d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT") in that tree.
On Sat, Mar 16, 2024 at 05:41:16AM -0400, Sasha Levin wrote:
On Wed, Mar 13, 2024 at 07:42:50AM -0300, Thadeu Lima de Souza Cascardo wrote:
Peter Zijlstra (4): arch: Introduce CONFIG_FUNCTION_ALIGNMENT x86/alternatives: Introduce int3_emulate_jcc() x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions x86/static_call: Add support for Jcc tail-calls
Thomas Gleixner (1): x86/asm: Differentiate between code and function alignment
Is this not an issue on 6.1? I don't see d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT") in that tree.
-- Thanks, Sasha
The fix is really the last patch, 923510c88d2b ("x86/static_call: Add support for Jcc tail-calls"). I see that 6.1 got 3,4,5:
9d80f3e60043 x86/static_call: Add support for Jcc tail-calls c51a456b4179 x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions 75c066485bcb x86/alternatives: Introduce int3_emulate_jcc()
I can resubmit the series without the first two patches if that is preferred.
Thanks. Cascardo.
On Sat, Mar 16, 2024 at 07:02:19AM -0300, Thadeu Lima de Souza Cascardo wrote:
On Sat, Mar 16, 2024 at 05:41:16AM -0400, Sasha Levin wrote:
On Wed, Mar 13, 2024 at 07:42:50AM -0300, Thadeu Lima de Souza Cascardo wrote:
Peter Zijlstra (4): arch: Introduce CONFIG_FUNCTION_ALIGNMENT x86/alternatives: Introduce int3_emulate_jcc() x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions x86/static_call: Add support for Jcc tail-calls
Thomas Gleixner (1): x86/asm: Differentiate between code and function alignment
Is this not an issue on 6.1? I don't see d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT") in that tree.
-- Thanks, Sasha
The fix is really the last patch, 923510c88d2b ("x86/static_call: Add support for Jcc tail-calls"). I see that 6.1 got 3,4,5:
9d80f3e60043 x86/static_call: Add support for Jcc tail-calls c51a456b4179 x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions 75c066485bcb x86/alternatives: Introduce int3_emulate_jcc()
I can resubmit the series without the first two patches if that is preferred.
I'm just curious why we needed the first two patches on 5.15 but not on 6.1.
On Sat, Mar 16, 2024 at 07:01:46AM -0400, Sasha Levin wrote:
On Sat, Mar 16, 2024 at 07:02:19AM -0300, Thadeu Lima de Souza Cascardo wrote:
On Sat, Mar 16, 2024 at 05:41:16AM -0400, Sasha Levin wrote:
On Wed, Mar 13, 2024 at 07:42:50AM -0300, Thadeu Lima de Souza Cascardo wrote:
Peter Zijlstra (4): arch: Introduce CONFIG_FUNCTION_ALIGNMENT x86/alternatives: Introduce int3_emulate_jcc() x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions x86/static_call: Add support for Jcc tail-calls
Thomas Gleixner (1): x86/asm: Differentiate between code and function alignment
Is this not an issue on 6.1? I don't see d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT") in that tree.
-- Thanks, Sasha
The fix is really the last patch, 923510c88d2b ("x86/static_call: Add support for Jcc tail-calls"). I see that 6.1 got 3,4,5:
9d80f3e60043 x86/static_call: Add support for Jcc tail-calls c51a456b4179 x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions 75c066485bcb x86/alternatives: Introduce int3_emulate_jcc()
I can resubmit the series without the first two patches if that is preferred.
I'm just curious why we needed the first two patches on 5.15 but not on 6.1.
-- Thanks, Sasha
Because 923510c88d2b uses ASM_FUNC_ALIGN, which requires those two. I didn't look at the 6.1 backport for reference. There, it was removed. It seems to be a requirement for depth tracking that would not be necessary in 5.15 either.
I will send a v2.
Cascardo.
On Wed, Mar 13, 2024 at 07:42:50AM -0300, Thadeu Lima de Souza Cascardo wrote:
Otherwise, we see warnings like this:
[ 0.000000][ T0] ------------[ cut here ]------------ [ 0.000000][ T0] unexpected static_call insn opcode 0xf at kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/static_call.c:88 __static_call_validate+0x68/0x70 [ 0.000000][ T0] Modules linked in: [ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.151-00083-gf200c7260296 #68 fe3cb25cf78cb710722bb5acd1cadddd35172924 [ 0.000000][ T0] RIP: 0010:__static_call_validate+0x68/0x70 [ 0.000000][ T0] Code: 0f b6 4a 04 81 f1 c0 00 00 00 09 c1 74 cc 80 3d be 2c 02 02 00 75 c3 c6 05 b5 2c 02 02 01 48 c7 c7 38 4f c3 82 e8 e8 c8 09 00 <0f> 0b c3 00 00 cc cc 00 53 48 89 fb 48 63 15 31 71 06 02 e8 b0 b8 [ 0.000000][ T0] RSP: 0000:ffffffff82e03e70 EFLAGS: 00010046 ORIG_RAX: 0000000000000000 [ 0.000000][ T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 [ 0.000000][ T0] RDX: 0000000000000000 RSI: ffffffff82e03ce0 RDI: 0000000000000001 [ 0.000000][ T0] RBP: 0000000000000001 R08: 00000000ffffffff R09: ffffffff82eaab70 [ 0.000000][ T0] R10: ffffffff82e2e900 R11: 205d305420202020 R12: ffffffff82e51960 [ 0.000000][ T0] R13: ffffffff81038987 R14: ffffffff81038987 R15: 0000000000000001 [ 0.000000][ T0] FS: 0000000000000000(0000) GS:ffffffff83726000(0000) knlGS:0000000000000000 [ 0.000000][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.000000][ T0] CR2: ffff888000014be8 CR3: 00000000037b2000 CR4: 00000000000000a0 [ 0.000000][ T0] Call Trace: [ 0.000000][ T0] <TASK> [ 0.000000][ T0] ? __warn+0x75/0xe0 [ 0.000000][ T0] ? report_bug+0x81/0xe0 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? early_fixup_exception+0x44/0xa0 [ 0.000000][ T0] ? early_idt_handler_common+0x2f/0x40 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? kvm_vcpu_reload_apic_access_page+0x17/0x30 [ 0.000000][ T0] ? __static_call_validate+0x68/0x70 [ 0.000000][ T0] ? arch_static_call_transform+0x5c/0x90 [ 0.000000][ T0] ? __static_call_init+0x1ec/0x230 [ 0.000000][ T0] ? static_call_init+0x32/0x70 [ 0.000000][ T0] ? setup_arch+0x36/0x4f0 [ 0.000000][ T0] ? start_kernel+0x67/0x400 [ 0.000000][ T0] ? secondary_startup_64_no_verify+0xb1/0xbb [ 0.000000][ T0] </TASK> [ 0.000000][ T0] ---[ end trace 8c8589c01f370686 ]---
Now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org