Masami, thanks for verifying!
Hi Greg and Sasha,
On Tue, Jul 11, 2023 at 10:33:03AM +0900, Masami Hiramatsu wrote:
On Mon, 10 Jul 2023 08:57:03 -0700 Nathan Chancellor nathan@kernel.org wrote:
On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
I just build tested, since I could not boot the kernel with CFI_CLANG=y. Would anyone know something about this error?
[ 0.141030] MMIO Stale Data: Unknown: No mitigations [ 0.153511] SMP alternatives: Using kCFI [ 0.164593] Freeing SMP alternatives memory: 36K [ 0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b [ 0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126 [ 0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 0.166028] Call Trace: [ 0.166028] <TASK> [ 0.166028] dump_stack_lvl+0x6e/0xb0 [ 0.166028] panic+0x146/0x2f0 [ 0.166028] ? start_kernel+0x472/0x48b [ 0.166028] __stack_chk_fail+0x14/0x20 [ 0.166028] start_kernel+0x472/0x48b [ 0.166028] x86_64_start_reservations+0x24/0x30 [ 0.166028] x86_64_start_kernel+0xa6/0xbb [ 0.166028] secondary_startup_64_no_verify+0x106/0x11b [ 0.166028] </TASK> [ 0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to me. What version of LLVM are you using? This was fixed in 16.0.4. Commit 514ca14ed544 ("start_kernel: Add __no_stack_protector function attribute") should resolve it on the Linux side, it looks like that is in 6.5-rc1. Not sure if we should backport it or just let people upgrade their toolchains on older releases.
Thanks for the info. I confirmed that the commit fixed the boot issue. So I think it should be backported to the stable tree.
Would you please apply commit 514ca14ed544 ("start_kernel: Add __no_stack_protector function attribute") to linux-6.4.y? The series ending with commit 611d4c716db0 ("x86/hyperv: Mark hv_ghcb_terminate() as noreturn") that shipped in 6.4 exposes an LLVM issue that affected 16.0.0 and 16.0.1, which was resolved in 16.0.2. When using those affected LLVM releases, the following crash at boot occurs:
[ 0.181667] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0 [ 0.182621] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.3 #1 [ 0.182621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 0.182621] Call Trace: [ 0.182621] <TASK> [ 0.182621] dump_stack_lvl+0x6a/0xa0 [ 0.182621] panic+0x124/0x2f0 [ 0.182621] ? start_kernel+0x3cf/0x3d0 [ 0.182621] ? acpi_enable+0x64/0xc0 [ 0.182621] __stack_chk_fail+0x14/0x20 [ 0.182621] start_kernel+0x3cf/0x3d0 [ 0.182621] x86_64_start_reservations+0x24/0x30 [ 0.182621] x86_64_start_kernel+0xab/0xb0 [ 0.182621] secondary_startup_64_no_verify+0x107/0x10b [ 0.182621] </TASK> [ 0.182621] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0 ]---
514ca14ed544 aims to avoid this on the Linux side. I have verified that it applies to 6.4.3 cleanly and resolves the issue there, as has Masami.
If there are any issues or questions, please let me know.
Cheers, Nathan