Hello everyone,
After upgrading the version of QEMU used in our CI from 4.2.0 to 6.2.0, I noticed that our 4.9 arm64 big endian builds stopped booting properly. This is not something that is clang specific, I could reproduce it with GCC 8.3.0 (the rootfs is at [1]).
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- distclean defconfig
$ scripts/config -e CPU_BIG_ENDIAN
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- olddefconfig Image.gz
$ qemu-system-aarch64 \ -initrd rootfs.cpio \ -append 'console=ttyAMA0 earlycon' \ -cpu max \ -machine virt,gic-version=max \ -machine virtualization=true \ -display none \ -kernel arch/arm64/boot/Image.gz \ -m 512m \ -nodefaults \ -serial mon:stdio [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 4.9.296 (nathan@archlinux-ax161) (gcc version 8.3.0 (Debian 8.3.0-2) ) #1 SMP PREEMPT Fri Jan 7 19:10:49 UTC 2022 ... [ 0.773924] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 0.773924] [ 0.776016] CPU: 0 PID: 1 Comm: init Not tainted 4.9.296 #1 [ 0.776149] Hardware name: linux,dummy-virt (DT) [ 0.776375] Call trace: [ 0.777063] [<ffff000008088ba0>] dump_backtrace+0x0/0x1b0 [ 0.777293] [<ffff000008088d64>] show_stack+0x14/0x20 [ 0.777420] [<ffff0000088c2d18>] dump_stack+0x98/0xb8 [ 0.777555] [<ffff0000088c0ee8>] panic+0x11c/0x278 [ 0.777684] [<ffff0000080c4d20>] do_exit+0x940/0x970 [ 0.777816] [<ffff0000080c4db8>] do_group_exit+0x38/0xa0 [ 0.777974] [<ffff0000080cf698>] get_signal+0xb8/0x678 [ 0.778111] [<ffff000008087ca8>] do_signal+0xd8/0x9b0 [ 0.778248] [<ffff0000080888dc>] do_notify_resume+0x8c/0xa8 [ 0.778392] [<ffff000008082ff4>] work_pending+0x8/0x10 [ 0.778790] Kernel Offset: disabled [ 0.778891] Memory Limit: none [ 0.779241] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
I ended up bisecting QEMU down to cd3f80aba0 ("target/arm: Enable ARMv8.1-VHE in -cpu max") [2], which did not seem obviously broken. I noticed that our 4.14 builds were fine so I ended up doing a reverse bisect down to commit ec347012bbec ("arm64: sysreg: Move to use definitions for all the SCTLR bits"). Getting that change to apply cleanly involved applying the three other arm64 patches in this series and making it build properly required the BUILD_BUG_ON header split (including bug.h might have been sufficient but I did not want to risk any further breakage). I searched through mainline to see if there were any fixes commits that I missed and I did not find any.
I am not sure if this series would be acceptable in 4.9, hence the RFC tag. If not, I am happy to just spin down our boot tests of arm64 big endian on 4.9, which is not a super valuable target, but I figured I would send the series and let others decide!
[1]: https://github.com/ClangBuiltLinux/boot-utils/tree/6cfa15992d375dfb874ca0677... [2]: https://gitlab.com/qemu-project/qemu/-/commit/cd3f80aba0c559a6291f7c3e686422...
Cheers, Nathan
Ian Abbott (1): bug: split BUILD_BUG stuff out into <linux/build_bug.h>
James Morse (1): arm64: sysreg: Move to use definitions for all the SCTLR bits
Mark Rutland (2): arm64: reduce el2_setup branching arm64: move !VHE work to end of el2_setup
Stefan Traby (1): arm64: Remove a redundancy in sysreg.h
arch/arm64/include/asm/sysreg.h | 69 +++++++++++++++++++++++++-- arch/arm64/kernel/head.S | 49 ++++++++----------- arch/arm64/mm/proc.S | 24 +--------- include/linux/bug.h | 72 +--------------------------- include/linux/build_bug.h | 84 +++++++++++++++++++++++++++++++++ 5 files changed, 170 insertions(+), 128 deletions(-) create mode 100644 include/linux/build_bug.h
base-commit: 710bf39c7aec32641ea63f6593db1df8c3e4a4d7