On Fri, Jan 07, 2022 at 12:43:30PM -0700, Nathan Chancellor wrote:
Hello everyone,
After upgrading the version of QEMU used in our CI from 4.2.0 to 6.2.0, I noticed that our 4.9 arm64 big endian builds stopped booting properly. This is not something that is clang specific, I could reproduce it with GCC 8.3.0 (the rootfs is at [1]).
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- distclean defconfig
$ scripts/config -e CPU_BIG_ENDIAN
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- olddefconfig Image.gz
$ qemu-system-aarch64 \ -initrd rootfs.cpio \ -append 'console=ttyAMA0 earlycon' \ -cpu max \ -machine virt,gic-version=max \ -machine virtualization=true \ -display none \ -kernel arch/arm64/boot/Image.gz \ -m 512m \ -nodefaults \ -serial mon:stdio [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 4.9.296 (nathan@archlinux-ax161) (gcc version 8.3.0 (Debian 8.3.0-2) ) #1 SMP PREEMPT Fri Jan 7 19:10:49 UTC 2022 ... [ 0.773924] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 0.773924] [ 0.776016] CPU: 0 PID: 1 Comm: init Not tainted 4.9.296 #1 [ 0.776149] Hardware name: linux,dummy-virt (DT) [ 0.776375] Call trace: [ 0.777063] [<ffff000008088ba0>] dump_backtrace+0x0/0x1b0 [ 0.777293] [<ffff000008088d64>] show_stack+0x14/0x20 [ 0.777420] [<ffff0000088c2d18>] dump_stack+0x98/0xb8 [ 0.777555] [<ffff0000088c0ee8>] panic+0x11c/0x278 [ 0.777684] [<ffff0000080c4d20>] do_exit+0x940/0x970 [ 0.777816] [<ffff0000080c4db8>] do_group_exit+0x38/0xa0 [ 0.777974] [<ffff0000080cf698>] get_signal+0xb8/0x678 [ 0.778111] [<ffff000008087ca8>] do_signal+0xd8/0x9b0 [ 0.778248] [<ffff0000080888dc>] do_notify_resume+0x8c/0xa8 [ 0.778392] [<ffff000008082ff4>] work_pending+0x8/0x10 [ 0.778790] Kernel Offset: disabled [ 0.778891] Memory Limit: none [ 0.779241] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
I ended up bisecting QEMU down to cd3f80aba0 ("target/arm: Enable ARMv8.1-VHE in -cpu max") [2], which did not seem obviously broken. I noticed that our 4.14 builds were fine so I ended up doing a reverse bisect down to commit ec347012bbec ("arm64: sysreg: Move to use definitions for all the SCTLR bits"). Getting that change to apply cleanly involved applying the three other arm64 patches in this series and making it build properly required the BUILD_BUG_ON header split (including bug.h might have been sufficient but I did not want to risk any further breakage). I searched through mainline to see if there were any fixes commits that I missed and I did not find any.
I am not sure if this series would be acceptable in 4.9, hence the RFC tag. If not, I am happy to just spin down our boot tests of arm64 big endian on 4.9, which is not a super valuable target, but I figured I would send the series and let others decide!
Seems sane, having build coverage for 4.9 for the next year is always good to keep going. I'll queue these up now, thanks!
greg k-h