Greetings:
Sending via plain text email -- apologies if you receive this twice.
If this isn't the process for reporting a regression in a LTS kernel per https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html, I'm happy to follow another process.
Kernel 6.1.149 introduced a regression, at least on our ARM Cortex A57-based platforms, via commit 8f4dc4e54eed4bebb18390305eb1f721c00457e1 in arch/arm64/kernel/fpsimd.c where booting KVM VMs eventually leads to a spinlock recursion BUG and crash of the box.
Reverting that commit via the below reverts to the old (working) behavior:
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 837d1937300a57..bc42163a7fd1f0 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1851,10 +1851,10 @@ void fpsimd_save_and_flush_cpu_state(void) if (!system_supports_fpsimd()) return; WARN_ON(preemptible()); - get_cpu_fpsimd_context(); + __get_cpu_fpsimd_context(); fpsimd_save(); fpsimd_flush_cpu_state(); - put_cpu_fpsimd_context(); + __put_cpu_fpsimd_context(); } #ifdef CONFIG_KERNEL_MODE_NEON
It's not entirely clear to me if this is specific to our firmware, specific to ARM Cortex A57, or more systemic as we lack sufficiently differentiated hardware to know. I've tested on the latest 6.1 kernel in addition to the one in the log below and have also tested a number of firmware versions available for these boxes.
Steps to reproduce:
Boot VM in qemu-system-aarch64 with "-accel kvm" and "-cpu host" flags set -- no other arguments seem to matter Generate CPU load in VM
Kernel log:
[sjc1] root@si-compute-kvm-e0fff70016b4:/# [ 805.905413] BUG: spinlock recursion on CPU#7, CPU 3/KVM/57616 [ 805.905452] lock: 0xffff3045ef850240, .magic: dead4ead, .owner: CPU 3/KVM/57616, .owner_cpu: 7 [ 805.905477] CPU: 7 PID: 57616 Comm: CPU 3/KVM Tainted: G O 6.1.152 #1 [ 805.905495] Hardware name: SoftIron SoftIron Platform Mainboard/SoftIron Platform Mainboard, BIOS 1.31 May 11 2023 [ 805.905516] Call trace: [ 805.905524] dump_backtrace+0xe4/0x110 [ 805.905538] show_stack+0x20/0x30 [ 805.905548] dump_stack_lvl+0x6c/0x88 [ 805.905561] dump_stack+0x18/0x34 [ 805.905571] spin_dump+0x98/0xac [ 805.905583] do_raw_spin_lock+0x70/0x128 [ 805.905596] _raw_spin_lock+0x18/0x28 [ 805.905607] raw_spin_rq_lock_nested+0x18/0x28 [ 805.905620] update_blocked_averages+0x70/0x550 [ 805.905634] run_rebalance_domains+0x50/0x70 [ 805.905645] handle_softirqs+0x198/0x328 [ 805.905659] __do_softirq+0x1c/0x28 [ 805.905669] ____do_softirq+0x18/0x28 [ 805.905680] call_on_irq_stack+0x30/0x48 [ 805.905691] do_softirq_own_stack+0x24/0x30 [ 805.905703] do_softirq+0x74/0x90 [ 805.905714] __local_bh_enable_ip+0x64/0x80 [ 805.905727] fpsimd_save_and_flush_cpu_state+0x5c/0x68 [ 805.905740] kvm_arch_vcpu_put_fp+0x4c/0x88 [ 805.905752] kvm_arch_vcpu_put+0x28/0x88 [ 805.905764] kvm_sched_out+0x38/0x58 [ 805.905774] __schedule+0x55c/0x6c8 [ 805.905786] schedule+0x60/0xa8 [ 805.905796] kvm_vcpu_block+0x5c/0x90 [ 805.905807] kvm_vcpu_halt+0x440/0x468 [ 805.905818] kvm_vcpu_wfi+0x3c/0x70 [ 805.905828] kvm_handle_wfx+0x18c/0x1f0 [ 805.905840] handle_exit+0xb8/0x148 [ 805.905851] kvm_arch_vcpu_ioctl_run+0x6c4/0x7b0 [ 805.905863] kvm_vcpu_ioctl+0x1d0/0x8b8 [ 805.905874] __arm64_sys_ioctl+0x9c/0xe0 [ 805.905886] invoke_syscall+0x78/0x108 [ 805.905899] el0_svc_common.constprop.3+0xb4/0xf8 [ 805.905912] do_el0_svc+0x78/0x88 [ 805.905922] el0_svc+0x48/0x78 [ 805.905932] el0t_64_sync_handler+0x40/0xc0 [ 805.905943] el0t_64_sync+0x18c/0x190 [ 806.048300] hrtimer: interrupt took 2976 ns [ 826.924613] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: SoC 0 became not ready SoC 0 became ready
Thanks,
-- Kenneth Van Alstyne, Jr.