On Tue, Jul 30, 2024 at 6:28 PM Max Kellermann max.kellermann@ionos.com wrote:
I'll let you know when problems occur later, but until then, I agree with merging your revert instead of my patches.
Not sure if that's the same bug/cause (looks different), but 6.10.2 with your patch is still unstable:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 9-.... 15-.... } 521399 jiffies s: 2085 root: 0x1/. rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x8200/. Sending NMI from CPU 3 to CPUs 9: NMI backtrace for cpu 9 CPU: 9 PID: 2756 Comm: kworker/9:2 Tainted: G D 6.10.2-cm4all2-vm+ #171 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: ceph-msgr ceph_con_workfn RIP: 0010:native_queued_spin_lock_slowpath+0x80/0x260 Code: 57 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 0f b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 5d c3 cc cc cc cc f3 90 <eb> 93 8b 37 b8 00 02 00 00 81 fe 00 01 00 00 74 07 eb a1 83 e8 01 RSP: 0018:ffffaf5880c03bb8 EFLAGS: 00000202 RAX: 0000000000000001 RBX: ffffa02bc37c9e98 RCX: ffffaf5880c03c90 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffa02bc37c9e98 RBP: ffffa02bc2f94000 R08: ffffaf5880c03c90 R09: 0000000000000010 R10: 0000000000000514 R11: 0000000000000000 R12: ffffaf5880c03c90 R13: ffffffffb4bcb2f0 R14: ffffa036c9e7e8e8 R15: ffffa02bc37c9e98 FS: 0000000000000000(0000) GS:ffffa036cf040000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fecac48568 CR3: 000000030d82c002 CR4: 00000000001706b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <NMI> ? nmi_cpu_backtrace+0x83/0xf0 ? nmi_cpu_backtrace_handler+0xd/0x20 ? nmi_handle+0x56/0x120 ? default_do_nmi+0x40/0x100 ? exc_nmi+0xdc/0x100 ? end_repeat_nmi+0xf/0x53 ? __pfx_ceph_ino_compare+0x10/0x10 ? native_queued_spin_lock_slowpath+0x80/0x260 ? native_queued_spin_lock_slowpath+0x80/0x260 ? native_queued_spin_lock_slowpath+0x80/0x260 </NMI> <TASK> ? __pfx_ceph_ino_compare+0x10/0x10 _raw_spin_lock+0x1e/0x30 find_inode+0x6e/0xc0 ? __pfx_ceph_ino_compare+0x10/0x10 ? __pfx_ceph_set_ino_cb+0x10/0x10 ilookup5_nowait+0x6d/0xa0 ? __pfx_ceph_ino_compare+0x10/0x10 iget5_locked+0x33/0xe0 ceph_get_inode+0xb8/0xf0 mds_dispatch+0xfe8/0x1ff0 ? inet_recvmsg+0x4d/0xf0 ceph_con_process_message+0x66/0x80 ceph_con_v1_try_read+0xcfc/0x17c0 ? __switch_to_asm+0x39/0x70 ? finish_task_switch.isra.0+0x78/0x240 ? __schedule+0x32a/0x1440 ceph_con_workfn+0x339/0x4f0 process_one_work+0x138/0x2e0 worker_thread+0x2b9/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xba/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK>