On 16 Apr 16:40, Jan Kara wrote:
Can you please run RIP through ./scripts/faddr2line to see where exactly are we looping? I expect the loop iterating over marks to notify but better be sure.
How easily can you hit this? Are you able to run debug kernels / inspect crash dumps when the issue occurs? Also testing with the latest mainline kernel (4.16) would be welcome whether this isn't just an issue with the backport of fsnotify fixes from Miklos.
I do have one proper kernel crash dump for one of the lockups we saw
PID: 30407 TASK: ffff9584913b2180 CPU: 8 COMMAND: "python" #0 [ffff959cb7883d80] machine_kexec at ffffffff890561ff #1 [ffff959cb7883dd8] __crash_kexec at ffffffff890f6dde #2 [ffff959cb7883e90] panic at ffffffff89074f03 #3 [ffff959cb7883f10] watchdog_timer_fn at ffffffff89117388 #4 [ffff959cb7883f40] __hrtimer_run_queues at ffffffff890dc65c #5 [ffff959cb7883f88] hrtimer_interrupt at ffffffff890dcb76 #6 [ffff959cb7883fd8] smp_apic_timer_interrupt at ffffffff89802f6a #7 [ffff959cb7883ff0] apic_timer_interrupt at ffffffff8980227d --- <IRQ stack> --- #8 [ffffafa5c894f880] apic_timer_interrupt at ffffffff8980227d [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffffffff8a696820 RFLAGS: 00000002 RAX: ffff95908f520c20 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff959c83c4d000 RSI: 0000000000000000 RDI: ffffafa5c894f9f8 RBP: 0000000053411000 R8: 0000000000000000 R9: ffff95908f520c48 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000 R13: 0000000000001000 R14: 0000000000001000 R15: 0000000053410000 ORIG_RAX: 0000000000000000 CS: 0000 SS: ffffffffffffff10 bt: WARNING: possibly bogus exception frame #9 [ffffafa5c894f928] fsnotify at ffffffff892293e7 #10 [ffffafa5c894f9e8] __fsnotify_parent at ffffffff89229686 #11 [ffffafa5c894fa48] __kernel_write at ffffffff891e9962 #12 [ffffafa5c894fa70] dump_emit at ffffffff892445af #13 [ffffafa5c894faa8] elf_core_dump at ffffffff8923f546 #14 [ffffafa5c894fc60] do_coredump at ffffffff89244c3f #15 [ffffafa5c894fda0] get_signal at ffffffff89083ed0 #16 [ffffafa5c894fe18] do_signal at ffffffff89028323 #17 [ffffafa5c894ff10] exit_to_usermode_loop at ffffffff8900308c #18 [ffffafa5c894ff38] prepare_exit_to_usermode at ffffffff89003753 RIP: 00007f69706935c3 RSP: 00007ffeb8c1b4a8 RFLAGS: 00010206 RAX: 00007f686d200034 RBX: 00005591f24f0170 RCX: 00007f68cb800000 RDX: 00007f696d200000 RSI: 0000000000000061 RDI: 00007f686d200034 RBP: 00007f686d200010 R8: ffffffffffffffff R9: 00000000000000ff R10: 00000000e0a9a400 R11: 0000000000000246 R12: 0000000100000000 R13: 0000000100000000 R14: 0000000000000000 R15: 0000000000000083 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
faddr2line gives "fsnotify at fs/notify/fsnotify.c:368" (it's a 4.14.22). So it does seem that you were right about the location.
This happens with systemd handling coredumps. It's using fsnotify to learn about new dumps.
Note that on this machine, the dumps are on a loop mount: /dev/loop0 /usr/cores ext4 rw,relatime,data=ordered 0 0