On Thu, 19 Jun 2025, Greg Chandler wrote:
So what I know for sure is this: The tulip driver on alpha (generic and DP264) oops/panic on physical disconnect, but only when an IP address is bound. It does not panic when no address is bound to the interface. It does not matter if the driver is compiled in, or if it is compiled as a module. It does not matter if all of the options are set for tulip or if none of them are: New bus configuration Use PCI shared mem for NIC registers Use RX polling (NAPI) Use Interrupt Mitigation The physical link does not auto-negotiate, and mii-tool does not seem to be able to force it with -F or -A like you would expect it to. The kernel does not drop the "Link is Up/Link is Down" messages when the PHY "links" The switch and interface both show LEDs as if linked at 10-Half-Duplex, and the lights turn off when the link is broken. Subsequently they do relink at 10-Half again if plugged back in. I did also attempt to test the kernel level stack for nfsroot, just to see if it worked prior to init launching everything else, and it did not. I used the same IP configuration for that test as all of the tests in these emails. All of the oops/panics seem to happen at: kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150
FYI something's changed a while ago in how `del_timer_sync' is handled and I can see a similar warning nowadays with another network driver with the MIPS platform.
Since I'm the maintainer of said driver I mean to bisect it and figure out what's going here, but haven't found time so far owing to other commitments (and the driver otherwise works just fine regardless, so it's minor annoyance). If you beat me to it, then I'll gladly accept it, but otherwise I'm just letting you know you're not alone with this issue and that it's not specific to the DEC Tulip driver on your system.
For the record:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1563 __timer_delete_sync+0x110/0x118 Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G W 6.4.0-rc3-00030-gae62c49c0cef #21 Stack : 807a0000 80095a8c 00000000 00000004 806a0000 00000009 80c09dac 807d0000 807a0000 807056ec 80769fac 807a13f3 807d30c4 1000ec00 80c09d58 80787a18 00000000 00000000 807056ec 00000000 00000001 80c09c94 00000077 34633236 20202020 00000000 807d7311 20202020 807056ec 1000ec00 00000000 00000000 806fcb60 806fcb38 807a0000 00000001 00000000 fffffffe 00000000 807d0000 ... Call Trace: [<80048ecc>] show_stack+0x2c/0xf8 [<80645c88>] dump_stack_lvl+0x34/0x4c [<80641d00>] __warn+0xb4/0xe8 [<80641d84>] warn_slowpath_fmt+0x50/0x88 [<800b177c>] __timer_delete_sync+0x110/0x118 [<8040f4b0>] fza_interrupt+0x904/0x1004 [<80098d7c>] __handle_irq_event_percpu+0x84/0x188 [<80098f1c>] handle_irq_event+0x38/0xbc [<8009d4e4>] handle_level_irq+0xc8/0x208 [<80098110>] generic_handle_irq+0x44/0x5c [<8064f450>] do_IRQ+0x1c/0x28 [<80041cf0>] dec_irq_dispatch+0x10/0x20 [<80043754>] handle_int+0x14c/0x158 [<8008bf64>] do_idle+0x5c/0x15c [<8008c368>] cpu_startup_entry+0x20/0x28 [<8064657c>] kernel_init+0x0/0x114
---[ end trace 0000000000000000 ]---
-- the arrival of this particular device state change interrupt means the timer set up just in case the device gets stuck can be deleted, so I'm not sure why calling `del_timer_sync' to discard the timer has become a no-no now; this code is 20+ years old now, though I sat on it for a while and then it took some time and effort to get it upstream too. The issue has started sometime between 5.18 (clean boot) and 6.4 (quoted above).
Maybe it'll ring someone's bell and they'll chime in or otherwise I'll bisect it... sometime. Or feel free to start yourself with 5.18, as it's not terribly old, only a bit and certainly not so as 2.6 is.
Maciej