This is a from-scratch build (non-vendor/non-distribution) Host/Target = alpha ev6 Kernel source = 6.12.12
My last working kernel on this was a 2.6.x, it's been a while since I've had time to bring this system up to date, so I don't know when this may have started. I had a 3.0.102 in there, but I didn't test the networking while using it.
Please let me know what I can do to help out with figuring this one out.
The kernel output: [ 0.692382] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [ 0.710937] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30 [ 0.989257] e1000 0000:00:10.0 eth2: (PCI:33MHz:64-bit) 00:02:b3:f3:e6:3d [ 0.990233] e1000 0000:00:10.0 eth2: Intel(R) PRO/1000 Network Connection [ 6.088864] tulip 0000:00:09.0 eth126: renamed from eth0 [ 6.103512] tulip 0000:00:09.0 rename_eth126: renamed from eth126 [ 6.164059] e1000 0000:00:10.0 eth124: renamed from eth2 [ 6.172848] e1000 0000:00:10.0 eth0: renamed from eth124 [ 6.207028] tulip 0000:00:09.0 eth2: renamed from rename_eth126 [ 18.957998] net eth2: Using user-specified media 10baseT-FDX [ 19.082021] net eth2: 21143 10baseT link beat good
I attempted to set the interface to 100MB/FDX with mii-tool but didn't seem to be having any luck, so I disconnected the cord, and it dropped this immediately:
[ 195.170798] ------------[ cut here ]------------ [ 195.170798] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1657 __timer_delete_sync+0x104/0x120 [ 195.170798] Modules linked in: loop [ 195.170798] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.12 #4 [ 195.170798] fffffc0000c83ab0 fffffc0000388c94 fffffc0000326744 fffffc0000afbee8 [ 195.170798] 0000000000000000 fffffc0000388c94 fffffc00009e0d70 0000000000000000 [ 195.170798] fffffc0000afbee8 0000000000000679 fffffc0000388c94 0000000000000009 [ 195.170798] fffffc0000cf9100 0000000000000000 fffffc0000388c94 0000000000000000 [ 195.170798] fffffc00020e6000 fffffc00020e73d0 fffffffff0669000 fffffd000a120000 [ 195.170798] fffffc00007cff70 fffffc00024b5c00 0000000000000000 0000000000000122 [ 195.170798] Trace: [ 195.170798] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 195.170798] [<fffffc0000326744>] __warn+0x194/0x1a0 [ 195.170798] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 195.170798] [<fffffc00009e0d70>] warn_slowpath_fmt+0x84/0xf0 [ 195.170798] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 195.170798] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 195.170798] [<fffffc00007cff70>] usb_hcd_poll_rh_status+0x140/0x1a0 [ 195.170798] [<fffffc0000919a2c>] tcp_orphan_update+0x6c/0x90 [ 195.170798] [<fffffc000038be64>] timekeeping_update+0xd4/0x290 [ 195.170798] [<fffffc0000780fdc>] t21142_lnk_change+0x1bc/0x790 [ 195.170798] [<fffffc000077a890>] tulip_interrupt+0x280/0xac0 [ 195.170798] [<fffffc0000372b90>] __handle_irq_event_percpu+0x60/0x180 [ 195.170798] [<fffffc0000372d30>] handle_irq_event_percpu+0x80/0xa0 [ 195.170798] [<fffffc0000372d98>] handle_irq_event+0x48/0xe0 [ 195.170798] [<fffffc0000377af0>] handle_level_irq+0xc0/0x1f0 [ 195.170798] [<fffffc0000315300>] handle_irq+0x70/0xe0 [ 195.170798] [<fffffc000031d6c0>] dp264_srm_device_interrupt+0x30/0x50 [ 195.170798] [<fffffc00003153dc>] do_entInt+0x6c/0x1c0 [ 195.170798] [<fffffc0000310cc0>] ret_from_sys_call+0x0/0x10 [ 195.170798] [<fffffc000035c69c>] pick_task_fair+0x3c/0x100 [ 195.170798] [<fffffc000035d89c>] task_non_contending+0x6c/0x2a0 [ 195.170798] [<fffffc000035fc28>] do_idle+0x58/0x190 [ 195.170798] [<fffffc00009eda20>] cpu_idle_poll.isra.0+0x0/0x60 [ 195.170798] [<fffffc00009eda50>] cpu_idle_poll.isra.0+0x30/0x60 [ 195.170798] [<fffffc0000360058>] cpu_startup_entry+0x38/0x50 [ 195.170798] [<fffffc00009edbc8>] rest_init+0xe8/0xec [ 195.170798] [<fffffc000031001c>] _stext+0x1c/0x20 [ 195.170798] [<fffffc0000312460>] common_shutdown_1+0x0/0x150
[ 195.170798] ---[ end trace 0000000000000000 ]---
Kernel options that may be relevant: CONFIG_ALPHA_DP264=y CONFIG_NET_TULIP=y CONFIG_TULIP=y CONFIG_TULIP_MWI=y CONFIG_TULIP_MMIO=y CONFIG_TULIP_NAPI=y CONFIG_TULIP_NAPI_HW_MITIGATION=y
Device info: root@bigbang:~# lspci -vvv |more 00:09.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 29 Region 0: I/O ports at 8400 [size=128] Region 1: Memory at 0a120000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a000000 [disabled] [size=256K] Kernel driver in use: tulip
00:0b.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 30 Region 0: I/O ports at 8480 [size=128] Region 1: Memory at 0a121000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a040000 [disabled] [size=256K] Kernel driver in use: tulip
Howdy!
On 6/9/25 15:43, Greg Chandler wrote:
This is a from-scratch build (non-vendor/non-distribution) Host/Target = alpha ev6 Kernel source = 6.12.12
My last working kernel on this was a 2.6.x, it's been a while since I've had time to bring this system up to date, so I don't know when this may have started. I had a 3.0.102 in there, but I didn't test the networking while using it.
Please let me know what I can do to help out with figuring this one out.
I don't have an Alpha machine to try this on, but I do have a functional Cobalt Qube2 (MIPS 32/64) with these adapters connected directly over PCI:
00:07.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (5000ns min, 10000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at 1000 [size=128] Region 1: Memory at 12082000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 12000000 [disabled] [size=256K] Kernel driver in use: tulip
00:0c.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (5000ns min, 10000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 20 Region 0: I/O ports at 1080 [size=128] Region 1: Memory at 12082400 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 12040000 [disabled] [size=256K] Kernel driver in use: tulip
the machine is not currently on a switch that I can control, but I can certainly try to plug in the cable and see what happens, give me a couple of days to get back to you, and if you don't hear back, please holler. Here are the bits of kernel configuration:
CONFIG_NET_TULIP=y CONFIG_TULIP=y # CONFIG_TULIP_MWI is not set # CONFIG_TULIP_MMIO is not set # CONFIG_TULIP_NAPI is not set
Thanks! I appreciate you getting back to me. I've got about 30 huge bugs I am shaking down, and this one just cropped up, so I haven't been able to put a lot of time into it yet. I rolled a full debug kernel last night to troubleshoot what appears to be a spinlock/mutex issue with something else, but I'm sure it'll help with this too.
If I find anything out I will also reply with the details....
On 2025/06/10 09:27, Florian Fainelli wrote:
Howdy!
On 6/9/25 15:43, Greg Chandler wrote:
This is a from-scratch build (non-vendor/non-distribution) Host/Target = alpha ev6 Kernel source = 6.12.12
My last working kernel on this was a 2.6.x, it's been a while since I've had time to bring this system up to date, so I don't know when this may have started. I had a 3.0.102 in there, but I didn't test the networking while using it.
Please let me know what I can do to help out with figuring this one out.
I don't have an Alpha machine to try this on, but I do have a functional Cobalt Qube2 (MIPS 32/64) with these adapters connected directly over PCI:
00:07.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (5000ns min, 10000ns max), Cache Line Size: 32
bytes Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at 1000 [size=128] Region 1: Memory at 12082000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 12000000 [disabled] [size=256K] Kernel driver in use: tulip
00:0c.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (5000ns min, 10000ns max), Cache Line Size: 32
bytes Interrupt: pin A routed to IRQ 20 Region 0: I/O ports at 1080 [size=128] Region 1: Memory at 12082400 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 12040000 [disabled] [size=256K] Kernel driver in use: tulip
the machine is not currently on a switch that I can control, but I can certainly try to plug in the cable and see what happens, give me a couple of days to get back to you, and if you don't hear back, please holler. Here are the bits of kernel configuration:
CONFIG_NET_TULIP=y CONFIG_TULIP=y # CONFIG_TULIP_MWI is not set # CONFIG_TULIP_MMIO is not set # CONFIG_TULIP_NAPI is not set
I decided to test this again before I got sidetracked on my bigger issue. The kernel I repored this on was 6.12.12 on alpha, this is also that same version, but with a make distclean, and just about every single debug option turned on.
I left the last line of the kernel boot in this output as well, showing "link beat good"
I pulled the plug and it happened again immediately. I waited 10 sec, and plugged it back in, and I do not get a "link up" type message that I would expect to see.
This should have popped up between: [ 1088.732841] ---[ end trace 0000000000000000 ]---
and this one: [ 1133.693755] ------------[ cut here ]------------
I waited for the switch to negotiate, which it only does at 10 half for this. I waited a few seconds more and pulled the plug again and got the second one:
[ 18.469717] net eth2: 21143 10baseT link beat good [ 1088.732841] ------------[ cut here ]------------ [ 1088.732841] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1657 __timer_delete_sync+0x104/0x120 [ 1088.732841] Modules linked in: loop [ 1088.732841] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.12 #4 [ 1088.732841] fffffc0000c83ab0 fffffc0000388c94 fffffc0000326744 fffffc0000afbee8 [ 1088.732841] 0000000000000000 fffffc0000388c94 fffffc00009e0d70 0000000000000000 [ 1088.732841] fffffc0000afbee8 0000000000000679 fffffc0000388c94 0000000000000009 [ 1088.732841] fffffc0000cf9100 000000011f821600 fffffc0000388c94 0000000000000000 [ 1088.732841] fffffc00020e6000 fffffc00020e73d0 fffffffff0669000 fffffd000a120000 [ 1088.732841] fffffc0000358f10 fffffc000203e180 0000000000000008 fffffc000203e6b0 [ 1088.732841] Trace: [ 1088.732841] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1088.732841] [<fffffc0000326744>] __warn+0x194/0x1a0 [ 1088.732841] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1088.732841] [<fffffc00009e0d70>] warn_slowpath_fmt+0x84/0xf0 [ 1088.732841] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1088.732841] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1088.732841] [<fffffc0000358f10>] try_to_wake_up+0x170/0x2d0 [ 1088.732841] [<fffffc0000358bb8>] wakeup_preempt+0x68/0xd0 [ 1088.732841] [<fffffc0000919a2c>] tcp_orphan_update+0x6c/0x90 [ 1088.732841] [<fffffc000038be64>] timekeeping_update+0xd4/0x290 [ 1088.732841] [<fffffc0000780fdc>] t21142_lnk_change+0x1bc/0x790 [ 1088.732841] [<fffffc000077a890>] tulip_interrupt+0x280/0xac0 [ 1088.732841] [<fffffc0000372b90>] __handle_irq_event_percpu+0x60/0x180 [ 1088.732841] [<fffffc0000372d30>] handle_irq_event_percpu+0x80/0xa0 [ 1088.732841] [<fffffc0000372d98>] handle_irq_event+0x48/0xe0 [ 1088.732841] [<fffffc0000377af0>] handle_level_irq+0xc0/0x1f0 [ 1088.732841] [<fffffc0000315300>] handle_irq+0x70/0xe0 [ 1088.732841] [<fffffc000031d6c0>] dp264_srm_device_interrupt+0x30/0x50 [ 1088.732841] [<fffffc00003153dc>] do_entInt+0x6c/0x1c0 [ 1088.732841] [<fffffc0000310cc0>] ret_from_sys_call+0x0/0x10 [ 1088.732841] [<fffffc000035c69c>] pick_task_fair+0x3c/0x100 [ 1088.732841] [<fffffc000035d89c>] task_non_contending+0x6c/0x2a0 [ 1088.732841] [<fffffc000035fc28>] do_idle+0x58/0x190 [ 1088.732841] [<fffffc00009eda20>] cpu_idle_poll.isra.0+0x0/0x60 [ 1088.732841] [<fffffc00009eda60>] cpu_idle_poll.isra.0+0x40/0x60 [ 1088.732841] [<fffffc0000360058>] cpu_startup_entry+0x38/0x50 [ 1088.732841] [<fffffc00009edbc8>] rest_init+0xe8/0xec [ 1088.732841] [<fffffc000031001c>] _stext+0x1c/0x20
[ 1088.732841] ---[ end trace 0000000000000000 ]--- [ 1133.693755] ------------[ cut here ]------------ [ 1133.693755] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1657 __timer_delete_sync+0x104/0x120 [ 1133.693755] Modules linked in: loop [ 1133.693755] CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G W 6.12.12 #4 [ 1133.693755] Tainted: [W]=WARN [ 1133.693755] fffffc0000c83ab0 fffffc0000388c94 fffffc0000326744 fffffc0000afbee8 [ 1133.693755] 0000000000000000 fffffc0000388c94 fffffc00009e0d70 0000000000000000 [ 1133.693755] fffffc0000afbee8 0000000000000679 fffffc0000388c94 0000000000000009 [ 1133.693755] fffffc0000cf9100 000000011f821600 fffffc0000388c94 0000000000000000 [ 1133.693755] fffffc00020e6000 fffffc00020e73d0 fffffffff8668000 fffffd000a120000 [ 1133.693755] fffffc0000358f10 fffffc000203e180 0000000000000008 fffffc000203e6b0 [ 1133.693755] Trace: [ 1133.693755] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1133.693755] [<fffffc0000326744>] __warn+0x194/0x1a0 [ 1133.693755] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1133.693755] [<fffffc00009e0d70>] warn_slowpath_fmt+0x84/0xf0 [ 1133.693755] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1133.693755] [<fffffc0000388c94>] __timer_delete_sync+0x104/0x120 [ 1133.693755] [<fffffc0000358f10>] try_to_wake_up+0x170/0x2d0 [ 1133.693755] [<fffffc0000358bb8>] wakeup_preempt+0x68/0xd0 [ 1133.693755] [<fffffc0000919a2c>] tcp_orphan_update+0x6c/0x90 [ 1133.693755] [<fffffc000038be64>] timekeeping_update+0xd4/0x290 [ 1133.693755] [<fffffc0000780fdc>] t21142_lnk_change+0x1bc/0x790 [ 1133.693755] [<fffffc000077a890>] tulip_interrupt+0x280/0xac0 [ 1133.693755] [<fffffc0000372b90>] __handle_irq_event_percpu+0x60/0x180 [ 1133.693755] [<fffffc0000372d30>] handle_irq_event_percpu+0x80/0xa0 [ 1133.693755] [<fffffc0000372d98>] handle_irq_event+0x48/0xe0 [ 1133.693755] [<fffffc0000377af0>] handle_level_irq+0xc0/0x1f0 [ 1133.693755] [<fffffc0000315300>] handle_irq+0x70/0xe0 [ 1133.693755] [<fffffc000031d6c0>] dp264_srm_device_interrupt+0x30/0x50 [ 1133.693755] [<fffffc00003153dc>] do_entInt+0x6c/0x1c0 [ 1133.693755] [<fffffc0000310cc0>] ret_from_sys_call+0x0/0x10 [ 1133.693755] [<fffffc000035c69c>] pick_task_fair+0x3c/0x100 [ 1133.693755] [<fffffc000035d89c>] task_non_contending+0x6c/0x2a0 [ 1133.693755] [<fffffc000035fc28>] do_idle+0x58/0x190 [ 1133.693755] [<fffffc00009eda20>] cpu_idle_poll.isra.0+0x0/0x60 [ 1133.693755] [<fffffc00009eda50>] cpu_idle_poll.isra.0+0x30/0x60 [ 1133.693755] [<fffffc0000360058>] cpu_startup_entry+0x38/0x50 [ 1133.693755] [<fffffc00009edbc8>] rest_init+0xe8/0xec [ 1133.693755] [<fffffc000031001c>] _stext+0x1c/0x20
[ 1133.693755] ---[ end trace 0000000000000000 ]---
After this, even though the link is shown at the switch mii-tool confirms: root@bigbang:~# mii-tool eth2 eth2: no link
Which I think leans towards it not showing the link up message. I think the driver croaked which is a pain as it's not a module. I can recompile to test that if someone thinks that would be helpful.
root@bigbang:~# dmesg |grep eth [ 8.150386] net eth0: Digital DS21142/43 Tulip rev 65 at Port 0x8400, 08:00:2b:86:ab:b1, IRQ 29 [ 8.170894] net eth1: Digital DS21142/43 Tulip rev 65 at Port 0x8480, 08:00:2b:86:a8:5b, IRQ 30 [ 8.809565] e1000 0000:00:10.0 eth2: (PCI:33MHz:64-bit) 00:02:b3:f3:e6:3d [ 8.811518] e1000 0000:00:10.0 eth2: Intel(R) PRO/1000 Network Connection [ 8.813472] usbcore: registered new interface driver kaweth [ 25.848619] tulip 0000:00:09.0 eth126: renamed from eth0 [ 26.400377] tulip 0000:00:09.0 rename_eth126: renamed from eth126 [ 26.861314] e1000 0000:00:10.0 eth124: renamed from eth2 [ 26.887681] e1000 0000:00:10.0 eth0: renamed from eth124 [ 27.122056] tulip 0000:00:09.0 eth2: renamed from rename_eth126 [ 68.313441] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb3862002) [ 68.791956] tulip 0000:00:09.0 eth2: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb3862002)
root@bigbang:~# mii-tool eth2 eth2: no link
root@bigbang:~# mii-tool eth1 eth1: no link
root@bigbang:~# ifconfig -a eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.1.75 netmask 255.255.255.0 broadcast 192.168.1.255 ether 00:02:b3:f3:e6:3d txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.76 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::a00:2bff:fe86:a85b prefixlen 64 scopeid 0x20<link> ether 08:00:2b:86:a8:5b txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 11 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 5 dropped 0 overruns 0 carrier 15 collisions 0
eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.77 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::a00:2bff:fe86:abb1 prefixlen 64 scopeid 0x20<link> ether 08:00:2b:86:ab:b1 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 11 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 5 dropped 0 overruns 0 carrier 15 collisions 0
Clearly the interface doesn't work at all, which I hadn't noticed up until now. I have been using a USB stick to copy stuff, and a serial console for everything else.
I took a quick look at the motherboard and these are Intel 21143-TD chips vs DEC ones.
I had not noticed the "tulip_stop_rxtx" errors until now, more likely becuase I wasn't looking for them, vs them not being there.
On 2025/06/10 09:27, Florian Fainelli wrote:
Howdy!
On 6/9/25 15:43, Greg Chandler wrote:
This is a from-scratch build (non-vendor/non-distribution) Host/Target = alpha ev6 Kernel source = 6.12.12
My last working kernel on this was a 2.6.x, it's been a while since I've had time to bring this system up to date, so I don't know when this may have started. I had a 3.0.102 in there, but I didn't test the networking while using it.
Please let me know what I can do to help out with figuring this one out.
I don't have an Alpha machine to try this on, but I do have a functional Cobalt Qube2 (MIPS 32/64) with these adapters connected directly over PCI:
00:07.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (5000ns min, 10000ns max), Cache Line Size: 32
bytes Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at 1000 [size=128] Region 1: Memory at 12082000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 12000000 [disabled] [size=256K] Kernel driver in use: tulip
00:0c.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (5000ns min, 10000ns max), Cache Line Size: 32
bytes Interrupt: pin A routed to IRQ 20 Region 0: I/O ports at 1080 [size=128] Region 1: Memory at 12082400 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 12040000 [disabled] [size=256K] Kernel driver in use: tulip
the machine is not currently on a switch that I can control, but I can certainly try to plug in the cable and see what happens, give me a couple of days to get back to you, and if you don't hear back, please holler. Here are the bits of kernel configuration:
CONFIG_NET_TULIP=y CONFIG_TULIP=y # CONFIG_TULIP_MWI is not set # CONFIG_TULIP_MMIO is not set # CONFIG_TULIP_NAPI is not set
On 6/10/25 11:53, Greg Chandler wrote:
I decided to test this again before I got sidetracked on my bigger issue. The kernel I repored this on was 6.12.12 on alpha, this is also that same version, but with a make distclean, and just about every single debug option turned on.
I left the last line of the kernel boot in this output as well, showing "link beat good"
I pulled the plug and it happened again immediately. I waited 10 sec, and plugged it back in, and I do not get a "link up" type message that I would expect to see.
I was not able to reproduce this on my Cobalt Qube2 with the link being UP and then pulling the cable unfortunately, I could try other things if you want me to.
Hmm... I'm wondering if that means it's an alpha-only issue then, which would make this a much larger headache than it already is. Also thank you for checking, I appreciate you taking the time.
I assume the those interfaces actually work right? (simple ping over that interface would be enough) I posted in a subsequent message that mine do not appear to at all.
My next step is to build that driver as a module, and see if it changes anything (I'm doubting it will). Then after that go dig up a different adapter, and see if it's the network stack or the driver.
I've been hard pressed over the last week to get a lot of diagnosing time.
On 2025/06/16 12:01, Florian Fainelli wrote:
On 6/10/25 11:53, Greg Chandler wrote:
I decided to test this again before I got sidetracked on my bigger issue. The kernel I repored this on was 6.12.12 on alpha, this is also that same version, but with a make distclean, and just about every single debug option turned on.
I left the last line of the kernel boot in this output as well, showing "link beat good"
I pulled the plug and it happened again immediately. I waited 10 sec, and plugged it back in, and I do not get a "link up" type message that I would expect to see.
I was not able to reproduce this on my Cobalt Qube2 with the link being UP and then pulling the cable unfortunately, I could try other things if you want me to.
(please no top posting)
On 6/17/25 11:19, Greg Chandler wrote:
Hmm... I'm wondering if that means it's an alpha-only issue then, which would make this a much larger headache than it already is. Also thank you for checking, I appreciate you taking the time.
I assume the those interfaces actually work right? (simple ping over that interface would be enough)Â I posted in a subsequent message that mine do not appear to at all.
Oh yeah, they work just fine:
udhcpc: broadcasting discover [ 19.197697] net eth0: Setting full-duplex based on MII#1 link partner capability of cde1
# ping -c 1 192.168.254.123 PING 192.168.254.123 (192.168.254.123): 56 data bytes 64 bytes from 192.168.254.123: seq=0 ttl=64 time=2.902 ms
--- 192.168.254.123 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 2.902/2.902/2.902 ms
- - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.03 sec 39.6 MBytes 33.1 Mbits/sec 0 sender [ 5] 0.00-10.07 sec 39.8 MBytes 33.1 Mbits/sec receiver
My next step is to build that driver as a module, and see if it changes anything (I'm doubting it will). Then after that go dig up a different adapter, and see if it's the network stack or the driver.
I've been hard pressed over the last week to get a lot of diagnosing time.
Let me know if I can run experiments, I can load any kernel version on this Cobalt Qube2 meaning that bisections are possible.
Good luck!
On 2025/06/17 11:22, Florian Fainelli wrote:
(please no top posting)
On 6/17/25 11:19, Greg Chandler wrote:
Hmm... I'm wondering if that means it's an alpha-only issue then, which would make this a much larger headache than it already is. Also thank you for checking, I appreciate you taking the time.
I assume the those interfaces actually work right? (simple ping over that interface would be enough)Â I posted in a subsequent message that mine do not appear to at all.
Oh yeah, they work just fine:
udhcpc: broadcasting discover [ 19.197697] net eth0: Setting full-duplex based on MII#1 link partner capability of cde1
# ping -c 1 192.168.254.123 PING 192.168.254.123 (192.168.254.123): 56 data bytes 64 bytes from 192.168.254.123: seq=0 ttl=64 time=2.902 ms
--- 192.168.254.123 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 2.902/2.902/2.902 ms
[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.03 sec 39.6 MBytes 33.1 Mbits/sec 0 sender [ 5] 0.00-10.07 sec 39.8 MBytes 33.1 Mbits/sec receiver
My next step is to build that driver as a module, and see if it changes anything (I'm doubting it will). Then after that go dig up a different adapter, and see if it's the network stack or the driver.
I've been hard pressed over the last week to get a lot of diagnosing time.
Let me know if I can run experiments, I can load any kernel version on this Cobalt Qube2 meaning that bisections are possible.
Good luck!
I thought I replied to the whole list, not just the sender, sorry for the repeat.... (full config is attached)
As a module, the system booted up but did not probe the module. This may be from a variety of issues, not limited to the fact I am still rolling this distro from scratch, and not all of the tools are 100% working yet. I'm having some issues with gcc/gdb so I have a large number of debugging options turned on in the kernel as well.
When the module is loaded with insmod:
[ 213.363172] tulip 0000:00:09.0: vgaarb: pci_notify [ 213.363172] tulip 0000:00:09.0: assign IRQ: got 29 [ 213.363172] tulip 0000:00:09.0: enabling Mem-Wr-Inval [ 213.369031] tulip0: EEPROM default media type Autosense [ 213.369031] tulip0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [ 213.369031] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [ 213.370007] tulip0: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [ 213.370007] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [ 213.376843] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [ 213.377820] tulip 0000:00:09.0: vgaarb: pci_notify [ 213.377820] tulip 0000:00:0b.0: vgaarb: pci_notify [ 213.377820] tulip 0000:00:0b.0: assign IRQ: got 30 [ 213.377820] tulip 0000:00:0b.0: enabling Mem-Wr-Inval [ 213.384656] tulip1: EEPROM default media type Autosense [ 213.384656] tulip1: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [ 213.384656] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [ 213.384656] tulip1: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [ 213.384656] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [ 213.391492] net eth2: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30
root@bigbang:/lib/modules/6.12.12-SMP# mii-tool eth1 eth1: no link
root@bigbang:/lib/modules/6.12.12-SMP# mii-tool eth2 eth2: autonegotiation failed, link ok
When I pulled the plug I did not get the crash.. When I plugged it back in, I did not get a dmesg/kernel line for the link.
I bound the IP addresses to both, and did a ping test to the default gateway, resulting in: (I promise the network is properly configured)
root@bigbang:/lib/modules/6.12.12-SMP# ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. From 192.168.1.75 icmp_seq=1 Destination Host Unreachable From 192.168.1.75 icmp_seq=2 Destination Host Unreachable From 192.168.1.75 icmp_seq=3 Destination Host Unreachable
Upon pulling the cord out of the switch **after** the IP address was bound, we are back to this: [ 593.769227] ------------[ cut here ]------------ [ 593.769227] WARNING: CPU: 0 PID: 33 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 593.769227] Modules linked in: tulip [ 593.769227] CPU: 0 UID: 0 PID: 33 Comm: lock_torture_wr Not tainted 6.12.12-SMP #1 [ 593.769227] fffffc0002aeba40 fffffc00011b0ac0 fffffc000032a4f0 fffffc0000f8f181 [ 593.769227] 0000000000000000 0000000000000000 fffffc000032a688 fffffc000119c690 [ 593.769227] 0000000000000000 fffffc0000f8f181 fffffc00003d297c fffffc000125c7e0 [ 593.769227] fffffffc0082f1c4 00000000efe4b99a fffffc00003d297c fffffc000b03b490 [ 593.769227] fffffc000b03b490 0000000000000000 fffffffff8668000 fffffd000a120000 [ 593.769227] fffffc0000366888 fffffc000020ce00 fffffc000020ce00 0000000000000008 [ 593.769227] Trace: [ 593.769227] [<fffffc000032a4f0>] __warn+0x190/0x1a0 [ 593.769227] [<fffffc000032a688>] warn_slowpath_fmt+0x188/0x240 [ 593.769227] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 593.769227] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 593.769227] [<fffffc0000366888>] wakeup_preempt+0xb8/0xd0 [ 593.769227] [<fffffc0000366950>] ttwu_do_activate.isra.0+0xb0/0x1a0 [ 593.769227] [<fffffc0000369030>] try_to_wake_up+0x370/0x700 [ 593.769227] [<fffffc0000e29470>] _raw_spin_unlock_irqrestore+0x20/0x40 [ 593.769227] [<fffffc000036e1c4>] task_tick_fair+0x74/0x370 [ 593.769227] [<fffffc00003d46dc>] enqueue_hrtimer.isra.0+0x5c/0xc0 [ 593.769227] [<fffffc000037bc1c>] task_non_contending+0xcc/0x4f0 [ 593.769227] [<fffffc00003a1a80>] __handle_irq_event_percpu+0x60/0x190 [ 593.769227] [<fffffc000036d598>] sched_balance_update_blocked_averages+0xc8/0x2a0 [ 593.769227] [<fffffc00003a1cb8>] handle_irq_event+0x68/0x110 [ 593.769227] [<fffffc00003a8108>] handle_level_irq+0x108/0x240 [ 593.769227] [<fffffc00003158e0>] handle_irq+0x70/0xe0 [ 593.769227] [<fffffc0000320820>] dp264_srm_device_interrupt+0x30/0x50 [ 593.769227] [<fffffc0000315af4>] do_entInt+0x1a4/0x200 [ 593.769227] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 593.769227] [<fffffc0000392164>] torture_spin_lock_write_delay+0x74/0x180 [ 593.769227] [<fffffc00003d7cc8>] ktime_get+0x58/0x160 [ 593.769227] [<fffffc0000317fd0>] read_rpcc+0x0/0x10 [ 593.769227] [<fffffc0000317fd0>] read_rpcc+0x0/0x10 [ 593.769227] [<fffffc0000430aa8>] stutter_wait+0x88/0x110 [ 593.769227] [<fffffc0000391904>] lock_torture_writer+0x1d4/0x450 [ 593.769227] [<fffffc00003918cc>] lock_torture_writer+0x19c/0x450 [ 593.769227] [<fffffc000035a980>] kthread+0x150/0x190 [ 593.769227] [<fffffc0000391730>] lock_torture_writer+0x0/0x450 [ 593.769227] [<fffffc00003110d8>] ret_from_kernel_thread+0x18/0x20 [ 593.769227] [<fffffc000035a830>] kthread+0x0/0x190
[ 593.769227] ---[ end trace 0000000000000000 ]---
Adding some machine relevant stuff here (for the sake of being thorough)
The machine is a DEC DS10
root@bigbang:/lib/modules/6.12.12-SMP# lspci -vvv 00:07.0 ISA bridge: ULi Electronics Inc. M1533/M1535/M1543 PCI to ISA Bridge [Aladdin IV/V/V+] (rev c3) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort+ <MAbort+ >SERR- <PERR- INTx- Latency: 0
00:09.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 29 Region 0: I/O ports at 8400 [size=128] Region 1: Memory at 0a120000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a000000 [disabled] [size=256K] Kernel driver in use: tulip Kernel modules: tulip
00:0b.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 30 Region 0: I/O ports at 8480 [size=128] Region 1: Memory at 0a121000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a040000 [disabled] [size=256K] Kernel driver in use: tulip Kernel modules: tulip
00:0d.0 IDE interface: ULi Electronics Inc. M5229 IDE (rev c1) (prog-if f0) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 (500ns min, 1000ns max) Interrupt: pin A routed to IRQ 255 Region 0: I/O ports at 01f0 [size=8] Region 1: I/O ports at 03f4 Region 2: I/O ports at 0170 [size=8] Region 3: I/O ports at 0374 Region 4: I/O ports at 8880 [size=16] Kernel driver in use: pata_ali
00:0e.0 VGA compatible controller: Texas Instruments TVP4020 [Permedia 2] (rev 01) (prog-if 00 [VGA controller]) Subsystem: Elsa AG GLoria Synergy Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255 (48000ns min, 48000ns max) Interrupt: pin A routed to IRQ 35 Region 0: Memory at 0a080000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at 09000000 (32-bit, non-prefetchable) [size=8M] Region 2: Memory at 09800000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at 0a100000 [disabled] [size=64K] Kernel driver in use: pm2fb
00:0f.0 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. USB 1.1 UHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 39 Region 4: I/O ports at 8800 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd
00:0f.1 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. USB 1.1 UHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 38 Region 4: I/O ports at 8820 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd
00:0f.2 USB controller: VIA Technologies, Inc. USB 2.0 EHCI-Compliant Host-Controller (rev 63) (prog-if 20 [EHCI]) Subsystem: VIA Technologies, Inc. USB 2.0 EHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 255, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 37 Region 0: Memory at 0a122000 (32-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: ehci-pci
00:10.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet Controller (Fiber) (rev 02) Subsystem: Intel Corporation PRO/1000 XF Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 252 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 43 Region 0: Memory at 0a0a0000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at 0a0c0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at 8840 [size=32] Expansion ROM at 0a0e0000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [f0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Kernel driver in use: e1000
00:11.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE/SATA Controller (rev 50) Subsystem: VIA Technologies, Inc. VT6421 IDE/SATA Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 240 Interrupt: pin A routed to IRQ 47 Region 0: I/O ports at 8890 [size=16] Region 1: I/O ports at 88a0 [size=16] Region 2: I/O ports at 88b0 [size=16] Region 3: I/O ports at 88c0 [size=16] Region 4: I/O ports at 8860 [size=32] Region 5: I/O ports at 8000 [size=256] Expansion ROM at 0a110000 [disabled] [size=64K] Capabilities: [e0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: sata_via
root@bigbang:/lib/modules/6.12.12-SMP# cat /proc/cpuinfo cpu : Alpha cpu model : EV6 cpu variation : 7 cpu revision : 0 cpu serial number : system type : Tsunami system variation : Webbrick system revision : 0 system serial number : 4004DQMZ1055 cycle frequency [Hz] : 462437186 est. timer frequency [Hz] : 1024.00 page size [bytes] : 8192 phys. address bits : 44 max. addr. space # : 255 BogoMIPS : 911.32 kernel unaligned acc : 0 (pc=0,va=0) user unaligned acc : 0 (pc=0,va=0) platform string : AlphaServer DS10 466 MHz cpus detected : 1 cpus active : 1 cpu active mask : 0000000000000001 L1 Icache : 64K, 2-way, 64b line L1 Dcache : 64K, 2-way, 64b line L2 cache : 2048K, 1-way, 64b line L3 cache : n/a
I also ran this setup again, with the USB/video/Intel NIC yanked out, and it's the same... Once the IPs are bound, a link loss pops the message: (with the other PCI cards pulled)
[ 363.702938] ------------[ cut here ]------------ [ 363.702938] WARNING: CPU: 0 PID: 34 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 363.702938] Modules linked in: tulip [ 363.702938] CPU: 0 UID: 0 PID: 34 Comm: lock_torture_wr Not tainted 6.12.12-SMP #1 [ 363.702938] fffffc0002aefa70 fffffc00011b0ac0 fffffc000032a4f0 fffffc0000f8f181 [ 363.702938] 0000000000000000 0000000000000000 fffffc000032a688 fffffc000119c690 [ 363.702938] 0000000000000000 fffffc0000f8f181 fffffc00003d297c fffffc000125c7e0 [ 363.702938] fffffffc008251c4 00000000fa83b2da fffffc00003d297c fffffc0003ded490 [ 363.702938] fffffc0003ded490 0000000000000000 fffffffff8668000 fffffd000a0c0000 [ 363.702938] 00000054ae626ca0 fffffc000285a600 fffffc000036692c fffffc000285af80 [ 363.702938] Trace: [ 363.702938] [<fffffc000032a4f0>] __warn+0x190/0x1a0 [ 363.702938] [<fffffc000032a688>] warn_slowpath_fmt+0x188/0x240 [ 363.702938] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 363.702938] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 363.702938] [<fffffc000036692c>] ttwu_do_activate.isra.0+0x8c/0x1a0 [ 363.702938] [<fffffc0000369030>] try_to_wake_up+0x370/0x700 [ 363.702938] [<fffffc0000368e70>] try_to_wake_up+0x1b0/0x700 [ 363.702938] [<fffffc00003d43f8>] hrtimer_wakeup+0x28/0x40 [ 363.702938] [<fffffc00003d43d0>] hrtimer_wakeup+0x0/0x40 [ 363.702938] [<fffffc00003c31a4>] rcu_sched_clock_irq+0x714/0xea0 [ 363.702938] [<fffffc00003a1a80>] __handle_irq_event_percpu+0x60/0x190 [ 363.702938] [<fffffc00003e6758>] tick_handle_periodic+0x38/0xd0 [ 363.702938] [<fffffc00003731e8>] enqueue_task_fair+0x358/0x8b0 [ 363.702938] [<fffffc00003a1cb8>] handle_irq_event+0x68/0x110 [ 363.702938] [<fffffc00003a8108>] handle_level_irq+0x108/0x240 [ 363.702938] [<fffffc00003158e0>] handle_irq+0x70/0xe0 [ 363.702938] [<fffffc0000320820>] dp264_srm_device_interrupt+0x30/0x50 [ 363.702938] [<fffffc0000372538>] pick_task_fair+0x88/0x100 [ 363.702938] [<fffffc0000315af4>] do_entInt+0x1a4/0x200 [ 363.702938] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 363.702938] [<fffffc00003923dc>] __torture_rt_boost+0x5c/0x100 [ 363.702938] [<fffffc0000e291d8>] _raw_spin_lock+0x18/0x30 [ 363.702938] [<fffffc0000390330>] do_raw_spin_lock+0x0/0x140 [ 363.702938] [<fffffc00003903a0>] do_raw_spin_lock+0x70/0x140 [ 363.702938] [<fffffc0000e291d8>] _raw_spin_lock+0x18/0x30 [ 363.702938] [<fffffc0000390ac0>] torture_spin_lock_write_lock+0x20/0x40 [ 363.702938] [<fffffc000039183c>] lock_torture_writer+0x10c/0x450 [ 363.702938] [<fffffc000035a980>] kthread+0x150/0x190 [ 363.702938] [<fffffc0000391730>] lock_torture_writer+0x0/0x450 [ 363.702938] [<fffffc00003110d8>] ret_from_kernel_thread+0x18/0x20 [ 363.702938] [<fffffc000035a830>] kthread+0x0/0x190
[ 363.702938] ---[ end trace 0000000000000000 ]---
I'm going to try one more thing, which is compile a non-specific alpha kernel, and try it, but I don't think it'll change anything.
On 2025/06/18 13:59, Greg Chandler wrote:
On 2025/06/17 11:22, Florian Fainelli wrote:
(please no top posting)
On 6/17/25 11:19, Greg Chandler wrote:
Hmm... I'm wondering if that means it's an alpha-only issue then, which would make this a much larger headache than it already is. Also thank you for checking, I appreciate you taking the time.
I assume the those interfaces actually work right? (simple ping over that interface would be enough)Â I posted in a subsequent message that mine do not appear to at all.
Oh yeah, they work just fine:
udhcpc: broadcasting discover [ 19.197697] net eth0: Setting full-duplex based on MII#1 link partner capability of cde1
# ping -c 1 192.168.254.123 PING 192.168.254.123 (192.168.254.123): 56 data bytes 64 bytes from 192.168.254.123: seq=0 ttl=64 time=2.902 ms
--- 192.168.254.123 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 2.902/2.902/2.902 ms
[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.03 sec 39.6 MBytes 33.1 Mbits/sec 0 sender [ 5] 0.00-10.07 sec 39.8 MBytes 33.1 Mbits/sec receiver
My next step is to build that driver as a module, and see if it changes anything (I'm doubting it will). Then after that go dig up a different adapter, and see if it's the network stack or the driver.
I've been hard pressed over the last week to get a lot of diagnosing time.
Let me know if I can run experiments, I can load any kernel version on this Cobalt Qube2 meaning that bisections are possible.
Good luck!
I thought I replied to the whole list, not just the sender, sorry for the repeat.... (full config is attached)
As a module, the system booted up but did not probe the module. This may be from a variety of issues, not limited to the fact I am still rolling this distro from scratch, and not all of the tools are 100% working yet. I'm having some issues with gcc/gdb so I have a large number of debugging options turned on in the kernel as well.
When the module is loaded with insmod:
[ 213.363172] tulip 0000:00:09.0: vgaarb: pci_notify [ 213.363172] tulip 0000:00:09.0: assign IRQ: got 29 [ 213.363172] tulip 0000:00:09.0: enabling Mem-Wr-Inval [ 213.369031] tulip0: EEPROM default media type Autosense [ 213.369031] tulip0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [ 213.369031] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [ 213.370007] tulip0: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [ 213.370007] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [ 213.376843] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [ 213.377820] tulip 0000:00:09.0: vgaarb: pci_notify [ 213.377820] tulip 0000:00:0b.0: vgaarb: pci_notify [ 213.377820] tulip 0000:00:0b.0: assign IRQ: got 30 [ 213.377820] tulip 0000:00:0b.0: enabling Mem-Wr-Inval [ 213.384656] tulip1: EEPROM default media type Autosense [ 213.384656] tulip1: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [ 213.384656] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [ 213.384656] tulip1: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [ 213.384656] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [ 213.391492] net eth2: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30
root@bigbang:/lib/modules/6.12.12-SMP# mii-tool eth1 eth1: no link
root@bigbang:/lib/modules/6.12.12-SMP# mii-tool eth2 eth2: autonegotiation failed, link ok
When I pulled the plug I did not get the crash.. When I plugged it back in, I did not get a dmesg/kernel line for the link.
I bound the IP addresses to both, and did a ping test to the default gateway, resulting in: (I promise the network is properly configured)
root@bigbang:/lib/modules/6.12.12-SMP# ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. From 192.168.1.75 icmp_seq=1 Destination Host Unreachable From 192.168.1.75 icmp_seq=2 Destination Host Unreachable From 192.168.1.75 icmp_seq=3 Destination Host Unreachable
Upon pulling the cord out of the switch **after** the IP address was bound, we are back to this: [ 593.769227] ------------[ cut here ]------------ [ 593.769227] WARNING: CPU: 0 PID: 33 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 593.769227] Modules linked in: tulip [ 593.769227] CPU: 0 UID: 0 PID: 33 Comm: lock_torture_wr Not tainted 6.12.12-SMP #1 [ 593.769227] fffffc0002aeba40 fffffc00011b0ac0 fffffc000032a4f0 fffffc0000f8f181 [ 593.769227] 0000000000000000 0000000000000000 fffffc000032a688 fffffc000119c690 [ 593.769227] 0000000000000000 fffffc0000f8f181 fffffc00003d297c fffffc000125c7e0 [ 593.769227] fffffffc0082f1c4 00000000efe4b99a fffffc00003d297c fffffc000b03b490 [ 593.769227] fffffc000b03b490 0000000000000000 fffffffff8668000 fffffd000a120000 [ 593.769227] fffffc0000366888 fffffc000020ce00 fffffc000020ce00 0000000000000008 [ 593.769227] Trace: [ 593.769227] [<fffffc000032a4f0>] __warn+0x190/0x1a0 [ 593.769227] [<fffffc000032a688>] warn_slowpath_fmt+0x188/0x240 [ 593.769227] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 593.769227] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 593.769227] [<fffffc0000366888>] wakeup_preempt+0xb8/0xd0 [ 593.769227] [<fffffc0000366950>] ttwu_do_activate.isra.0+0xb0/0x1a0 [ 593.769227] [<fffffc0000369030>] try_to_wake_up+0x370/0x700 [ 593.769227] [<fffffc0000e29470>] _raw_spin_unlock_irqrestore+0x20/0x40 [ 593.769227] [<fffffc000036e1c4>] task_tick_fair+0x74/0x370 [ 593.769227] [<fffffc00003d46dc>] enqueue_hrtimer.isra.0+0x5c/0xc0 [ 593.769227] [<fffffc000037bc1c>] task_non_contending+0xcc/0x4f0 [ 593.769227] [<fffffc00003a1a80>] __handle_irq_event_percpu+0x60/0x190 [ 593.769227] [<fffffc000036d598>] sched_balance_update_blocked_averages+0xc8/0x2a0 [ 593.769227] [<fffffc00003a1cb8>] handle_irq_event+0x68/0x110 [ 593.769227] [<fffffc00003a8108>] handle_level_irq+0x108/0x240 [ 593.769227] [<fffffc00003158e0>] handle_irq+0x70/0xe0 [ 593.769227] [<fffffc0000320820>] dp264_srm_device_interrupt+0x30/0x50 [ 593.769227] [<fffffc0000315af4>] do_entInt+0x1a4/0x200 [ 593.769227] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 593.769227] [<fffffc0000392164>] torture_spin_lock_write_delay+0x74/0x180 [ 593.769227] [<fffffc00003d7cc8>] ktime_get+0x58/0x160 [ 593.769227] [<fffffc0000317fd0>] read_rpcc+0x0/0x10 [ 593.769227] [<fffffc0000317fd0>] read_rpcc+0x0/0x10 [ 593.769227] [<fffffc0000430aa8>] stutter_wait+0x88/0x110 [ 593.769227] [<fffffc0000391904>] lock_torture_writer+0x1d4/0x450 [ 593.769227] [<fffffc00003918cc>] lock_torture_writer+0x19c/0x450 [ 593.769227] [<fffffc000035a980>] kthread+0x150/0x190 [ 593.769227] [<fffffc0000391730>] lock_torture_writer+0x0/0x450 [ 593.769227] [<fffffc00003110d8>] ret_from_kernel_thread+0x18/0x20 [ 593.769227] [<fffffc000035a830>] kthread+0x0/0x190
[ 593.769227] ---[ end trace 0000000000000000 ]---
Adding some machine relevant stuff here (for the sake of being thorough)
The machine is a DEC DS10
root@bigbang:/lib/modules/6.12.12-SMP# lspci -vvv 00:07.0 ISA bridge: ULi Electronics Inc. M1533/M1535/M1543 PCI to ISA Bridge [Aladdin IV/V/V+] (rev c3) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort+ <MAbort+ >SERR- <PERR- INTx-
Latency: 0
00:09.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64
bytes Interrupt: pin A routed to IRQ 29 Region 0: I/O ports at 8400 [size=128] Region 1: Memory at 0a120000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a000000 [disabled] [size=256K] Kernel driver in use: tulip Kernel modules: tulip
00:0b.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64
bytes Interrupt: pin A routed to IRQ 30 Region 0: I/O ports at 8480 [size=128] Region 1: Memory at 0a121000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a040000 [disabled] [size=256K] Kernel driver in use: tulip Kernel modules: tulip
00:0d.0 IDE interface: ULi Electronics Inc. M5229 IDE (rev c1) (prog-if f0) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (500ns min, 1000ns max) Interrupt: pin A routed to IRQ 255 Region 0: I/O ports at 01f0 [size=8] Region 1: I/O ports at 03f4 Region 2: I/O ports at 0170 [size=8] Region 3: I/O ports at 0374 Region 4: I/O ports at 8880 [size=16] Kernel driver in use: pata_ali
00:0e.0 VGA compatible controller: Texas Instruments TVP4020 [Permedia 2] (rev 01) (prog-if 00 [VGA controller]) Subsystem: Elsa AG GLoria Synergy Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (48000ns min, 48000ns max) Interrupt: pin A routed to IRQ 35 Region 0: Memory at 0a080000 (32-bit, non-prefetchable)
[size=128K] Region 1: Memory at 09000000 (32-bit, non-prefetchable) [size=8M] Region 2: Memory at 09800000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at 0a100000 [disabled] [size=64K] Kernel driver in use: pm2fb
00:0f.0 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. USB 1.1 UHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 39 Region 4: I/O ports at 8800 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd
00:0f.1 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. USB 1.1 UHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 38 Region 4: I/O ports at 8820 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd
00:0f.2 USB controller: VIA Technologies, Inc. USB 2.0 EHCI-Compliant Host-Controller (rev 63) (prog-if 20 [EHCI]) Subsystem: VIA Technologies, Inc. USB 2.0 EHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 37 Region 0: Memory at 0a122000 (32-bit, non-prefetchable)
[size=256] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: ehci-pci
00:10.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet Controller (Fiber) (rev 02) Subsystem: Intel Corporation PRO/1000 XF Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 252 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 43 Region 0: Memory at 0a0a0000 (32-bit, non-prefetchable)
[size=128K] Region 1: Memory at 0a0c0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at 8840 [size=32] Expansion ROM at 0a0e0000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [f0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Kernel driver in use: e1000
00:11.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE/SATA Controller (rev 50) Subsystem: VIA Technologies, Inc. VT6421 IDE/SATA Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 240 Interrupt: pin A routed to IRQ 47 Region 0: I/O ports at 8890 [size=16] Region 1: I/O ports at 88a0 [size=16] Region 2: I/O ports at 88b0 [size=16] Region 3: I/O ports at 88c0 [size=16] Region 4: I/O ports at 8860 [size=32] Region 5: I/O ports at 8000 [size=256] Expansion ROM at 0a110000 [disabled] [size=64K] Capabilities: [e0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: sata_via
root@bigbang:/lib/modules/6.12.12-SMP# cat /proc/cpuinfo cpu : Alpha cpu model : EV6 cpu variation : 7 cpu revision : 0 cpu serial number : system type : Tsunami system variation : Webbrick system revision : 0 system serial number : 4004DQMZ1055 cycle frequency [Hz] : 462437186 est. timer frequency [Hz] : 1024.00 page size [bytes] : 8192 phys. address bits : 44 max. addr. space # : 255 BogoMIPS : 911.32 kernel unaligned acc : 0 (pc=0,va=0) user unaligned acc : 0 (pc=0,va=0) platform string : AlphaServer DS10 466 MHz cpus detected : 1 cpus active : 1 cpu active mask : 0000000000000001 L1 Icache : 64K, 2-way, 64b line L1 Dcache : 64K, 2-way, 64b line L2 cache : 2048K, 1-way, 64b line L3 cache : n/a
I also ran this setup again, with the USB/video/Intel NIC yanked out, and it's the same... Once the IPs are bound, a link loss pops the message: (with the other PCI cards pulled)
[ 363.702938] ------------[ cut here ]------------ [ 363.702938] WARNING: CPU: 0 PID: 34 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 363.702938] Modules linked in: tulip [ 363.702938] CPU: 0 UID: 0 PID: 34 Comm: lock_torture_wr Not tainted 6.12.12-SMP #1 [ 363.702938] fffffc0002aefa70 fffffc00011b0ac0 fffffc000032a4f0 fffffc0000f8f181 [ 363.702938] 0000000000000000 0000000000000000 fffffc000032a688 fffffc000119c690 [ 363.702938] 0000000000000000 fffffc0000f8f181 fffffc00003d297c fffffc000125c7e0 [ 363.702938] fffffffc008251c4 00000000fa83b2da fffffc00003d297c fffffc0003ded490 [ 363.702938] fffffc0003ded490 0000000000000000 fffffffff8668000 fffffd000a0c0000 [ 363.702938] 00000054ae626ca0 fffffc000285a600 fffffc000036692c fffffc000285af80 [ 363.702938] Trace: [ 363.702938] [<fffffc000032a4f0>] __warn+0x190/0x1a0 [ 363.702938] [<fffffc000032a688>] warn_slowpath_fmt+0x188/0x240 [ 363.702938] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 363.702938] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 363.702938] [<fffffc000036692c>] ttwu_do_activate.isra.0+0x8c/0x1a0 [ 363.702938] [<fffffc0000369030>] try_to_wake_up+0x370/0x700 [ 363.702938] [<fffffc0000368e70>] try_to_wake_up+0x1b0/0x700 [ 363.702938] [<fffffc00003d43f8>] hrtimer_wakeup+0x28/0x40 [ 363.702938] [<fffffc00003d43d0>] hrtimer_wakeup+0x0/0x40 [ 363.702938] [<fffffc00003c31a4>] rcu_sched_clock_irq+0x714/0xea0 [ 363.702938] [<fffffc00003a1a80>] __handle_irq_event_percpu+0x60/0x190 [ 363.702938] [<fffffc00003e6758>] tick_handle_periodic+0x38/0xd0 [ 363.702938] [<fffffc00003731e8>] enqueue_task_fair+0x358/0x8b0 [ 363.702938] [<fffffc00003a1cb8>] handle_irq_event+0x68/0x110 [ 363.702938] [<fffffc00003a8108>] handle_level_irq+0x108/0x240 [ 363.702938] [<fffffc00003158e0>] handle_irq+0x70/0xe0 [ 363.702938] [<fffffc0000320820>] dp264_srm_device_interrupt+0x30/0x50 [ 363.702938] [<fffffc0000372538>] pick_task_fair+0x88/0x100 [ 363.702938] [<fffffc0000315af4>] do_entInt+0x1a4/0x200 [ 363.702938] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 363.702938] [<fffffc00003923dc>] __torture_rt_boost+0x5c/0x100 [ 363.702938] [<fffffc0000e291d8>] _raw_spin_lock+0x18/0x30 [ 363.702938] [<fffffc0000390330>] do_raw_spin_lock+0x0/0x140 [ 363.702938] [<fffffc00003903a0>] do_raw_spin_lock+0x70/0x140 [ 363.702938] [<fffffc0000e291d8>] _raw_spin_lock+0x18/0x30 [ 363.702938] [<fffffc0000390ac0>] torture_spin_lock_write_lock+0x20/0x40 [ 363.702938] [<fffffc000039183c>] lock_torture_writer+0x10c/0x450 [ 363.702938] [<fffffc000035a980>] kthread+0x150/0x190 [ 363.702938] [<fffffc0000391730>] lock_torture_writer+0x0/0x450 [ 363.702938] [<fffffc00003110d8>] ret_from_kernel_thread+0x18/0x20 [ 363.702938] [<fffffc000035a830>] kthread+0x0/0x190
[ 363.702938] ---[ end trace 0000000000000000 ]---
I'm going to try one more thing, which is compile a non-specific alpha kernel, and try it, but I don't think it'll change anything.
With the recompiled (fully from scratch) kernel with generic alpha vs DP264 I am still seeing the same thing. What is odd is that I see "dp264_srm_device_interrupt" in the error though... Next step is to make sure it's the tulip driver itself and not the network stack. I can't use the i1000 card at the moment, so I need to find another PCI nic to test.
root@bigbang:~# zcat /proc/config.gz |grep CONFIG_ALPHA CONFIG_ALPHA=y CONFIG_ALPHA_GENERIC=y # CONFIG_ALPHA_ALCOR is not set # CONFIG_ALPHA_DP264 is not set # CONFIG_ALPHA_EIGER is not set # CONFIG_ALPHA_LX164 is not set # CONFIG_ALPHA_MARVEL is not set # CONFIG_ALPHA_MIATA is not set # CONFIG_ALPHA_MIKASA is not set # CONFIG_ALPHA_NAUTILUS is not set # CONFIG_ALPHA_NORITAKE is not set # CONFIG_ALPHA_PC164 is not set # CONFIG_ALPHA_RAWHIDE is not set # CONFIG_ALPHA_RUFFIAN is not set # CONFIG_ALPHA_RX164 is not set # CONFIG_ALPHA_SX164 is not set # CONFIG_ALPHA_SABLE is not set # CONFIG_ALPHA_SHARK is not set # CONFIG_ALPHA_TAKARA is not set # CONFIG_ALPHA_TITAN is not set # CONFIG_ALPHA_WILDFIRE is not set CONFIG_ALPHA_BROKEN_IRQ_MASK=y # CONFIG_ALPHA_WTINT is not set CONFIG_ALPHA_LEGACY_START_ADDRESS=y
[ 204.662981] ------------[ cut here ]------------ [ 204.662981] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 204.662981] Modules linked in: tulip [ 204.662981] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.12-SMP #1 [ 204.662981] fffffc00011c7a20 fffffc0008b62000 fffffc00003314e0 fffffc0000fa108f [ 204.662981] 0000000000000000 0000000000000000 fffffc0000331678 fffffc00011bc690 [ 204.662981] 0000000000000000 fffffc0000fa108f fffffc00003d99cc fffffc00012829e0 [ 204.662981] fffffd000a0c0060 00000000e0ccdeeb fffffc00003d99cc fffffc0008b62000 [ 204.662981] fffffc0008b63490 0000000000000000 fffffd000a0c0000 fffffd000a0c0070 [ 204.662981] fffffc0000370040 fffffc000283d580 0000002fa6dd421e 0000000000000000 [ 204.662981] Trace: [ 204.662981] [<fffffc00003314e0>] __warn+0x190/0x1a0 [ 204.662981] [<fffffc0000331678>] warn_slowpath_fmt+0x188/0x240 [ 204.662981] [<fffffc00003d99cc>] __timer_delete_sync+0x10c/0x150 [ 204.662981] [<fffffc00003d99cc>] __timer_delete_sync+0x10c/0x150 [ 204.662981] [<fffffc0000370040>] try_to_wake_up+0x370/0x700 [ 204.662981] [<fffffc000036d960>] ttwu_do_activate.isra.0+0xb0/0x1a0 [ 204.662981] [<fffffc0000e37960>] _raw_spin_unlock_irqrestore+0x20/0x40 [ 204.662981] [<fffffc000036fe80>] try_to_wake_up+0x1b0/0x700 [ 204.662981] [<fffffc00003a8aa0>] __handle_irq_event_percpu+0x60/0x190 [ 204.662981] [<fffffc0000318214>] rtc_timer_interrupt+0x44/0xc0 [ 204.662981] [<fffffc00003a8c50>] handle_irq_event_percpu+0x80/0xa0 [ 204.662981] [<fffffc00003a8cd8>] handle_irq_event+0x68/0x110 [ 204.662981] [<fffffc00003af128>] handle_level_irq+0x108/0x240 [ 204.662981] [<fffffc00003159cc>] handle_irq+0x7c/0xf0 [ 204.662981] [<fffffc00003244b0>] dp264_srm_device_interrupt+0x30/0x50 [ 204.662981] [<fffffc000037d714>] pick_next_task_fair+0x114/0x200 [ 204.662981] [<fffffc0000315be4>] do_entInt+0x1a4/0x200 [ 204.662981] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 204.662981] [<fffffc0000e37898>] _raw_spin_unlock+0x18/0x30 [ 204.662981] [<fffffc000038769c>] update_dl_rq_load_avg+0x1bc/0x350 [ 204.662981] [<fffffc0000e2c54c>] cpu_idle_poll.isra.0+0x1c/0xa0 [ 204.662981] [<fffffc0000e2be60>] ct_kernel_enter_state+0x0/0x50 [ 204.662981] [<fffffc0000e2c580>] cpu_idle_poll.isra.0+0x50/0xa0 [ 204.662981] [<fffffc0000383318>] do_idle+0x78/0x1d0 [ 204.662981] [<fffffc0000383798>] cpu_startup_entry+0x58/0x70 [ 204.662981] [<fffffc0000e2c77c>] rest_init+0x11c/0x120 [ 204.662981] [<fffffc000031001c>] _stext+0x1c/0x20 [ 204.662981] [<fffffc0000312460>] do_entUnaUser+0x520/0x550 [ 204.662981] [<fffffc0000cbd6f8>] rtm_new_nexthop+0x14c8/0x1800
[ 204.662981] ---[ end trace 0000000000000000 ]---
On 2025/06/18 15:51, Greg Chandler wrote:
On 2025/06/18 13:59, Greg Chandler wrote:
On 2025/06/17 11:22, Florian Fainelli wrote:
(please no top posting)
On 6/17/25 11:19, Greg Chandler wrote:
Hmm... I'm wondering if that means it's an alpha-only issue then, which would make this a much larger headache than it already is. Also thank you for checking, I appreciate you taking the time.
I assume the those interfaces actually work right? (simple ping over that interface would be enough)Â I posted in a subsequent message that mine do not appear to at all.
Oh yeah, they work just fine:
udhcpc: broadcasting discover [ 19.197697] net eth0: Setting full-duplex based on MII#1 link partner capability of cde1
# ping -c 1 192.168.254.123 PING 192.168.254.123 (192.168.254.123): 56 data bytes 64 bytes from 192.168.254.123: seq=0 ttl=64 time=2.902 ms
--- 192.168.254.123 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 2.902/2.902/2.902 ms
[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.03 sec 39.6 MBytes 33.1 Mbits/sec 0 sender [ 5] 0.00-10.07 sec 39.8 MBytes 33.1 Mbits/sec receiver
My next step is to build that driver as a module, and see if it changes anything (I'm doubting it will). Then after that go dig up a different adapter, and see if it's the network stack or the driver.
I've been hard pressed over the last week to get a lot of diagnosing time.
Let me know if I can run experiments, I can load any kernel version on this Cobalt Qube2 meaning that bisections are possible.
Good luck!
I thought I replied to the whole list, not just the sender, sorry for the repeat.... (full config is attached)
As a module, the system booted up but did not probe the module. This may be from a variety of issues, not limited to the fact I am still rolling this distro from scratch, and not all of the tools are 100% working yet. I'm having some issues with gcc/gdb so I have a large number of debugging options turned on in the kernel as well.
When the module is loaded with insmod:
[ 213.363172] tulip 0000:00:09.0: vgaarb: pci_notify [ 213.363172] tulip 0000:00:09.0: assign IRQ: got 29 [ 213.363172] tulip 0000:00:09.0: enabling Mem-Wr-Inval [ 213.369031] tulip0: EEPROM default media type Autosense [ 213.369031] tulip0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [ 213.369031] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [ 213.370007] tulip0: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [ 213.370007] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [ 213.376843] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [ 213.377820] tulip 0000:00:09.0: vgaarb: pci_notify [ 213.377820] tulip 0000:00:0b.0: vgaarb: pci_notify [ 213.377820] tulip 0000:00:0b.0: assign IRQ: got 30 [ 213.377820] tulip 0000:00:0b.0: enabling Mem-Wr-Inval [ 213.384656] tulip1: EEPROM default media type Autosense [ 213.384656] tulip1: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [ 213.384656] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [ 213.384656] tulip1: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [ 213.384656] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [ 213.391492] net eth2: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30
root@bigbang:/lib/modules/6.12.12-SMP# mii-tool eth1 eth1: no link
root@bigbang:/lib/modules/6.12.12-SMP# mii-tool eth2 eth2: autonegotiation failed, link ok
When I pulled the plug I did not get the crash.. When I plugged it back in, I did not get a dmesg/kernel line for the link.
I bound the IP addresses to both, and did a ping test to the default gateway, resulting in: (I promise the network is properly configured)
root@bigbang:/lib/modules/6.12.12-SMP# ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. From 192.168.1.75 icmp_seq=1 Destination Host Unreachable From 192.168.1.75 icmp_seq=2 Destination Host Unreachable From 192.168.1.75 icmp_seq=3 Destination Host Unreachable
Upon pulling the cord out of the switch **after** the IP address was bound, we are back to this: [ 593.769227] ------------[ cut here ]------------ [ 593.769227] WARNING: CPU: 0 PID: 33 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 593.769227] Modules linked in: tulip [ 593.769227] CPU: 0 UID: 0 PID: 33 Comm: lock_torture_wr Not tainted 6.12.12-SMP #1 [ 593.769227] fffffc0002aeba40 fffffc00011b0ac0 fffffc000032a4f0 fffffc0000f8f181 [ 593.769227] 0000000000000000 0000000000000000 fffffc000032a688 fffffc000119c690 [ 593.769227] 0000000000000000 fffffc0000f8f181 fffffc00003d297c fffffc000125c7e0 [ 593.769227] fffffffc0082f1c4 00000000efe4b99a fffffc00003d297c fffffc000b03b490 [ 593.769227] fffffc000b03b490 0000000000000000 fffffffff8668000 fffffd000a120000 [ 593.769227] fffffc0000366888 fffffc000020ce00 fffffc000020ce00 0000000000000008 [ 593.769227] Trace: [ 593.769227] [<fffffc000032a4f0>] __warn+0x190/0x1a0 [ 593.769227] [<fffffc000032a688>] warn_slowpath_fmt+0x188/0x240 [ 593.769227] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 593.769227] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 593.769227] [<fffffc0000366888>] wakeup_preempt+0xb8/0xd0 [ 593.769227] [<fffffc0000366950>] ttwu_do_activate.isra.0+0xb0/0x1a0 [ 593.769227] [<fffffc0000369030>] try_to_wake_up+0x370/0x700 [ 593.769227] [<fffffc0000e29470>] _raw_spin_unlock_irqrestore+0x20/0x40 [ 593.769227] [<fffffc000036e1c4>] task_tick_fair+0x74/0x370 [ 593.769227] [<fffffc00003d46dc>] enqueue_hrtimer.isra.0+0x5c/0xc0 [ 593.769227] [<fffffc000037bc1c>] task_non_contending+0xcc/0x4f0 [ 593.769227] [<fffffc00003a1a80>] __handle_irq_event_percpu+0x60/0x190 [ 593.769227] [<fffffc000036d598>] sched_balance_update_blocked_averages+0xc8/0x2a0 [ 593.769227] [<fffffc00003a1cb8>] handle_irq_event+0x68/0x110 [ 593.769227] [<fffffc00003a8108>] handle_level_irq+0x108/0x240 [ 593.769227] [<fffffc00003158e0>] handle_irq+0x70/0xe0 [ 593.769227] [<fffffc0000320820>] dp264_srm_device_interrupt+0x30/0x50 [ 593.769227] [<fffffc0000315af4>] do_entInt+0x1a4/0x200 [ 593.769227] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 593.769227] [<fffffc0000392164>] torture_spin_lock_write_delay+0x74/0x180 [ 593.769227] [<fffffc00003d7cc8>] ktime_get+0x58/0x160 [ 593.769227] [<fffffc0000317fd0>] read_rpcc+0x0/0x10 [ 593.769227] [<fffffc0000317fd0>] read_rpcc+0x0/0x10 [ 593.769227] [<fffffc0000430aa8>] stutter_wait+0x88/0x110 [ 593.769227] [<fffffc0000391904>] lock_torture_writer+0x1d4/0x450 [ 593.769227] [<fffffc00003918cc>] lock_torture_writer+0x19c/0x450 [ 593.769227] [<fffffc000035a980>] kthread+0x150/0x190 [ 593.769227] [<fffffc0000391730>] lock_torture_writer+0x0/0x450 [ 593.769227] [<fffffc00003110d8>] ret_from_kernel_thread+0x18/0x20 [ 593.769227] [<fffffc000035a830>] kthread+0x0/0x190
[ 593.769227] ---[ end trace 0000000000000000 ]---
Adding some machine relevant stuff here (for the sake of being thorough)
The machine is a DEC DS10
root@bigbang:/lib/modules/6.12.12-SMP# lspci -vvv 00:07.0 ISA bridge: ULi Electronics Inc. M1533/M1535/M1543 PCI to ISA Bridge [Aladdin IV/V/V+] (rev c3) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort+ <MAbort+ >SERR- <PERR- INTx-
Latency: 0
00:09.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64
bytes Interrupt: pin A routed to IRQ 29 Region 0: I/O ports at 8400 [size=128] Region 1: Memory at 0a120000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a000000 [disabled] [size=256K] Kernel driver in use: tulip Kernel modules: tulip
00:0b.0 Ethernet controller: Digital Equipment Corporation DECchip 21142/43 (rev 41) Subsystem: Digital Equipment Corporation DE500B Fast Ethernet Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (5000ns min, 10000ns max), Cache Line Size: 64
bytes Interrupt: pin A routed to IRQ 30 Region 0: I/O ports at 8480 [size=128] Region 1: Memory at 0a121000 (32-bit, non-prefetchable) [size=1K] Expansion ROM at 0a040000 [disabled] [size=256K] Kernel driver in use: tulip Kernel modules: tulip
00:0d.0 IDE interface: ULi Electronics Inc. M5229 IDE (rev c1) (prog-if f0) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (500ns min, 1000ns max) Interrupt: pin A routed to IRQ 255 Region 0: I/O ports at 01f0 [size=8] Region 1: I/O ports at 03f4 Region 2: I/O ports at 0170 [size=8] Region 3: I/O ports at 0374 Region 4: I/O ports at 8880 [size=16] Kernel driver in use: pata_ali
00:0e.0 VGA compatible controller: Texas Instruments TVP4020 [Permedia 2] (rev 01) (prog-if 00 [VGA controller]) Subsystem: Elsa AG GLoria Synergy Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255 (48000ns min, 48000ns max) Interrupt: pin A routed to IRQ 35 Region 0: Memory at 0a080000 (32-bit, non-prefetchable)
[size=128K] Region 1: Memory at 09000000 (32-bit, non-prefetchable) [size=8M] Region 2: Memory at 09800000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at 0a100000 [disabled] [size=64K] Kernel driver in use: pm2fb
00:0f.0 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. USB 1.1 UHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 39 Region 4: I/O ports at 8800 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd
00:0f.1 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. USB 1.1 UHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 38 Region 4: I/O ports at 8820 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd
00:0f.2 USB controller: VIA Technologies, Inc. USB 2.0 EHCI-Compliant Host-Controller (rev 63) (prog-if 20 [EHCI]) Subsystem: VIA Technologies, Inc. USB 2.0 EHCI controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 255, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 37 Region 0: Memory at 0a122000 (32-bit, non-prefetchable)
[size=256] Capabilities: [80] Power Management version 2 Flags: PMEClk+ DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: ehci-pci
00:10.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet Controller (Fiber) (rev 02) Subsystem: Intel Corporation PRO/1000 XF Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 252 (63750ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 43 Region 0: Memory at 0a0a0000 (32-bit, non-prefetchable)
[size=128K] Region 1: Memory at 0a0c0000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at 8840 [size=32] Expansion ROM at 0a0e0000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [f0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Kernel driver in use: e1000
00:11.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE/SATA Controller (rev 50) Subsystem: VIA Technologies, Inc. VT6421 IDE/SATA Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 240 Interrupt: pin A routed to IRQ 47 Region 0: I/O ports at 8890 [size=16] Region 1: I/O ports at 88a0 [size=16] Region 2: I/O ports at 88b0 [size=16] Region 3: I/O ports at 88c0 [size=16] Region 4: I/O ports at 8860 [size=32] Region 5: I/O ports at 8000 [size=256] Expansion ROM at 0a110000 [disabled] [size=64K] Capabilities: [e0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: sata_via
root@bigbang:/lib/modules/6.12.12-SMP# cat /proc/cpuinfo cpu : Alpha cpu model : EV6 cpu variation : 7 cpu revision : 0 cpu serial number : system type : Tsunami system variation : Webbrick system revision : 0 system serial number : 4004DQMZ1055 cycle frequency [Hz] : 462437186 est. timer frequency [Hz] : 1024.00 page size [bytes] : 8192 phys. address bits : 44 max. addr. space # : 255 BogoMIPS : 911.32 kernel unaligned acc : 0 (pc=0,va=0) user unaligned acc : 0 (pc=0,va=0) platform string : AlphaServer DS10 466 MHz cpus detected : 1 cpus active : 1 cpu active mask : 0000000000000001 L1 Icache : 64K, 2-way, 64b line L1 Dcache : 64K, 2-way, 64b line L2 cache : 2048K, 1-way, 64b line L3 cache : n/a
I also ran this setup again, with the USB/video/Intel NIC yanked out, and it's the same... Once the IPs are bound, a link loss pops the message: (with the other PCI cards pulled)
[ 363.702938] ------------[ cut here ]------------ [ 363.702938] WARNING: CPU: 0 PID: 34 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 363.702938] Modules linked in: tulip [ 363.702938] CPU: 0 UID: 0 PID: 34 Comm: lock_torture_wr Not tainted 6.12.12-SMP #1 [ 363.702938] fffffc0002aefa70 fffffc00011b0ac0 fffffc000032a4f0 fffffc0000f8f181 [ 363.702938] 0000000000000000 0000000000000000 fffffc000032a688 fffffc000119c690 [ 363.702938] 0000000000000000 fffffc0000f8f181 fffffc00003d297c fffffc000125c7e0 [ 363.702938] fffffffc008251c4 00000000fa83b2da fffffc00003d297c fffffc0003ded490 [ 363.702938] fffffc0003ded490 0000000000000000 fffffffff8668000 fffffd000a0c0000 [ 363.702938] 00000054ae626ca0 fffffc000285a600 fffffc000036692c fffffc000285af80 [ 363.702938] Trace: [ 363.702938] [<fffffc000032a4f0>] __warn+0x190/0x1a0 [ 363.702938] [<fffffc000032a688>] warn_slowpath_fmt+0x188/0x240 [ 363.702938] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 363.702938] [<fffffc00003d297c>] __timer_delete_sync+0x10c/0x150 [ 363.702938] [<fffffc000036692c>] ttwu_do_activate.isra.0+0x8c/0x1a0 [ 363.702938] [<fffffc0000369030>] try_to_wake_up+0x370/0x700 [ 363.702938] [<fffffc0000368e70>] try_to_wake_up+0x1b0/0x700 [ 363.702938] [<fffffc00003d43f8>] hrtimer_wakeup+0x28/0x40 [ 363.702938] [<fffffc00003d43d0>] hrtimer_wakeup+0x0/0x40 [ 363.702938] [<fffffc00003c31a4>] rcu_sched_clock_irq+0x714/0xea0 [ 363.702938] [<fffffc00003a1a80>] __handle_irq_event_percpu+0x60/0x190 [ 363.702938] [<fffffc00003e6758>] tick_handle_periodic+0x38/0xd0 [ 363.702938] [<fffffc00003731e8>] enqueue_task_fair+0x358/0x8b0 [ 363.702938] [<fffffc00003a1cb8>] handle_irq_event+0x68/0x110 [ 363.702938] [<fffffc00003a8108>] handle_level_irq+0x108/0x240 [ 363.702938] [<fffffc00003158e0>] handle_irq+0x70/0xe0 [ 363.702938] [<fffffc0000320820>] dp264_srm_device_interrupt+0x30/0x50 [ 363.702938] [<fffffc0000372538>] pick_task_fair+0x88/0x100 [ 363.702938] [<fffffc0000315af4>] do_entInt+0x1a4/0x200 [ 363.702938] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 363.702938] [<fffffc00003923dc>] __torture_rt_boost+0x5c/0x100 [ 363.702938] [<fffffc0000e291d8>] _raw_spin_lock+0x18/0x30 [ 363.702938] [<fffffc0000390330>] do_raw_spin_lock+0x0/0x140 [ 363.702938] [<fffffc00003903a0>] do_raw_spin_lock+0x70/0x140 [ 363.702938] [<fffffc0000e291d8>] _raw_spin_lock+0x18/0x30 [ 363.702938] [<fffffc0000390ac0>] torture_spin_lock_write_lock+0x20/0x40 [ 363.702938] [<fffffc000039183c>] lock_torture_writer+0x10c/0x450 [ 363.702938] [<fffffc000035a980>] kthread+0x150/0x190 [ 363.702938] [<fffffc0000391730>] lock_torture_writer+0x0/0x450 [ 363.702938] [<fffffc00003110d8>] ret_from_kernel_thread+0x18/0x20 [ 363.702938] [<fffffc000035a830>] kthread+0x0/0x190
[ 363.702938] ---[ end trace 0000000000000000 ]---
I'm going to try one more thing, which is compile a non-specific alpha kernel, and try it, but I don't think it'll change anything.
With the recompiled (fully from scratch) kernel with generic alpha vs DP264 I am still seeing the same thing. What is odd is that I see "dp264_srm_device_interrupt" in the error though... Next step is to make sure it's the tulip driver itself and not the network stack. I can't use the i1000 card at the moment, so I need to find another PCI nic to test.
root@bigbang:~# zcat /proc/config.gz |grep CONFIG_ALPHA CONFIG_ALPHA=y CONFIG_ALPHA_GENERIC=y # CONFIG_ALPHA_ALCOR is not set # CONFIG_ALPHA_DP264 is not set # CONFIG_ALPHA_EIGER is not set # CONFIG_ALPHA_LX164 is not set # CONFIG_ALPHA_MARVEL is not set # CONFIG_ALPHA_MIATA is not set # CONFIG_ALPHA_MIKASA is not set # CONFIG_ALPHA_NAUTILUS is not set # CONFIG_ALPHA_NORITAKE is not set # CONFIG_ALPHA_PC164 is not set # CONFIG_ALPHA_RAWHIDE is not set # CONFIG_ALPHA_RUFFIAN is not set # CONFIG_ALPHA_RX164 is not set # CONFIG_ALPHA_SX164 is not set # CONFIG_ALPHA_SABLE is not set # CONFIG_ALPHA_SHARK is not set # CONFIG_ALPHA_TAKARA is not set # CONFIG_ALPHA_TITAN is not set # CONFIG_ALPHA_WILDFIRE is not set CONFIG_ALPHA_BROKEN_IRQ_MASK=y # CONFIG_ALPHA_WTINT is not set CONFIG_ALPHA_LEGACY_START_ADDRESS=y
[ 204.662981] ------------[ cut here ]------------ [ 204.662981] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150 [ 204.662981] Modules linked in: tulip [ 204.662981] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.12-SMP #1 [ 204.662981] fffffc00011c7a20 fffffc0008b62000 fffffc00003314e0 fffffc0000fa108f [ 204.662981] 0000000000000000 0000000000000000 fffffc0000331678 fffffc00011bc690 [ 204.662981] 0000000000000000 fffffc0000fa108f fffffc00003d99cc fffffc00012829e0 [ 204.662981] fffffd000a0c0060 00000000e0ccdeeb fffffc00003d99cc fffffc0008b62000 [ 204.662981] fffffc0008b63490 0000000000000000 fffffd000a0c0000 fffffd000a0c0070 [ 204.662981] fffffc0000370040 fffffc000283d580 0000002fa6dd421e 0000000000000000 [ 204.662981] Trace: [ 204.662981] [<fffffc00003314e0>] __warn+0x190/0x1a0 [ 204.662981] [<fffffc0000331678>] warn_slowpath_fmt+0x188/0x240 [ 204.662981] [<fffffc00003d99cc>] __timer_delete_sync+0x10c/0x150 [ 204.662981] [<fffffc00003d99cc>] __timer_delete_sync+0x10c/0x150 [ 204.662981] [<fffffc0000370040>] try_to_wake_up+0x370/0x700 [ 204.662981] [<fffffc000036d960>] ttwu_do_activate.isra.0+0xb0/0x1a0 [ 204.662981] [<fffffc0000e37960>] _raw_spin_unlock_irqrestore+0x20/0x40 [ 204.662981] [<fffffc000036fe80>] try_to_wake_up+0x1b0/0x700 [ 204.662981] [<fffffc00003a8aa0>] __handle_irq_event_percpu+0x60/0x190 [ 204.662981] [<fffffc0000318214>] rtc_timer_interrupt+0x44/0xc0 [ 204.662981] [<fffffc00003a8c50>] handle_irq_event_percpu+0x80/0xa0 [ 204.662981] [<fffffc00003a8cd8>] handle_irq_event+0x68/0x110 [ 204.662981] [<fffffc00003af128>] handle_level_irq+0x108/0x240 [ 204.662981] [<fffffc00003159cc>] handle_irq+0x7c/0xf0 [ 204.662981] [<fffffc00003244b0>] dp264_srm_device_interrupt+0x30/0x50 [ 204.662981] [<fffffc000037d714>] pick_next_task_fair+0x114/0x200 [ 204.662981] [<fffffc0000315be4>] do_entInt+0x1a4/0x200 [ 204.662981] [<fffffc0000310d00>] ret_from_sys_call+0x0/0x10 [ 204.662981] [<fffffc0000e37898>] _raw_spin_unlock+0x18/0x30 [ 204.662981] [<fffffc000038769c>] update_dl_rq_load_avg+0x1bc/0x350 [ 204.662981] [<fffffc0000e2c54c>] cpu_idle_poll.isra.0+0x1c/0xa0 [ 204.662981] [<fffffc0000e2be60>] ct_kernel_enter_state+0x0/0x50 [ 204.662981] [<fffffc0000e2c580>] cpu_idle_poll.isra.0+0x50/0xa0 [ 204.662981] [<fffffc0000383318>] do_idle+0x78/0x1d0 [ 204.662981] [<fffffc0000383798>] cpu_startup_entry+0x58/0x70 [ 204.662981] [<fffffc0000e2c77c>] rest_init+0x11c/0x120 [ 204.662981] [<fffffc000031001c>] _stext+0x1c/0x20 [ 204.662981] [<fffffc0000312460>] do_entUnaUser+0x520/0x550 [ 204.662981] [<fffffc0000cbd6f8>] rtm_new_nexthop+0x14c8/0x1800
[ 204.662981] ---[ end trace 0000000000000000 ]---
Well I have good news, and that is, it's the driver, and not the platform or the network stack. The bad news is that it's the driver on this platform with this network stack...
I already had the e1000 driver compiled in, and I was able to find one of those laying around (finally)
[ 7.578121] e1000 0000:00:0f.0 eth124: renamed from eth2 [ 7.581050] e1000 0000:00:0f.0 eth3: renamed from eth124 [ 124.012631] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
root@bigbang:~# ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.763 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=2.32 ms 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=2.76 ms 64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=0.554 ms
Then I removed the the ethernet cable, and all I see in the dmesg is this: [ 124.012631] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 132.938408] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb3862002) [ 167.312414] e1000: eth3 NIC Link is Down
Which is excactly what we expect to see (the link down message)
So what I know for sure is this: The tulip driver on alpha (generic and DP264) oops/panic on physical disconnect, but only when an IP address is bound. It does not panic when no address is bound to the interface. It does not matter if the driver is compiled in, or if it is compiled as a module. It does not matter if all of the options are set for tulip or if none of them are: New bus configuration Use PCI shared mem for NIC registers Use RX polling (NAPI) Use Interrupt Mitigation The physical link does not auto-negotiate, and mii-tool does not seem to be able to force it with -F or -A like you would expect it to. The kernel does not drop the "Link is Up/Link is Down" messages when the PHY "links" The switch and interface both show LEDs as if linked at 10-Half-Duplex, and the lights turn off when the link is broken. Subsequently they do relink at 10-Half again if plugged back in. I did also attempt to test the kernel level stack for nfsroot, just to see if it worked prior to init launching everything else, and it did not. I used the same IP configuration for that test as all of the tests in these emails. All of the oops/panics seem to happen at: kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150
I've attached my normal running config (non-diagnostic) set for the platform (DP264).
On Thu, 19 Jun 2025, Greg Chandler wrote:
So what I know for sure is this: The tulip driver on alpha (generic and DP264) oops/panic on physical disconnect, but only when an IP address is bound. It does not panic when no address is bound to the interface. It does not matter if the driver is compiled in, or if it is compiled as a module. It does not matter if all of the options are set for tulip or if none of them are: New bus configuration Use PCI shared mem for NIC registers Use RX polling (NAPI) Use Interrupt Mitigation The physical link does not auto-negotiate, and mii-tool does not seem to be able to force it with -F or -A like you would expect it to. The kernel does not drop the "Link is Up/Link is Down" messages when the PHY "links" The switch and interface both show LEDs as if linked at 10-Half-Duplex, and the lights turn off when the link is broken. Subsequently they do relink at 10-Half again if plugged back in. I did also attempt to test the kernel level stack for nfsroot, just to see if it worked prior to init launching everything else, and it did not. I used the same IP configuration for that test as all of the tests in these emails. All of the oops/panics seem to happen at: kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150
FYI something's changed a while ago in how `del_timer_sync' is handled and I can see a similar warning nowadays with another network driver with the MIPS platform.
Since I'm the maintainer of said driver I mean to bisect it and figure out what's going here, but haven't found time so far owing to other commitments (and the driver otherwise works just fine regardless, so it's minor annoyance). If you beat me to it, then I'll gladly accept it, but otherwise I'm just letting you know you're not alone with this issue and that it's not specific to the DEC Tulip driver on your system.
For the record:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1563 __timer_delete_sync+0x110/0x118 Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G W 6.4.0-rc3-00030-gae62c49c0cef #21 Stack : 807a0000 80095a8c 00000000 00000004 806a0000 00000009 80c09dac 807d0000 807a0000 807056ec 80769fac 807a13f3 807d30c4 1000ec00 80c09d58 80787a18 00000000 00000000 807056ec 00000000 00000001 80c09c94 00000077 34633236 20202020 00000000 807d7311 20202020 807056ec 1000ec00 00000000 00000000 806fcb60 806fcb38 807a0000 00000001 00000000 fffffffe 00000000 807d0000 ... Call Trace: [<80048ecc>] show_stack+0x2c/0xf8 [<80645c88>] dump_stack_lvl+0x34/0x4c [<80641d00>] __warn+0xb4/0xe8 [<80641d84>] warn_slowpath_fmt+0x50/0x88 [<800b177c>] __timer_delete_sync+0x110/0x118 [<8040f4b0>] fza_interrupt+0x904/0x1004 [<80098d7c>] __handle_irq_event_percpu+0x84/0x188 [<80098f1c>] handle_irq_event+0x38/0xbc [<8009d4e4>] handle_level_irq+0xc8/0x208 [<80098110>] generic_handle_irq+0x44/0x5c [<8064f450>] do_IRQ+0x1c/0x28 [<80041cf0>] dec_irq_dispatch+0x10/0x20 [<80043754>] handle_int+0x14c/0x158 [<8008bf64>] do_idle+0x5c/0x15c [<8008c368>] cpu_startup_entry+0x20/0x28 [<8064657c>] kernel_init+0x0/0x114
---[ end trace 0000000000000000 ]---
-- the arrival of this particular device state change interrupt means the timer set up just in case the device gets stuck can be deleted, so I'm not sure why calling `del_timer_sync' to discard the timer has become a no-no now; this code is 20+ years old now, though I sat on it for a while and then it took some time and effort to get it upstream too. The issue has started sometime between 5.18 (clean boot) and 6.4 (quoted above).
Maybe it'll ring someone's bell and they'll chime in or otherwise I'll bisect it... sometime. Or feel free to start yourself with 5.18, as it's not terribly old, only a bit and certainly not so as 2.6 is.
Maciej
Hi Maciej,
On 6/19/25 12:36, Maciej W. Rozycki wrote:
On Thu, 19 Jun 2025, Greg Chandler wrote:
So what I know for sure is this: The tulip driver on alpha (generic and DP264) oops/panic on physical disconnect, but only when an IP address is bound. It does not panic when no address is bound to the interface. It does not matter if the driver is compiled in, or if it is compiled as a module. It does not matter if all of the options are set for tulip or if none of them are: New bus configuration Use PCI shared mem for NIC registers Use RX polling (NAPI) Use Interrupt Mitigation The physical link does not auto-negotiate, and mii-tool does not seem to be able to force it with -F or -A like you would expect it to. The kernel does not drop the "Link is Up/Link is Down" messages when the PHY "links" The switch and interface both show LEDs as if linked at 10-Half-Duplex, and the lights turn off when the link is broken. Subsequently they do relink at 10-Half again if plugged back in. I did also attempt to test the kernel level stack for nfsroot, just to see if it worked prior to init launching everything else, and it did not. I used the same IP configuration for that test as all of the tests in these emails. All of the oops/panics seem to happen at: kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150
FYI something's changed a while ago in how `del_timer_sync' is handled and I can see a similar warning nowadays with another network driver with the MIPS platform.
Since I'm the maintainer of said driver I mean to bisect it and figure out what's going here, but haven't found time so far owing to other commitments (and the driver otherwise works just fine regardless, so it's minor annoyance). If you beat me to it, then I'll gladly accept it, but otherwise I'm just letting you know you're not alone with this issue and that it's not specific to the DEC Tulip driver on your system.
For the record:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1563 __timer_delete_sync+0x110/0x118 Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G W 6.4.0-rc3-00030-gae62c49c0cef #21 Stack : 807a0000 80095a8c 00000000 00000004 806a0000 00000009 80c09dac 807d0000 807a0000 807056ec 80769fac 807a13f3 807d30c4 1000ec00 80c09d58 80787a18 00000000 00000000 807056ec 00000000 00000001 80c09c94 00000077 34633236 20202020 00000000 807d7311 20202020 807056ec 1000ec00 00000000 00000000 806fcb60 806fcb38 807a0000 00000001 00000000 fffffffe 00000000 807d0000 ... Call Trace: [<80048ecc>] show_stack+0x2c/0xf8 [<80645c88>] dump_stack_lvl+0x34/0x4c [<80641d00>] __warn+0xb4/0xe8 [<80641d84>] warn_slowpath_fmt+0x50/0x88 [<800b177c>] __timer_delete_sync+0x110/0x118 [<8040f4b0>] fza_interrupt+0x904/0x1004 [<80098d7c>] __handle_irq_event_percpu+0x84/0x188 [<80098f1c>] handle_irq_event+0x38/0xbc [<8009d4e4>] handle_level_irq+0xc8/0x208 [<80098110>] generic_handle_irq+0x44/0x5c [<8064f450>] do_IRQ+0x1c/0x28 [<80041cf0>] dec_irq_dispatch+0x10/0x20 [<80043754>] handle_int+0x14c/0x158 [<8008bf64>] do_idle+0x5c/0x15c [<8008c368>] cpu_startup_entry+0x20/0x28 [<8064657c>] kernel_init+0x0/0x114
---[ end trace 0000000000000000 ]---
-- the arrival of this particular device state change interrupt means the timer set up just in case the device gets stuck can be deleted, so I'm not sure why calling `del_timer_sync' to discard the timer has become a no-no now; this code is 20+ years old now, though I sat on it for a while and then it took some time and effort to get it upstream too. The issue has started sometime between 5.18 (clean boot) and 6.4 (quoted above).
Maybe it'll ring someone's bell and they'll chime in or otherwise I'll bisect it... sometime. Or feel free to start yourself with 5.18, as it's not terribly old, only a bit and certainly not so as 2.6 is.
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
On Thu, 19 Jun 2025, Florian Fainelli wrote:
Maybe it'll ring someone's bell and they'll chime in or otherwise I'll bisect it... sometime. Or feel free to start yourself with 5.18, as it's not terribly old, only a bit and certainly not so as 2.6 is.
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
in `tulip_init_one' and `pnic_lnk_change' won't ever trigger this, but the other two can; apparently the corresponding comment in `tulip_interrupt':
/* * NB: t21142_lnk_change() does a del_timer_sync(), so be careful if this * call is ever done under the spinlock */
hasn't been updated when `pnic2_lnk_change' was added. Also ISTM no link change handler is a valid option too, in which case `del_timer_sync' won't be called either. This is from a cursory glance only, so please take with a pinch of salt.
Maciej
On 2025/06/19 14:53, Maciej W. Rozycki wrote:
On Thu, 19 Jun 2025, Florian Fainelli wrote:
Maybe it'll ring someone's bell and they'll chime in or otherwise I'll bisect it... sometime. Or feel free to start yourself with 5.18, as it's not terribly old, only a bit and certainly not so as 2.6 is.
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
in `tulip_init_one' and `pnic_lnk_change' won't ever trigger this, but the other two can; apparently the corresponding comment in `tulip_interrupt':
/*
- NB: t21142_lnk_change() does a del_timer_sync(), so be careful if
this
- call is ever done under the spinlock
*/
hasn't been updated when `pnic2_lnk_change' was added. Also ISTM no link change handler is a valid option too, in which case `del_timer_sync' won't be called either. This is from a cursory glance only, so please take with a pinch of salt.
Maciej
I'm not sure which of us that was directed at, but for my onboard tulips:
Micro Linear ML6698CH <- PHY Intel 21143-TD <- NIC
I know that the ML chips are most commonly used with 21143s and a very small smattering of others, I don't think they are all that common at least not since the late '90s.. I'm relatively certain all my DEC ISA/PCI nics use them though.
I found a link to the datasheet (If needed), but have had mixed luck with alldatasheets: https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.h...
Glancing over it I don't see anything about the link, I'll go stick my eyes in the driver a bit and see what stabs me in the eye....
On 2025/06/19 15:56, Greg Chandler wrote:
On 2025/06/19 14:53, Maciej W. Rozycki wrote:
On Thu, 19 Jun 2025, Florian Fainelli wrote:
Maybe it'll ring someone's bell and they'll chime in or otherwise I'll bisect it... sometime. Or feel free to start yourself with 5.18, as it's not terribly old, only a bit and certainly not so as 2.6 is.
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
in `tulip_init_one' and `pnic_lnk_change' won't ever trigger this, but the other two can; apparently the corresponding comment in `tulip_interrupt':
/*
- NB: t21142_lnk_change() does a del_timer_sync(), so be careful if
this
- call is ever done under the spinlock
*/
hasn't been updated when `pnic2_lnk_change' was added. Also ISTM no link change handler is a valid option too, in which case `del_timer_sync' won't be called either. This is from a cursory glance only, so please take with a pinch of salt.
Maciej
I'm not sure which of us that was directed at, but for my onboard tulips:
Micro Linear ML6698CH <- PHY Intel 21143-TD <- NIC
I know that the ML chips are most commonly used with 21143s and a very small smattering of others, I don't think they are all that common at least not since the late '90s.. I'm relatively certain all my DEC ISA/PCI nics use them though.
I found a link to the datasheet (If needed), but have had mixed luck with alldatasheets: https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.h...
Glancing over it I don't see anything about the link, I'll go stick my eyes in the driver a bit and see what stabs me in the eye....
That didn't take long.. The first thing to jab it's thumb in my eye was this: const struct tulip_chip_table tulip_tbl[] = { { }, /* placeholder for array, slot unused currently */ { }, /* placeholder for array, slot unused currently */
/* DC21140 */ { "Digital DS21140 Tulip", 128, 0x0001ebef, HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | HAS_PCI_MWI, tulip_timer, tulip_media_task },
/* DC21142, DC21143 */ { "Digital DS21142/43 Tulip", 128, 0x0801fbff, HAS_MII | HAS_MEDIA_TABLE | ALWAYS_CHECK_MII | HAS_ACPI | HAS_NWAY | HAS_INTR_MITIGATION | HAS_PCI_MWI, tulip_timer, t21142_media_task },
The alpha ev6 platform to my knowledge has never had ACPI, this one surely doesn't, and checking my config the variables aren't even listed compared to the ones enabled or commented for my other platforms. It's possible that other alphas (ev67 or ev7s) may have but it's also not likely. I know for sure the: ev4, ev45, ev5, and ev56 architectures did not, as the ACPI standard hadn't been ratified, or wasn't around long enough to make it into the production of the chipsets, and boards.
I will see if I can find a link between not having ACPI and this issue, it's possible that the other instances you mentioned also have that same issue. Or that they do have ACPI and have it disabled for 10 reasons or another....
The second potential issue I see is that I don't know off-hand what PCI MWI is...
It's only found in the tulip driver and nowhere else in the kernel:
root@constellation:/tmp/tmp/linux-6.12.12/drivers/net/ethernet/dec/tulip# grep -R HAS_PCI_MWI ../../../../../ grep: ../../../../../drivers/net/ethernet/dec/tulip/tulip.ko: binary file matches grep: ../../../../../drivers/net/ethernet/dec/tulip/eeprom.o: binary file matches grep: ../../../../../drivers/net/ethernet/dec/tulip/interrupt.o: binary file matches ../../../../../drivers/net/ethernet/dec/tulip/tulip.h: HAS_PCI_MWI = 0x01000, ../../../../../drivers/net/ethernet/dec/tulip/tulip_core.c: HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | HAS_PCI_MWI, tulip_timer, ../../../../../drivers/net/ethernet/dec/tulip/tulip_core.c: | HAS_INTR_MITIGATION | HAS_PCI_MWI, tulip_timer, t21142_media_task }, ../../../../../drivers/net/ethernet/dec/tulip/tulip_core.c: HAS_MII | HAS_NWAY | HAS_8023X | HAS_PCI_MWI, pnic2_timer, }, ../../../../../drivers/net/ethernet/dec/tulip/tulip_core.c: | HAS_NWAY | HAS_PCI_MWI, tulip_timer, tulip_media_task }, ../../../../../drivers/net/ethernet/dec/tulip/tulip_core.c: if (!force_csr0 && (tp->flags & HAS_PCI_MWI)) grep: ../../../../../drivers/net/ethernet/dec/tulip/tulip.o: binary file matches grep: ../../../../../drivers/net/ethernet/dec/tulip/tulip_core.o: binary file matches
It's defined as what looks labeled as a table flag in the tulip.h:
enum tbl_flag { HAS_MII = 0x00001, HAS_MEDIA_TABLE = 0x00002, CSR12_IN_SROM = 0x00004, ALWAYS_CHECK_MII = 0x00008, HAS_ACPI = 0x00010, MC_HASH_ONLY = 0x00020, /* Hash-only multicast filter. */ HAS_PNICNWAY = 0x00080, HAS_NWAY = 0x00040, /* Uses internal NWay xcvr. */ HAS_INTR_MITIGATION = 0x00100, IS_ASIX = 0x00200, HAS_8023X = 0x00400, COMET_MAC_ADDR = 0x00800, HAS_PCI_MWI = 0x01000, HAS_PHY_IRQ = 0x02000, HAS_SWAPPED_SEEPROM = 0x04000, NEEDS_FAKE_MEDIA_TABLE = 0x08000, COMET_PM = 0x10000, };
On Thu, 19 Jun 2025, Greg Chandler wrote:
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
I'm not sure which of us that was directed at, but for my onboard tulips:
It was for Florian, as obviously your system does trigger the issue.
I found a link to the datasheet (If needed), but have had mixed luck with alldatasheets: https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.h...
There's no need to chase hw documentation as the issue isn't directly related to it.
As I noted in the earlier e-mail it seems a regression in the handling of `del_timer_sync', perhaps deliberate, introduced sometime between 5.18 and 6.4. I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2 actually here that worked correctly) and see if it still triggers the problem and if it does not then bisect it (perhaps limiting the upper bound to 6.4 if it does trigger it for you, to save an iteration or a couple). Once you know the offender you'll likely know the solution. Or you can come back with results and ask for one if unsure.
HTH,
Maciej
On 2025/06/19 17:57, Maciej W. Rozycki wrote:
On Thu, 19 Jun 2025, Greg Chandler wrote:
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
I'm not sure which of us that was directed at, but for my onboard tulips:
It was for Florian, as obviously your system does trigger the issue.
I found a link to the datasheet (If needed), but have had mixed luck with alldatasheets: https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.h...
There's no need to chase hw documentation as the issue isn't directly related to it.
As I noted in the earlier e-mail it seems a regression in the handling of `del_timer_sync', perhaps deliberate, introduced sometime between 5.18 and 6.4. I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2 actually here that worked correctly) and see if it still triggers the problem and if it does not then bisect it (perhaps limiting the upper bound to 6.4 if it does trigger it for you, to save an iteration or a couple). Once you know the offender you'll likely know the solution. Or you can come back with results and ask for one if unsure.
HTH,
Maciej
I haven't had keyboard time in quite a few days, but I've been looking over the code today. I removed the HAS_ACPI from the 21142 setup, only to find later it was only used in a single function to deal with sleep mode stuff. As I was reading over the driver, I've been taking a look at what could potentially drop in some of the debgugging statements, and loaded the module with:
insmod ./tulip.ko tulip_debug=100
[16933.489376] tulip0: EEPROM default media type Autosense [16933.489376] tulip0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.489376] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.498165] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [16933.498165] tulip 0000:00:09.0 eth0: Restarting 21143 autonegotiation, csr14=0003ffff [16933.498165] tulip 0000:00:09.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: assign IRQ: got 30 [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): tulip_mwi_config() [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): MWI config cacheline=16, csr0=01a09000 [16933.498165] tulip 0000:00:0b.0: enabling bus mastering [16933.505001] tulip1: EEPROM default media type Autosense [16933.505001] tulip1: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.505001] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.513790] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30 [16933.513790] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [16933.513790] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.609494] tulip 0000:00:09.0 eth109: renamed from eth0 [16933.619259] tulip 0000:00:09.0 eth2: renamed from eth109
This popped up when I bound an IP address to the interface (but not before)
[17042.757875] tulip 0000:00:0b.0 eth1: tulip_up(), irq==30 [17042.757875] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: tulip_up(), irq==29 [17043.033266] tulip 0000:00:09.0 eth2: Restarting 21143 autonegotiation, csr14=0003ffff [17043.033266] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.330140] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0268010 new csr5=0xf0260000 [17044.690491] net eth2: 21143 link status interrupt cde1d2ce, CSR5 f0268010, fffbffff [17044.690491] net eth2: Switching to 100baseTx-FDX based on link negotiation 01e0 & cde1 = 01e0 [17044.690491] tulip 0000:00:09.0 eth2: 21143 non-MII 100baseTx-FDX transceiver control 08af/00a0 [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR15 to 08af0008/00a00008 [17044.690491] tulip 0000:00:09.0 eth2: Using media type 100baseTx-FDX, CSR12 is ce [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR6 83860200/b3862202 CSR12 cde1d2ce [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc85 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf8668000 new csr5=0xf8668000 [17044.691468] net eth2: 21143 link status interrupt cde1d2cc, CSR5 f8668000, fffbffff [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0668010 new csr5=0xf0660000 [17044.691468] net eth2: 21143 link status interrupt 000002c8, CSR5 f0668010, fffbff7f [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffb000 [17045.493225] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.803772] net eth1: 21143 negotiation status 000021c6, 10baseT [17045.803772] net eth1: 21143 negotiation failed, status 000021c6 [17045.803772] net eth1: Testing new 21143 media 100baseTx [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0208100 new csr5=0xf0200000 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0260000 [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc85 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17045.805725] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb3862002) [17046.053772] net eth2: 21143 negotiation status 000002c8, 100baseTx-FDX [17046.053772] net eth2: Using NWay-set 100baseTx-FDX media, csr12 000002c8
I'm still working my way through the driver, but I figured I'd post the additional debug info in case anyone wanted it.
On 2025/06/24 16:10, Greg Chandler wrote:
On 2025/06/19 17:57, Maciej W. Rozycki wrote:
On Thu, 19 Jun 2025, Greg Chandler wrote:
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
I'm not sure which of us that was directed at, but for my onboard tulips:
It was for Florian, as obviously your system does trigger the issue.
I found a link to the datasheet (If needed), but have had mixed luck with alldatasheets: https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.h...
There's no need to chase hw documentation as the issue isn't directly related to it.
As I noted in the earlier e-mail it seems a regression in the handling of `del_timer_sync', perhaps deliberate, introduced sometime between 5.18 and 6.4. I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2 actually here that worked correctly) and see if it still triggers the problem and if it does not then bisect it (perhaps limiting the upper bound to 6.4 if it does trigger it for you, to save an iteration or a couple). Once you know the offender you'll likely know the solution. Or you can come back with results and ask for one if unsure.
HTH,
Maciej
I haven't had keyboard time in quite a few days, but I've been looking over the code today. I removed the HAS_ACPI from the 21142 setup, only to find later it was only used in a single function to deal with sleep mode stuff. As I was reading over the driver, I've been taking a look at what could potentially drop in some of the debgugging statements, and loaded the module with:
insmod ./tulip.ko tulip_debug=100
[16933.489376] tulip0: EEPROM default media type Autosense [16933.489376] tulip0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.489376] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.498165] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [16933.498165] tulip 0000:00:09.0 eth0: Restarting 21143 autonegotiation, csr14=0003ffff [16933.498165] tulip 0000:00:09.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: assign IRQ: got 30 [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): tulip_mwi_config() [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): MWI config cacheline=16, csr0=01a09000 [16933.498165] tulip 0000:00:0b.0: enabling bus mastering [16933.505001] tulip1: EEPROM default media type Autosense [16933.505001] tulip1: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.505001] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.513790] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30 [16933.513790] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [16933.513790] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.609494] tulip 0000:00:09.0 eth109: renamed from eth0 [16933.619259] tulip 0000:00:09.0 eth2: renamed from eth109
This popped up when I bound an IP address to the interface (but not before)
[17042.757875] tulip 0000:00:0b.0 eth1: tulip_up(), irq==30 [17042.757875] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: tulip_up(), irq==29 [17043.033266] tulip 0000:00:09.0 eth2: Restarting 21143 autonegotiation, csr14=0003ffff [17043.033266] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.330140] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0268010 new csr5=0xf0260000 [17044.690491] net eth2: 21143 link status interrupt cde1d2ce, CSR5 f0268010, fffbffff [17044.690491] net eth2: Switching to 100baseTx-FDX based on link negotiation 01e0 & cde1 = 01e0 [17044.690491] tulip 0000:00:09.0 eth2: 21143 non-MII 100baseTx-FDX transceiver control 08af/00a0 [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR15 to 08af0008/00a00008 [17044.690491] tulip 0000:00:09.0 eth2: Using media type 100baseTx-FDX, CSR12 is ce [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR6 83860200/b3862202 CSR12 cde1d2ce [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc85 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf8668000 new csr5=0xf8668000 [17044.691468] net eth2: 21143 link status interrupt cde1d2cc, CSR5 f8668000, fffbffff [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0668010 new csr5=0xf0660000 [17044.691468] net eth2: 21143 link status interrupt 000002c8, CSR5 f0668010, fffbff7f [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffb000 [17045.493225] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.803772] net eth1: 21143 negotiation status 000021c6, 10baseT [17045.803772] net eth1: 21143 negotiation failed, status 000021c6 [17045.803772] net eth1: Testing new 21143 media 100baseTx [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0208100 new csr5=0xf0200000 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0260000 [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc85 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17045.805725] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb3862002) [17046.053772] net eth2: 21143 negotiation status 000002c8, 100baseTx-FDX [17046.053772] net eth2: Using NWay-set 100baseTx-FDX media, csr12 000002c8
I'm still working my way through the driver, but I figured I'd post the additional debug info in case anyone wanted it.
As I hit send on that last mail, I noticed a line that has not shown up before: [17044.690491] net eth2: Switching to 100baseTx-FDX based on link negotiation 01e0 & cde1 = 01e0
I looked down at the switch, and it was actually linked at 100MB/FDX, until now it has only linked at 10-Half
The interface worked even with the errors above (I brought the intel adapter hard down and unplugged the cable to check).
The only thing I have changed is the ACPI disable which should do litterally nothing in this case, and loading the module with a debug flag. I am going to reboot the machine to clear out everything and see what exactly did this. I can't beleive that turning on debugging fixed it, but I have seen much weirder stuff happen.
On 2025/06/24 16:18, Greg Chandler wrote:
On 2025/06/24 16:10, Greg Chandler wrote:
On 2025/06/19 17:57, Maciej W. Rozycki wrote:
On Thu, 19 Jun 2025, Greg Chandler wrote:
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
I'm not sure which of us that was directed at, but for my onboard tulips:
It was for Florian, as obviously your system does trigger the issue.
I found a link to the datasheet (If needed), but have had mixed luck with alldatasheets: https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.h...
There's no need to chase hw documentation as the issue isn't directly related to it.
As I noted in the earlier e-mail it seems a regression in the handling of `del_timer_sync', perhaps deliberate, introduced sometime between 5.18 and 6.4. I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2 actually here that worked correctly) and see if it still triggers the problem and if it does not then bisect it (perhaps limiting the upper bound to 6.4 if it does trigger it for you, to save an iteration or a couple). Once you know the offender you'll likely know the solution. Or you can come back with results and ask for one if unsure.
HTH,
Maciej
I haven't had keyboard time in quite a few days, but I've been looking over the code today. I removed the HAS_ACPI from the 21142 setup, only to find later it was only used in a single function to deal with sleep mode stuff. As I was reading over the driver, I've been taking a look at what could potentially drop in some of the debgugging statements, and loaded the module with:
insmod ./tulip.ko tulip_debug=100
[16933.489376] tulip0: EEPROM default media type Autosense [16933.489376] tulip0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.489376] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.498165] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [16933.498165] tulip 0000:00:09.0 eth0: Restarting 21143 autonegotiation, csr14=0003ffff [16933.498165] tulip 0000:00:09.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: assign IRQ: got 30 [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): tulip_mwi_config() [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): MWI config cacheline=16, csr0=01a09000 [16933.498165] tulip 0000:00:0b.0: enabling bus mastering [16933.505001] tulip1: EEPROM default media type Autosense [16933.505001] tulip1: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.505001] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.513790] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30 [16933.513790] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [16933.513790] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.609494] tulip 0000:00:09.0 eth109: renamed from eth0 [16933.619259] tulip 0000:00:09.0 eth2: renamed from eth109
This popped up when I bound an IP address to the interface (but not before)
[17042.757875] tulip 0000:00:0b.0 eth1: tulip_up(), irq==30 [17042.757875] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: tulip_up(), irq==29 [17043.033266] tulip 0000:00:09.0 eth2: Restarting 21143 autonegotiation, csr14=0003ffff [17043.033266] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.330140] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0268010 new csr5=0xf0260000 [17044.690491] net eth2: 21143 link status interrupt cde1d2ce, CSR5 f0268010, fffbffff [17044.690491] net eth2: Switching to 100baseTx-FDX based on link negotiation 01e0 & cde1 = 01e0 [17044.690491] tulip 0000:00:09.0 eth2: 21143 non-MII 100baseTx-FDX transceiver control 08af/00a0 [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR15 to 08af0008/00a00008 [17044.690491] tulip 0000:00:09.0 eth2: Using media type 100baseTx-FDX, CSR12 is ce [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR6 83860200/b3862202 CSR12 cde1d2ce [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc85 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf8668000 new csr5=0xf8668000 [17044.691468] net eth2: 21143 link status interrupt cde1d2cc, CSR5 f8668000, fffbffff [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0668010 new csr5=0xf0660000 [17044.691468] net eth2: 21143 link status interrupt 000002c8, CSR5 f0668010, fffbff7f [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffb000 [17045.493225] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.803772] net eth1: 21143 negotiation status 000021c6, 10baseT [17045.803772] net eth1: 21143 negotiation failed, status 000021c6 [17045.803772] net eth1: Testing new 21143 media 100baseTx [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0208100 new csr5=0xf0200000 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0260000 [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc85 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17045.805725] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb3862002) [17046.053772] net eth2: 21143 negotiation status 000002c8, 100baseTx-FDX [17046.053772] net eth2: Using NWay-set 100baseTx-FDX media, csr12 000002c8
I'm still working my way through the driver, but I figured I'd post the additional debug info in case anyone wanted it.
As I hit send on that last mail, I noticed a line that has not shown up before: [17044.690491] net eth2: Switching to 100baseTx-FDX based on link negotiation 01e0 & cde1 = 01e0
I looked down at the switch, and it was actually linked at 100MB/FDX, until now it has only linked at 10-Half
The interface worked even with the errors above (I brought the intel adapter hard down and unplugged the cable to check).
The only thing I have changed is the ACPI disable which should do litterally nothing in this case, and loading the module with a debug flag. I am going to reboot the machine to clear out everything and see what exactly did this. I can't beleive that turning on debugging fixed it, but I have seen much weirder stuff happen.
Another bit of info that might help as I am tracing through this. Debug levels 1-10 panic: insmod ./tulip.ko tulip_debug=1 insmod ./tulip.ko tulip_debug=2 insmod ./tulip.ko tulip_debug=3 insmod ./tulip.ko tulip_debug=4 insmod ./tulip.ko tulip_debug=5 insmod ./tulip.ko tulip_debug=6 insmod ./tulip.ko tulip_debug=7 insmod ./tulip.ko tulip_debug=8 insmod ./tulip.ko tulip_debug=9 insmod ./tulip.ko tulip_debug=10
This does not, so hopfully that will narrow the search today: insmod ./tulip.ko tulip_debug=100
linux-stable-mirror@lists.linaro.org