On 2025/06/24 16:10, Greg Chandler wrote:
On 2025/06/19 17:57, Maciej W. Rozycki wrote:
On Thu, 19 Jun 2025, Greg Chandler wrote:
I am still not sure why I could not see that warning on by Cobalt Qube2 trying to reproduce Greg's original issue, that is with an IP assigned on the interface yanking the cable did not trigger a timer warning. It could be that machine is orders of magnitude slower and has a different CONFIG_HZ value that just made it less likely to be seen?
Can it have a different PHY attached? There's this code:
if (tp->chip_id == PNIC2) tp->link_change = pnic2_lnk_change; else if (tp->flags & HAS_NWAY) tp->link_change = t21142_lnk_change; else if (tp->flags & HAS_PNICNWAY) tp->link_change = pnic_lnk_change;
I'm not sure which of us that was directed at, but for my onboard tulips:
It was for Florian, as obviously your system does trigger the issue.
I found a link to the datasheet (If needed), but have had mixed luck with alldatasheets: https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.h...
There's no need to chase hw documentation as the issue isn't directly related to it.
As I noted in the earlier e-mail it seems a regression in the handling of `del_timer_sync', perhaps deliberate, introduced sometime between 5.18 and 6.4. I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2 actually here that worked correctly) and see if it still triggers the problem and if it does not then bisect it (perhaps limiting the upper bound to 6.4 if it does trigger it for you, to save an iteration or a couple). Once you know the offender you'll likely know the solution. Or you can come back with results and ask for one if unsure.
HTH,
Maciej
I haven't had keyboard time in quite a few days, but I've been looking over the code today. I removed the HAS_ACPI from the 21142 setup, only to find later it was only used in a single function to deal with sleep mode stuff. As I was reading over the driver, I've been taking a look at what could potentially drop in some of the debgugging statements, and loaded the module with:
insmod ./tulip.ko tulip_debug=100
[16933.489376] tulip0: EEPROM default media type Autosense [16933.489376] tulip0: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.489376] tulip0: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.489376] tulip0: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.498165] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 [16933.498165] tulip 0000:00:09.0 eth0: Restarting 21143 autonegotiation, csr14=0003ffff [16933.498165] tulip 0000:00:09.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.498165] tulip 0000:00:0b.0: assign IRQ: got 30 [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): tulip_mwi_config() [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) (uninitialized): MWI config cacheline=16, csr0=01a09000 [16933.498165] tulip 0000:00:0b.0: enabling bus mastering [16933.505001] tulip1: EEPROM default media type Autosense [16933.505001] tulip1: Index #0 - Media 10baseT (#0) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #1 - Media 10baseT-FDX (#4) described by a 21142 Serial PHY (2) block [16933.505001] tulip1: Index #2 - Media 100baseTx (#3) described by a 21143 SYM PHY (4) block [16933.505001] tulip1: Index #3 - Media 100baseTx-FDX (#5) described by a 21143 SYM PHY (4) block [16933.513790] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0xa121000, 08:00:2b:86:a8:5b, IRQ 30 [16933.513790] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [16933.513790] tulip 0000:00:0b.0: vgaarb: pci_notify [16933.609494] tulip 0000:00:09.0 eth109: renamed from eth0 [16933.619259] tulip 0000:00:09.0 eth2: renamed from eth109
This popped up when I bound an IP address to the interface (but not before)
[17042.757875] tulip 0000:00:0b.0 eth1: tulip_up(), irq==30 [17042.757875] tulip 0000:00:0b.0 eth1: Restarting 21143 autonegotiation, csr14=0003ffff [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17042.758852] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: tulip_up(), irq==29 [17043.033266] tulip 0000:00:09.0 eth2: Restarting 21143 autonegotiation, csr14=0003ffff [17043.033266] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.033266] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: Done tulip_up(), CSR0 f9a09000, CSR5 f0760000 CSR6 b2422202 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17043.035219] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17043.330140] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0268010 new csr5=0xf0260000 [17044.690491] net eth2: 21143 link status interrupt cde1d2ce, CSR5 f0268010, fffbffff [17044.690491] net eth2: Switching to 100baseTx-FDX based on link negotiation 01e0 & cde1 = 01e0 [17044.690491] tulip 0000:00:09.0 eth2: 21143 non-MII 100baseTx-FDX transceiver control 08af/00a0 [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR15 to 08af0008/00a00008 [17044.690491] tulip 0000:00:09.0 eth2: Using media type 100baseTx-FDX, CSR12 is ce [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR6 83860200/b3862202 CSR12 cde1d2ce [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc85 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffbc84 [17044.690491] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf8668000 new csr5=0xf8668000 [17044.691468] net eth2: 21143 link status interrupt cde1d2cc, CSR5 f8668000, fffbffff [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0668010 new csr5=0xf0660000 [17044.691468] net eth2: 21143 link status interrupt 000002c8, CSR5 f0668010, fffbff7f [17044.691468] net eth2: 21143 100baseTx-FDX link beat good [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.493225] tulip 0000:00:09.0 eth2: Transmit error, Tx status 7fffb000 [17045.493225] tulip 0000:00:09.0 eth2: exiting interrupt, csr5=0xf0660000 [17045.803772] net eth1: 21143 negotiation status 000021c6, 10baseT [17045.803772] net eth1: 21143 negotiation failed, status 000021c6 [17045.803772] net eth1: Testing new 21143 media 100baseTx [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0208100 new csr5=0xf0200000 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0260000 [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new csr5=0xf0660000 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc85 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 7fffbc84 [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, csr5=0xf0660000 [17045.805725] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb3862002) [17046.053772] net eth2: 21143 negotiation status 000002c8, 100baseTx-FDX [17046.053772] net eth2: Using NWay-set 100baseTx-FDX media, csr12 000002c8
I'm still working my way through the driver, but I figured I'd post the additional debug info in case anyone wanted it.
As I hit send on that last mail, I noticed a line that has not shown up before: [17044.690491] net eth2: Switching to 100baseTx-FDX based on link negotiation 01e0 & cde1 = 01e0
I looked down at the switch, and it was actually linked at 100MB/FDX, until now it has only linked at 10-Half
The interface worked even with the errors above (I brought the intel adapter hard down and unplugged the cable to check).
The only thing I have changed is the ACPI disable which should do litterally nothing in this case, and loading the module with a debug flag. I am going to reboot the machine to clear out everything and see what exactly did this. I can't beleive that turning on debugging fixed it, but I have seen much weirder stuff happen.