On Fri, 18 Jul 2025 09:07:26 +0000 John Ernberg wrote:
Thanks for the analysis, I think I may have misread the code. What I was saying is that we are restoring the carrier while we are still processing the previous carrier off event in the workqueue. My thinking was that if we deferred the netif_carrier_on() to the workqueue this race couldn't happen.
usbnet_bh() already checks netif_carrier_ok() - we're kinda duplicating the carrier state with this RX_PAUSED workaround.
I don't feel strongly about this, but deferring the carrier_on() the the workqueue would be a cleaner solution IMO.
I've been thinking about this idea, but I'm concerned for the opposite direction. I cannot think of a way to fully guarantee that the carrier isn't turned on again incorrectly if an off gets queued.
The most I came up with was adding an extra flag bit to set carrier on, and then test_and_clear_bit() it in the __handle_link_change() function. And also clear_bit() in the usbnet_link_change() function if an off arrives. I cannot convince myself that there isn't a way for that to go sideways. But perhaps that would be robust enough?
I think it should be robust enough.. Unless my grep skills are failing me - no drivers which call usbnet_link_change() twiddle the link state directly.
Give it a go, if you think your initial patch is cleaner -- it's fine.
I've also considered the possibility of just not re-submitting the INTR poll URB until the last one was fully processed when handling a link change. But that might cause havoc with ASIX and Sierra devices as they are calling usbnet_link_change() in other ways than through the .status-callback. I don't have any of these devices so I cannot test them for regressions. So this path feels quite dangerous. With a sub-driver property to enable this behavior it might work out?
Yeah, that seems more involved.