On Wed, Aug 24, 2022 at 01:22:12PM -0700, KernelCI bot wrote:
The KernelCI bisection bot identified 7eea9a60703ca ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling") as causing a boot regression on Panda in v5.18. The board stops detecting a link which breks NFS boot:
<6>[ 4.953613] smsc95xx 2-1.1:1.0 eth0: register 'smsc95xx' at usb-4a064c00.ehci-1.1, smsc95xx USB 2.0 Ethernet, 02:03:01:8c:13:b0 <6>[ 5.032928] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup <6>[ 5.044036] smsc95xx 2-1.1:1.0 eth0: Link is Down <6>[ 25.053924] Waiting up to 100 more seconds for network. <6>[ 45.074005] Waiting up to 80 more seconds for network. <6>[ 65.093933] Waiting up to 60 more seconds for network. <6>[ 85.123992] Waiting up to 40 more seconds for network. <6>[ 105.143951] Waiting up to 20 more seconds for network. <5>[ 125.084014] Sending DHCP requests ...... timed out! <6>[ 207.765808] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup <3>[ 207.773468] IP-Config: Reopening network devices... <6>[ 207.840332] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup <6>[ 207.851226] smsc95xx 2-1.1:1.0 eth0: Link is Down <6>[ 227.873931] Waiting up to 100 more seconds for network. <6>[ 247.873931] Waiting up to 80 more seconds for network.
We're seen other failures on this board in mainline which stop boot sooner so can't confirm if there's a similar issue there.
I've left the whole report, including links to more details like full logs and a tag for the bot below:
- This automated bisection report was sent to you on the basis *
- that you may be involved with the breaking commit it has *
- found. No manual investigation has been done to verify it, *
- and the root cause of the problem may be somewhere else. *
*
- If you do send a fix, please include this trailer: *
- Reported-by: "kernelci.org bot" bot@kernelci.org *
*
- Hope this helps! *
stable-rc/linux-5.18.y bisection: baseline.login on panda
Summary: Start: 22a992953741a Linux 5.18.19 Plain log: https://storage.kernelci.org/stable-rc/linux-5.18.y/v5.18.19/arm/multi_v7_de... HTML log: https://storage.kernelci.org/stable-rc/linux-5.18.y/v5.18.19/arm/multi_v7_de... Result: 7eea9a60703ca usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
Checks: revert: PASS verify: PASS
Parameters: Tree: stable-rc URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-5.18.y Target: panda CPU arch: arm Lab: lab-baylibre Compiler: gcc-10 Config: multi_v7_defconfig Test case: baseline.login
Breaking commit found:
commit 7eea9a60703caf1b04eccba93cd103f1c8fc6809 Author: Lukas Wunner lukas@wunner.de Date: Thu May 12 10:42:05 2022 +0200
usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
[ Upstream commit 1ce8b37241ed291af56f7a49bbdbf20c08728e88 ] Link status of SMSC LAN95xx chips is polled once per second, even though they're capable of signaling PHY interrupts through the MAC layer. Forward those interrupts to the PHY driver to avoid polling. Benefits are reduced bus traffic, reduced CPU overhead and quicker interface bringup. Polling was introduced in 2016 by commit d69d16949346 ("usbnet: smsc95xx: fix link detection for disabled autonegotiation"). Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt, hence couldn't detect link-up events when auto-negotiation was disabled. The proper solution would have been to enable the ENERGYON interrupt instead of polling. Since then, PHY handling was moved from the LAN95xx driver to the SMSC PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support"). That PHY driver is capable of link detection with auto-negotiation disabled because it enables the ENERGYON interrupt. Note that signaling interrupts through the MAC layer not only works with the integrated PHY, but also with an external PHY, provided its interrupt pin is attached to LAN95xx's nPHY_INT pin. In the unlikely event that the interrupt pin of an external PHY is attached to a GPIO of the SoC (or not connected at all), the driver can be amended to retrieve the irq from the PHY's of_node. To forward PHY interrupts to phylib, it is not sufficient to call phy_mac_interrupt(). Instead, the PHY's interrupt handler needs to run so that PHY interrupts are cleared. That's because according to page 119 of the LAN950x datasheet, "The source of this interrupt is a level. The interrupt persists until it is cleared in the PHY." https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/Da... Therefore, create an IRQ domain with a single IRQ for the PHY. In the future, the IRQ domain may be extended to support the 11 GPIOs on the LAN95xx. Normally the PHY interrupt should be masked until the PHY driver has cleared it. However masking requires a (sleeping) USB transaction and interrupts are received in (non-sleepable) softirq context. I decided not to mask the interrupt at all (by using the dummy_irq_chip's noop ->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec intervals and normally that's sufficient to wake the PHY driver's IRQ thread and have it clear the interrupt. If it does take longer, worst thing that can happen is the IRQ thread is woken again. No big deal. Because PHY interrupts are now perpetually enabled, there's no need to selectively enable them on suspend. So remove all invocations of smsc95xx_enable_phy_wakeup_interrupts(). In smsc95xx_resume(), move the call of phy_init_hw() before usbnet_resume() (which restarts the status URB) to ensure that the PHY is fully initialized when an interrupt is handled. Tested-by: Oleksij Rempel o.rempel@pengutronix.de # LAN9514/9512/9500 Tested-by: Ferry Toth fntoth@gmail.com # LAN9514 Signed-off-by: Lukas Wunner lukas@wunner.de Reviewed-by: Andrew Lunn andrew@lunn.ch # from a PHY perspective Cc: Andre Edich andre.edich@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index f5a208948d22a..358b170cc8fb7 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -18,6 +18,8 @@ #include <linux/usb/usbnet.h> #include <linux/slab.h> #include <linux/of_net.h> +#include <linux/irq.h> +#include <linux/irqdomain.h> #include <linux/mdio.h> #include <linux/phy.h> #include <net/selftests.h> @@ -53,6 +55,9 @@ #define SUSPEND_ALLMODES (SUSPEND_SUSPEND0 | SUSPEND_SUSPEND1 | \ SUSPEND_SUSPEND2 | SUSPEND_SUSPEND3) +#define SMSC95XX_NR_IRQS (1) /* raise to 12 for GPIOs */ +#define PHY_HWIRQ (SMSC95XX_NR_IRQS - 1)
struct smsc95xx_priv { u32 mac_cr; u32 hash_hi; @@ -61,6 +66,9 @@ struct smsc95xx_priv { spinlock_t mac_cr_lock; u8 features; u8 suspend_flags;
- struct irq_chip irqchip;
- struct irq_domain *irqdomain;
- struct fwnode_handle *irqfwnode; struct mii_bus *mdiobus; struct phy_device *phydev;
}; @@ -597,6 +605,8 @@ static void smsc95xx_mac_update_fullduplex(struct usbnet *dev) static void smsc95xx_status(struct usbnet *dev, struct urb *urb) {
- struct smsc95xx_priv *pdata = dev->driver_priv;
- unsigned long flags; u32 intdata;
if (urb->actual_length != 4) { @@ -608,11 +618,15 @@ static void smsc95xx_status(struct usbnet *dev, struct urb *urb) intdata = get_unaligned_le32(urb->transfer_buffer); netif_dbg(dev, link, dev->net, "intdata: 0x%08X\n", intdata);
- local_irq_save(flags);
- if (intdata & INT_ENP_PHY_INT_)
;
else netdev_warn(dev->net, "unexpected interrupt, intdata=0x%08X\n", intdata);generic_handle_domain_irq(pdata->irqdomain, PHY_HWIRQ);
- local_irq_restore(flags);
} /* Enable or disable Tx & Rx checksum offload engines */ @@ -1098,8 +1112,9 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf) { struct smsc95xx_priv *pdata; bool is_internal_phy;
- char usb_path[64];
- int ret, phy_irq; u32 val;
- int ret;
printk(KERN_INFO SMSC_CHIPNAME " v" SMSC_DRIVER_VERSION "\n"); @@ -1139,10 +1154,38 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf) if (ret) goto free_pdata;
- /* create irq domain for use by PHY driver and GPIO consumers */
- usb_make_path(dev->udev, usb_path, sizeof(usb_path));
- pdata->irqfwnode = irq_domain_alloc_named_fwnode(usb_path);
- if (!pdata->irqfwnode) {
ret = -ENOMEM;
goto free_pdata;
- }
- pdata->irqdomain = irq_domain_create_linear(pdata->irqfwnode,
SMSC95XX_NR_IRQS,
&irq_domain_simple_ops,
pdata);
- if (!pdata->irqdomain) {
ret = -ENOMEM;
goto free_irqfwnode;
- }
- phy_irq = irq_create_mapping(pdata->irqdomain, PHY_HWIRQ);
- if (!phy_irq) {
ret = -ENOENT;
goto remove_irqdomain;
- }
- pdata->irqchip = dummy_irq_chip;
- pdata->irqchip.name = SMSC_CHIPNAME;
- irq_set_chip_and_handler_name(phy_irq, &pdata->irqchip,
handle_simple_irq, "phy");
- pdata->mdiobus = mdiobus_alloc(); if (!pdata->mdiobus) { ret = -ENOMEM;
goto free_pdata;
}goto dispose_irq;
ret = smsc95xx_read_reg(dev, HW_CFG, &val); @@ -1175,6 +1218,7 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf) goto unregister_mdio; }
- pdata->phydev->irq = phy_irq; pdata->phydev->is_internal = is_internal_phy;
/* detect device revision as different features may be available */ @@ -1217,6 +1261,15 @@ static int smsc95xx_bind(struct usbnet *dev, struct usb_interface *intf) free_mdio: mdiobus_free(pdata->mdiobus); +dispose_irq:
- irq_dispose_mapping(phy_irq);
+remove_irqdomain:
- irq_domain_remove(pdata->irqdomain);
+free_irqfwnode:
- irq_domain_free_fwnode(pdata->irqfwnode);
free_pdata: kfree(pdata); return ret; @@ -1229,6 +1282,9 @@ static void smsc95xx_unbind(struct usbnet *dev, struct usb_interface *intf) phy_disconnect(dev->net->phydev); mdiobus_unregister(pdata->mdiobus); mdiobus_free(pdata->mdiobus);
- irq_dispose_mapping(irq_find_mapping(pdata->irqdomain, PHY_HWIRQ));
- irq_domain_remove(pdata->irqdomain);
- irq_domain_free_fwnode(pdata->irqfwnode); netif_dbg(dev, ifdown, dev->net, "free pdata\n"); kfree(pdata);
} @@ -1253,29 +1309,6 @@ static u32 smsc_crc(const u8 *buffer, size_t len, int filter) return crc << ((filter % 2) * 16); } -static int smsc95xx_enable_phy_wakeup_interrupts(struct usbnet *dev, u16 mask) -{
- int ret;
- netdev_dbg(dev->net, "enabling PHY wakeup interrupts\n");
- /* read to clear */
- ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_SRC);
- if (ret < 0)
return ret;
- /* enable interrupt source */
- ret = smsc95xx_mdio_read_nopm(dev, PHY_INT_MASK);
- if (ret < 0)
return ret;
- ret |= mask;
- smsc95xx_mdio_write_nopm(dev, PHY_INT_MASK, ret);
- return 0;
-}
static int smsc95xx_link_ok_nopm(struct usbnet *dev) { int ret; @@ -1442,7 +1475,6 @@ static int smsc95xx_enter_suspend3(struct usbnet *dev) static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up) { struct smsc95xx_priv *pdata = dev->driver_priv;
- int ret;
if (!netif_running(dev->net)) { /* interface is ifconfig down so fully power down hw */ @@ -1461,27 +1493,10 @@ static int smsc95xx_autosuspend(struct usbnet *dev, u32 link_up) } netdev_dbg(dev->net, "autosuspend entering SUSPEND1\n");
/* enable PHY wakeup events for if cable is attached */
ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
PHY_INT_MASK_ANEG_COMP_);
if (ret < 0) {
netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
return ret;
}
- netdev_info(dev->net, "entering SUSPEND1 mode\n"); return smsc95xx_enter_suspend1(dev); }
- /* enable PHY wakeup events so we remote wakeup if cable is pulled */
- ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
PHY_INT_MASK_LINK_DOWN_);
- if (ret < 0) {
netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
return ret;
- }
- netdev_dbg(dev->net, "autosuspend entering SUSPEND3\n"); return smsc95xx_enter_suspend3(dev);
} @@ -1547,13 +1562,6 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message) } if (pdata->wolopts & WAKE_PHY) {
ret = smsc95xx_enable_phy_wakeup_interrupts(dev,
(PHY_INT_MASK_ANEG_COMP_ | PHY_INT_MASK_LINK_DOWN_));
if (ret < 0) {
netdev_warn(dev->net, "error enabling PHY wakeup ints\n");
goto done;
}
- /* if link is down then configure EDPD and enter SUSPEND1,
*/
- otherwise enter SUSPEND0 below
@@ -1787,11 +1795,12 @@ static int smsc95xx_resume(struct usb_interface *intf) return ret; }
- phy_init_hw(pdata->phydev);
- ret = usbnet_resume(intf); if (ret < 0) netdev_warn(dev->net, "usbnet_resume error\n");
- phy_init_hw(pdata->phydev); return ret;
}
Git bisection log:
git bisect start # good: [9aa5a042881d4a99657f82c774e9e15353ebeb2d] Linux 5.18.14 git bisect good 9aa5a042881d4a99657f82c774e9e15353ebeb2d # bad: [22a992953741ad79c07890d3f4104585e52ef26b] Linux 5.18.19 git bisect bad 22a992953741ad79c07890d3f4104585e52ef26b # good: [77d5174ed962c259965226386f71575790646ec0] drm/mediatek: dpi: Remove output format of YUV git bisect good 77d5174ed962c259965226386f71575790646ec0 # good: [598bc9541e2f82a7fe8dfe23891201b355a56cda] usb: dwc3: core: Do not perform GCTL_CORE_SOFTRESET during bootup git bisect good 598bc9541e2f82a7fe8dfe23891201b355a56cda # good: [876f57cc94922896cc71dd4696013a7c0558c9b4] f2fs: give priority to select unpinned section for foreground GC git bisect good 876f57cc94922896cc71dd4696013a7c0558c9b4 # bad: [5f4e505909fe50a4e256704076594ee3def0b9b1] block: add a bdev_max_zone_append_sectors helper git bisect bad 5f4e505909fe50a4e256704076594ee3def0b9b1 # good: [b9c3401f7cac6ae291a16784dadcd1bf116218fe] x86/kprobes: Update kcb status flag after singlestepping git bisect good b9c3401f7cac6ae291a16784dadcd1bf116218fe # bad: [73ce2046e04ad488cecc66757c36cbe1bdf089d4] iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE) git bisect bad 73ce2046e04ad488cecc66757c36cbe1bdf089d4 # good: [f624c94ad56b663693a9413d8c8c03635435f369] drm/vc4: drv: Adopt the dma configuration from the HVS or V3D component git bisect good f624c94ad56b663693a9413d8c8c03635435f369 # bad: [87c4896d5dd7fd9927c814cf3c6289f41de3b562] firmware: arm_scpi: Ensure scpi_info is not assigned if the probe fails git bisect bad 87c4896d5dd7fd9927c814cf3c6289f41de3b562 # good: [9b61971cab92a918cab7168d439a351b1c799aca] usbnet: smsc95xx: Avoid link settings race on interrupt reception git bisect good 9b61971cab92a918cab7168d439a351b1c799aca # bad: [e81395cfbe62518f41af79a1b287fc8fe7c96b37] usbnet: smsc95xx: Fix deadlock on runtime resume git bisect bad e81395cfbe62518f41af79a1b287fc8fe7c96b37 # bad: [7eea9a60703caf1b04eccba93cd103f1c8fc6809] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling git bisect bad 7eea9a60703caf1b04eccba93cd103f1c8fc6809
# first bad commit: [7eea9a60703caf1b04eccba93cd103f1c8fc6809] usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling
-=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#30878): https://groups.io/g/kernelci-results/message/30878 Mute This Topic: https://groups.io/mt/93235418/1131744 Group Owner: kernelci-results+owner@groups.io Unsubscribe: https://groups.io/g/kernelci-results/unsub [broonie@kernel.org] -=-=-=-=-=-=-=-=-=-=-=-
On Wed, Aug 24, 2022 at 11:47:03PM +0100, Mark Brown wrote:
On Wed, Aug 24, 2022 at 01:22:12PM -0700, KernelCI bot wrote: The KernelCI bisection bot identified 7eea9a60703ca ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling") as causing a boot regression on Panda in v5.18. The board stops detecting a link which breks NFS boot:
<6>[ 4.953613] smsc95xx 2-1.1:1.0 eth0: register 'smsc95xx' at usb-4a064c00.ehci-1.1, smsc95xx USB 2.0 Ethernet, 02:03:01:8c:13:b0 <6>[ 5.032928] smsc95xx 2-1.1:1.0 eth0: hardware isn't capable of remote wakeup <6>[ 5.044036] smsc95xx 2-1.1:1.0 eth0: Link is Down <6>[ 25.053924] Waiting up to 100 more seconds for network. <6>[ 45.074005] Waiting up to 80 more seconds for network. <6>[ 65.093933] Waiting up to 60 more seconds for network. <6>[ 85.123992] Waiting up to 40 more seconds for network. <6>[ 105.143951] Waiting up to 20 more seconds for network. <5>[ 125.084014] Sending DHCP requests ...... timed out!
Looks like v5.19 boots fine (with the offending commit), so only v5.18-stable and v5.15-stable are affected:
https://storage.kernelci.org/stable-rc/linux-5.19.y/v5.19.3-366-g32f80a5b58e...
The offending commit was a new feature introduced in v5.19, it deliberately wasn't tagged for stable. Aggressively backporting such new features always carries a risk of regressions.
Off the cuff I don't see that a prerequisite commit is missing in stable kernels. So just reverting the offending commit in v5.18-stable and v5.15-stable (but not in v5.19-stable) might be the simplest solution.
Thanks,
Lukas
linux-stable-mirror@lists.linaro.org