Please see the bisection report below about a boot failure.
Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid.
It appears to be due to the fact that the network interface is failing to get brought up:
[ 114.385000] Waiting up to 10 more seconds for network. [ 124.355000] Sending DHCP requests ...# ..# .# timed out! [ 212.355000] IP-Config: Reopening network devices... [ 212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready #
I guess the board would boot fine without network if it didn't have ip=dhcp in the command line, so it's not strictly a kernel boot failure but still an ethernet issue.
There wasn't any failure reported by kernelci on linux-4.9.y so maybe this patch was applied by mistake on linux-4.4.y but I haven't investigated enough to prove this.
Thanks, Guillaume
On 10/05/2020 18:27, kernelci.org bot wrote:
- This automated bisection report was sent to you on the basis *
- that you may be involved with the breaking commit it has *
- found. No manual investigation has been done to verify it, *
- and the root cause of the problem may be somewhere else. *
*
- If you do send a fix, please include this trailer: *
- Reported-by: "kernelci.org bot" bot@kernelci.org *
*
- Hope this helps! *
stable/linux-4.4.y bisection: baseline.login on at91-sama5d4_xplained
Summary: Start: e157447efd85b Linux 4.4.223 Plain log: https://storage.kernelci.org/stable/linux-4.4.y/v4.4.223/arm/multi_v7_defcon... HTML log: https://storage.kernelci.org/stable/linux-4.4.y/v4.4.223/arm/multi_v7_defcon... Result: 0d1951fa23ba0 net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS
Checks: revert: PASS verify: PASS
Parameters: Tree: stable URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Branch: linux-4.4.y Target: at91-sama5d4_xplained CPU arch: arm Lab: lab-baylibre Compiler: gcc-8 Config: multi_v7_defconfig Test case: baseline.login
Breaking commit found:
commit 0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd Author: Florian Fainelli f.fainelli@gmail.com Date: Mon Jan 18 19:33:06 2016 -0800
net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS
commit d5c3d84657db57bd23ecd58b97f1c99dd42a7b80 upstream. Commit 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") changed a hunk in phy_state_machine() in the PHY_RUNNING case which was not needed. The change essentially makes the PHY library treat PHY devices with PHY_IGNORE_INTERRUPT to keep polling for the PHY device, even though the intent is not to do it. Fix this by reverting that specific hunk, which makes the PHY state machine wait for state changes, and stay in the PHY_RUNNING state for as long as needed. Fixes: 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 7d2cf015c5e76..b242bec834f4b 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -912,10 +912,10 @@ void phy_state_machine(struct work_struct *work) phydev->adjust_link(phydev->attached_dev); break; case PHY_RUNNING:
/* Only register a CHANGE if we are polling or ignoring
* interrupts and link changed since latest checking.
/* Only register a CHANGE if we are polling and link changed
*/* since latest checking.
if (!phy_interrupt_is_valid(phydev)) {
if (phydev->irq == PHY_POLL) { old_link = phydev->link; err = phy_read_status(phydev); if (err)
@@ -1015,8 +1015,13 @@ void phy_state_machine(struct work_struct *work) dev_dbg(&phydev->dev, "PHY state change %s -> %s\n", phy_state_to_str(old_state), phy_state_to_str(phydev->state));
- queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
PHY_STATE_TIME * HZ);
- /* Only re-schedule a PHY state machine change if we are polling the
* PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
* between states from phy_mac_interrupt()
*/
- if (phydev->irq == PHY_POLL)
queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
PHY_STATE_TIME * HZ);
} void phy_mac_interrupt(struct phy_device *phydev, int new_link)
Git bisection log:
git bisect start # good: [b63f449e18b130fdc372b9717e72c19b83fc4876] Linux 4.4.222 git bisect good b63f449e18b130fdc372b9717e72c19b83fc4876 # bad: [e157447efd85bb2e6f8deaabbb62663bccd9bad2] Linux 4.4.223 git bisect bad e157447efd85bb2e6f8deaabbb62663bccd9bad2 # bad: [5733a9f4a3df384097c92c532aed34bc698a9acd] net: dsa: slave: fix of-node leak and phy priority git bisect bad 5733a9f4a3df384097c92c532aed34bc698a9acd # good: [1ce6993b857318a4b8c674b1bbaaf79aced34136] net/mlx5e: Fix blue flame quota logic git bisect good 1ce6993b857318a4b8c674b1bbaaf79aced34136 # good: [c32532162f8ea4beed50a20cf4f9b205c75fe1b1] serial: samsung: Fix possible out of bounds access on non-DT platform git bisect good c32532162f8ea4beed50a20cf4f9b205c75fe1b1 # good: [25e8aad6f491da6ae330148da09585371a3790f2] Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()" git bisect good 25e8aad6f491da6ae330148da09585371a3790f2 # good: [2f3e56e4b6020812350190f1cada230d790ce0e8] powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint() git bisect good 2f3e56e4b6020812350190f1cada230d790ce0e8 # bad: [0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd] net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS git bisect bad 0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd # good: [4ebef63e925e37f5de2f9da8fc86a545e4e0b945] sctp: fix the transports round robin issue when init is retransmitted git bisect good 4ebef63e925e37f5de2f9da8fc86a545e4e0b945 # good: [c175435fdf50c81ca2b6576f090cba31c3489209] NFC: nci: memory leak in nci_core_conn_create() git bisect good c175435fdf50c81ca2b6576f090cba31c3489209
# first bad commit: [0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd] net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS
On Tue, May 12, 2020 at 06:54:29AM +0100, Guillaume Tucker wrote:
Please see the bisection report below about a boot failure.
Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid.
It appears to be due to the fact that the network interface is failing to get brought up:
[ 114.385000] Waiting up to 10 more seconds for network. [ 124.355000] Sending DHCP requests ...# ..# .# timed out! [ 212.355000] IP-Config: Reopening network devices... [ 212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready #
I guess the board would boot fine without network if it didn't have ip=dhcp in the command line, so it's not strictly a kernel boot failure but still an ethernet issue.
There wasn't any failure reported by kernelci on linux-4.9.y so maybe this patch was applied by mistake on linux-4.4.y but I haven't investigated enough to prove this.
It wasn't applied "by mistake", as the commit log for this says it resolves an issue that was created in 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") which was in 3.11.
I'll go revert this now, as regressions are not good, perhaps some other change that happened between 4.5 and 4.9 in this area keeps the error you are seeing from happening.
thanks,
greg k-h
Hi,
On 12/05/2020 06:54:29+0100, Guillaume Tucker wrote:
Please see the bisection report below about a boot failure.
Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid.
It appears to be due to the fact that the network interface is failing to get brought up:
[ 114.385000] Waiting up to 10 more seconds for network. [ 124.355000] Sending DHCP requests ...# ..# .# timed out! [ 212.355000] IP-Config: Reopening network devices... [ 212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready #
I guess the board would boot fine without network if it didn't have ip=dhcp in the command line, so it's not strictly a kernel boot failure but still an ethernet issue.
I think the resolution of this issue is 99f81afc139c6edd14d77a91ee91685a414a1c66. If this is taken, then I think f5aba91d7f186cba84af966a741a0346de603cd4 should also be backported.
There wasn't any failure reported by kernelci on linux-4.9.y so maybe this patch was applied by mistake on linux-4.4.y but I haven't investigated enough to prove this.
Thanks, Guillaume
On 10/05/2020 18:27, kernelci.org bot wrote:
- This automated bisection report was sent to you on the basis *
- that you may be involved with the breaking commit it has *
- found. No manual investigation has been done to verify it, *
- and the root cause of the problem may be somewhere else. *
*
- If you do send a fix, please include this trailer: *
- Reported-by: "kernelci.org bot" bot@kernelci.org *
*
- Hope this helps! *
stable/linux-4.4.y bisection: baseline.login on at91-sama5d4_xplained
Summary: Start: e157447efd85b Linux 4.4.223 Plain log: https://storage.kernelci.org/stable/linux-4.4.y/v4.4.223/arm/multi_v7_defcon... HTML log: https://storage.kernelci.org/stable/linux-4.4.y/v4.4.223/arm/multi_v7_defcon... Result: 0d1951fa23ba0 net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS
Checks: revert: PASS verify: PASS
Parameters: Tree: stable URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Branch: linux-4.4.y Target: at91-sama5d4_xplained CPU arch: arm Lab: lab-baylibre Compiler: gcc-8 Config: multi_v7_defconfig Test case: baseline.login
Breaking commit found:
commit 0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd Author: Florian Fainelli f.fainelli@gmail.com Date: Mon Jan 18 19:33:06 2016 -0800
net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS
commit d5c3d84657db57bd23ecd58b97f1c99dd42a7b80 upstream. Commit 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") changed a hunk in phy_state_machine() in the PHY_RUNNING case which was not needed. The change essentially makes the PHY library treat PHY devices with PHY_IGNORE_INTERRUPT to keep polling for the PHY device, even though the intent is not to do it. Fix this by reverting that specific hunk, which makes the PHY state machine wait for state changes, and stay in the PHY_RUNNING state for as long as needed. Fixes: 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 7d2cf015c5e76..b242bec834f4b 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -912,10 +912,10 @@ void phy_state_machine(struct work_struct *work) phydev->adjust_link(phydev->attached_dev); break; case PHY_RUNNING:
/* Only register a CHANGE if we are polling or ignoring
* interrupts and link changed since latest checking.
/* Only register a CHANGE if we are polling and link changed
*/* since latest checking.
if (!phy_interrupt_is_valid(phydev)) {
if (phydev->irq == PHY_POLL) { old_link = phydev->link; err = phy_read_status(phydev); if (err)
@@ -1015,8 +1015,13 @@ void phy_state_machine(struct work_struct *work) dev_dbg(&phydev->dev, "PHY state change %s -> %s\n", phy_state_to_str(old_state), phy_state_to_str(phydev->state));
- queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
PHY_STATE_TIME * HZ);
- /* Only re-schedule a PHY state machine change if we are polling the
* PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
* between states from phy_mac_interrupt()
*/
- if (phydev->irq == PHY_POLL)
queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
PHY_STATE_TIME * HZ);
} void phy_mac_interrupt(struct phy_device *phydev, int new_link)
Git bisection log:
git bisect start # good: [b63f449e18b130fdc372b9717e72c19b83fc4876] Linux 4.4.222 git bisect good b63f449e18b130fdc372b9717e72c19b83fc4876 # bad: [e157447efd85bb2e6f8deaabbb62663bccd9bad2] Linux 4.4.223 git bisect bad e157447efd85bb2e6f8deaabbb62663bccd9bad2 # bad: [5733a9f4a3df384097c92c532aed34bc698a9acd] net: dsa: slave: fix of-node leak and phy priority git bisect bad 5733a9f4a3df384097c92c532aed34bc698a9acd # good: [1ce6993b857318a4b8c674b1bbaaf79aced34136] net/mlx5e: Fix blue flame quota logic git bisect good 1ce6993b857318a4b8c674b1bbaaf79aced34136 # good: [c32532162f8ea4beed50a20cf4f9b205c75fe1b1] serial: samsung: Fix possible out of bounds access on non-DT platform git bisect good c32532162f8ea4beed50a20cf4f9b205c75fe1b1 # good: [25e8aad6f491da6ae330148da09585371a3790f2] Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()" git bisect good 25e8aad6f491da6ae330148da09585371a3790f2 # good: [2f3e56e4b6020812350190f1cada230d790ce0e8] powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint() git bisect good 2f3e56e4b6020812350190f1cada230d790ce0e8 # bad: [0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd] net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS git bisect bad 0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd # good: [4ebef63e925e37f5de2f9da8fc86a545e4e0b945] sctp: fix the transports round robin issue when init is retransmitted git bisect good 4ebef63e925e37f5de2f9da8fc86a545e4e0b945 # good: [c175435fdf50c81ca2b6576f090cba31c3489209] NFC: nci: memory leak in nci_core_conn_create() git bisect good c175435fdf50c81ca2b6576f090cba31c3489209
# first bad commit: [0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd] net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS
On 5/12/2020 4:10 AM, Alexandre Belloni wrote:
Hi,
On 12/05/2020 06:54:29+0100, Guillaume Tucker wrote:
Please see the bisection report below about a boot failure.
Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid.
It appears to be due to the fact that the network interface is failing to get brought up:
[ 114.385000] Waiting up to 10 more seconds for network. [ 124.355000] Sending DHCP requests ...# ..# .# timed out! [ 212.355000] IP-Config: Reopening network devices... [ 212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready #
I guess the board would boot fine without network if it didn't have ip=dhcp in the command line, so it's not strictly a kernel boot failure but still an ethernet issue.
I think the resolution of this issue is 99f81afc139c6edd14d77a91ee91685a414a1c66. If this is taken, then I think f5aba91d7f186cba84af966a741a0346de603cd4 should also be backported.
Agreed.
On Tue, May 12, 2020 at 01:29:06PM -0700, Florian Fainelli wrote:
On 5/12/2020 4:10 AM, Alexandre Belloni wrote:
Hi,
On 12/05/2020 06:54:29+0100, Guillaume Tucker wrote:
Please see the bisection report below about a boot failure.
Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid.
It appears to be due to the fact that the network interface is failing to get brought up:
[ 114.385000] Waiting up to 10 more seconds for network. [ 124.355000] Sending DHCP requests ...# ..# .# timed out! [ 212.355000] IP-Config: Reopening network devices... [ 212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready #
I guess the board would boot fine without network if it didn't have ip=dhcp in the command line, so it's not strictly a kernel boot failure but still an ethernet issue.
I think the resolution of this issue is 99f81afc139c6edd14d77a91ee91685a414a1c66. If this is taken, then I think f5aba91d7f186cba84af966a741a0346de603cd4 should also be backported.
Agreed.
Okay, I've queued both for 4.4, thanks!
f5aba91d7f1 had a little conflict with missing 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") but I've worked around that.
On Tue, May 12, 2020 at 05:15:19PM -0400, Sasha Levin wrote:
On Tue, May 12, 2020 at 01:29:06PM -0700, Florian Fainelli wrote:
On 5/12/2020 4:10 AM, Alexandre Belloni wrote:
Hi,
On 12/05/2020 06:54:29+0100, Guillaume Tucker wrote:
Please see the bisection report below about a boot failure.
Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid.
It appears to be due to the fact that the network interface is failing to get brought up:
[ 114.385000] Waiting up to 10 more seconds for network. [ 124.355000] Sending DHCP requests ...# ..# .# timed out! [ 212.355000] IP-Config: Reopening network devices... [ 212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready #
I guess the board would boot fine without network if it didn't have ip=dhcp in the command line, so it's not strictly a kernel boot failure but still an ethernet issue.
I think the resolution of this issue is 99f81afc139c6edd14d77a91ee91685a414a1c66. If this is taken, then I think f5aba91d7f186cba84af966a741a0346de603cd4 should also be backported.
Agreed.
Okay, I've queued both for 4.4, thanks!
I notice 99f81afc139c was reverted in mainline with commit b43bd72835a5. The revert commit points out that:
"It was papering over the real problem, which is fixed by commit f555f34fdc58 ("net: phy: fix auto-negotiation stall due to unavailable interrupt")"
Maybe f555f34fdc58 should be backported to 4.4 instead of 99f81afc139c?
Thanks, -- Henri
On Fri, May 15, 2020 at 11:14:00AM +0300, Henri Rosten wrote:
On Tue, May 12, 2020 at 05:15:19PM -0400, Sasha Levin wrote:
On Tue, May 12, 2020 at 01:29:06PM -0700, Florian Fainelli wrote:
On 5/12/2020 4:10 AM, Alexandre Belloni wrote:
Hi,
On 12/05/2020 06:54:29+0100, Guillaume Tucker wrote:
Please see the bisection report below about a boot failure.
Reports aren't automatically sent to the public while we're trialing new bisection features on kernelci.org but this one looks valid.
It appears to be due to the fact that the network interface is failing to get brought up:
[ 114.385000] Waiting up to 10 more seconds for network. [ 124.355000] Sending DHCP requests ...# ..# .# timed out! [ 212.355000] IP-Config: Reopening network devices... [ 212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready #
I guess the board would boot fine without network if it didn't have ip=dhcp in the command line, so it's not strictly a kernel boot failure but still an ethernet issue.
I think the resolution of this issue is 99f81afc139c6edd14d77a91ee91685a414a1c66. If this is taken, then I think f5aba91d7f186cba84af966a741a0346de603cd4 should also be backported.
Agreed.
Okay, I've queued both for 4.4, thanks!
I notice 99f81afc139c was reverted in mainline with commit b43bd72835a5. The revert commit points out that:
"It was papering over the real problem, which is fixed by commit f555f34fdc58 ("net: phy: fix auto-negotiation stall due to unavailable interrupt")"
Maybe f555f34fdc58 should be backported to 4.4 instead of 99f81afc139c?
Notice if f555f34fdc58 is taken, then I believe 215d08a85b9a should also be backported.
-- Henri
linux-stable-mirror@lists.linaro.org